Query Scheduling in a Data Warehouse
Scheduling of ad hoc queries is a responsibility of the query manager. Simultaneous large ad hoc queries, if not controlled, can severely affect the performance of any system: in particular if the queries are run using parallelism, where a single query can potentially use all the CPU resource made available to it. Typically this will be achieved using queuing mechanisms to ensure that large jobs do not clash. This means that the query manager has to integrate with the schedule manager.
The query manager will need to be able to issue commands to queue, abort and re¬queue query jobs. It will also need to be able to deal with multiple queues and queue priorities. One aspect of query control that is glaringly visible by its absence is the ability to predict how long a query will take to complete. Some RDBMS’s have a prediction facility, but to date they have been notoriously unreliable. Prediction tools are a new field, and the first of them is only just arriving on the market. In the absence of a good predictive tool; the best that can be done is to use the experience of similar queries, but to do this requires a lot of historic query data.
Leave a Comment
If you would like to make a comment, please fill out the form below.