Monday, August 1, 2011

Parallel Query Option in Oracle


The Parallel Query Option is a mechanism that allows a large query to be split up (transparently to the end-user) into a number of smaller queries that can all be run simultaneously.

The separate sections of the query are operated by processes that can run virtually independently of each other, so that under suitable conditions the performance of the base query can scale almost linearly with the requested degree of parallelism.

The Parallel Query Option (PQO) should not be confused with the Oracle Parallel Server (OPS) which allows multiple instances of Oracle (usually running on a number of separate machines) to address a single set of database files and uses a high-speed ‘distributed lock manager’ to synchronise the ownership of blocks and locks across the separate SGAs.  Parallel Query Option does work on Oracle Parallel Server, but for the rest of this article I shall assume that we are running only a single instance.

Parallel Query Option: the purpose  :  The Parallel Query Option is based on the premises that: 
  • Disc is very much slower than CPU
  • Seek time is much larger than read time
  • An inefficient job that ‘wastes’ CPU may give better overall response time than an efficient job that leaves the CPU idle most of the time.
As a simple example, this means that a very large tablescan can produce results more quickly than even a good indexed access path if:
  •  the table is split over many discs
  •  different discs can be accessed in parallel with minimal contention
  • there are multiple CPUs in use so that there is no contention for CPU (although PQO can help even on single CPU systems).

  For example : 
   SQL> select count(*) from BIGT  where COLX = 99;

The above statement would be split down internally into a number of separately executed queries of the form:

AQL> select /*+ rowid(A1) PIV_SSF */ count(*)
from “BIGT” A1
where ROWID between :1 and :2
and COLX = 99;

The predicate
ROWID between :1 and :2
combined with the hint:
/*+ rowid (alias) */
is the real key to the operation of the PQO.  The predicate and hint ensure that the query operates only on a selected set of contiguous blocks from the table

Since it is possible to restrict a query to a range of blocks, Oracle need only determine that different blocks can be found on different devices to be able to run multiple copies of the query in parallel in such a way that they don’t interfere with each other;  and this separation of I/O is how a near linear performance gain can be achieved by PQO.

Clearly, the PQO is of most benefit when planning fairly large database systems (since these are the ones likely to have a significant number of discs and CPUs).  As a very rough guideline, I probably wouldn’t ‘design for PQO’ on a database of less than 25 GB. 

Implementing PQO:
There are three main steps to using PQO.  First, make sure that it is linked into our Oracle server (this usually happens by default);  second, set up the init.ora to allow for parallel query slaves and to control the side-effects of parallel query slaves;  third change the definitions of tables, views, or application SQL statements to allow the optimiser to invoke parallelism.

Init.ora Parameters
To use PQO, we need to have a number of ‘floating’ processes available in the system that can be called by our normal shadow process (dedicated server). These processes, called the ‘parallel query slaves’  will have names like ora_p000_{SID}, and we can have up to 256 of them.

The parameter parallel_min_servers specifies the number of such processes that we want permanently running and waiting for calls.  Since such processes take up machine resources we may want to keep this number low for normal running:  if we set  parallel_max_servers  Oracle will then spawn further processes on demand up to this limit.  One odd feature of dynamically spawned processes, however, is that they are owned at the O/S level by the user running the session’s shadow process, and not by any special ‘Oracle id.

To ensure that the dynamically spawned processes die when they are no longer needed, we  can set
parallel_server_idle_time to the number of minutes  a process is allowed to be idle before it dies.

Once we have the capability of running parallel queries, there are two other parameters which we should consider.  These relate to the way the system performance will degrade as the load increases.

Consider the problem of a query that takes 3 minutes to complete when running with a degree of parallelism of 15.  What happens if the database is so busy that no parallel slaves are free when someone starts the query.  Is it acceptable for the query to run serially and take 45 minutes, or should it simply die with a warning message ?

Set parallel_min_percent to a number between 0 and 100, and if Oracle is unable to get that percentage of the demanded degree of parallelism  it returns an error (ORA-12827 insufficient parallel query slaves available).  For example, we want a tablescan at degree 6.  We have parallel_min_percent set to 50.  If the query can run at least at degree 3 with the current number of slaves available it will run, but it there are insufficient slaves available for running at degree 3 the error is raised.

Associated with this parameter is optimizer_percent_parallel which affects the assumptions that the optimiser makes about the degree of parallelism at which a query is expected to run. 

Imagine a query which works best at a high degree of parallelism (20 say) by using hash-joins, but gives an acceptable performance at a lower level of parallelism (5 to 15 say) by switching to nested loops.  If the optimiser decides that the default degree of the query is 20, then it will always generate the hashing path, even when the system is busy and there are not enough free slaves around to run the query at degree 20.  Setting this parameter to a number between 0 and 100, however, tells the optimiser to reduce the assumed degree of parallelism - setting it to 50 in our example would make the optimiser choose the path that was appropriate to degree 10, hence taking the nested loop path.

Just for the record, I have only found a few extreme examples where a change in the value of this parameter has affected the path, and even then the actual performance effect was not hugely significant.  It is worth noting that the option exists though, as there are likely to be some types of application where it can be used to flatten out response times.

Both these parameters can be set at the session level using:

    SQL> alter session set parm = nnn;

which is very convenient both for testing and in production running.

Invoking PQO
After setting our parameters and restarting our database, what makes it happen ?

The first possibility is to give a table an explicit (or implicit) degree of parallelism:

alter table PX parallel (degree 4);
alter table PX parallel (degree default)

Whenever a query involves this table, the optimiser will try to find a path that uses parallel techniques on the table. 
Two points that are specific to 7.3.2:
First, the Cost Based Optimiser will always take over in queries where there are parallel tables available even if the current optimiser goal is Rule.  Second, the default keyword translates into ‘the smaller of the number of CPUs and the number of devices across which the table appears to be spread’. 

The second possibility for introducing parallelism into a query is to put an explicit hint into it:

SQL> select /*+ parallel (px, 4)*/
count(*) from PX;

SQL> select /*+ parallel (px, default) */
count(*) from PX;

The third option is to define views which contain hints, so that application code does not have to worry about degrees of parallelism,

create or replace view big_px as
SQL> select
/*+
    parallel (px1, 4)
    parallel (px2, 6)
*/
from px1, px2
where
px1.id = px2.x_id ;

An interesting, and possibly irritating, feature of the view definition above is the question of what happens when there are several parallel tables in a query.

In general it is not possible to predict the degree of parallelism that such a query will use, as it depends on the usage, the driving table, and the join paths used.  In the view above I could only say that, if no other parallel tables are introduced, the final query will run with degree 1, 4 or 6.

How Many Slaves:
So far in this article I have repeatedly used the expression ‘degree of parallelism’ and avoided saying anything explicitly about the number of parallel slave processes involved in a parallel query.

It is now time to come clean, and admit that the degree and the number of slaves are not always the same.  Specifically, a query of degree N may take 2N slaves.  Basically this is to handle sorting. 

Consider a query like:

SQL> select /*+ parallel (fa,3) */
prod_id, count(*)
from fa
group by prod_id ;

When the PQO cuts in, the degree (3)  specifies the number of slaves to be used to scan the table.   By looking at the other and other_tag columns of the 7.3 Explain Plan utility, it is possible to determine that each slave will perform the required aggregation on each subset of the table that it is required to scan.  The partial results are then shared out to another set of 3 slaves using some form of hashing algorithm to achieve a complete and even share.  Each slave in this second set sorts and aggregates the results it has received before forwarding them to the normal shadow process (which is given the title ‘query co-ordinator’.  

There are four things we can infer :

1) If we fire off a query that involves sorting, then half the slaves will be I/O intensive as they read the disc, and the others will be CPU intensive in brief bursts and quiet the rest of the time.

2) There will be large demands on memory as every slave will require its own sort_area_size, as will the normal shadow.

3) There must be some memory used up in the ‘n-squared’ lines of communication going on between the two layers of slaves:  this is actually in the SGA, and a guideline I have seen suggests an addition to the SGA of:
 parallel_max_servers * (largest degree used + 2) * 3K

4) If we have a query that returns a lot of rows, then the ultimate bottleneck on response may be the link between the normal shadow and the front-end, which is often the only serialised part of the process.  This is a good reason for not using PQO through the multi-threaded server.

One interesting little point that is not obvious is that we cannot rely on a group by to do an implicit order by.  In the example, slave p000 may return results about prod_ids 1,4,7,11 whilst slave p001 returns results about prod_ids 2,5,8,12:  unless the query does an explicit order, the result set returned to SQL*Plus will be in the order 1,4,7,11,2,5,8,12. etc

Conclusion:
The parallel query option is in principle a very simple concept which in practice means reviewing the assumptions we may have made about optimising Oracle over the last few years since it relies very much on a huge amount of table-scanning.

The main benefit of the PQO is in large databases, where a significant number of disks is available for spreading I/O;  but behind this it also requires lots of memory and multiple CPUs.


Enjoy  :-) 

No comments: