Stat Wars – using the Force (DBMS_STATS) for Incremental Stats Gathering

We haven’t been queuing for days to see that film.
Darth Debbie was quite insistent that she really did have better things to do than
queue up outside a cinema behind someone dressed as a Wookie.
Even the potential opportunity of uttering the line “will someone get this walking carpet out of my way ?!”, has not moved her on this point.
All things Star Wars are not to be mentioned in our house at the moment. So, any resemblance to a certain Space Opera in what follows is purely coincidental.

Anyway, a Long Time Ago in a Database far, far away….

It wasn’t easy being a DBA on the Jedi Archives Database.
Strong with the Force they may have been, but the users weren’t particularly patient.
On top of that, there was still some muttering going on about that unfortunate data loss incident with Obi Wan Kenobi and the missing planetary system.

All in all then, when complaints began coming in about the nightly batch overrunning, it was with a sense of dread that the Geeki began to investigate….

It was one of those batch jobs in which data was loaded into a table partition on the Oracle 11g R2 Database via a standard ETL process.
The table was partitioned daily so each load was into a new table partition.
That data was then used immediately by a subsequent report.

This required a call to DBMS_STATS to ensure that the Optimizer stats are up to date.
The problem was that it was taking several hours to gather stats, despite the partition name and the degree of parallelism being specified in the call.

It was at this point that our hard-pressed DBA recalled the wise words of his master – “when all else fails, read the manual you must!”

Incremental Stats Gathering – what the Manual saya

Whilst it’s fair to say that there were one or two issues with Incremental Stats when they were first introduced in 11g, these kinks are ironed out by version 11.2.03.

To start with then, this is what the Oracle Documentation has to say on when to gather Manual Statistics :

“For tables that are bulk-loaded, run the statistics-gathering procedures on the tables immediately following the load process.
Preferably, run the procedures as part of the same script or job that is running the bulk load.”

This confirms that the stats gathering step in the batch was appropriate and necessary.

There was also something quite interesting about the option of gathering Incremental Stats :

“An alternative to mandatory full table scans is gathering incremental statistics. When the following criteria are met, the database updates global statistics incrementally by scanning only the partitions that have changed:

  • The INCREMENTAL value for the partitioned table is true.
  • The PUBLISH value for the partitioned table is true.
  • The user specifies AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT and AUTO for GRANULARITY when gathering statistics on the table.

“Gathering table statistics incrementally has the following consequences:

  • The SYSAUX tablespace consumes additional space to maintain global statistics for partitioned tables.
  • If a table uses composite partitioning, then the database only gathers statistics for modified subpartitions. The database does not gather statistics at the subpartition level for unmodified subpartitions. In this way, the database reduces work by skipping unmodified partitions.
  • If a table uses incremental statistics, and if this table has a locally partitioned index, then the database gathers index statistics at the global level and for modified (not unmodified) index partitions. The database does not generate global index statistics from the partition-level index statistics. Rather, the database gathers global index statistics by performing a full index scan.”

In summary, if you specify the appropriate parameters, Oracle will work out what stats to gather and not waste time refreshing statistics on data that has not changed.

Making sure that the target table has INCREMENTAL and PUBLISH set to TRUE is fairly straightforward and can be achieved as simply as :

dbms_stats.set_table_prefs(my_table_owner, my_table, 'INCREMENTAL', 'TRUE');
dbms_stats.set_table_prefs(my_table_owner, my_table, 'PUBLISH', 'TRUE');

… where my_table_owner is the owner and my_table is the name of the table you want to gather stats on.

That rest should be simple enough for our Jedi DBA. After all, the Recommended parameter values for DBMS_STATS to perform Incremental Stats Gathering should be the defaults, right ?
I’m sure someone once said something along the lines of “beware the quick and easy path…”

The Parameter values for DBMS_STATS

The fact is that DBMS_STATS.GATHER_TABLE_STATS has a number of parameters, not all of which are mandatory. However, the defaults used for some of them are not necessarily what is required for Incremental Stats Gathering to take place.

You can find the default values that this procedure uses here.

The recommended settings to enable incremental stats gathering are subtly different :

Parameter Name Recommended Value
estimate_percent DBMS_STATS.AUTO_SAMPLE_SIZE
method_opt FOR ALL COLUMNS SIZE AUTO
degree DBMS_STATS.AUTO_DEGREE
granularity AUTO
cascade DBMS_STATS.AUTO_CASCADE
no_invalidate DBMS_STATS.AUTO_INVALIDATE

It may be that most of the default values match up to those recommended. However this is at least one parameter that definitely doesn’t have the same default value as that recommended.

The DEGREE parameter, which determines the degree of parallelism to be used in the Stats Gathering operation is defaulted to NULL.
This may seem strange at first, until you realise that this is because Parallel Stats Gathering is only available in the Enterprise Edition of the Oracle 11g R2 database.

If your running a Data Warehouse application, you will almost certainly be on Enterprise Edition.
If you have partitioned tables, Enterprise Edition is a pre-requisite as partitioning is an option for EE.
If you want to make sure, you can always check by running :

select banner
from v$version
where banner like 'Oracle Database %'
/

The output should be something like :

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production.

Assuming this is the case, you should be able to specify the recommended DBMS_STATS.AUTO_DEGREE as the value for the DEGREE parameter without fear of any licensing reprucussions later on.

One other point for consideration is whether or not you want to collect Histograms.
It’s not unheard in Data Warehouses for bind variable peeking to have been disabled. If your database is set up this way then the optimizer will never look at histograms. Therefore, it seems a bit pointless to collect and store them.

If you want to know if this is the case, you can find out easily enough….

select value
from v$parameter
where name = '_optim_peek_user_binds'
/

If this query returns ‘TRUE’, or does not return any rows, then bind variable peeking is enabled and histograms are required. Otherwise, it isn’t and Histograms will not be used by the Optimizer.

Incremental Stats – the code

Now that we know what the optimal parameter values are likely to be, we can knock up something like this to gather stats on our partitioned tables :

create or replace package incremental_stats
as
	procedure gather_table_stats
	(
		i_owner in user_users.username%type default null,
		i_table in user_tables.table_name%type
	);
end incremental_stats;
/

create or replace package body incremental_stats
as
    -- Private 
	function bind_peeking_enabled_fn
		return boolean
	is
		l_value v$parameter.value%type;
	begin
		select value
		into l_value
		from v$parameter
		where name = '_optim_peek_user_binds';

		if l_value = 'TRUE' then
			return true;
		else
			return false;
		end if;
	exception when no_data_found then
	    -- parameter not set...
		return true;
	end bind_peeking_enabled_fn;
	
	procedure gather_table_stats
	(
		i_owner in user_users.username%type default null,
		i_table in user_tables.table_name%type
	)
	is
	--
	-- Gather table stats using Oracle recommended settings (as at 11G R2).
	-- See http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#i13546
	-- The one exception to this is opt_method - if we're not using bind-variable peeking at the moment
	-- there's no point in collectiong histograms. 
	--
			
		l_method_opt varchar2(30);
	begin
		
		-- Check to see if bind_variable peeking is enabled
		if bind_peeking_enabled_fn then
			l_method_opt := 'FOR ALL COLUMNS SIZE AUTO';
		else
			l_method_opt := 'FOR ALL COLUMNS SIZE 1';
		end if;
		
		-- Ensure that table prefs are set to facilitate incremental stats gathering for partitions
		dbms_stats.set_table_prefs(i_owner, i_table, 'INCREMENTAL', 'TRUE');
		dbms_stats.set_table_prefs(i_owner, i_table, 'PUBLISH', 'TRUE');
		
		--
		-- NOTE - apart from the parameters specified here, all other parameters
		-- for DBMS_STATS.GATHER_TABLE_STATS derive their default value from the existing preferences.
		-- These can be set by a call to DBMS_STATS.SET_TABLE_PREFS. 
		--
		dbms_stats.gather_table_stats
		(
			ownname => i_owner,
			tabname => i_table,
			degree => dbms_stats.auto_degree,
			method_opt => l_method_opt
		);
	end gather_table_stats;
	
end incremental_stats;
/

All we need now is a test case to show that table stats are only gathered on the partition(s) that have changed.

A Simple Test

To start with, we need a partitioned table….

create table star_wars_characters
(
    film_name varchar2(100) not null,
    character_name varchar2(100)
)
partition by list(film_name)
(
    partition originals values ( 'A NEW HOPE', 'THE EMPIRE STRIKES BACK', 'RETURN OF THE JEDI'),
    partition prequels values ('THE PHANTOM MENACE', 'ATTACK OF THE CLONES', 'REVENGE OF THE SITH'),
    partition sequels values ('THE FORCE AWAKENS')
)
/

…with some data already in it…

-- Phantom Menace

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'ANAKIN SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'DARTH MAUL')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'PADME AMIDALLA')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'SENATOR PLAPATINE')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'YODA')
/

insert into star_wars_characters( film_name, character_name)
values('THE PHANTOM MENACE', 'MACE WINDU')
/

-- Attack of the Clones

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'ANAKIN SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'DARTH MAUL')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'PADME AMIDALLA')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'SENATOR PLAPATINE')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'YODA')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'MACE WINDU')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'COUNT DOOKU')
/

insert into star_wars_characters( film_name, character_name)
values('ATTACK OF THE CLONES', 'JANGO FETT')
/

-- Revenge of the Sith

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'ANAKIN SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'DARTH MAUL')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'PADME AMIDALLA')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'SENATOR PLAPATINE')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'YODA')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'MACE WINDU')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'COUNT DOOKU')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'JANGO FETT')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'ANAKIN SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'DARTH MAUL')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'PADME AMIDALLA')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'SENATOR PLAPATINE')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'YODA')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'MACE WINDU')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'COUNT DOOKU')
/

insert into star_wars_characters( film_name, character_name)
values('REVENGE OF THE SITH', 'SENATOR ORGANA')
/

-- A New Hope
insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'LUKE SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'DARTH VADER')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'HAN SOLO')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'PRINCESS LEIA')
/

insert into star_wars_characters( film_name, character_name)
values('A NEW HOPE', 'GRAND MOFF TARKIN')
/

-- Empire Strikes Back
insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'LUKE SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'DARTH VADER')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'HAN SOLO')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'PRINCESS LEIA')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'GRAND MOFF TARKIN')
/

insert into star_wars_characters( film_name, character_name)
values('THE EMPIRE STRIKES BACK', 'LANDO CALRISSIAN')
/

-- Return of the Jedi
insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'LUKE SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'DARTH VADER')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'OBI WAN KENOBI')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'R2-D2')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'C3PO')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'HAN SOLO')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'PRINCESS LEIA')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'GRAND MOFF TARKIN')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'LANDO CALRISSIAN')
/

insert into star_wars_characters( film_name, character_name)
values('RETURN OF THE JEDI', 'ADMIRAL ACKBAR')
/ 

-- Force Awakens

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'KYLO RENN')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'FINN')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'REY')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'BB8')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'HAN SOLO')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'PRINCESS LEIA')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'LUKE SKYWALKER')
/

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'CHEWBACCA')
/

commit;

…and gather stats.

begin
    dbms_stats.gather_table_stats('MIKE', 'STAR_WARS_CHARACTERS');
end;
/

We can see when the stats were last gathered on each partition…

select partition_name, to_char(last_analyzed, 'DD-MON-YYYY HH24:MI')
from user_tab_partitions
where table_name = 'STAR_WARS_CHARACTERS'
/

PARTITION_NAME	     TO_CHAR(LAST_ANALYZED,'DD-
-------------------- --------------------------
ORIGINALS	     19-DEC-2015 21:36
PREQUELS	     19-DEC-2015 21:36
SEQUELS 	     19-DEC-2015 21:36

Now, if we insert data into a single parition…

insert into star_wars_characters( film_name, character_name)
values('THE FORCE AWAKENS', 'CAPTAIN PHASMA')
/

commit;

… and run our stats gathering package…

begin
    incremental_stats.gather_table_stats('MIKE', 'STAR_WARS_CHARACTERS');
end;
/

… we can see that stats have only been gathered on the partitions that have changed…

select partition_name, to_char(last_analyzed, 'DD-MON-YYYY HH24:MI')
from user_tab_partitions
where table_name = 'STAR_WARS_CHARACTERS'
/

PARTITION_NAME	     TO_CHAR(LAST_ANALYZED,'DD-
-------------------- --------------------------
ORIGINALS	     19-DEC-2015 21:36
PREQUELS	     19-DEC-2015 21:36
SEQUELS 	     19-DEC-2015 21:44

So, is this the panacea for all of your Stats Gathering woes ?
The Geeki reckon that they’ve observed some significant improvements in Stats gathering execution times. One example was where a job went from running in over 4 hours to under 18 minutes.
They’ve also noticed that there is less of a tendency for batch jobs to serialize because some stats job is hogging all of the parallel servers on the database.

That said, you need to remember that this is a Galaxy far, far away and there’s no knowing how the behaviour of Midi-Chlorians may affect runtime.
The best thing to do would be to test it on your application with your partitioned tables ( which will probably be a bit bigger than the one in my test case).

Planning our trip to finally see the film has hit a bit of a snag. When I suggested that we might go in fancy dress, Deb told me that she’d rather kiss a Wookie.
Sometimes I suspect that, despite her protestations to the contrary, she’s more of a fan than she lets on.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s