Looking after an overnight batch process can be a big fraught at times.
If it’s a good day, you might begin the morning with a nice cup of coffee and a leisurely scroll through the logs to confirm that all is well.
In contrast, if the batch has overrun you may well find yourself scrambling through those same logs whilst gulping down large quantities of caffeine in a desperate attempt to hot-wire your brain into working out the elapsed time between each entry. Not great. Especially when you consider that, as Terry Pratchett put it,
“Coffee is a way of stealing time that should by rights belong to your older self”.
A better approach might be to get Oracle to do it for you.
What we’re going to look at here is :
- the INTERVAL data type that holds the difference between two TIMESTAMP values
- using the LAG SQL function to report the interval time between timestamps across rows of a query
- creating some graphs using SQLDeveloper User Defined Reports for when it’s just too early for words and numbers.
The Batch Log table
Running on Oracle 18cXE, our overnight batch logs to the BATCH_LOG table :
create table batch_log ( id number generated always as identity, log_ts timestamp not null, job_name varchar2(50), status varchar2(25)) /
I’ve used the following script to populate the table with some test data :
set serverout on size unlimited clear screen declare tbl_jobs dbms_utility.name_array; tbl_statuses dbms_utility.name_array; start_ts timestamp := to_timestamp('16-MAR-2021 01:00', 'DD-MON-YYYY HH24:MI'); v_ts timestamp; v_duration pls_integer; begin tbl_jobs(1) := 'EXTRACT'; tbl_jobs(2) := 'TRANSFORM'; tbl_jobs(3) := 'LOAD'; tbl_jobs(4) := 'REPORT'; tbl_statuses(1) := 'STARTED'; tbl_statuses(2) := 'COMPLETED'; for i in 1..7 loop v_ts := start_ts; for j in 1..tbl_jobs.count loop v_duration := case mod(j, 4) when 3 then trunc(dbms_random.value(8,11)) when 2 then trunc(dbms_random.value(30,41)) when 1 then trunc(dbms_random.value(25,31)) else trunc(dbms_random.value(15,21)) end; for k in 1..tbl_statuses.count loop insert into batch_log( log_ts, job_name, status) values( case when tbl_statuses(k) = 'STARTED' then v_ts else v_ts + (v_duration/1440) end, tbl_jobs(j), tbl_statuses(k)); end loop; v_ts := v_ts + (v_duration/1440); end loop; start_ts := start_ts + interval '1' day; end loop; end; / -- Now simulate a long running Load job update batch_log set log_ts = log_ts + (60/1440) where id >= (select max(id) from batch_log where job_name = 'LOAD' and status = 'COMPLETED'); commit;
As a result, the table now has a week’s worth of log entries.
The INTERVAL data type
You can find lots of information about Oracle’s DATETIME and INTERVAL data types in the documentation. Here’s the 18c version , for example ( all of the code here is written and tested on 18cXE).
You will see from this that we have two distinct INTERVAL data types – Year to Month and Day to Second.
It’s the latter of these which is relevant in the current context, unless you’re having extreme performance issues on your overnight batch.
If you have any scheduler jobs running, you can see an example of the INTERVAL data type in action in the _scheduler_jobs views :
select owner, job_name, last_run_duration from dba_scheduler_jobs t where run_count > 0;
OWNER JOB_NAME LAST_RUN_DURATION --------------- ---------------------------------------- ------------------- SYS PURGE_LOG +00 00:00:00.924991 SYS PMO_DEFERRED_GIDX_MAINT_JOB +00 00:00:00.117572 SYS CLEANUP_NON_EXIST_OBJ +00 00:00:00.221356 SYS CLEANUP_ONLINE_IND_BUILD +00 00:00:00.382708 SYS CLEANUP_TAB_IOT_PMO +00 00:00:00.129863 SYS CLEANUP_TRANSIENT_TYPE +00 00:00:00.071285 SYS CLEANUP_TRANSIENT_PKG +00 00:00:00.094254 SYS CLEANUP_ONLINE_PMO +00 00:00:00.024660 SYS FILE_SIZE_UPD +00 00:00:00.166189 SYS ORA$AUTOTASK_CLEAN +00 00:00:00.151000 SYS DRA_REEVALUATE_OPEN_FAILURES +00 00:00:00.387121 SYS BSLN_MAINTAIN_STATS_JOB +00 00:00:00.443146 SYS RSE$CLEAN_RECOVERABLE_SCRIPT +00 00:00:00.324392 SYS SM$CLEAN_AUTO_SPLIT_MERGE +00 00:00:00.023483 APEX_180200 ORACLE_APEX_PURGE_SESSIONS +00 00:00:00.868204 APEX_180200 ORACLE_APEX_MAIL_QUEUE +00 00:00:00.026859 APEX_180200 ORACLE_APEX_WS_NOTIFICATIONS +00 00:00:01.685787 APEX_180200 ORACLE_APEX_DAILY_MAINTENANCE +00 00:00:00.943306 ORDS_METADATA CLEAN_OLD_ORDS_SESSIONS +00 00:00:00.640826 19 rows selected.
According to the documentation, INTERVALs are not subject to a format model in the same way that dates and timestamps are. However, you can re-format their contents using the EXTRACT function.
For example, if we want to convert the LAST_RUN_DURATION from our previous query into seconds, then we can run :
select owner, job_name, (extract ( minute from last_run_duration) * 60) + extract( second from last_run_duration) as last_run_secs from dba_scheduler_jobs where run_count > 0 order by 3 desc /
OWNER JOB_NAME LAST_RUN_SECS --------------- ---------------------------------------- ------------- APEX_180200 ORACLE_APEX_WS_NOTIFICATIONS 1.685787 APEX_180200 ORACLE_APEX_DAILY_MAINTENANCE .943306 SYS PURGE_LOG .924991 APEX_180200 ORACLE_APEX_PURGE_SESSIONS .868204 ORDS_METADATA CLEAN_OLD_ORDS_SESSIONS .640826 SYS BSLN_MAINTAIN_STATS_JOB .443146 SYS DRA_REEVALUATE_OPEN_FAILURES .387121 SYS CLEANUP_ONLINE_IND_BUILD .382708 SYS RSE$CLEAN_RECOVERABLE_SCRIPT .324392 SYS CLEANUP_NON_EXIST_OBJ .221356 SYS FILE_SIZE_UPD .166189 SYS ORA$AUTOTASK_CLEAN .151 SYS CLEANUP_TAB_IOT_PMO .129863 SYS PMO_DEFERRED_GIDX_MAINT_JOB .117572 SYS CLEANUP_TRANSIENT_PKG .094254 SYS CLEANUP_TRANSIENT_TYPE .071285 APEX_180200 ORACLE_APEX_MAIL_QUEUE .026859 SYS CLEANUP_ONLINE_PMO .02466 SYS SM$CLEAN_AUTO_SPLIT_MERGE .023483 19 rows selected.
What we need now is a way to calculate the durations between rows in a single query…
The LAG function
All we need to do is to use the LAG function to find the elapsed time between the timestamp of the current record and that of it’s predecessor in the result set :
select trunc(log_ts) as run_date, job_name, status, log_ts, case when status = 'COMPLETED' then log_ts - lag( log_ts) over( order by log_ts) end as duration, case when status = 'COMPLETED' then extract( hour from log_ts - lag(log_ts) over (order by log_ts)) * 60 + extract( minute from log_ts - lag(log_ts) over (order by log_ts)) end as duration_mins from batch_log order by log_ts /
We could use what we have so far as the basis for an ad-hoc script to give us information about batch runtimes. Then again, if it’s one of those mornings where you’re struggling a bit, looking at a picture might make things a bit easier. That last listing was from SQLDeveloper, which is handy considering where we’re going next …
SQLDeveloper User Defined Chart Reports – The Line Chart
We can see the total runtime for the batch for each day using the query :
with batch_runs as ( select job_name, status, log_ts, case status when 'COMPLETED' then (extract( hour from log_ts - lag( log_ts, 1) over( order by log_ts)) * 60) + extract(minute from log_ts - lag(log_ts, 1) over( order by log_ts)) end as duration_mins from batch_log where ( (job_name = 'EXTRACT' and status = 'STARTED') or (job_name = 'REPORT' and status = 'COMPLETED')) order by log_ts) select trunc(log_ts) as run_date, duration_mins from batch_runs where duration_mins is not null order by 1 /
RUN_DATE DURATION_MINS ----------- ------------- 16-MAR-2021 88 17-MAR-2021 83 18-MAR-2021 90 19-MAR-2021 84 20-MAR-2021 94 21-MAR-2021 91 22-MAR-2021 155 7 rows selected.
Jeff Smith has a nifty way of turning a SQLDeveloper query grid into a chart report, which I’ve largely plagiarised below.
First, we run the query in a SQLDeveloper worksheet. Then, in the result grid, right-click and select Save Grid as Report :
This will take you to the Create Report Window :
As you can see, SQLDeveloper has reformatted the query a bit.
Now we can add a Name for the report.
Before we go any further, it’s probably a good idea to specify a connection at this point. This will help us when we’re configuring the report as we’ll be able to use live data to see how things look.
The other thing we need to do is to specify a series name so that we can produce a line graph.
This means that the report query now looks like this :
SELECT "RUN_DATE", "DURATION_MINS", 'Overnight Runs' as series_name FROM( with batch_runs as ( select job_name, status, log_ts, case status when 'COMPLETED' then (extract( hour from log_ts - lag( log_ts, 1) over( order by log_ts)) * 60) + extract(minute from log_ts - lag(log_ts, 1) over( order by log_ts)) end as duration_mins from batch_log where ( (job_name = 'EXTRACT' and status = 'STARTED') or (job_name = 'REPORT' and status = 'COMPLETED')) order by log_ts) select trunc(log_ts) as run_date, duration_mins from batch_runs where duration_mins is not null order by 1 )
…and with a connection specified, I can check everything is still working as expected by clicking the Test Report button…
…which provides the expected output.
Now I know that my query still works, I can change the report Style from Table to Chart :
Now we need to click on the Property node of the navigation tree in the left-hand pane of the Create Report Window.
Here, we change the Chart Type to Line :
Next we need to specify the Data for the report so we need to click on the Data Tree Node.
As we’re connected to a database, we can Fetch Column Names so we can select the Group, Series and Value columns from a drop-down.
Now we can check the Use Live Data checkbox and click Refresh Report to see what the output will look like :
Finally, click the Apply button in the bottom right of the Create Report window and the new report will appear in the User Defined Reports node of the main Reports pane :
Now we can run the report in the usual way…
If we want a more granular view of what’s happening during the batch run, but we’re not quite ready to deal with actual text, we can try using…
A Stacked Bar Chart
This time, we want to know the elapsed time for each process in the batch. Therefore, we’ll use this query as the basis for our report :
select trunc(log_ts) as run_date, job_name, status, log_ts, case when status = 'COMPLETED' then log_ts - lag( log_ts) over( order by log_ts) end as duration, case when status = 'COMPLETED' then extract( hour from log_ts - lag(log_ts) over (order by log_ts)) * 60 + extract( minute from log_ts - lag(log_ts) over (order by log_ts)) end as duration_mins from batch_log order by log_ts
The Report creation is as before, except that this time we choose a Chart Type of Bar – Vertical Stack.
Additionally, in order to view the total time for each day, we go to the Plot Area Property and check the Cumulative box in the Data Tips section :
Running this new report, we can see that the increase in duration is due to the Load process taking rather longer than normal :
Of course, you’ll need to investigate further to find out just exactly what’s going on but, thanks to some handy SQL features and a couple of SQLDeveloper Chart Reports, at least you know where to start looking.