Cumulative Total in Oracle SQL – Cricket, but not as we know it

If you spend you working life playing with databases, sooner or later, you’ll come across a situation where you want to have a running total of something or other ( usually money) included in the result set of a query.
For many years, this required quite a bit of messing about. Such techniques a covered nicely in this article by Kevin Meade.

Also mentioned by Kevin, and what I’m covering here, is the method of achieving this without too much fuss, using Oracle’s Analytic Functions. Yes, the syntax is a bit arcane on the face of it, but bear with me here.

But before I get into that, I need some data.
For this, I’m going to wander down memory lane to my days of cricket played on rather “sporting” wickets in local parks and occasionally ( if we were playing away at a really posh club), the Village Green.

Let’s walk through a typical over from my very part-time and very brief bowling career from a database perspective.
The commentary will be in the style of those ball-by-ball text commentaries you can get on the internet

First of all, let’s have a look at the table :

    over_no NUMBER(2),
    ball_no NUMBER(2),
    runs NUMBER(2),
    wkt VARCHAR2(1) DEFAULT 'N')

Now to collect some data….

With Smithers standing at the top of his run-up several questions are about to be answered. “Are they short of bowlers this week then ?” being the most obvious.
On the plus side, I think we can now answer the question : “Who ate all the pies ?”
Helmets have been donned in anticipation…by the people playing tennis on the courts beyond the mid-wicket boundary.

First ball…over-pitched and drilled through extra-cover for four. No wonder his team-mates claim that the safest place to field when he’s bowling is behind several inches of armour plating…

INSERT INTO overs( over_no, ball_no, runs)
VALUES( 1, 1, 4)

Next ball, overcompensates. The resulting long-hop is still rising as it disappears over the anxious looking tennis players. Hmmm, that looks less like a pull-shot than a drive off the first tee…

INSERT INTO overs( over_no, ball_no, runs)
VALUES( 1, 2, 6)

Eventful over so far…but at least the runs have temporarily dried up – drilled straight to mid-off, who can’t get out of the way in time. No run

INSERT INTO overs( over_no, ball_no, runs, wides)
VALUES( 1, 3, 0)

Better still, next delivery batsmen is surprised by the variation ball…a straight one on the stumps and is BOWLED…

INSERT INTO overs( over_no, ball_no, runs, wkt)
VALUES( 1, 4, 0, 'Y')

Classing Mike Smithers wicket – playing on the batsman’s pity. Next delivery to the new batsman, normal service is resumed. Some nifty fielding on the boundary keeps them down to 3….

INSERT INTO overs( over_no, ball_no, runs)
VALUES( 1, 5, 3)

And finally…defended into the off-side for an unusual dot-ball…a sigh of relief from fielders and tennis players alike as the over comes to an end…

INSERT INTO overs( over_no, ball_no, runs)
VALUES( 1, 6, 0)

For the over-worked scorer, it would be really handy to have some way of keeping track of the cumulative total after each ball ( how many runs for how many wickets).
This is where the Analytic Function stuff comes in…

SELECT runs, wkt,
    SUM( runs) OVER ( PARTITION BY over_no order by ball_no)
        ||' FOR '
        || SUM(CASE wkt WHEN 'Y' THEN 1 ELSE 0 END) 
            OVER ( PARTITION BY over_no ORDER BY ball_no) as Total
FROM overs
WHERE over_no = 1
ORDER BY ball_no

The output is :

---------- --- --------------------
	 4 N   4 FOR 0
	 6 N   10 FOR 0
	 0 N   10 FOR 0
	 0 Y   10 FOR 1
	 3 N   13 FOR 1
	 0 N   13 FOR 1

6 rows selected.

The syntax could probably do with some explaining.
First of all, we’re getting the sum of the column…but not for the entire result set – just for the result set so far. We need to specify over what subset of  rows ( in this case, the over_no column). If the over_no value changes, then Oracle will break and start calculating again for the new over_no value. Yes, I realise that this would mean I’d have to re-work this example, but I didn’t often get asked to bowl a second over after the events of my first !
We also need to make sure that we’re getting our running total in the correct order – hence the order by.)
To put it another way :
SUM – Get me a running total of this
OVER…PARTITION BY – within this subset of data
ORDER BY…calculate in this order

A couple of points to note – the Analytic Function is sorting the rows in the result set using a combination of the column you’re partitioning over and the column(s) you’re ordering by. It’s probably a good idea therefore, to make sure that this combination of columns will uniquely identify a row in the result set.
The same really applies to the order by clause. You want to make sure you can identify a single row within the subset defined by your partition columns.
Right, off to play with someone else’s bowling figures ( I don’t want to complicate things with lots of big numbers !)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.