Oracle Float to Integer fun and Sachin Tendulkar

I recently spent some time working with Venkata, an Oracle programmer and keen cricket fan :

Venkata smiling the smile of a man whose team have won the cricket world cup twice more than...er...Wales.

Venkata smiling the smile of a man whose team have won the cricket world cup twice more than…er…Wales.

This post is about the fun and games involved in the bulk loading of data into Oracle – especially when it includes floating point values.
In Venkata’s honour, I’m going to explore this topic through the medium of the career of Sachin Tendulkar…

At this point it’s probably worth wandering off the point to explore some of the highlights of The Little Master’s career.

India’s tour of England in 1996 has been largely forgotten. However, it proved to be a watershed in our hero’s test career.
The first test of the series at Edgebaston was not untypical of the time – Indian batsmen skittled in conditions totally alien to them. The could only muster a paltry 219 in their second innings and none of them got beyond 18…apart from Tendulkar, ninth out for 122, made with a serenity that contrasted starkly with the chaos surrounding him.
It was in the very next match, at Lords, that India granted test debuts to two batsmen who themselves would prove quite useful over the years. Surav Ganguly marked the occasion with a century. Rahul Dravid fell an agonising five runs short of doing the same.
The fourth member of India’s vaunted middle-order announced himself in the next series we look at, against Australia.
Having been soundly beaten in the first test of the series, India came back the hard way in the second test.
VVS Laxmans 281 not out, with the not inconsiderable assistance of Dravid (180) and Harbhajan (13-196) became only the third team in the history of Test Cricket to win after following on. This against arguably the best team to have ever played the game who had their record winning streak of 16 consecutive tests ended somewhat emphatically.
The 2004/05 series against Bangladesh is included because Tendulkar’s highest test score (248 not out) was made in this series.
The 2009/10 series against South Africa is indicative of the journey that India made during the course of Tendulkar’s career.
At this time they were vying with South Africa for the status of the world’s leading test nation.

Back to the techie stuff. What I’m going to cover is :

  • What happens when you insert a floating point value into an INTEGER column
  • How you can overcome this using SQL*Loader
  • An alternative approach using external tables

The target table

The target table looks like this :

CREATE TABLE sachin_tests(
    season VARCHAR2(7),
    opponents VARCHAR2(50),
    venue VARCHAR2(4),
    matches INTEGER,
    runs INTEGER,
    high_score VARCHAR2(4),
    average FLOAT)
/

The data we’re inserting into the table is a selection of Tendulkar’s performance in the test series I’ve already mentioned.
As this table is populated via a data load, this data is in the form of a comma delimited file :

season, opponents, venue, matches, runs, high_score, average,
1996,ENGLAND,AWAY,3,428,177,85.60,
2000/01,AUSTRALIA,HOME,3,304,126,50.66,
2004/05,BANGLADESH,AWAY,2,284,248*,284.00,
2009/10,SOUTH AFRICA,HOME,2,213,106,71.00,

The limitations of datatype validation

As you’d expect, the structure of this table does serve to provide some elementary validation.
If we issue a statement like this …

INSERT INTO sachin_tests( 
    season, 
    opponents, 
    venue,
    matches, 
    runs, 
    high_score, 
    average)
VALUES( 
    '1996 or thereabouts', 
    'ENGLAND', 
    'EDGBASTON', 
    3, 
    428, 
    '177', 
    85.60);

We’ll get …

    '1996 or thereabouts',
    *
ERROR at line 10:
ORA-12899: value too large for column "MIKE"."SACHIN_TESTS"."SEASON" (actual:
19, maximum: 7)

However, what happens when we try inserting a float into an integer column ?

INSERT INTO sachin_tests( 
    season, 
    opponents, 
    venue,
    matches, 
    runs, 
    high_score, 
    average)
VALUES(
    '1996', 
    'ENGLAND', 
    'AWAY',
    3, 
    85.60, 
    '177', 
    428);

Run this and well get…

1 row created.

So, the record has been inserted without error. But have a look at the value in the table…

SQL> SELECT runs
  2  FROM sachin_tests
  3  WHERE season = '1996'
  4  AND opponents = 'ENGLAND'
  5  AND venue = 'AWAY'
  6  /

      RUNS
----------
	86

SQL> 

The float value we’ve inserted has been rounded. Is this some insidious bug in Oracle ?
Well, no. The ANSI standard for SQL stipulates that a float value inserted into an integer column will either be truncated or rounded. The action is implementation specific. In Oracle’s case, the value is rounded.

Therefore, if we want to protect against this, we need to write our own validation.

The trouble with triggers

At this point, you may be thinking that the obvious solution is to simply write a trigger to check that
the INTEGER values being inserted are not in fact FLOATS. Let’s knock up a quick test :

CREATE OR REPLACE TRIGGER st_bi_trg
    BEFORE INSERT ON sachin_tests
    FOR EACH ROW
BEGIN
    --
    -- Output the value to be inserted into the runs column
    --
    DBMS_OUTPUT.PUT_LINE(:new.runs);
    --
    -- Check to make sure that the runs value being inserted is a whole number
    --
    IF MOD( :new.runs, 1) > 0 THEN
        RAISE_APPLICATION_ERROR(-20000, 'Cannot insert a FLOAT into an INTEGER column.');
    END IF;
END;
/

Now let’s run the offending insert statement again. This time, we’ll set the serveroutput so that we can see the value of RUNS as the trigger fires :

set serveroutput on
  1  INSERT INTO sachin_tests( season, opponents, venue,
  2	 matches, runs, high_score, average)
  3  VALUES(
  4	 '1996', 'ENGLAND', 'AWAY',
  5*	 3, 85.60, '177', 428)
SQL> /
86

1 row created.

SQL> 

So, the value has been rounded before we even get to the trigger. An alternative approach is required here.

The SQL*LOADER option

The data we have to load is in a flat-file. So, let’s see what SQL*LOADER can do for us.

OPTIONS(SKIP=1)
LOAD DATA
INFILE 'tests.dat'
INTO TABLE sachin_tests
APPEND FIELDS TERMINATED BY ','
TRAILING NULLCOLS(
    season CHAR(7),
    opponents CHAR(50),
    venue CHAR(4),
    matches INTEGER EXTERNAL "DECODE(TRUNC(:matches),:matches,:matches,'Error Matches must be an Integer')",
    runs INTEGER EXTERNAL "DECODE(TRUNC(:runs), :runs, :runs, 'Error Runs must be an Integer')",
    high_score CHAR(4),
    average CHAR(7))

If we run this for our correctly formatted file :

$ sqlldr userid=mike control=tests.ctl
Password:

SQL*Loader: Release 11.2.0.2.0 - Production on Sun Jun 23 14:15:47 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Commit point reached - logical record count 4
$ 

Now, let’s play around with the .dat file and try the insert we’ve been having problems with up to now…

season, opponents, venue, matches, runs, high_score, average,
1996,ENGLAND,AWAY,3,85.60,177,428,
2000/01,AUSTRALIA,HOME,3,304,126,50.66,
2004/05,BANGLADESH,AWAY,2,284,248*,284.00,
2009/10,SOUTH AFRICA,HOME,2,213,106,71.00,

Save the file and run it again …

sqlldr userid=mike control=tests.ctl

This time, if we check in the SQL*Loader log file (tests.log), we’ll see :

Record 1: Rejected - Error on table SACHIN_TESTS, column RUNS.
ORA-01722: invalid number


Table SACHIN_TESTS:
  3 Rows successfully loaded.
  1 Row not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.

SQL*Loader has successfully filtered out our rogue record.

The External Table option

But what if we want to encapsulate the entire process within the database, rather than having to look at the log file on the OS.
Well, we can load the data into an External Table which we use as a staging area.
We can then insert the records either into the target table, or an error table using a multi-table insert statement.

A couple of housekeeping chores are required when setting up an external table.

First, we need to specify a directory to hold the tests.dat file. This needs to be created on the OS and be a directory to which the Oracle owner (usually oracle on *nix), has access…

cd /u01/app/oracle/admin
sudo su oracle
mkdir ext_tabs

Now copy the tests.data file to the new directory.

Next, we need to create a directory object in the database :

CREATE DIRECTORY ext_tabs AS
    '/u01/app/oracle/admin/ext_tabs'

The owner of the directory object in the database should automatically have READ and WRITE privileges.

Finally, the external table. The table will be structured to allow all numeric values to be loaded without rounding them…

CREATE TABLE tests_staging_ext(
    season VARCHAR2(7),
    opponents VARCHAR2(50),
    venue VARCHAR2(4),
    matches NUMBER,
    runs NUMBER,
    high_score VARCHAR2(4),
    average NUMBER)
    ORGANIZATION EXTERNAL
    (
        type oracle_loader
        default directory ext_tabs
        access parameters (
            records delimited by newline
            fields terminated by ','
            missing field values are NULL
            reject rows with all null fields( 
                season, opponents, venue,
                matches, runs, high_score, average)
        )
        location( 'tests.dat')
    )
    REJECT LIMIT UNLIMITED
/

Yep, we’ll take any numeric value into a number column.
If we do a quick check, we should see that all of the records have been loaded into our external table…

SQL> SELECT COUNT(*)   
  2  FROM tests_staging_ext 
  3  /

  COUNT(*)
----------
	 4

SQL> 

We’ll then write some code to sort out the load into our proper target table as well as an error table. The error table looks like this :

CREATE TABLE sachin_test_errors(
    season VARCHAR2(7),
    opponents VARCHAR2(50),
    venue VARCHAR2(4),
    matches NUMBER,
    runs NUMBER,
    high_score VARCHAR2(4),
    average NUMBER,
    rejection_reason VARCHAR2(100))
/

At this point, you may well be thinking that something like this could now do the job…

INSERT FIRST
    WHEN TRUNC(matches) != TRUNC(matches)
        OR TRUNC(runs) != TRUNC(runs)
    THEN
        INTO sachin_test_errors(
            season, opponents, venue,
            matches, runs, high_score, average, rejection_reason)
        VALUES(
            season, opponents, venue,
            matches, runs, high_score, average,
            'FLOAT value found where INTEGER expected.')
    ELSE
        INTO sachin_tests
SELECT season, opponents, venue,
    matches, runs, high_score, average
FROM tests_staging_ext
/

…unfortunately, this will cause us to trip over the same issue as we had with the trigger.
In order for this to work properly, we need to push the logic to check for FLOAT values into the SELECT statement itself…

INSERT FIRST
    WHEN rejection_reason IS NOT NULL
    THEN
        INTO sachin_test_errors(
            season, opponents, venue,
            matches, runs, high_score, average, rejection_reason)
        VALUES(
            season, opponents, venue,
            matches, runs, high_score, average,
            rejection_reason)
    ELSE
        INTO sachin_tests( 
            season, opponents, venue,
            matches, runs, high_score, average)
        VALUES(
            season, opponents, venue,
            matches, runs, high_score, average)
SELECT season, opponents, venue,
    matches, runs, high_score, average,
    CASE
        WHEN matches != TRUNC(matches) THEN 'MATCHES has a FLOAT value'
        WHEN runs != TRUNC(runs) THEN 'RUNS has a FLOAT value'
        ELSE
            NULL
    END as rejection_reason
FROM tests_staging_ext
/

Now, if we look at our target table, we should have the three valid records loaded

SELECT *
FROM sachin_tests
/

SEASON  OPPONENTS            VENU MATCHES RUNS HIGH                                 AVERAGE
------- -------------------- ---- ------- ---- ---- ---------------------------------------
2000/01 AUSTRALIA            HOME       3  304 126                                    50.66 
2004/05 BANGLADESH           AWAY       2  284 248*                                     284 
2009/10 SOUTH AFRICA         HOME       2  213 106                                       71 

The rejected record should be in the errors table …

SELECT *
FROM sachin_test_errors
/

SEASON  OPPONENTS            VENU MATCHES RUNS HIGH AVERAGE REJECTION_REASON             
------- -------------------- ---- ------- ---- ---- ------- ------------------------------
1996    ENGLAND              AWAY       3   86 177      428 RUNS has a FLOAT value        

There, I’ve managed to get through the entire post without mentioning the Champions Trophy final which, England have managed to let India win. At least Venkata will still be smiling.

About these ads

One thought on “Oracle Float to Integer fun and Sachin Tendulkar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s