Defaulting Null values in SQL*Loader

Having spent the last couple of months working with a South African ( hello Trevor) and being reminded, oh so gently, of that nation’s recent cricketing success, I was hoping to get the opportunity to return the favour via the medium of Rugby Union.

The Springboks narrow defeat of England in the Autumn International did little to derail this cunning plan. Unfortunately, my emergency backup nationality failed at a crucial moment as the hitherto invincible All Blacks were roundly thrashed by England a week later.
People in glass-houses…

Anyway, what follows is a solution to a fairly specific problem we were confronted with recently.

The problem

Imagine you have a flat file which is loaded by means of SQL*Loader into a database table.
For operational reasons, it is not possible to alter the structure of the table.

The table may look something like this :

    fixture_date DATE NOT NULL,
    home_team VARCHAR2(30) NOT NULL,
    away_team VARCHAR2(30) NOT NULL,
    home_points NUMBER NOT NULL,
    away_points NUMBER NOT NULL)

The file that you are required to load into the table is called fixtures.dat and looks like this …


The data is the fixture list for this year’s Rugby Championship. No, I’m not making any comment about any of the results.

Notice a problem ? Yes, the HOME_POINTS and AWAY_POINTS columns in the table are mandatory, but the file itself contains nulls for these values in some of the records.
So the question is – how do we load this file directly into the table using only SQL*Loader ?

The SQL*Loader Control File

These days, you’ll often find SQL*Loader used for External Tables.
In this instance, we’re going to be using this utility in it’s more traditional command-line mode.
Fortunately, the command itself is fairly simple, because most of the messy stuff can be written separately in a control file.

The control file specifies

  • the action ( LOAD DATA)
  • the file containing the data (INFILE)
  • the table to load into
  • The delimiter used in the file (FIELDS TERMINATED BY)
  • treat any missing values at the end of a line as null (TRAILING NULLCOLS)

Finally, and most significantly from our point of view, it specifies how any data needs to be modified before loading.

The formatting of date data is fairly common place in control files.
However, you can also manipulate individual data items in more sophisticated ways such as, for example, assigning a default value when a null exists…

In this case, the control file looks something like this :

INFILE 'fixtures.dat'
INTO TABLE results
    fixture_date DATE "DDMMYYYY",
    home_points "NVL(:home_points,0)",
    away_points "NVL(:away_points,0)")   

Time to put it to the test (NOTE – in this case, I’m connecting to a schema where the RESULTS table already exists) :

$ sqlldr userid=mike control=results.ctl

SQL*Loader: Release - Production on Thu Dec 27 12:50:46 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Commit point reached - logical record count 12

It looks as if all the rows have been loaded successully so we can now check the actual contents of the table …

SQL> SELECT fixture_date, home_team, away_team,  
  2     home_points, away_points
  3  FROM results
  4  WHERE fixture_date = TO_DATE('15092012', 'DDMMYYYY')
  5  /

------------ -------------------- -------------------- ----------- -----------
15-SEP-12    AUSTRALIA		  ARGENTINA			 0	     0
15-SEP-12    NEW ZEALAND	  SOUTH AFRICA			 0	     0


This is by no means the neatest way of manipulating data during upload into Oracle and there may well be performance penalties using this method. However, when you’re stuck for a better alternative, it’s worth remembering that SQL*Loader does have a fair amount of flexibility.

There, I’ve managed to get all the way through this post without a single reference to South Africa’s success or otherwise in this particular competition…but the results can be found here if you just have to know how it went for them.

That’s the great thing about writing a blog, you always get the last word…unless a certain Welsh person decides to remind you just which team won the Six Nations this year!

About these ads

3 thoughts on “Defaulting Null values in SQL*Loader

  1. Hi! I was very intrigued by the foregoing article and it gave me some ideas for further development of a database that I run that holds the match statistics for a local youth rugby (i.e. rugby union/football) league.

    However, I was wondering do you, or other readers, have any ideas or pointers on how best to handle rugby’s bonus points; i.e. 1 BP for 4 tries and/or 1 BP for losing team coming within 7 points or less of the winning score.

    I have written a view for Standings that reads from a Master Results table. The Standings table is correct except for one statistic: Bonus Points. Unfortunately, as written, the logic in the View does not distinguish between the losing team coming within 7 points of the winner but awards a bonus point to both teams. I’ve been tearing my hair out trying to fix this.

    I’ve posted on a couple of Forums but have not had a response.
    Apologies if this is not the right place or correct protocol for posting.

    • Richard,

      I’m assuming that you’re using Oracle or some other relational database ( and not one of those new-fangled No-SQL ones). I’m also assuming that points are awarded as follows :

      a win is 4 points
      a draw is 2 points
      a bonus point for scoring 4 or more tries in a game (win or lose)
      a bonus point for losing by 7 points or less

      So…you don’t get another bonus point for 8 tries in a game. Also, you can get up to 2 bonus points if you lose the game, but only one at best, if you win.

      I’ve created a simplified results table. It’s probably not the way you’ve implemented it, but it makes this example a bit simpler.

      CREATE TABLE team_results(
          team_name VARCHAR2(30),
          opposition VARCHAR2(30),
          points_for NUMBER(3),
          tries NUMBER(2),
          points_ag NUMBER(3))
      -- Now for some test data...
      INSERT INTO team_results(
          team_name, opposition,
          points_for, tries, points_ag)
          'WALES', 'AUSTRALIA',
          12, 0 , 14)
      INSERT INTO team_results(
         team_name, opposition,
         points_for, tries, points_ag)
          'WALES', 'NEW ZEALAND',
      INSERT INTO team_results(
          team_name, opposition,
          points_for, tries, points_ag) 
          'WALES', 'FRANCE',
      -- Unlike Wales, South Africa have scored four or more tries in a game th‎is season...
      -- as well as having a draw...
      INSERT INTO team_results(
          team_name, opposition,
          points_for, tries, points_ag) 
      INSERT INTO team_results(
          team_name, opposition,
          points_for, tries, points_ag) 
      INSERT INTO team_results(
          team_name, opposition,
          points_for, tries, points_ag) 
          'SOUTH AFRICA', 'ENGLAND',

      So, if we look at the results:

      SELECT team_name, points_for, points_ag, tries 
      FROM team_results
      ORDER BY team_name
      ------------------------------ ---------- ---------- ----------
      SOUTH AFRICA			       19	  26	      2
      SOUTH AFRICA			       31	   9	      5
      SOUTH AFRICA			       14	  14	      1
      WALES				       12	  33	      1
      WALES				       12	  14	      0
      WALES				       26	  17	      1
      6 rows selected.

      … for Wales, we can see that they’ll get One point for the loss to Australia ( less than 7 point margin), nothing from their defeat to New Zealand, and four points from their win over France.
      This makes a total of 5 points, including 1 bonus points.

      As for South Africa, they’ll get five points from their win over Australia ( 4 plus a bonus point for tries scored),
      One for their defeat to Australia ( 7 points or less)
      And two points from their draw against England.
      This give them a total of 8 points (including 2 bonus points).

      To derive this from the raw data, we can use the CASE statement in our query.
      We’re also using an in-line view to tidy up the calculation of overall points

      SELECT team_name, pld, bp, bp+points as total_points
      FROM (
          SELECT team_name,
              COUNT(team_name) as pld,
                      WHEN points_for > points_ag THEN 4
                      WHEN points_for = points_ag THEN 2
                      ELSE 0
                 END) as points,
                      WHEN tries >= 4 THEN 1
                      ELSE 0
                 END) +
                      WHEN points_ag > points_for AND points_for + 7 >= points_ag THEN 1
                      ELSE 0
                  END) as bp
          FROM team_results
          GROUP BY team_name)

      Run this and we get :

      ------------------------------ ---------- ---------- ------------
      SOUTH AFRICA				3	   2		8
      WALES					3	   1		5

      I think the short answer to your question about bonus point calculation is to
      use the CASE statement, which as you can see, can handle compound conditions
      (lost, but by 7 points or less).

      There are a couple of posts which may be of interest ( although they’re based
      on the round-ball game, rather than the oval one).
      This post outlines some possibilities when designing the database.
      This one makes use of Oracle’s RANK function so that you can order your result
      set by multiple criteria easily.



      P.S. Sorry some of the formatting is a bit wonky. Not quite sure why this is but I hope you get the gist.

      • Mike,

        First, apologies for not responding sooner! It has been a busy week and I’m only now getting to my “weekend job” as volunteer webmaster & database manager for our local youth rugby football league.

        BTW – I’m using MS SQL 2012 managed through MS Visual Studio 2012.

        Many thanks for the suggested schema. At first reading, it looks like just what I need. I plan to take a crack this weekend at adapting your scripts to my existing schema. I’ll let you know how I get on.

        One observation from a NON-expert when it comes to relational databases. Local, volunteer run, sports leagues, whether rugby, soccer, or whatever sport you like, are ubiquitous around the world. I did a fair bit of research online looking for a SQL script that basically amounted to a cheat sheet on how to set up tables/views and queries for a rugby football Results table and a Standings table – such as the topic of this thread. I could not find anything that I could easily adapt for the Standings. I am, in fact, quite surprised that some sports mad academic in the IT department at some university somewhere has not published online a set of “sports league” scripts for use by volunteers & amateurs like me.

        Anyhow, many thanks for the help & advice!



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s