Loading selected fields from a delimited file into Oracle using an External Table

If you’ve dealt with ETL processes for any length of time, sooner or later, you’ll be faced with the need to load data from a delimited file into your database.
In the case of Oracle, External Tables are tailor-made for this purpose.
However, whilst they might make loading an entire file is very simple, exactly how do you persuade them to just load certain fields ?

What we’ll be looking at here is :

  • an external table that loads an entire file
  • an external table that just loads the first few fields of each record in the file
  • an external table that loads a selection of fields that are not contiguous in the file

Example file and Environment

All of these examples have been tested on Oracle 21cXE, although I’ve not seen any behaviour here that’s not consistent with Oracle versions back to 11.2.

The example file is simply a csv (called employees.csv) containing all the records from the EMPLOYEES table in the HR sample schema.

The external table which reads the entire file is called ALL_EMPS_XT :

create table all_emps_xt(
    employee_id number(6,0),
    first_name varchar2(20),
    last_name varchar2(25),
    email varchar2(25),
    phone_number varchar2(20),
    hire_date date,
    job_id varchar2(10),
    salary number(8,2),
    commission_pct number(2,2),
    manager_id number(6,0),
    department_id number(4,0))
    organization external (
        type oracle_loader
        default directory my_files
        access parameters (
            records delimited by newline
            logfile 'all_emps.log'
            badfile 'all_emps.bad'
            discardfile 'all_emps.dis'
            skip 1
            load when (1:3) != 'TRL'
            fields terminated by ',' optionally enclosed by '"'
            missing field values are null (
                employee_id integer external(6),
                first_name char(20),
                last_name char(25),
                email char(25),
                phone_number char(20),
                hire_date date "DD-MON-YYYY",
                job_id char(10),
                salary float external,
                commission_pct float external,
                manager_id integer external(6),
                department_id integer external(4))
        )
    location('employees.csv')
    )
    reject limit unlimited
/    

Sure enough, if we select from the external table, we can see that all of the fields have been populated as expected …

select *
from all_emps_xt
where job_id in ('AD_PRES', 'SA_MAN')
order by employee_id
/

… and that all 107 data records have been loaded :

select count(*) from all_emps_xt;

  COUNT(*)
----------
       107

Loading first few fields only

Let’s say that we weren’t interested in all of the fields in the file and we just wanted the first five – i.e. :

  • EMPLOYEE_ID
  • FIRST_NAME
  • LAST_NAME
  • EMAIL
  • PHONE_NUMBER

This is simple enough, we just specify definitions for those fields in the access parameters clause and our external table will ignore all of the fields after phone_number, simply by making sure that

missing field values are null

…is still specified in the access parameters clause of the External Table. This one is called START_EMPS_XT :

create table start_emps_xt(
    employee_id number(6,0),
    first_name varchar2(20),
    last_name varchar2(25),
    email varchar2(25),
    phone_number varchar2(20))
    organization external (
        type oracle_loader
        default directory my_files
        access parameters (
            records delimited by newline
            logfile 'start_emps.log'
            badfile 'start_emps.bad'
            discardfile 'start_emps.dis'
            skip 1
            load when (1:3) != 'TRL'
            fields terminated by ',' optionally enclosed by '"'
            missing field values are null (
                employee_id integer external(6),
                first_name char(20),
                last_name char(25),
                email char(25),
                phone_number char(20))
        )
    location('employees.csv')
    )
    reject limit unlimited
/    

select *
from start_emps_xt
where employee_id between 100 and 110
order by employee_id
/

Selecting non-contiguous fields to load

This is where it gets a bit more tricky…but not much, as it turns out. Say we want to load :

Field NoField Name
1EMPLOYEE_ID
2FIRST_NAME
3LAST_NAME
6HIRE_DATE
10MANAGER_ID
11DEPARTMENT_ID

We simply need to include placeholder values for in the access parameters clause for the columns we don’t want to load. In this case, I’ve used the actual field names but not specified a data type for them. That way, if I ever need to add one of these columns into the external table definition, I know where it appears in the file and the change is compartively simple. Anyhow, the table is called SELECTED_EMPS_XT :

create table selected_emps_xt(
    employee_id number(6,0),
    first_name varchar2(20),
    last_name varchar2(25),
    hire_date date,
    manager_id number(6,0),
    department_id number(4,0))
    organization external (
        type oracle_loader
        default directory my_files
        access parameters (
            records delimited by newline
            logfile 'selected_emps.log'
            badfile 'selected_emps.bad'
            discardfile 'selected_emps.dis'
            skip 1
            load when (1:3) != 'TRL'
            fields terminated by ',' optionally enclosed by '"'
            missing field values are null (
                employee_id integer external(6),
                first_name char(20),
                last_name char(25),
                email,
                phone_number,
                hire_date date "DD-MON-YYYY",
                job_id,
                salary,
                commission_pct,
                manager_id integer external(6),
                department_id integer external(4))
        )
    location('employees.csv')
    )
    reject limit unlimited
/    

Sure enough, if we query the table :

select *
from selected_emps_xt
where department_id = 60
order by hire_date
/

Author: mikesmithers

Back in 1993, I discovered that I could get paid money for doing fun stuff with computers. Over the years, I've specialised in Oracle Databases as a developer, a DBA and sometimes, an architect. It's my evil alter-ego - The Antikyte - who writes a blog about my various technical adventures. Yes, that is his Death Star parked in the Disabled Bay. I currently live in the South-West of England with Deb, my long-suffering wife.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.