If you’ve dealt with ETL processes for any length of time, sooner or later, you’ll be faced with the need to load data from a delimited file into your database.
In the case of Oracle, External Tables are tailor-made for this purpose.
However, whilst they might make loading an entire file is very simple, exactly how do you persuade them to just load certain fields ?
What we’ll be looking at here is :
- an external table that loads an entire file
- an external table that just loads the first few fields of each record in the file
- an external table that loads a selection of fields that are not contiguous in the file
Example file and Environment
All of these examples have been tested on Oracle 21cXE, although I’ve not seen any behaviour here that’s not consistent with Oracle versions back to 11.2.
The example file is simply a csv (called employees.csv) containing all the records from the EMPLOYEES table in the HR sample schema.
The external table which reads the entire file is called ALL_EMPS_XT :
create table all_emps_xt(
employee_id number(6,0),
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number(8,2),
commission_pct number(2,2),
manager_id number(6,0),
department_id number(4,0))
organization external (
type oracle_loader
default directory my_files
access parameters (
records delimited by newline
logfile 'all_emps.log'
badfile 'all_emps.bad'
discardfile 'all_emps.dis'
skip 1
load when (1:3) != 'TRL'
fields terminated by ',' optionally enclosed by '"'
missing field values are null (
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20),
hire_date date "DD-MON-YYYY",
job_id char(10),
salary float external,
commission_pct float external,
manager_id integer external(6),
department_id integer external(4))
)
location('employees.csv')
)
reject limit unlimited
/
Sure enough, if we select from the external table, we can see that all of the fields have been populated as expected …
select *
from all_emps_xt
where job_id in ('AD_PRES', 'SA_MAN')
order by employee_id
/
… and that all 107 data records have been loaded :
select count(*) from all_emps_xt; COUNT(*) ---------- 107
Loading first few fields only
Let’s say that we weren’t interested in all of the fields in the file and we just wanted the first five – i.e. :
- EMPLOYEE_ID
- FIRST_NAME
- LAST_NAME
- PHONE_NUMBER
This is simple enough, we just specify definitions for those fields in the access parameters clause and our external table will ignore all of the fields after phone_number, simply by making sure that
missing field values are null
…is still specified in the access parameters clause of the External Table. This one is called START_EMPS_XT :
create table start_emps_xt(
employee_id number(6,0),
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20))
organization external (
type oracle_loader
default directory my_files
access parameters (
records delimited by newline
logfile 'start_emps.log'
badfile 'start_emps.bad'
discardfile 'start_emps.dis'
skip 1
load when (1:3) != 'TRL'
fields terminated by ',' optionally enclosed by '"'
missing field values are null (
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20))
)
location('employees.csv')
)
reject limit unlimited
/
select *
from start_emps_xt
where employee_id between 100 and 110
order by employee_id
/
Selecting non-contiguous fields to load
This is where it gets a bit more tricky…but not much, as it turns out. Say we want to load :
Field No | Field Name |
---|---|
1 | EMPLOYEE_ID |
2 | FIRST_NAME |
3 | LAST_NAME |
6 | HIRE_DATE |
10 | MANAGER_ID |
11 | DEPARTMENT_ID |
We simply need to include placeholder values for in the access parameters clause for the columns we don’t want to load. In this case, I’ve used the actual field names but not specified a data type for them. That way, if I ever need to add one of these columns into the external table definition, I know where it appears in the file and the change is compartively simple. Anyhow, the table is called SELECTED_EMPS_XT :
create table selected_emps_xt(
employee_id number(6,0),
first_name varchar2(20),
last_name varchar2(25),
hire_date date,
manager_id number(6,0),
department_id number(4,0))
organization external (
type oracle_loader
default directory my_files
access parameters (
records delimited by newline
logfile 'selected_emps.log'
badfile 'selected_emps.bad'
discardfile 'selected_emps.dis'
skip 1
load when (1:3) != 'TRL'
fields terminated by ',' optionally enclosed by '"'
missing field values are null (
employee_id integer external(6),
first_name char(20),
last_name char(25),
email,
phone_number,
hire_date date "DD-MON-YYYY",
job_id,
salary,
commission_pct,
manager_id integer external(6),
department_id integer external(4))
)
location('employees.csv')
)
reject limit unlimited
/
Sure enough, if we query the table :
select *
from selected_emps_xt
where department_id = 60
order by hire_date
/