This post is dedicated to Morten Braten and William Robertson.
Thanks to both of you for saving me a lot of time (and typing)…
XML, YAML, JSON. When it comes to transferring data between systems, they’ve all been flavour of the month at one time or another. However, good old CSV is still the format of choice when the batch window opens.
Not only is it simple, it adds comparatively little overhead to the size of the data file being generated.
OK – CSV used to mean “Comma-delimited” but these days it’s become synonymous with delimited data.
Whatever separator character you use, generating CSV is considerably easier these days…unless you’re using PL/SQL in a batch (as opposed to interactive) process.
That said, Connor MacDonald does have a clever way of utilising the functionality available in a command line tool such as SQL*Plus by means of a DBMS_SCHEDULER job.
If for some reason that won’t work for you, it looks like you’re going to have to hack out some custom PL/SQL to get the job done…or you could have a look at a couple of the options available in what other technologies would refer to as “the ecosystem”.
What I’m going to cover is :
Hand-cranking delimited code
We’re working in a Data Warehouse running on Oracle. We need to provide a data feed to a downstream system.
The feed is comma separated with values enclosed by quotes.
The query to generate the data for the feed is reasonably straight forward :
select dept.department_id, dept.department_name, loc.city, coun.country_id from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas'; -- this will be passed as a parameter in the run.
All ok so far. However, when it comes to tweaking the query to generate the output, things get a bit more fiddly.
The query in the package will have to change to something like this :
select '"'||dept.department_id||'",' ||'"'||dept.department_name||'",' ||'"'||loc.city||'",' ||'"'||coun.country_id||'"' from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas';
Typically, these sorts of files are generated with dozens (sometimes hundreds) of attributes. If looking at all of those quotes and commas is going to give you a headache, you might seek to mitigate the worst effects by replacing the characters with their ASCII values :
select chr(34)||dept.department_id||chr(34)||chr(44) ||chr(34)||dept.department_name||chr(34)||chr(44) ||chr(34)||loc.city||chr(34)||chr(44) ||chr(34)||coun.country_id||chr(34) from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas';
The double-quotes surrounding the data values are designed to escape any commas in the data and prevent them from being read as field terminators. Unfortunately, this doesn’t help much if the data also contains double-quotes. For this reason, I tend to prefer a pipe character as a delimiter. This allows us to dispense with the double-quotes. As a double-pipe is the SQL concatenation operator, let’s use the ASCII value instead for the delimiter itself :
select dept.department_id||chr(124) ||dept.department_name||chr(124) ||loc.city||chr(124) ||coun.country_id from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas';
Wouldn’t it be nice if there was some way we could just persuade Oracle to magically write our delimiters for us.
Well, I don’t know about Oracle, but Morten Braten will have a go for you…
The Alexandria CSV_UTIL_PKG package
As with most Alexandria packages, there’s a demo on how to use CSV_UTIL_PKG in demos/csv_util_pkg_demo.sql.
However, as explained in Morten’s post on this subject, you need to create some types first …
create type t_str_array as table of varchar2(4000); / create type t_csv_line as object ( line_number number, line_raw varchar2(4000), c001 varchar2(4000), c002 varchar2(4000), c003 varchar2(4000), c004 varchar2(4000), c005 varchar2(4000), c006 varchar2(4000), c007 varchar2(4000), c008 varchar2(4000), c009 varchar2(4000), c010 varchar2(4000), c011 varchar2(4000), c012 varchar2(4000), c013 varchar2(4000), c014 varchar2(4000), c015 varchar2(4000), c016 varchar2(4000), c017 varchar2(4000), c018 varchar2(4000), c019 varchar2(4000), c020 varchar2(4000) ); / create type t_csv_tab as table of t_csv_line; /
With these types in place, we can install the package from the Alexandria GitHub repo.
The files we’re looking for are under the ora directory:
- csv_util_pkg.pks
- csv_util_pkg.pkb
Download them and run them in the order they are listed here ( the .pks is the header and the .pkb is the body).
Now, we can take some of the commas out of our code…
select csv_util_pkg.array_to_csv( t_str_array( dept.department_id, dept.department_name, loc.city, coun.country_id)) from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas';
By default, the results are output using a comma as the separator. However, we can change this easily enough:
select csv_util_pkg.array_to_csv( t_str_array(dept.department_id, dept.department_name, loc.city, coun.country_id), chr(124)) from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas';
The output is pretty much what we’re looking for :
60|IT|Southlake|US 50|Shipping|South San Francisco|US 10|Administration|Seattle|US 30|Purchasing|Seattle|US 90|Executive|Seattle|US 100|Finance|Seattle|US 110|Accounting|Seattle|US 120|Treasury|Seattle|US 130|Corporate Tax|Seattle|US 140|Control And Credit|Seattle|US 150|Shareholder Services|Seattle|US 160|Benefits|Seattle|US 170|Manufacturing|Seattle|US 180|Construction|Seattle|US 190|Contracting|Seattle|US 200|Operations|Seattle|US 210|IT Support|Seattle|US 220|NOC|Seattle|US 230|IT Helpdesk|Seattle|US 240|Government Sales|Seattle|US 250|Retail Sales|Seattle|US 260|Recruiting|Seattle|US 270|Payroll|Seattle|US 20|Marketing|Toronto|CA
Now all you need to do is put this into your PL/SQL procedure and write some code to save it to a file ( possibly using the Alexandria FILE_UTIL_PKG package).
If that seems like too much effort though…
William Robertson’s CSV package
Whilst the Alexandria package is a bit of an all-rounder, William Robertson has produced something that’s rather more tailored to producing CSV files.
The package – simply called CSV – has a function that produces delimited format from a refcursor. It also contains a procedure to write csv data to a file.
Let’s take a closer look…
Installation of the package simply involves downloading the code from here, and running it. Both the package header and body are in a single file – csv.pkg.
The package’s REPORT function takes a slightly different approach in that it takes a ref cursor as an argument :
select column_value from table(csv.report(cursor( select dept.department_id, dept.department_name, loc.city, coun.country_id from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas'))) /
Once again, the delimiter character is comma by default. Once again, we can override this :
select column_value from table(csv.report(cursor( select dept.department_id, dept.department_name, loc.city, coun.country_id from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas'), chr(124))) /
We can go even further. You can add common things that you may need to include in a csv that’s serving as a feed file for another system. These include :
- a header record with a delimited list of attributes
- a label attribute for each row of data to make these rows easier to identify for the program loading the csv
- a rowcount as a trailer record
All of which can be accomplished thus :
select column_value from table(csv.report(cursor( select dept.department_id, dept.department_name, loc.city, coun.country_id from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = 'Americas'), p_separator => chr(124), p_label => 'DATA', p_heading => 'Y', p_rowcount => 'Y')) / [show output]
Not only that, but the package also facilitates creating the file itself…
Generating a file from the CSV package
First of all, we need access to a directory. So, the DBA needs to do something like :
grant read, write on directory app_files to hr;
Now, connected as HR, we can create a PL/SQL procedure to generate our csv files like this :
create or replace procedure depts_by_region( i_region regions.region_name%type) as l_fname varchar2(100); rc sys_refcursor; begin l_fname := i_region||'_depts.csv'; open rc for select dept.department_id, dept.department_name, loc.city, coun.country_id from departments dept inner join locations loc on dept.location_id = loc.location_id inner join countries coun on loc.country_id = coun.country_id inner join regions reg on reg.region_id = coun.region_id where reg.region_name = i_region; csv.write_file( p_dataset => rc, p_separator => chr(124), p_label => 'DATA', p_heading => 'Y', p_rowcount => 'Y', p_directory => 'APP_FILES', p_filename => l_fname); end; /
…and run it like this :
begin depts_by_region('Americas'); end; /
Sure enough, when we look on the server, we can see :
[mike@frea ~]$ cd /opt/oracle/app_files/ [mike@frea app_files]$ ls -l *.csv -rw-r--r-- 1 oracle oinstall 840 Feb 11 20:56 Americas_depts.csv
…and the file looks like this :
HEADING|DATA|DEPARTMENT_ID|DEPARTMENT_NAME|CITY|COUNTRY_ID DATA|60|IT|Southlake|US DATA|50|Shipping|South San Francisco|US DATA|10|Administration|Seattle|US DATA|30|Purchasing|Seattle|US DATA|90|Executive|Seattle|US DATA|100|Finance|Seattle|US DATA|110|Accounting|Seattle|US DATA|120|Treasury|Seattle|US DATA|130|Corporate Tax|Seattle|US DATA|140|Control And Credit|Seattle|US DATA|150|Shareholder Services|Seattle|US DATA|160|Benefits|Seattle|US DATA|170|Manufacturing|Seattle|US DATA|180|Construction|Seattle|US DATA|190|Contracting|Seattle|US DATA|200|Operations|Seattle|US DATA|210|IT Support|Seattle|US DATA|220|NOC|Seattle|US DATA|230|IT Helpdesk|Seattle|US DATA|240|Government Sales|Seattle|US DATA|250|Retail Sales|Seattle|US DATA|260|Recruiting|Seattle|US DATA|270|Payroll|Seattle|US DATA|20|Marketing|Toronto|CA ROW_COUNT|DATA|24
There are some limitations. For example, William points out that any row that’s longer than 4000 characters will cause the package to break. However, unless you’re generating fairly “wide” lines, he’s pretty much written your code for you.