Generating CSV files from PL/SQL the Very Easy Way.

This post is dedicated to Morten Braten and William Robertson.
Thanks to both of you for saving me a lot of time (and typing)…

XML, YAML, JSON. When it comes to transferring data between systems, they’ve all been flavour of the month at one time or another. However, good old CSV is still the format of choice when the batch window opens.
Not only is it simple, it adds comparatively little overhead to the size of the data file being generated.

OK – CSV used to mean “Comma-delimited” but these days it’s become synonymous with delimited data.
Whatever separator character you use, generating CSV is considerably easier these days…unless you’re using PL/SQL in a batch (as opposed to interactive) process.
That said, Connor MacDonald does have a clever way of utilising the functionality available in a command line tool such as SQL*Plus by means of a DBMS_SCHEDULER job.

If for some reason that won’t work for you, it looks like you’re going to have to hack out some custom PL/SQL to get the job done…or you could have a look at a couple of the options available in what other technologies would refer to as “the ecosystem”.
What I’m going to cover is :

  • The pain of hand-cranking delimited SQL
  • A solution offered in Morten Braten’s Alexandria PL/SQL Utility Library
  • An alternative solution made available by William Robertson
  • The final PL/SQL procedure
  • Hand-cranking delimited code

    We’re working in a Data Warehouse running on Oracle. We need to provide a data feed to a downstream system.
    The feed is comma separated with values enclosed by quotes.
    The query to generate the data for the feed is reasonably straight forward :

    select dept.department_id, dept.department_name,
        loc.city, coun.country_id
    from departments dept
    inner join locations loc
        on dept.location_id = loc.location_id
    inner join countries coun
        on loc.country_id = coun.country_id  
    inner join regions reg
        on reg.region_id = coun.region_id
    where reg.region_name = 'Americas';  -- this will be passed as a parameter in the run.  
    

    All ok so far. However, when it comes to tweaking the query to generate the output, things get a bit more fiddly.
    The query in the package will have to change to something like this :

    select '"'||dept.department_id||'",'
        ||'"'||dept.department_name||'",'
        ||'"'||loc.city||'",'
        ||'"'||coun.country_id||'"'
    from departments dept
    inner join locations loc
        on dept.location_id = loc.location_id
    inner join countries coun
        on loc.country_id = coun.country_id  
    inner join regions reg
        on reg.region_id = coun.region_id
    where reg.region_name = 'Americas';    
    

    Typically, these sorts of files are generated with dozens (sometimes hundreds) of attributes. If looking at all of those quotes and commas is going to give you a headache, you might seek to mitigate the worst effects by replacing the characters with their ASCII values :

    select chr(34)||dept.department_id||chr(34)||chr(44)
        ||chr(34)||dept.department_name||chr(34)||chr(44)
        ||chr(34)||loc.city||chr(34)||chr(44)
        ||chr(34)||coun.country_id||chr(34)
    from departments dept
    inner join locations loc
        on dept.location_id = loc.location_id
    inner join countries coun
        on loc.country_id = coun.country_id  
    inner join regions reg
        on reg.region_id = coun.region_id
    where reg.region_name = 'Americas';    
    

    The double-quotes surrounding the data values are designed to escape any commas in the data and prevent them from being read as field terminators. Unfortunately, this doesn’t help much if the data also contains double-quotes. For this reason, I tend to prefer a pipe character as a delimiter. This allows us to dispense with the double-quotes. As a double-pipe is the SQL concatenation operator, let’s use the ASCII value instead for the delimiter itself :

    select dept.department_id||chr(124)
        ||dept.department_name||chr(124)
        ||loc.city||chr(124)
        ||coun.country_id
    from departments dept
    inner join locations loc
        on dept.location_id = loc.location_id
    inner join countries coun
        on loc.country_id = coun.country_id  
    inner join regions reg
        on reg.region_id = coun.region_id
    where reg.region_name = 'Americas';      
    

    Wouldn’t it be nice if there was some way we could just persuade Oracle to magically write our delimiters for us.
    Well, I don’t know about Oracle, but Morten Braten will have a go for you…

    The Alexandria CSV_UTIL_PKG package

    As with most Alexandria packages, there’s a demo on how to use CSV_UTIL_PKG in demos/csv_util_pkg_demo.sql.
    However, as explained in Morten’s post on this subject, you need to create some types first …

    create type t_str_array as table of varchar2(4000);
    /
     
    create type t_csv_line as object (
      line_number  number,
      line_raw     varchar2(4000),
      c001         varchar2(4000),
      c002         varchar2(4000),
      c003         varchar2(4000),
      c004         varchar2(4000),
      c005         varchar2(4000),
      c006         varchar2(4000),
      c007         varchar2(4000),
      c008         varchar2(4000),
      c009         varchar2(4000),
      c010         varchar2(4000),
      c011         varchar2(4000),
      c012         varchar2(4000),
      c013         varchar2(4000),
      c014         varchar2(4000),
      c015         varchar2(4000),
      c016         varchar2(4000),
      c017         varchar2(4000),
      c018         varchar2(4000),
      c019         varchar2(4000),
      c020         varchar2(4000)
    );
    /
     
    create type t_csv_tab as table of t_csv_line;
    /
    

    With these types in place, we can install the package from the Alexandria GitHub repo.
    The files we’re looking for are under the ora directory:

    • csv_util_pkg.pks
    • csv_util_pkg.pkb

    Download them and run them in the order they are listed here ( the .pks is the header and the .pkb is the body).

    Now, we can take some of the commas out of our code…

    select csv_util_pkg.array_to_csv(
        t_str_array(
            dept.department_id, dept.department_name,
            loc.city, coun.country_id))
    from departments dept
    inner join locations loc
        on dept.location_id = loc.location_id
    inner join countries coun
        on loc.country_id = coun.country_id  
    inner join regions reg
        on reg.region_id = coun.region_id
    where reg.region_name = 'Americas';
    

    By default, the results are output using a comma as the separator. However, we can change this easily enough:

     
    select csv_util_pkg.array_to_csv(
        t_str_array(dept.department_id, dept.department_name, loc.city, coun.country_id),
        chr(124))
    from departments dept
    inner join locations loc
        on dept.location_id = loc.location_id
    inner join countries coun
        on loc.country_id = coun.country_id  
    inner join regions reg
        on reg.region_id = coun.region_id
    where reg.region_name = 'Americas';
    

    The output is pretty much what we’re looking for :

    60|IT|Southlake|US
    50|Shipping|South San Francisco|US
    10|Administration|Seattle|US
    30|Purchasing|Seattle|US
    90|Executive|Seattle|US
    100|Finance|Seattle|US
    110|Accounting|Seattle|US
    120|Treasury|Seattle|US
    130|Corporate Tax|Seattle|US
    140|Control And Credit|Seattle|US
    150|Shareholder Services|Seattle|US
    160|Benefits|Seattle|US
    170|Manufacturing|Seattle|US
    180|Construction|Seattle|US
    190|Contracting|Seattle|US
    200|Operations|Seattle|US
    210|IT Support|Seattle|US
    220|NOC|Seattle|US
    230|IT Helpdesk|Seattle|US
    240|Government Sales|Seattle|US
    250|Retail Sales|Seattle|US
    260|Recruiting|Seattle|US
    270|Payroll|Seattle|US
    20|Marketing|Toronto|CA
    

    Now all you need to do is put this into your PL/SQL procedure and write some code to save it to a file ( possibly using the Alexandria FILE_UTIL_PKG package).
    If that seems like too much effort though…

    William Robertson’s CSV package

    Whilst the Alexandria package is a bit of an all-rounder, William Robertson has produced something that’s rather more tailored to producing CSV files.

    The package – simply called CSV – has a function that produces delimited format from a refcursor. It also contains a procedure to write csv data to a file.
    Let’s take a closer look…

    Installation of the package simply involves downloading the code from here, and running it. Both the package header and body are in a single file – csv.pkg.

    The package’s REPORT function takes a slightly different approach in that it takes a ref cursor as an argument :

    select column_value
    from table(csv.report(cursor(
        select dept.department_id, dept.department_name,
            loc.city, coun.country_id
        from departments dept
        inner join locations loc
            on dept.location_id = loc.location_id
        inner join countries coun
            on loc.country_id = coun.country_id  
        inner join regions reg
            on reg.region_id = coun.region_id
        where reg.region_name = 'Americas')))
    /    
    

    Once again, the delimiter character is comma by default. Once again, we can override this :

    select column_value
    from table(csv.report(cursor(
        select dept.department_id, dept.department_name,
            loc.city, coun.country_id
        from departments dept
        inner join locations loc
            on dept.location_id = loc.location_id
        inner join countries coun
            on loc.country_id = coun.country_id  
        inner join regions reg
            on reg.region_id = coun.region_id
        where reg.region_name = 'Americas'), 
        chr(124)))
    /    
    

    We can go even further. You can add common things that you may need to include in a csv that’s serving as a feed file for another system. These include :

    • a header record with a delimited list of attributes
    • a label attribute for each row of data to make these rows easier to identify for the program loading the csv
    • a rowcount as a trailer record

    All of which can be accomplished thus :

    select column_value
    from table(csv.report(cursor(
        select dept.department_id, dept.department_name,
            loc.city, coun.country_id
        from departments dept
        inner join locations loc
            on dept.location_id = loc.location_id
        inner join countries coun
            on loc.country_id = coun.country_id  
        inner join regions reg
            on reg.region_id = coun.region_id
        where reg.region_name = 'Americas'), 
        p_separator => chr(124),
        p_label => 'DATA',
        p_heading => 'Y',
        p_rowcount => 'Y'))
    /    
    
    [show output]
    

    Not only that, but the package also facilitates creating the file itself…

    Generating a file from the CSV package

    First of all, we need access to a directory. So, the DBA needs to do something like :

    grant read, write on directory app_files to hr;
    

    Now, connected as HR, we can create a PL/SQL procedure to generate our csv files like this :

    create or replace procedure depts_by_region( i_region regions.region_name%type)
    as
        l_fname varchar2(100);
        rc sys_refcursor;
    begin
        
        l_fname := i_region||'_depts.csv';
        
        open rc for
        select dept.department_id, dept.department_name,
            loc.city, coun.country_id
        from departments dept
        inner join locations loc
            on dept.location_id = loc.location_id
        inner join countries coun
            on loc.country_id = coun.country_id  
        inner join regions reg
            on reg.region_id = coun.region_id
        where reg.region_name = i_region; 
        
        csv.write_file(
            p_dataset => rc,
            p_separator => chr(124),
            p_label => 'DATA',
            p_heading => 'Y',
            p_rowcount => 'Y',
            p_directory => 'APP_FILES',
            p_filename => l_fname);
    end;
    /
    

    …and run it like this :

    begin
        depts_by_region('Americas');
    end;
    /
    

    Sure enough, when we look on the server, we can see :

    [mike@frea ~]$ cd /opt/oracle/app_files/
    [mike@frea app_files]$ ls -l *.csv
    -rw-r--r-- 1 oracle oinstall 840 Feb 11 20:56 Americas_depts.csv
    

    …and the file looks like this :

    HEADING|DATA|DEPARTMENT_ID|DEPARTMENT_NAME|CITY|COUNTRY_ID
    DATA|60|IT|Southlake|US
    DATA|50|Shipping|South San Francisco|US
    DATA|10|Administration|Seattle|US
    DATA|30|Purchasing|Seattle|US
    DATA|90|Executive|Seattle|US
    DATA|100|Finance|Seattle|US
    DATA|110|Accounting|Seattle|US
    DATA|120|Treasury|Seattle|US
    DATA|130|Corporate Tax|Seattle|US
    DATA|140|Control And Credit|Seattle|US
    DATA|150|Shareholder Services|Seattle|US
    DATA|160|Benefits|Seattle|US
    DATA|170|Manufacturing|Seattle|US
    DATA|180|Construction|Seattle|US
    DATA|190|Contracting|Seattle|US
    DATA|200|Operations|Seattle|US
    DATA|210|IT Support|Seattle|US
    DATA|220|NOC|Seattle|US
    DATA|230|IT Helpdesk|Seattle|US
    DATA|240|Government Sales|Seattle|US
    DATA|250|Retail Sales|Seattle|US
    DATA|260|Recruiting|Seattle|US
    DATA|270|Payroll|Seattle|US
    DATA|20|Marketing|Toronto|CA
    ROW_COUNT|DATA|24
    

    There are some limitations. For example, William points out that any row that’s longer than 4000 characters will cause the package to break. However, unless you’re generating fairly “wide” lines, he’s pretty much written your code for you.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out /  Change )

    Google photo

    You are commenting using your Google account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s

    This site uses Akismet to reduce spam. Learn how your comment data is processed.