To Boldly gzip from PL/SQL

Lempel-Ziv. Sounds like something you might find on a starship. I mean, you can just imagine Scotty telling Kirk that they can’t go to warp because the Lempel-Ziv compressors are offline.
In fact, the Lempel-Ziv algorithm is what GNU’s Gzip program uses to compress files.
Fortunately, the UTL_COMPRESS package employs the same algorithm, which means that you can create archives using one tool that can be manipulated by the other…

Starfleet HR have generated a delimited file of their Captains from their Oracle database for loading into their data mart, which runs on a different server ( and doesn’t use Oracle).
This file needs to be compressed to make the transfer as efficient as possible.
At present, the file looks like this on the database server :

ls -l starfleet_captains.csv
-rw-r--r-- 1 oracle oinstall 343 Apr 26 15:49 starfleet_captains.csv

Compressing the file

Using UTL_COMPRESS, we can easily create a .gz archive from the file…

declare
    l_dir all_directories.directory_name%type := 'APP_FILES';
    fname varchar2(100) := 'starfleet_captains.csv';

    src_file bfile;
    lz_compressed blob;
    lz_compressed_len integer;
    fh utl_file.file_type;
    
    v_buffer raw(32767);
    buf_size pls_integer := 32767;
    v_pos integer := 1;
begin

    src_file := bfilename( l_dir, fname);
    dbms_lob.fileopen(src_file, dbms_lob.file_readonly);
    
    lz_compressed := utl_compress.lz_compress( src_file, 9); 
    lz_compressed_len := dbms_lob.getlength( lz_compressed);
    
    fh := utl_file.fopen( l_dir, fname||'.gz', 'wb');
    
    while v_pos < lz_compressed_len loop
        dbms_lob.read( lz_compressed, buf_size, v_pos, v_buffer);
        utl_file.put_raw( fh, v_buffer, true);
        v_pos := v_pos + buf_size;
    end loop;
    utl_file.fclose(fh);
    -- Now need to delete original
    utl_file.fremove(l_dir, fname);
end;
/

The LZ_COMPRESS function accepts two arguments. The first is the file we’re compressing.
The second is the “quality” – a number between 1 and 9 with 1 being the fastest execution, but least compression, and 9 being the greatest compression and the slowest execution.
We’ll come back to that shortly.
In the meantime, we can see that our delimited file has now been replaced with a compressed file…

ls -l starfleet_captains*
-rw-r--r-- 1 oracle oinstall 242 Apr 26 15:58 starfleet_captains.csv.gz

…which can now be manipulated using gzip…

gzip -l starfleet_captains.csv.gz 
         compressed        uncompressed  ratio uncompressed_name
                265                 343  34.7% starfleet_captains.csv

…and indeed, gunzip…

gunzip starfleet_captains.csv.gz

cat starfleet_captains.csv

CAPTAIN|SHIP
Archer|Enterprise NX-01
Hernandez|Columbia NX-02
Decker|USS Enterprise
Georgiou|USS Shenzou
Kirk|USS Enterprise
Lorca|USS Discovery
Pike|USS Enterprise
Styles|USS Excelsior
Terrell|USS Reliant
Tracey|USS Exeter
Adams|USS Heart of Gold
Janeway|USS Voyager
Picard|USS Stargazer
Sisko|USS Defiant
Thomas|USS Puget Sound
ROW_COUNT|15

Incidentally, some of this data does raise questions.
For example, what’s so special about Puget Sound that it gets a Starship as well as an Oracle User Group ?
Also, was Janeway overlooked for the command of the Heart of Gold because she’s not a tea drinker ?
Full disclosure : that wasn’t my geeky reference to Hitchicker’s Guide to the Galaxy. I got these ships and captains from a Star Trek Wiki.

Coming back to more immediate questions, how does UTL_COMPRESS perform with different compression settings for larger files…

Performance test

Let’s generate a more substantial file (using William Robertson’s CSV package)…

declare
    rc sys_refcursor;
begin
    open rc for
        select rownum as id,
            case mod( rownum, 8) 
                when 1 then 'Archer'
                when 2 then 'Pyke'
                when 3 then 'Georgiou'
                when 4 then 'Lorca'
                when 5 then 'Kirk'
                when 6 then 'Picard'
                when 7 then 'Sisko'
                when 0 then 'Janeway'
            end as captain,
            case mod(rownum, 8)
                when 1 then 'Enterprise NX-01'
                when 2 then 'USS Enterprise'
                when 3 then 'USS Shenzou'
                when 4 then 'USS Discovery'
                when 5 then 'USS Enterprise'
                when 6 then 'USS Stargazer'
                when 7 then 'USS Defiant'
                when 0 then 'USS Voyager'
            end as ship
        from dual
        connect by rownum < 1000000;
    
    csv.write_file( 
        p_dataset => rc,
        p_separator => chr(124),
        p_heading => 'Y',
        p_rowcount => 'Y',
        p_directory => 'APP_FILES',
        p_filename => 'famous_federation_captains.csv');
end;
/

If we look at the file on disk, we can see that it’s hefty enough for our purposes :

ls -lh famous_federation_captains.csv
-rw-r--r-- 1 oracle oinstall 27M Apr 27 17:31 famous_federation_captains.csv

To start with, let’s try compressing this using the default value for quality (6) :

set timing on
set feedback on
declare
    l_dir all_directories.directory_name%type := 'APP_FILES';
    fname varchar2(100) := 'famous_federation_captains.csv';

    src_file bfile;
    lz_compressed blob;
    lz_compressed_len integer;
    fh utl_file.file_type;
    
    v_buffer raw(32767);
    buf_size pls_integer := 32767;
    v_pos integer := 1;
begin

    src_file := bfilename( l_dir, fname);
    dbms_lob.fileopen(src_file, dbms_lob.file_readonly);
    
    lz_compressed := utl_compress.lz_compress( src_file); 
    lz_compressed_len := dbms_lob.getlength( lz_compressed);
    
    fh := utl_file.fopen( l_dir, fname||'.gz', 'wb');
    
    while v_pos < lz_compressed_len loop
        dbms_lob.read( lz_compressed, buf_size, v_pos, v_buffer);
        utl_file.put_raw( fh, v_buffer, true);
        v_pos := v_pos + buf_size;
    end loop;
    utl_file.fclose(fh);
    -- Now need to delete original
    utl_file.fremove(l_dir, fname);
end;
/

The output is :

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.329

If we examing the compressed file we can see that the compression ratio is around 90% :

ls -lh famous_federation_captains.csv.gz 
-rw-r--r-- 1 oracle oinstall 2.7M Apr 27 17:37 famous_federation_captains.csv.gz
gzip -l famous_federation_captains.csv.gz 
         compressed        uncompressed  ratio uncompressed_name
            2779405            27388901  89.9% famous_federation_captains.csv

Trying this with the quality value set to the minimum (1) :

...
lz_compressed := utl_compress.lz_compress( src_file, 1); 
...

…results in a faster runtime…

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.202

…but a marginally bigger file…

 ls -lh famous_federation_captains.csv.gz 
-rw-r--r-- 1 oracle oinstall 2.9M Apr 27 17:43 famous_federation_captains.csv.gz
gzip -l famous_federation_captains.csv.gz 
         compressed        uncompressed  ratio uncompressed_name
            2994045            27388901  89.1% famous_federation_captains.csv

Finally, let’s go for the maximum quality setting (9) :

...
lz_compressed := utl_compress.lz_compress( src_file, 9); 
...

The runtime is a little longer…

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.393

…and the resulting file is more or less the same as for the default :

ls -lh famous_federation_captains.csv.gz 
-rw-r--r-- 1 oracle oinstall 2.7M Apr 27 17:47 famous_federation_captains.csv.gz
gzip -l famous_federation_captains.csv.gz 
         compressed        uncompressed  ratio uncompressed_name
            2762750            27388901  89.9% famous_federation_captains.csv

Obviously, this is a limited test. I’m running Oracle 18cXE on CentOS. However, it may become a consideration for you if you’re either tight on space or need to speed up your compression job.

Uncompressing GZIPPED files

As we’ve seen, UTL_COMPRESS can produce files that are compatible with GZIP. But can it read files that have been compressed using GZIP ?

Let’s take a look :

 
gzip captains_log.txt
ls -l captains_log.txt.gz 
-rw-rw-rw- 1 oracle oinstall 561 Apr 27 18:00 captains_log.txt.gz

Want to read the Captain’s log ?

set serverout on size unlimited
clear screen
declare
    l_dir all_directories.directory_name%type := 'APP_FILES';
    fname varchar2(100) := 'captains_log.txt.gz';
    zipped_file bfile;
    v_unzipped blob := empty_blob();
    v_text clob;
    
    v_dest_offset integer := 1;
    v_src_offset integer := 1;
    v_lang integer := dbms_lob.default_lang_ctx;
    v_warn integer;
begin
    zipped_file := bfilename( l_dir, fname);
    dbms_lob.fileopen(zipped_file, dbms_lob.file_readonly);

    v_unzipped := utl_compress.lz_uncompress( zipped_file);
    dbms_lob.fileclose(zipped_file);
    -- Now we have the uncompressed contents of the file, let's print it out...
    
    dbms_lob.createtemporary( v_text, true);
    
    dbms_lob.converttoclob(
        dest_lob => v_text,
        src_blob => v_unzipped,
        amount => dbms_lob.lobmaxsize,
        dest_offset => v_dest_offset,
        src_offset => v_src_offset,
        blob_csid => dbms_lob.default_csid,
        lang_context => v_lang,
        warning => v_warn);
   
    dbms_output.put_line(v_text);     

    dbms_lob.freetemporary( v_text);
end;
/

Run this and Kirk’s inner-most thoughts (or possibly the lyrics to Star Trekking by The Firm) are there for all to see…

Acknowledgements

Despite appearances, I do actually do some research before kicking out this stuff.
On this occasion, the following links were of some assistance :

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.