Lempel-Ziv. Sounds like something you might find on a starship. I mean, you can just imagine Scotty telling Kirk that they can’t go to warp because the Lempel-Ziv compressors are offline.
In fact, the Lempel-Ziv algorithm is what GNU’s Gzip program uses to compress files.
Fortunately, the UTL_COMPRESS package employs the same algorithm, which means that you can create archives using one tool that can be manipulated by the other…
Starfleet HR have generated a delimited file of their Captains from their Oracle database for loading into their data mart, which runs on a different server ( and doesn’t use Oracle).
This file needs to be compressed to make the transfer as efficient as possible.
At present, the file looks like this on the database server :
ls -l starfleet_captains.csv -rw-r--r-- 1 oracle oinstall 343 Apr 26 15:49 starfleet_captains.csv
Compressing the file
Using UTL_COMPRESS, we can easily create a .gz archive from the file…
declare l_dir all_directories.directory_name%type := 'APP_FILES'; fname varchar2(100) := 'starfleet_captains.csv'; src_file bfile; lz_compressed blob; lz_compressed_len integer; fh utl_file.file_type; v_buffer raw(32767); buf_size pls_integer := 32767; v_pos integer := 1; begin src_file := bfilename( l_dir, fname); dbms_lob.fileopen(src_file, dbms_lob.file_readonly); lz_compressed := utl_compress.lz_compress( src_file, 9); lz_compressed_len := dbms_lob.getlength( lz_compressed); fh := utl_file.fopen( l_dir, fname||'.gz', 'wb'); while v_pos < lz_compressed_len loop dbms_lob.read( lz_compressed, buf_size, v_pos, v_buffer); utl_file.put_raw( fh, v_buffer, true); v_pos := v_pos + buf_size; end loop; utl_file.fclose(fh); -- Now need to delete original utl_file.fremove(l_dir, fname); end; /
The LZ_COMPRESS function accepts two arguments. The first is the file we’re compressing.
The second is the “quality” – a number between 1 and 9 with 1 being the fastest execution, but least compression, and 9 being the greatest compression and the slowest execution.
We’ll come back to that shortly.
In the meantime, we can see that our delimited file has now been replaced with a compressed file…
ls -l starfleet_captains* -rw-r--r-- 1 oracle oinstall 242 Apr 26 15:58 starfleet_captains.csv.gz
…which can now be manipulated using gzip…
gzip -l starfleet_captains.csv.gz compressed uncompressed ratio uncompressed_name 265 343 34.7% starfleet_captains.csv
…and indeed, gunzip…
gunzip starfleet_captains.csv.gz cat starfleet_captains.csv CAPTAIN|SHIP Archer|Enterprise NX-01 Hernandez|Columbia NX-02 Decker|USS Enterprise Georgiou|USS Shenzou Kirk|USS Enterprise Lorca|USS Discovery Pike|USS Enterprise Styles|USS Excelsior Terrell|USS Reliant Tracey|USS Exeter Adams|USS Heart of Gold Janeway|USS Voyager Picard|USS Stargazer Sisko|USS Defiant Thomas|USS Puget Sound ROW_COUNT|15
Incidentally, some of this data does raise questions.
For example, what’s so special about Puget Sound that it gets a Starship as well as an Oracle User Group ?
Also, was Janeway overlooked for the command of the Heart of Gold because she’s not a tea drinker ?
Full disclosure : that wasn’t my geeky reference to Hitchicker’s Guide to the Galaxy. I got these ships and captains from a Star Trek Wiki.
Coming back to more immediate questions, how does UTL_COMPRESS perform with different compression settings for larger files…
Performance test
Let’s generate a more substantial file (using William Robertson’s CSV package)…
declare rc sys_refcursor; begin open rc for select rownum as id, case mod( rownum, 8) when 1 then 'Archer' when 2 then 'Pyke' when 3 then 'Georgiou' when 4 then 'Lorca' when 5 then 'Kirk' when 6 then 'Picard' when 7 then 'Sisko' when 0 then 'Janeway' end as captain, case mod(rownum, 8) when 1 then 'Enterprise NX-01' when 2 then 'USS Enterprise' when 3 then 'USS Shenzou' when 4 then 'USS Discovery' when 5 then 'USS Enterprise' when 6 then 'USS Stargazer' when 7 then 'USS Defiant' when 0 then 'USS Voyager' end as ship from dual connect by rownum < 1000000; csv.write_file( p_dataset => rc, p_separator => chr(124), p_heading => 'Y', p_rowcount => 'Y', p_directory => 'APP_FILES', p_filename => 'famous_federation_captains.csv'); end; /
If we look at the file on disk, we can see that it’s hefty enough for our purposes :
ls -lh famous_federation_captains.csv -rw-r--r-- 1 oracle oinstall 27M Apr 27 17:31 famous_federation_captains.csv
To start with, let’s try compressing this using the default value for quality (6) :
set timing on set feedback on declare l_dir all_directories.directory_name%type := 'APP_FILES'; fname varchar2(100) := 'famous_federation_captains.csv'; src_file bfile; lz_compressed blob; lz_compressed_len integer; fh utl_file.file_type; v_buffer raw(32767); buf_size pls_integer := 32767; v_pos integer := 1; begin src_file := bfilename( l_dir, fname); dbms_lob.fileopen(src_file, dbms_lob.file_readonly); lz_compressed := utl_compress.lz_compress( src_file); lz_compressed_len := dbms_lob.getlength( lz_compressed); fh := utl_file.fopen( l_dir, fname||'.gz', 'wb'); while v_pos < lz_compressed_len loop dbms_lob.read( lz_compressed, buf_size, v_pos, v_buffer); utl_file.put_raw( fh, v_buffer, true); v_pos := v_pos + buf_size; end loop; utl_file.fclose(fh); -- Now need to delete original utl_file.fremove(l_dir, fname); end; /
The output is :
PL/SQL procedure successfully completed. Elapsed: 00:00:00.329
If we examing the compressed file we can see that the compression ratio is around 90% :
ls -lh famous_federation_captains.csv.gz -rw-r--r-- 1 oracle oinstall 2.7M Apr 27 17:37 famous_federation_captains.csv.gz gzip -l famous_federation_captains.csv.gz compressed uncompressed ratio uncompressed_name 2779405 27388901 89.9% famous_federation_captains.csv
Trying this with the quality value set to the minimum (1) :
... lz_compressed := utl_compress.lz_compress( src_file, 1); ...
…results in a faster runtime…
PL/SQL procedure successfully completed. Elapsed: 00:00:00.202
…but a marginally bigger file…
ls -lh famous_federation_captains.csv.gz -rw-r--r-- 1 oracle oinstall 2.9M Apr 27 17:43 famous_federation_captains.csv.gz gzip -l famous_federation_captains.csv.gz compressed uncompressed ratio uncompressed_name 2994045 27388901 89.1% famous_federation_captains.csv
Finally, let’s go for the maximum quality setting (9) :
... lz_compressed := utl_compress.lz_compress( src_file, 9); ...
The runtime is a little longer…
PL/SQL procedure successfully completed. Elapsed: 00:00:00.393
…and the resulting file is more or less the same as for the default :
ls -lh famous_federation_captains.csv.gz -rw-r--r-- 1 oracle oinstall 2.7M Apr 27 17:47 famous_federation_captains.csv.gz gzip -l famous_federation_captains.csv.gz compressed uncompressed ratio uncompressed_name 2762750 27388901 89.9% famous_federation_captains.csv
Obviously, this is a limited test. I’m running Oracle 18cXE on CentOS. However, it may become a consideration for you if you’re either tight on space or need to speed up your compression job.
Uncompressing GZIPPED files
As we’ve seen, UTL_COMPRESS can produce files that are compatible with GZIP. But can it read files that have been compressed using GZIP ?
Let’s take a look :
gzip captains_log.txt ls -l captains_log.txt.gz -rw-rw-rw- 1 oracle oinstall 561 Apr 27 18:00 captains_log.txt.gz
Want to read the Captain’s log ?
set serverout on size unlimited clear screen declare l_dir all_directories.directory_name%type := 'APP_FILES'; fname varchar2(100) := 'captains_log.txt.gz'; zipped_file bfile; v_unzipped blob := empty_blob(); v_text clob; v_dest_offset integer := 1; v_src_offset integer := 1; v_lang integer := dbms_lob.default_lang_ctx; v_warn integer; begin zipped_file := bfilename( l_dir, fname); dbms_lob.fileopen(zipped_file, dbms_lob.file_readonly); v_unzipped := utl_compress.lz_uncompress( zipped_file); dbms_lob.fileclose(zipped_file); -- Now we have the uncompressed contents of the file, let's print it out... dbms_lob.createtemporary( v_text, true); dbms_lob.converttoclob( dest_lob => v_text, src_blob => v_unzipped, amount => dbms_lob.lobmaxsize, dest_offset => v_dest_offset, src_offset => v_src_offset, blob_csid => dbms_lob.default_csid, lang_context => v_lang, warning => v_warn); dbms_output.put_line(v_text); dbms_lob.freetemporary( v_text); end; /
Run this and Kirk’s inner-most thoughts (or possibly the lyrics to Star Trekking by The Firm) are there for all to see…
Acknowledgements
Despite appearances, I do actually do some research before kicking out this stuff.
On this occasion, the following links were of some assistance :