Generating an md5sum on Oracle Database LOBs – or how to organise your Holiday Snaps

Following the trend in these straightened times, Deb and I decided to stay at home this year rather than going away on holiday.
I say “decided”, but this was really more due to the fact we were terribly grown up and bought a house last year.
As a result, the only recreation we could afford was a walk around the garden…whilst pushing a lawn-mower.
In an attempt to recall happier times, I’ve had a look back of some of the photos from our last proper holiday, in Canada.
As well as providing some happy memories, this also gives me the opportunity to explore how to compare an operating system file ( such as a jpeg) with a LOB held in the database.

Before diving into the technical stuff, here are a couple of pics that I’m going to use in the examples that follow :

How did I end up married to such a gorgeous creature ?

How did I end up married to such a gorgeous creature ?

...my innate sense of style and acute fashion-sense, of course !

…my innate sense of style and acute fashion-sense, of course !


All I’ll say in my defence is that these disposable yellow cagouls are still all the rage. If you ever go to the caves behind Niagara Falls, you’ll see they are what everyone is wearing.

md5sum in Linux

If you want to find out whether two binary files on a linux os are identical, one way is to use md5sum :

$ md5sum mike_yellow.jpg
4009fc1cf89609d7d4fd5dffe70b28d3  mike_yellow.jpg

Wouldn’t it be good if you could just run md5sum in the database for your LOBs ? Well, funny you should say that…

The BLOB example

To illustrate the point, we’re going to create a table to hold our jpegs…

CREATE TABLE holiday_snaps(
    file_name VARCHAR2(50),
    picture BLOB)
/

We need to define a directory in Oracle from where the files will be loaded into the database…

CREATE DIRECTORY my_files AS '/u01/app/oracle/my_files'
/

Finally, we need some code to load binary files into the table.

CREATE OR REPLACE PROCEDURE load_pics_pr( 
    i_dir VARCHAR2, i_fname VARCHAR2)
IS
    l_file BFILE;
    l_dest_loc BLOB;
BEGIN
    l_file := BFILENAME( i_dir, i_fname);
    --
    -- Create the record and get the locator
    --
    INSERT INTO holiday_snaps( file_name, picture)
    VALUES( i_fname, EMPTY_BLOB())
    RETURNING picture INTO l_dest_loc;
    --
    -- Open the file
    --
    DBMS_LOB.OPEN(l_file, DBMS_LOB.LOB_READONLY);
    --
    -- Now load the file into the table
    --
    DBMS_LOB.LOADFROMFILE ( 
        dest_lob => l_dest_loc,
        src_lob => l_file,
        amount => DBMS_LOB.GETLENGTH(l_file) );
    --
    -- tidy up
    --
    DBMS_LOB.CLOSE(l_file);
END load_pics_pr;
/

Now, we copy the files to the directory we’ve specified…

sudo cp *.jpg /u01/app/oracle/my_files/.

…and call the load procedure…

BEGIN
    load_pics_pr( i_dir => 'MY_FILES', i_fname => 'deb_canada.jpg');
    load_pics_pr( i_dir => 'MY_FILES', i_fname => 'mike_yellow.jpg');
END;
/
COMMIT;

We now have two rows in our table…

SELECT file_name 
FROM holiday_snaps
/

FILE_NAME
--------------------
deb_canada.jpg
mike_yellow.jpg

Sometime later, we stumble across a couple of files somewhere on the disk. Are these the same as the ones we have in the database ? How can we tell ?

Generate an MD5SUM for a BLOB

The answer is, with a bit of code like this….

CREATE OR REPLACE FUNCTION get_md5sum_blob_fn( i_blob IN BLOB)
    RETURN RAW
IS
BEGIN
    RETURN
        DBMS_CRYPTO.HASH
        (
            src => i_blob,
            typ => DBMS_CRYPTO.HASH_MD5
        );
END;
/

NOTE – the owner of this function will need EXECUTE on the DBMS_CRYPTO package.

Using this function, we can generate md5sums for our BLOBS and then compare them with the md5sums of the files on disk…

SELECT file_name, get_md5sum_blob_fn(picture) md5sum
FROM holiday_snaps
/
FILE_NAME	     MD5SUM
-------------------- ----------------------------------------
deb_canada.jpg	     34730B6E7CE47F023FEB302E12F15DFD
mike_yellow.jpg      4009FC1CF89609D7D4FD5DFFE70B28D3

If we now get the md5sums for the files, we can see that they match :

$ md5sum *.jpg
34730b6e7ce47f023feb302e12f15dfd  deb_canada.jpg
4009fc1cf89609d7d4fd5dffe70b28d3  mike_yellow.jpg

With some slight modifications, we can do the same for CLOBS…

The CLOB example

This requires a similar setup as for BLOBs. We’ll use the same directory that we created earlier.
The table will look something like this :

CREATE TABLE holiday_plans(
    file_name VARCHAR2(50),
    file_contents CLOB)
/

The file we’re going to load is called hols.txt and contains the following :

We're all going on a summer holiday !

With the file copied to the directory, we now need the load procedure :

CREATE OR REPLACE PROCEDURE load_holiday_plans_pr( 
    i_dir VARCHAR2, i_fname VARCHAR2)
IS
    l_file BFILE;
    l_dest_loc CLOB;
    l_dest_off NUMBER(38) := 1;
    l_src_off NUMBER(38) := 1;
    l_context NUMBER(38) := DBMS_LOB.DEFAULT_LANG_CTX;
    l_warning NUMBER(38);
    l_length NUMBER;
    l_csid NUMBER := DBMS_LOB.DEFAULT_CSID;
BEGIN
    l_file := BFILENAME( i_dir, i_fname);
    --
    -- Create the record and get the locator
    --
    INSERT INTO holiday_plans( file_name, file_contents)
    VALUES( i_fname, EMPTY_CLOB())
    RETURNING file_contents INTO l_dest_loc;
    --
    -- Open the file
    --
    DBMS_LOB.OPEN(l_file, DBMS_LOB.LOB_READONLY);
    --
    -- Now load the file into the table
    --
    l_length := DBMS_LOB.GETLENGTH(l_file);
    DBMS_LOB.LOADCLOBFROMFILE ( 
        dest_lob => l_dest_loc,
        src_bfile => l_file,
        amount => l_length,
        dest_offset => l_dest_off,
        src_offset => l_src_off,
        bfile_csid => l_csid,
        lang_context => l_context,
        warning => l_warning);   
    --
    -- tidy up
    --
    DBMS_LOB.CLOSE(l_file);
END load_holiday_plans_pr;
/

We run this …

BEGIN
    load_holiday_plans_pr(i_dir => 'MY_FILES', i_fname => 'hols.txt');
END;
/

COMMIT;

SQL> select file_name from holiday_plans;

FILE_NAME
--------------------------------------------------
hols.txt

SQL> 

The checksum generation code for CLOBS is almost identical it’s BLOB counterpart…

CREATE OR REPLACE FUNCTION get_md5sum_clob_fn( i_clob IN CLOB)
    RETURN RAW
IS
BEGIN
    RETURN
        DBMS_CRYPTO.HASH
        (
            src => UTL_RAW.CAST_TO_RAW(i_clob),
            typ => DBMS_CRYPTO.HASH_MD5
        );
END;
/

Run this and we get…

SELECT file_name, get_md5sum_fn(file_contents) 
FROM holiday_plans
WHERE file_name = 'hols.txt';

FILE_NAME	     GET_MD5SUM_FN(FILE_CONTENTS)
-------------------- ----------------------------------------
hols.txt	     A88DA5D09E0993D2DC54CDC6DF8AA833

…which matches the md5sum output on Linux for the file …

 $ md5sum hols.txt
a88da5d09e0993d2dc54cdc6df8aa833  hols.txt

You could put the comparison in a shell script to save you the bother of checking all 32 characters match :

#!/bin/sh
#
# Script to compare a jpeg on the os with one stored in the 
# HOLIDAY_SNAPS table.
#
# usage is :
# compfiles.sh file_name you want to compare
# Assumption : file exists in directory we're running in
#
echo 'Enter database connect string : '
read conn_str
osfile=$1
oscksum=`md5sum $1|awk '{print$1}'`
msg=`sqlplus -s $conn_str <<-END_SCRIPT
    set serveroutput on
    set feedback off
    DECLARE 
        l_dummy PLS_INTEGER;
    BEGIN
        SELECT 1
        INTO l_dummy 
        FROM holiday_snaps
        WHERE file_name = '$1'
        AND get_md5sum_blob_fn(picture) = UPPER('$oscksum');
        DBMS_OUTPUT.PUT_LINE('File is in the database');
    EXCEPTION WHEN NO_DATA_FOUND THEN
        DBMS_OUTPUT.PUT_LINE('Checksums differ.');
    END;
    /
    quit
END_SCRIPT`
echo $msg
exit 0

Whilst sorting through these pictures, I have uncovered something which may offer an alternative explanation to why I’m so irresistible. I think the file is called beer_goggles.jpg…

another town, another bar bill ( I had the diet coke).

another town, another bar bill ( I had the diet coke).

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s