Generating an md5sum on Oracle Database LOBs – or how to organise your Holiday Snaps

Following the trend in these straightened times, Deb and I decided to stay at home this year rather than going away on holiday.
I say “decided”, but this was really more due to the fact we were terribly grown up and bought a house last year.
As a result, the only recreation we could afford was a walk around the garden…whilst pushing a lawn-mower.
In an attempt to recall happier times, I’ve had a look back of some of the photos from our last proper holiday, in Canada.
As well as providing some happy memories, this also gives me the opportunity to explore how to compare an operating system file ( such as a jpeg) with a LOB held in the database.

Before diving into the technical stuff, here are a couple of pics that I’m going to use in the examples that follow :

How did I end up married to such a gorgeous creature ?

How did I end up married to such a gorgeous creature ?

...my innate sense of style and acute fashion-sense, of course !

…my innate sense of style and acute fashion-sense, of course !


All I’ll say in my defence is that these disposable yellow cagouls are still all the rage. If you ever go to the caves behind Niagara Falls, you’ll see they are what everyone is wearing.

md5sum in Linux

If you want to find out whether two binary files on a linux os are identical, one way is to use md5sum :

$ md5sum mike_yellow.jpg
4009fc1cf89609d7d4fd5dffe70b28d3  mike_yellow.jpg

Wouldn’t it be good if you could just run md5sum in the database for your LOBs ? Well, funny you should say that…

The BLOB example

To illustrate the point, we’re going to create a table to hold our jpegs…

CREATE TABLE holiday_snaps(
    file_name VARCHAR2(50),
    picture BLOB)
/

We need to define a directory in Oracle from where the files will be loaded into the database…

CREATE DIRECTORY my_files AS '/u01/app/oracle/my_files'
/

Finally, we need some code to load binary files into the table.

CREATE OR REPLACE PROCEDURE load_pics_pr( 
    i_dir VARCHAR2, i_fname VARCHAR2)
IS
    l_file BFILE;
    l_dest_loc BLOB;
BEGIN
    l_file := BFILENAME( i_dir, i_fname);
    --
    -- Create the record and get the locator
    --
    INSERT INTO holiday_snaps( file_name, picture)
    VALUES( i_fname, EMPTY_BLOB())
    RETURNING picture INTO l_dest_loc;
    --
    -- Open the file
    --
    DBMS_LOB.OPEN(l_file, DBMS_LOB.LOB_READONLY);
    --
    -- Now load the file into the table
    --
    DBMS_LOB.LOADFROMFILE ( 
        dest_lob => l_dest_loc,
        src_lob => l_file,
        amount => DBMS_LOB.GETLENGTH(l_file) );
    --
    -- tidy up
    --
    DBMS_LOB.CLOSE(l_file);
END load_pics_pr;
/

Now, we copy the files to the directory we’ve specified…

sudo cp *.jpg /u01/app/oracle/my_files/.

…and call the load procedure…

BEGIN
    load_pics_pr( i_dir => 'MY_FILES', i_fname => 'deb_canada.jpg');
    load_pics_pr( i_dir => 'MY_FILES', i_fname => 'mike_yellow.jpg');
END;
/
COMMIT;

We now have two rows in our table…

SELECT file_name 
FROM holiday_snaps
/

FILE_NAME
--------------------
deb_canada.jpg
mike_yellow.jpg

Sometime later, we stumble across a couple of files somewhere on the disk. Are these the same as the ones we have in the database ? How can we tell ?

Generate an MD5SUM for a BLOB

The answer is, with a bit of code like this….

CREATE OR REPLACE FUNCTION get_md5sum_blob_fn( i_blob IN BLOB)
    RETURN RAW
IS
BEGIN
    RETURN
        DBMS_CRYPTO.HASH
        (
            src => i_blob,
            typ => DBMS_CRYPTO.HASH_MD5
        );
END;
/

NOTE – the owner of this function will need EXECUTE on the DBMS_CRYPTO package.

Using this function, we can generate md5sums for our BLOBS and then compare them with the md5sums of the files on disk…

SELECT file_name, get_md5sum_blob_fn(picture) md5sum
FROM holiday_snaps
/
FILE_NAME	     MD5SUM
-------------------- ----------------------------------------
deb_canada.jpg	     34730B6E7CE47F023FEB302E12F15DFD
mike_yellow.jpg      4009FC1CF89609D7D4FD5DFFE70B28D3

If we now get the md5sums for the files, we can see that they match :

$ md5sum *.jpg
34730b6e7ce47f023feb302e12f15dfd  deb_canada.jpg
4009fc1cf89609d7d4fd5dffe70b28d3  mike_yellow.jpg

With some slight modifications, we can do the same for CLOBS…

The CLOB example

This requires a similar setup as for BLOBs. We’ll use the same directory that we created earlier.
The table will look something like this :

CREATE TABLE holiday_plans(
    file_name VARCHAR2(50),
    file_contents CLOB)
/

The file we’re going to load is called hols.txt and contains the following :

We're all going on a summer holiday !

With the file copied to the directory, we now need the load procedure :

CREATE OR REPLACE PROCEDURE load_holiday_plans_pr( 
    i_dir VARCHAR2, i_fname VARCHAR2)
IS
    l_file BFILE;
    l_dest_loc CLOB;
    l_dest_off NUMBER(38) := 1;
    l_src_off NUMBER(38) := 1;
    l_context NUMBER(38) := DBMS_LOB.DEFAULT_LANG_CTX;
    l_warning NUMBER(38);
    l_length NUMBER;
    l_csid NUMBER := DBMS_LOB.DEFAULT_CSID;
BEGIN
    l_file := BFILENAME( i_dir, i_fname);
    --
    -- Create the record and get the locator
    --
    INSERT INTO holiday_plans( file_name, file_contents)
    VALUES( i_fname, EMPTY_CLOB())
    RETURNING file_contents INTO l_dest_loc;
    --
    -- Open the file
    --
    DBMS_LOB.OPEN(l_file, DBMS_LOB.LOB_READONLY);
    --
    -- Now load the file into the table
    --
    l_length := DBMS_LOB.GETLENGTH(l_file);
    DBMS_LOB.LOADCLOBFROMFILE ( 
        dest_lob => l_dest_loc,
        src_bfile => l_file,
        amount => l_length,
        dest_offset => l_dest_off,
        src_offset => l_src_off,
        bfile_csid => l_csid,
        lang_context => l_context,
        warning => l_warning);   
    --
    -- tidy up
    --
    DBMS_LOB.CLOSE(l_file);
END load_holiday_plans_pr;
/

We run this …

BEGIN
    load_holiday_plans_pr(i_dir => 'MY_FILES', i_fname => 'hols.txt');
END;
/

COMMIT;

SQL> select file_name from holiday_plans;

FILE_NAME
--------------------------------------------------
hols.txt

SQL> 

The checksum generation code for CLOBS is almost identical it’s BLOB counterpart…

CREATE OR REPLACE FUNCTION get_md5sum_clob_fn( i_clob IN CLOB)
    RETURN RAW
IS
BEGIN
    RETURN
        DBMS_CRYPTO.HASH
        (
            src => UTL_RAW.CAST_TO_RAW(i_clob),
            typ => DBMS_CRYPTO.HASH_MD5
        );
END;
/

Run this and we get…

SELECT file_name, get_md5sum_fn(file_contents) 
FROM holiday_plans
WHERE file_name = 'hols.txt';

FILE_NAME	     GET_MD5SUM_FN(FILE_CONTENTS)
-------------------- ----------------------------------------
hols.txt	     A88DA5D09E0993D2DC54CDC6DF8AA833

…which matches the md5sum output on Linux for the file …

 $ md5sum hols.txt
a88da5d09e0993d2dc54cdc6df8aa833  hols.txt

You could put the comparison in a shell script to save you the bother of checking all 32 characters match :

#!/bin/sh
#
# Script to compare a jpeg on the os with one stored in the 
# HOLIDAY_SNAPS table.
#
# usage is :
# compfiles.sh file_name you want to compare
# Assumption : file exists in directory we're running in
#
echo 'Enter database connect string : '
read conn_str
osfile=$1
oscksum=`md5sum $1|awk '{print$1}'`
msg=`sqlplus -s $conn_str <<-END_SCRIPT
    set serveroutput on
    set feedback off
    DECLARE 
        l_dummy PLS_INTEGER;
    BEGIN
        SELECT 1
        INTO l_dummy 
        FROM holiday_snaps
        WHERE file_name = '$1'
        AND get_md5sum_blob_fn(picture) = UPPER('$oscksum');
        DBMS_OUTPUT.PUT_LINE('File is in the database');
    EXCEPTION WHEN NO_DATA_FOUND THEN
        DBMS_OUTPUT.PUT_LINE('Checksums differ.');
    END;
    /
    quit
END_SCRIPT`
echo $msg
exit 0

Whilst sorting through these pictures, I have uncovered something which may offer an alternative explanation to why I’m so irresistible. I think the file is called beer_goggles.jpg…

another town, another bar bill ( I had the diet coke).

another town, another bar bill ( I had the diet coke).

Advertisements

3 thoughts on “Generating an md5sum on Oracle Database LOBs – or how to organise your Holiday Snaps

  1. Hi, I am getting different results from DBMS_CRYPTO.HASH using MD5. My table is sequence and my clob column is sequence. I am using your code. My database is Oracle 11g and the character set is AL32UTF8. I changed the name of your function slightly. Here is my query. My table name is sequence and the sequence column is type clob. The md5 column is set using our code is RAW(32).

    select GET_MD5_FROM_CLOB_FN( sequence )
    from sequence
    where GET_MD5_FROM_CLOB_FN( sequence ) in (select md5sum from sequence);

    Thanks,

    David

  2. Clarification:

    select GET_MD5_FROM_CLOB_FN( sequence )
    from sequence
    where GET_MD5_FROM_CLOB_FN( sequence ) in (select md5sum from sequence);

    No rows selected.

    • David,

      as far as I can work out, I think you are saying your DDL looks like this :

      create table sequence
      (
          sequence clob,
          md5sum raw(32)
      )    
      /
      
      
      create or replace function get_md5_from_clob_fn( i_clob IN CLOB)
          return raw
      is
      begin
          return dbms_crypto.hash(utl_raw.cast_to_raw(i_clob), dbms_crypto.hash_md5);
      end get_md5_from_clob_fn;
      /
      

      As an aside, I’d suggest you may want to rename the table and the column as using keywords to name database objects usually ends in unexpected behaviour/general confusion/both.

      Anyhow, I’ve not been able to replicate your issue.
      I inserted a record :

      insert into sequence( sequence, md5sum)
      values('I have a dog called Cat', get_md5_from_clob_fn('I have a dog called Cat'))
      /
      
      commit;
      

      Then ran…

      
      select sequence, md5sum, get_md5_from_clob_fn(sequence) as Function_value
      from sequence
      /
      
      
      SEQUENCE                       MD5SUM                           FUNCTION_VALUE                  
      ------------------------------ -------------------------------- --------------------------------
      I have a dog called Cat        75E44A8646D81FB24C9CFC3A1405670B 75E44A8646D81FB24C9CFC3A1405670B
      
      

      I also tried :

      select *
      from sequence
      where md5sum = get_md5_from_clob_fn(sequence)
      /
      
      
      SEQUENCE                       MD5SUM                          
      ------------------------------ --------------------------------
      I have a dog called Cat        75E44A8646D81FB24C9CFC3A1405670B
      
      

      …as well as you’re suggestion …

      select GET_MD5_FROM_CLOB_FN( sequence )
      from sequence
      where GET_MD5_FROM_CLOB_FN( sequence ) in (select md5sum from sequence)
      /
      
      GET_MD5_FROM_CLOB_FN(SEQUENCE)
      ---------------------------------
      75E44A8646D81FB24C9CFC3A1405670B
      
      
      

      I ran these tests on Oracle 11g :

      select banner 
      from v$version
      where banner like 'Oracle Database %'
      /
      
      BANNER                                                                          
      --------------------------------------------------------------------------------
      Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
      
      

      …using the same character set as you…

      select sys_context('userenv', 'language') 
      from dual
      /
      
      SYS_CONTEXT('USERENV','LANGUAGE')
      -------------------------------
      ENGLISH_UNITED KINGDOM.AL32UTF8
      

      I’m not too sure what to suggest beyond what you may well have done already :

      check to make sure that the checksum is being correctly inserted into the table and that your insert transactions are committed
      check there’s nowhere in your code where you are assuming that the checksum characters will be in lower case

      Failing that, can you please post the following to allow me to dig a bit further :

      The script you used to create your table
      The code for your function
      The code to insert records into the table
      A query returning all of the columns in the table and the md5sum returned by the function, together with the output

      HTH,

      MIke

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.