Getting a File Listing from a Directory in PL/SQL

It’s General Election time here in the UK.
Rather than the traditional two-way fight to form a government, this time around we seem to have a reasonably broad range of choice.
In addition to red and blue, we also have purple and – depending on where you live in the country, multiple shades of yellow and green.
The net effect is to leave the political landscape looking not so much like a rainbow as a nasty bruise.

The message coming across from the politicians is that everything that’s wrong in this country is down to foreigners – Eastern Europeans…or English (once again, depending on your location).
Strangely, the people who’ve been running our economy and public services for the last several years tend not to get much of a mention.
Whatever we end up choosing, our ancient electoral system is not set up to cater for so many parties attracting a significant share of support.

The resulting wrangling to cobble together a Coalition Government will be hampered somewhat by our – equally ancient – constitution.

That’s largely because, since Magna Carta, no-one’s bothered to write it down.

In olden times, if you wanted to find out what files were in a directory from inside the database, you’re options were pretty undocumented as well.
Fortunately, times have changed…

What I’m going to cover here is how to use an External Table pre-process to retrieve a file listing from a directory from inside the database.
Whilst this technique will work on any platform, I’m going to focus on Linux in the examples that follow…

An Overview of the Solution

First of all we’re going to write a shell script to give us a file listing on Linux.
This will include all of the details supplied by ls -l listing, in a comma delimited format.
The directory we’re going to get a list of files for is the DATA_PUMP_DIR.
Then we’ll create an External Table to read the output of the script.
Here goes then….

Getting a useable file listing on Linux

The first thing we need here is a shell script for the external table to execute in it’s pre-processor directive.
The output needs to be in a format that can be easily loaded by the external table.

Let’s have a look then…

/u01/app/oracle/admin/XE/dpdump $ ls -l
total 322320
-rw-r--r-- 1 oracle dba     87816 Aug 22  2014 export.log
-rw-r--r-- 1 oracle dba         0 Apr 16 13:13 file_list_dummy.txt
-rw-r----- 1 oracle dba 327385088 Aug 22  2014 full220814.dmp
-rw-r--r-- 1 oracle dba      2889 Jan  9 15:29 hr_dev_imp.log
-rw-r----- 1 oracle dba    524288 Jan  9 15:16 hr_export.dmp
-rw-r--r-- 1 oracle dba      2171 Jan  9 15:16 hr_export.log
-rw-r----- 1 oracle dba   1560576 Feb 12 11:46 ossca_pre_050_export.dmp
-rw-r--r-- 1 oracle dba      3331 Feb 12 11:46 ossca_pre_050_export.log
-rw-r----- 1 oracle dba    462848 Mar  5 13:44 test_schemas.dmp
-rw-r--r-- 1 oracle dba      1759 Mar  5 13:44 test_schemas_exp.log

There are a few things to note from here.
Firstly, there are up to eight distinct columns in the output. You’ll notice that files over 6 months old are reported with a date and then a year. Newer files just have a date.

Secondly, the widths of the fields themselves vary in length between files, most notably the file sizes.
We need to tidy up this output a bit if we’re going to be able to load it easily into the External Table.

After some trial and error, it looks like the following will meet our needs :

ls -l --time-style=long-iso | awk 'BEGIN {OFS = ",";} {print $1, $2, $3, $4, $5, $6" "$7, $8}'
total,322320,,,, ,
-rw-r--r--,1,oracle,dba,87816,2014-08-22 13:30,export.log
-rw-r--r--,1,oracle,dba,0,2015-04-16 13:13,file_list_dummy.txt
-rw-r-----,1,oracle,dba,327385088,2014-08-22 13:30,full220814.dmp
-rw-r--r--,1,oracle,dba,2889,2015-01-09 15:29,hr_dev_imp.log
-rw-r-----,1,oracle,dba,524288,2015-01-09 15:16,hr_export.dmp
-rw-r--r--,1,oracle,dba,2171,2015-01-09 15:16,hr_export.log
-rw-r-----,1,oracle,dba,1560576,2015-02-12 11:46,ossca_pre_050_export.dmp
-rw-r--r--,1,oracle,dba,3331,2015-02-12 11:46,ossca_pre_050_export.log
-rw-r-----,1,oracle,dba,462848,2015-03-05 13:44,test_schemas.dmp
-rw-r--r--,1,oracle,dba,1759,2015-03-05 13:44,test_schemas_exp.log

The Shell Script

There are a few things we need to bear in mind for this script.
As it’s going to be called from the external table, we cannot assume that any environment variables have been set (even $PATH).
Therefore, the commands need to be called using their absolute path, rather than just their name.

Additionally, the script will be passed the fully qualified path and filename of the location of our external table as the first argument. This is going to be useful as we’ll need to direct the shell script to the appropriate directory to list.
We can strip the filename from this argument and use the directory by using basename…

If you want to find out what the paths for these commands are ( and they can differ between Linux Distros), then you can do the following :

which ls
which awk
which basename

The final script looks like this :

# Generate a comma separated ls -l for the directory in which the external table's
# location file resides
targetDir=`/usr/bin/dirname $1`
/bin/ls -l --time-style=long-iso $targetDir | /usr/bin/awk 'BEGIN {OFS = ",";} {print $1, $2, $3, $4, $5, $6" "$7, $8}'
exit 0

I’ve saved this file as

Now for…

The External Table

The output we’ll be reading starts with the Total number of blocks in the directory, so we’ll want to skip that.
As for the files themselves, each file record will have the following attributes :

  • File Permissions
  • Number of Hard Links
  • File Owner
  • Group of the file owner
  • File size in bytes
  • Date the file was last modified
  • The file name

As we’re using a pre-processor in our external table, the usual rules will apply.

The External Table needs to have a location file specified at runtime, even though it won’t actually be reading the contents of that file in this instance.

So, our table will look something like this :

create table list_files_xt
    permissions varchar2(15),
    hard_links number,
    file_owner varchar2(32),
    group_name varchar2(32),
    size_bytes number,
    last_modified date,
    file_name varchar2(255)
    organization external
        type oracle_loader
        default directory data_pump_dir
        access parameters
            records delimited by newline
            preprocessor pre_proc_dir : ''            
            skip 1
            fields terminated by ','            
                last_modified date 'YYYY-MM-DD HH24:MI',

Deploying the Components

First, we need to put the script in place. To work out where, check the path of the directory object that we’re referencing in the External Table definition ( in this case, PRE_PROC_DIR).
Then we’ll need to create the external table’s dummy file in the directory specified as the Default Directory ( DATA_PUMP_DIR).

So, we need to know the OS location of these directories…

select directory_name, directory_path
from all_directories
where directory_name in ('PRE_PROC_DIR', 'DATA_PUMP_DIR')

--------------- ----------------------------------------
PRE_PROC_DIR	/u01/app/oracle/pre_proc
DATA_PUMP_DIR	/u01/app/oracle/admin/XE/dpdump/

Now to make sure that the shell script is in the right place and has the appropriate permissions :

sudo cp /u01/app/oracle/pre_proc/.
sudo chmod o+x /u01/app/oracle/pre_proc/

If we now check the permissions on the file, we can see that all users have execute :

ls -l /u01/app/oracle/pre_proc/
-rw-r--r-x 1 root root 281 Apr 23 13:02 /u01/app/oracle/pre_proc/

Next we need to create the dummy file for the External table. The oracle os user needs to have read and write permissions on this file.
The simplest way to achieve this is to create the file as the oracle os user :

sudo su oracle
touch /u01/app/oracle/admin/XE/dpdump/list_files_dummy.txt

Now we can see that we’ve got an empty file ….

ls -l /u01/app/oracle/admin/XE/dpdump/list_files_dummy.txt
-rw-r--r-- 1 oracle dba 0 Apr 23 13:05 /u01/app/oracle/admin/XE/dpdump/list_files_dummy.txt

Finally, we just need to login to the database and create the table using the script above.
Now let’s see what export files I’ve got :

select *
from list_files_xt
where file_name like '%.dmp'
order by last_modified

---------- ---------- ---------- ----------- ---------- ------------------ --------------------------------------------------
-rw-r-----          1 oracle     dba          327385088 22-AUG-14          full220814.dmp
-rw-r-----          1 oracle     dba             524288 09-JAN-15          hr_export.dmp
-rw-r-----          1 oracle     dba            1560576 12-FEB-15          ossca_pre_050_export.dmp
-rw-r-----          1 oracle     dba             462848 05-MAR-15          test_schemas.dmp

NOTE – no minorities were persecuted in the writing of this post.

5 thoughts on “Getting a File Listing from a Directory in PL/SQL

    • Matthias,

      Fair point – I had originally intended to include something about how you could dynamically change the external table default directory to use the same table to list multiple directories (assuming you had the appropriate “location” file in each directory. However, in the end I felt that it was better to keep things relatively simple in terms of this example.
      All of which just goes to show that I should really wait until I’ve finished writing the post before coming up with the title 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.