Dude, Where’s My File ? Finding External Table Files in the midst of (another) General Election

It’s early summer in the UK, which means it must be time for an epoch defining vote of some kind. No, I’m not talking about Britain’s Got Talent.
Having promised that there wouldn’t be another General Election until 2020, our political classes have now decided that they can’t go any longer without asking us what we think. Again.
Try as I might, it may not be possible to prevent the ear-worm phrases from the current campaign slipping into this post.
What I want to look at is how you can persuade Oracle to tell you the location on disk of any files associated with a given external table.
Specifically, I’ll be covering :

  • getting the name of the Database Server
  • finding the fully qualified path of the datafile the external table is pointing to
  • finding other files associated with the table, such as logfiles

In the course of this, we’ll be challenging the orthodoxy of Western Capitalism “If You Can Do It In SQL…” with the principle of DRY ( Don’t Repeat Yourself).
Hopefully I’ll be able to come up with a solution that is “Strong and Stable” and yet at the same time “Works For The Many, Not the Few”…

The Application

For the most part, I’ve written this code against Oracle 11g Express Edition. However, there are two versions of the final script, one of which is specifically for 12c. I’ll let you know which is which when we get there.

I have an external table which I use to load data from a csv file.

Initially, our application’s external table looks like this :

create table plebiscites_xt
(
    vote_year number(4),
    vote_name varchar2(100)
)
    organization external
    (
        type oracle_loader
        default directory my_files
        access parameters
        (
            records delimited by newline
            badfile 'plebiscites.bad'
            logfile 'plebiscites.log'
            skip 1
            fields terminated by ','
            (
                vote_year integer external(4),
                vote_name char(100)
            )
        )
            location('plebiscites.csv')
    )
reject limit unlimited
/

I’ve created the table in the MIKE schema.

The file that we’re currently loading – plebiscites.csv contains the following :

year,vote_name
2014,Scottish Independence Referendum
2015,UK General Election
2016,EU Referendum
2017,UK General Election

For the purposes of this exercise, I’ll assume that the file is uploaded frequently ( say once per day). I’ll also assume that there’s some ETL process that loads the data from the external table into a more permanent table elsewhere in the database.

As is the nature of this sort of ETL, there are times when it doesn’t quite work as planned.
This is when, equipped with just Read-Only access to production, you will need to diagnose the problem.

In these circumstances, just how do you locate any files that are associated with the external table?
Furthermore, how do you do this without having to create any database objects of your own ?

Finding the server that the files are on

There are a couple of ways to do this.
You could simply look in V$INSTANCE…

select host_name 
from v$instance
/

Alternatively…

select sys_context('userenv', 'server_host')
from dual
/

…will do the same job.
Either way, you should now have the name of the server that your database is running on and, therefore, the server from which the file in question will be visible.
Now to find the location of the data file itself…

Finding the datafile

In keeping with the current standard of public discourse, we’re going to answer the question “How do you find an External Table’s current Location File when not connected as the table owner” by answering a slightly different question ( i.e. as above but as the table owner)…

Our search is simplified somewhat by the fact that the location of any external table is held in the _EXTERNAL_LOCATIONS dictionary views :

select directory_owner, directory_name, location
from user_external_locations
where table_name = 'PLEBISCITES_XT'
/

With this information, we can establish the full path of the file by running…

select dir.directory_path||'/'||uel.location as xt_file
from user_external_locations uel
inner join all_directories dir
    on dir.owner = uel.directory_owner
    and dir.directory_name = uel.directory_name
where uel.table_name = 'PLEBISCITES_XT'
/

…which results in…

XT_FILE                                                                        
--------------------------------------------------------------------------------
/u01/app/oracle/my_files/plebiscites.csv                                        

This is all rather neat and simple. Unfortunately, our scenario of having to investigate an issue with the load is likely to take place in circumstances that render all of this of limited use, at best.

Remember, the scenario here is that we’re investigating an issue with the load on a production system. Therefore, it’s quite likely that we are connected as a user other than the application owner.
In my case, I’m connected as a user with CREATE SESSION and the LOOK_BUT_DONT_TOUCH role, which is created as follows :

create role look_but_dont_touch
/

grant select any dictionary to look_but_dont_touch
/

grant select_catalog_role to look_but_dont_touch
/

As well as the table’s data file, we’re going to want to look at any logfiles, badfiles and discardfiles associated with the table.

Finding other External Table files

At this point it’s worth taking a look at how we can find these additional files. Once again, we have two options.
First of all, we can simply check the table definition using DBMS_METADATA…

set long 5000
set pages 100
select dbms_metadata.get_ddl('TABLE', 'PLEBISCITES_XT', 'MIKE')
from dual
/

…alternatively, we can use the _EXTERNAL_TABLES to home in on the ACCESS_PARAMTERS defined for the table…

set long 5000
select access_parameters
from dba_external_tables
where owner = 'MIKE'
and table_name = 'PLEBISCITES_XT'
/

For our table as it’s currently defined, this query returns :

records delimited by newline
            badfile 'plebiscites.bad'
            logfile 'plebiscites.log'
            skip 1
            fields terminated by ','
            (
                vote_year integer external(4),
                vote_name char(100)
            )

In either case, we end up with a CLOB that we need to search to find the information we need.
To do this programatically, you may be tempted to follow the time-honoured approach of “If you can do it in SQL, do it in SQL”…

with exttab as
(
    select dir.directory_path,  
        regexp_replace( ext.access_parameters, '[[:space:]]') as access_parameters
    from dba_external_tables ext
    inner join dba_directories dir
        on dir.owner = ext.default_directory_owner
        and dir.directory_name = ext.default_directory_name
    where ext.owner = 'MIKE' 
    and ext.table_name = 'PLEBISCITES_XT'
)
select directory_path||'/'||
    case when instr(access_parameters, 'logfile',1,1) > 0 then
        substr
        ( 
            access_parameters, 
            instr(access_parameters, 'logfile') +8, -- substring start position
            instr(access_parameters, chr(39), instr(access_parameters, 'logfile') +8, 1) - (instr(access_parameters, 'logfile') +8) -- substr number of characters
        ) 
        else to_clob('Filename not specified')    
    end as log_file_name,
    directory_path||'/'||
    case when instr(access_parameters, 'badfile',1,1) > 0 then
        substr
        ( 
            access_parameters, 
            instr(access_parameters, 'badfile') +8, -- substring start position
            instr(access_parameters, chr(39), instr(access_parameters, 'badfile') +8, 1) - (instr(access_parameters, 'badfile') +8) -- substr number of characters
        ) 
        else to_clob('Filename not specified')    
    end as bad_file_name,
    directory_path||'/'||
    case when instr(access_parameters, 'discardfile',1,1) > 0 then    
        substr
        ( 
            access_parameters, 
            instr(access_parameters, 'discardfile') +12, -- substring start position
            instr(access_parameters, chr(39), instr(access_parameters, 'discardfile') +12, 1) - (instr(access_parameters, 'discardfile') +12) -- substr number of characters
        ) 
        else to_clob('Filename not specified')    
    end as discard_file_name    
from exttab    
/

…which returns…

Hmmm, it’s possible that a slightly more pragmatic approach is in order here…

set serveroutput on size unlimited
declare
    function get_file
    ( 
        i_owner in dba_external_tables.owner%type,
        i_table in dba_external_tables.table_name%type, 
        i_ftype in varchar2
    )
        return varchar2
    is
        separator constant varchar2(1) := '/';
        dir_path dba_directories.directory_path%type;
        access_params dba_external_tables.access_parameters%type;
        start_pos pls_integer := 0;
        end_pos pls_integer := 0;
    begin
        select dir.directory_path, lower(regexp_replace(ext.access_parameters, '[[:space:]]'))
        into dir_path, access_params
        from dba_external_tables ext
        inner join dba_directories dir
            on dir.owner = ext.default_directory_owner
            and dir.directory_name = ext.default_directory_name
        where ext.owner = upper(i_owner) 
        and ext.table_name = upper(i_table);

        start_pos := instr( access_params, i_ftype||chr(39),1,1) + length(i_ftype||chr(39));
        if start_pos - length(i_ftype||chr(39)) = 0 then
            return 'Filename Not Specified';
        end if;    
        end_pos := instr(access_params, chr(39), start_pos, 1);
        return dir_path||separator||substr(access_params, start_pos, end_pos - start_pos);
    end get_file;

begin
    dbms_output.put_line('LOGFILE '||get_file('MIKE', 'PLEBISCITES_XT', 'logfile'));
    dbms_output.put_line('BADFILE '||get_file('MIKE', 'PLEBISCITES_XT','badfile'));
    dbms_output.put_line('DISCARDFILE '||get_file('MIKE', 'PLEBISCITES_XT','discardfile'));
end;
/

Yes, it’s PL/SQL. No, I don’t think I’ll be getting a visit from the Database Police as this is a rather more DRY method of doing pretty much the same thing…

LOGFILE /u01/app/oracle/my_files/plebiscites.log
BADFILE /u01/app/oracle/my_files/plebiscites.bad
DISCARDFILE Filename Not Specified


PL/SQL procedure successfully completed.

As we’re about to find out, this solution also falls short of being a panacea…

Separate Directory Definitions

What happens when the directories that the files are created in are different from each other ?
Let’s re-define our table :

drop table plebiscites_xt
/

create table plebiscites_xt
(
    vote_year number(4),
    vote_name varchar2(100)
)
    organization external
    (
        type oracle_loader
        default directory my_files
        access parameters
        (
            records delimited by newline
            badfile 'plebiscites.bad'
            logfile my_files_logs:'plebiscites.log'
            discardfile my_files_discards:'plebiscites.disc'
            skip 1
            fields terminated by ','
            (
                vote_year integer external(4),
                vote_name char(100)
            )
        )
            location('plebiscites.csv')
    )
reject limit unlimited
/

You’ll notice here that we’ve added a discard file specification. More pertinently, the directory location for both the discard file and the log file are now specified.
Therefore, our solution needs some tweaking to ensure that it is fit for the many. In fact, while we’re at it, we may as well add the location file in as well….

set serveroutput on size unlimited
declare
    separator constant varchar2(1) := chr(47); -- '/'

    loc_dir_path dba_directories.directory_path%type;
    loc_file user_external_locations.location%type;
    
    function get_file( i_table user_external_tables.table_name%type, i_ftype in varchar2)
        return varchar2
    is
        squote constant varchar2(1) := chr(39); -- " ' "
        colon constant varchar2(1) := chr(58); -- ':'
        
        dir_path dba_directories.directory_path%type;
        access_params dba_external_tables.access_parameters%type;
        
        filedef_start pls_integer := 0;
        filedef_end pls_integer := 0;
        filedef_str clob;
        
        dir_defined boolean;
        
        dir_start pls_integer := 0;
        dir_end pls_integer := 0;
        
        dir_name dba_directories.directory_name%type;
        
        fname_start pls_integer := 0;
        fname_end pls_integer := 0;
        
        fname varchar2(4000);
        
    begin
        select dir.directory_path, lower(regexp_replace(ext.access_parameters, '[[:space:]]'))
        into dir_path, access_params
        from dba_external_tables ext
        inner join dba_directories dir
            on dir.owner = ext.default_directory_owner
            and dir.directory_name = ext.default_directory_name
        where ext.table_name = upper(i_table);
        
        filedef_start := instr(access_params, i_ftype, 1,1); 
        
        if filedef_start = 0 then
            return 'Filename Not Specified';
        end if;
        filedef_end := instr(access_params, squote, filedef_start, 2) + 1;
        filedef_str := substr(access_params, filedef_start, filedef_end - filedef_start);

        dir_defined := instr( filedef_str, colon, 1, 1) > 0;
        if dir_defined then 

            dir_start := length(i_ftype) + 1; 
            dir_end := instr( filedef_str, colon, 1, 1);
            dir_name := substr(filedef_str, dir_start, dir_end - dir_start);
            begin
                select directory_path
                into dir_path
                from dba_directories
                where directory_name = upper(dir_name);
            exception when no_data_found then
                return 'The directory object specified for this file does not exist';
            end;    
        end if;    
        
        fname_start := instr(filedef_str, squote, 1, 1) + 1; 
        fname_end := instr(filedef_str, squote, 1, 2);
        fname := substr( filedef_str, fname_start, fname_end - fname_start);
        return dir_path||separator||fname;
    end get_file;

begin
    -- Get the current file that the XT is pointing to 
    select dir.directory_path, ext.location
        into loc_dir_path, loc_file 
    from dba_external_locations ext
    inner join dba_directories dir
        on dir.owner = ext.directory_owner
        and dir.directory_name = ext.directory_name
        and ext.table_name = 'PLEBISCITES_XT';
        
    dbms_output.put_line('LOCATION '||loc_dir_path||separator||loc_file);    
    dbms_output.put_line('LOGFILE '||get_file('PLEBISCITES_XT', 'logfile'));
    dbms_output.put_line('BADFILE '||get_file('PLEBISCITES_XT','badfile'));
    dbms_output.put_line('DISCARDFILE '||get_file('PLEBISCITES_XT','discardfile'));
    dbms_output.put_line('PREPROCESSOR '||get_file('plebiscites_xt', 'preprocessor'));
end;
/

Run this and we get :

LOCATION /u01/app/oracle/my_files/plebiscites.csv
LOGFILE /u01/app/oracle/my_files/logs/plebiscites.log
BADFILE /u01/app/oracle/my_files/plebiscites.bad
DISCARDFILE /u01/app/oracle/my_files/discards/plebiscites.disc
PREPROCESSOR Filename Not Specified


PL/SQL procedure successfully completed.

Having made such a big thing of preferring the DRY principle to the “Do it in SQL” doctrine, I feel it’s only fair to point out that the new features of the WITH clause in 12c does tend to blur the line between SQL and PL/SQL somewhat…

set lines 130
column ftype format a20
column file_path format a60
with function get_file( i_table in dba_external_tables.table_name%type, i_ftype in varchar2)
    return varchar2
    is
    
        separator constant varchar2(1) := chr(47); -- '/'
        squote constant varchar2(1) := chr(39); -- " ' "
        colon constant varchar2(1) := chr(58); -- ':'
        
        dir_path dba_directories.directory_path%type;
        access_params dba_external_tables.access_parameters%type;
        
        filedef_start pls_integer := 0;
        filedef_end pls_integer := 0;
        filedef_str clob;
        
        dir_defined boolean;
        
        dir_start pls_integer := 0;
        dir_end pls_integer := 0;
        
        dir_name dba_directories.directory_name%type;
        
        fname_start pls_integer := 0;
        fname_end pls_integer := 0;
        
        fname varchar2(4000);
        
    begin
        select dir.directory_path, lower(regexp_replace(ext.access_parameters, '[[:space:]]'))
        into dir_path, access_params
        from dba_external_tables ext
        inner join dba_directories dir
            on dir.owner = ext.default_directory_owner
            and dir.directory_name = ext.default_directory_name
        where ext.table_name = upper(i_table);
        
        filedef_start := instr(access_params, i_ftype, 1,1); 
        
        if filedef_start = 0 then
            return 'Filename Not Specified';
        end if;
        filedef_end := instr(access_params, squote, filedef_start, 2) + 1;
        filedef_str := substr(access_params, filedef_start, filedef_end - filedef_start);

        dir_defined := instr( filedef_str, colon, 1, 1) > 0;
        if dir_defined then 

            dir_start := length(i_ftype) + 1; 
            dir_end := instr( filedef_str, colon, 1, 1);
            dir_name := substr(filedef_str, dir_start, dir_end - dir_start);
            begin
                select directory_path
                into dir_path
                from dba_directories
                where directory_name = upper(dir_name);
            exception when no_data_found then
                return 'The directory object specified for this file does not exist';
            end;    
        end if;    
        
        fname_start := instr(filedef_str, squote, 1, 1) + 1; 
        fname_end := instr(filedef_str, squote, 1, 2);
        fname := substr( filedef_str, fname_start, fname_end - fname_start);
        return dir_path||separator||fname;
    end get_file;

select 'LOCATION ' as ftype, dir.directory_path||sys_context('userenv', 'platform_slash')||ext.location as file_path
from user_external_locations ext
inner join dba_directories dir
    on dir.owner = ext.directory_owner
    and dir.directory_name = ext.directory_name
    and ext.table_name = 'PLEBISCITES_XT'
union select 'LOGFILE', get_file('plebiscites_xt', 'logfile') from dual
union select 'BADFILE', get_file('plebiscites_xt', 'badfile') from dual
union select 'DISCARDFILE', get_file('plebiscites_xt', 'discardfile') from dual
union select 'PREPROCESSOR', get_file('plebiscites_xt', 'preprocessor') from dual
/

Hopefully that’s something to think about in between the Party Election Broadcasts.

Advertisements

2 thoughts on “Dude, Where’s My File ? Finding External Table Files in the midst of (another) General Election

  1. Nice work Mike! DRY principle and when to use PL/SQL vs SQL well worth thinking about. I took your code and tried to make a DRY pure-SQL version:

    with file_params (file_type, nxtpos) as (
        select 'badfile', 8 from dual union all
        select 'logfile', 8 from dual union all
        select 'discardfile', 12 from dual union all
        select 'preprocessor', 13 from dual
    ), file_types (access_parameters, default_dir, file_type, fp_yn, d_beg) as (
        select ext.access_parameters, ext.default_directory_name, fpr.file_type,
               case when instr(lower(ext.access_parameters), fpr.file_type, 1, 1) > 0 then 'Y' else 'N' end,
               instr(lower(ext.access_parameters), fpr.file_type) + fpr.nxtpos
          from file_params fpr
         cross join dba_external_tables ext
         where ext.table_name = 'PLEBISCITES_XT'
           and ext.owner = 'TEST'
    ), file_pos (access_parameters, default_dir, file_type, fp_yn, d_beg, f_beg, f_end) as (
        select access_parameters, default_dir, file_type, fp_yn, d_beg,
               instr(access_parameters, '''', d_beg, 1) + 1,
               instr(access_parameters, '''', d_beg, 2) - 1
          from file_types
    ), file_dirs (file_type, dir, fname) as (
    	select file_type, 
    	       case fp_yn when 'Y' then nvl(to_char(substr(access_parameters, d_beg, f_beg - d_beg - 2)), default_dir) end,
               case fp_yn when 'Y' then to_char(substr(access_parameters, f_beg, f_end - f_beg + 1)) end
    	  from file_pos
         union all
        select 'inputfile', directory_name, location
          from dba_external_locations
         where table_name = 'PLEBISCITES_XT'
    )
    select fdi.file_type, fdi.dir, dir.directory_path, fdi.fname
      from file_dirs fdi
      left join dba_directories dir
        on dir.directory_name = upper(fdi.dir)
     order by 1
    /
    FILE_TYPE       DIR             DIRECTORY_PATH       FNAME
    --------------- --------------- -------------------- --------------------
    badfile         INPUT_DIR       C:\input             plebiscites.bad
    discardfile     discards_dir    C:\input\discards    plebiscites.disc
    inputfile       INPUT_DIR       C:\input             plebiscites.csv
    logfile         logs_dir        C:\input\logs        plebiscites.log
    preprocessor
    

    Strong and Stable and Dry…or Weak and Wobbly and Wet? Perhaps the people should decide? (What could possibly go wrong? 🙂 )

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s