Putting VALIDATE_CONVERSION front and centre, or possibly center

I recently had an interesting conversation with Suzanne, a native of Pittsburgh PA, about how the Americans and British spell some words differently.
Adding some local colour ( color) to the discussion, Suzanne recounted the story of when the US Postal service, seeking to save money on printing, proposed removing “superfluous” characters from place names.
Well, the burghers of Pittsburgh weren’t having any of that thank-you-very-much and so the City’s name remains unabridged to this day. The denizens of Harrisburg weren’t so fortunate.
Whilst we may be separated by a common language, as programmers who work with Oracle, Suzanne and I do share the challenge of loading data into tables when the format of that data may not be entirely what we were expecting.
If you’re fortunate enough to inhabit the sunlit uplands of Oracle 12c R2 and beyond, we’re about to explore the shiny new VALIDATE_CONVERSION function which promises to make your life that little bit easier.
For those languishing on 11g, we’ll explore how we might implement a similar function in PL/SQL.

The Data to be converted

Let’s pretend we’ve loaded some data from a file. By definition the actual data as it exists in the file is a collection of character strings. We’ve taken a permissive approach to the load (i.e. load everything if at all possible). The target table for our load is :

create table historic_events_stg(
    id varchar2(4000),
    event_desc varchar2(4000),
    event_ts varchar2(4000))
/    

A DML representation of the data load would look like this :

insert into historic_events_stg(id, event_desc, event_ts)
    select 'ONE', q'[Webster's Dictionary first published]', 'April 14 1828' from dual union all
    select '2', q'[Wright Brother's first flight]', 'DEC-17-1903' from dual union all
    select '3', 'NHS established in the UK', '5 July 1948' from dual union all
    select '4', 'First Manned Moon Landing',  'July 20 1969 20:17:40' from dual union all
    select '5', 'Tim Berners-Lee proposes the World Wide Web',  '19901112' from dual union all
    select '6', q'[JK Rowling's first Harry Potter book published]', '30-JUN-1997' from dual union all
    select '7', 'iPhone released in the USA', '06/29/2007' from dual;
commit;    

Now, we’d like to load the data from the staging table into our application table which looks like this :

create table historic_events (
    id number,
    event_desc varchar2(4000),
    event_ts timestamp with time zone)
/    

In 11g and previously this might prove a bit tricky without the use of something like Log Errors :

insert into historic_events select * from historic_events_stg;

ERROR at line 1:
ORA-01722: invalid number

Sometimes, it would be nice if we could just see which values are going to cause problems before we even attempt to load the data…

The VALIDATE_CONVERSION function

Let’s see what might be causing our error…

select id, 
    validate_conversion(id as number) as isnum
from historic_events_stg;

Yes, it’s that simple, just ask validate_conversion what happens when you try to convert all of the IDs in the table to numbers :

ID  ISNUM
--- -----
ONE     0
2       1
3       1
4       1
5       1
6       1
7       1

If the conversion is going to succeed, the function returns a 1. Otherwise, it returns a zero.
It works for DATES too, although some extra effort may be needed.

In my current session, the NLS_DATE_FORMAT is :

select sys_context('userenv', 'nls_date_format') from dual;

SYS_CONTEXT('USERENV','NLS_DATE_FORMAT')
----------------------------------------
DD-MON-YYYY

So, when I check my date strings in the staging table, most of them fail to pass muster…

select event_ts, 
    validate_conversion(event_ts as date) as isdate
from historic_events_stg; 

EVENT_TS                  ISDATE
------------------------- ------
April 14 1828                  0
DEC-17-1903                    0
5 July 1948                    1
July 20 1969 20:17:40 UTC      0
19901112                       0
30-JUN-1997                    1
06/29/2007                     0

We can specify the date format we’re validating to so we get a slightly different result with :

select event_ts, 
    validate_conversion(event_ts as date, 'MON-DD-YYYY') as isdate
from historic_events_stg;

EVENT_TS                       ISDAT
------------------------------ -----
April 14 1828                      1
DEC-17-1903                        1
5th July 1948                      0
July 20th 1969 20:17:40 UTC        0
19901112                           0
30-JUN-1997                        0
06/29/2007                         0

Unfortunately, it looks like we’re going to have to go through the record set and re-format most of the dates.
OK, it’s more likely that you’d go back to whoever is giving you the feed and ask them to just pick one date format.

The other possibility is to use the function to select the correct format to use for each conversion…

select event_ts,
    case 
        when validate_conversion( event_ts as date, 'MON-DD-YYYY') = 1 
            then to_date(event_ts, 'MON-DD-YYYY')
        when validate_conversion( event_ts as date, 'DD Month YYYY') = 1 
            then to_date( event_ts, 'DD Month YYYY')
        when validate_conversion( event_ts as date, 'DD-MON-YYYY') = 1 
            then to_date( event_ts, 'DD-MON-YYYY')
        when validate_conversion( event_ts as date, 'MM/DD/YYYY') = 1 
            then to_date(event_ts, 'MM/DD/YYYY')
        when validate_conversion( event_ts as date, 'YYYYMMDD') = 1 
            then to_date(event_ts, 'YYYYMMDD')
        when validate_conversion(event_ts as timestamp with time zone, 'MON DD YYYY HH24:MI:SS TZR') = 1 
            then to_timestamp_tz(event_ts, 'MON DD YYYY HH24:MI:SS TZR')
    end as converted_event_ts
from historic_events_stg; 

EVENT_TS                       CONVERTED_EVENT_TS                                
------------------------------ --------------------------------------------------
April 14 1828                  14-APR-28 00.01.00.000000000 EUROPE/LONDON        
DEC-17-1903                    17-DEC-03 00.00.00.000000000 EUROPE/LONDON        
5 July 1948                    05-JUL-48 00.00.00.000000000 EUROPE/LONDON        
July 20 1969 20:17:40 UTC      20-JUL-69 20.17.40.000000000 UTC                  
19901112                       12-NOV-90 00.00.00.000000000 EUROPE/LONDON        
30-JUN-1997                    30-JUN-97 00.00.00.000000000 EUROPE/LONDON        
06/29/2007                     29-JUN-07 00.00.00.000000000 EUROPE/LONDON        

7 rows selected. 

To be honest, I think I’d prefer the first option if I had a choice.
If you’re not on 12c yet, all of this is somewhat academic. If you want to take advantage of similar functionality, you’re going to have to roll-your-own…

The sort-of Validate Conversion function

Using the documentation for the VALIDATE_CONVERSION function as a rough guide, we can come up with something reasonably serviceable in PL/SQL :

create or replace function is_valid_conversion(
    i_expr in varchar2,
    i_target_type in varchar2,
    i_format in varchar2 default null)
    return pls_integer deterministic
is

--
-- Mimic the VALIDATE_CONVERSION function that's available in 12c and above.
-- NOTE - setting of NLS params specifically excluded here.
-- This function simply works against the base data types i.e. :
-- NUMBER
-- DATE
-- TIMESTAMP
-- TIMESTAMP WITH TIME ZONE
-- This should cover most of the common use-cases for this function.

    dummy_date date;
    dummy_num number;
    dummy_ts timestamp;
    dummy_tstz timestamp with time zone;
    
    l_type varchar2(30);
    l_format varchar2(50);
    
    e_missing_type exception;
    e_unsupported_type exception;
begin
    -- Sanity check the input parameters
    if i_target_type is null then
        raise e_missing_type;
    elsif upper( i_target_type) not in ('NUMBER', 'DATE', 'TIMESTAMP', 'TIMESTAMP WITH TIME ZONE')
    then
        raise e_unsupported_type;
    end if;
    
    if i_expr is null then
        -- will convert to any of the data types we're dealing with here
        return 1;
    end if;
    l_type := upper( i_target_type);
    -- Now test the conversion
    if l_type = 'NUMBER' then
        if i_format is not null then
            dummy_num := to_number( i_expr, i_format);
        else
            dummy_num := to_number(i_expr);
        end if;
    elsif l_type = 'DATE' then
        l_format := coalesce( i_format, sys_context('userenv', 'nls_date_format'));
        dummy_date := to_date( i_expr, l_format);
    elsif l_type = 'TIMESTAMP' then
        l_format := coalesce( i_format, sys_context('userenv', 'nls_timestamp_format'));
        dummy_ts := to_timestamp( i_expr, l_format);
    elsif l_type = 'TIMESTAMP WITH TIME ZONE' then
        select coalesce( i_format, value)
        into l_format
        from v$nls_parameters
        where parameter = 'NLS_TIMESTAMP_TZ_FORMAT';
        
        dummy_tstz := to_timestamp_tz( i_expr, i_format);
    end if;
    
    return 1;
exception 
    when e_missing_type then 
        raise_application_error(-20000, 'A type to convert to must be specified');
    when e_unsupported_type then
        raise_application_error(-20001, q'[Target type is not supported by this function. You haven't written that bit yet !]');
    when others then
        -- conversion has failed
        return 0;
end is_valid_conversion;
/

Sure enough, when you run this, the results are the same :

select id, 
    is_valid_conversion( id, 'NUMBER')
from historic_events_stg;

ID  ISNUM
--- -----
ONE     0
2       1
3       1
4       1
5       1
6       1
7       1
select event_ts, 
    is_valid_conversion(event_ts, 'DATE') as isdate
from historic_events_stg; 

EVENT_TS                  ISDATE
------------------------- ------
April 14 1828                  0
DEC-17-1903                    0
5 July 1948                    1
July 20 1969 20:17:40 UTC      0
19901112                       0
30-JUN-1997                    1
06/29/2007                     0
select event_ts, 
    is_valid_conversion(event_ts, 'DATE', 'MON-DD-YYYY') as isdate
from historic_events_stg; 

EVENT_TS                           ISDATE
------------------------------ ----------
April 14 1828                           1
DEC-17-1903                             1
5 July 1948                             0
July 20 1969 20:17:40 UTC               0
19901112                                0
30-JUN-1997                             0
06/29/2007                              0
 
select event_ts,
    is_valid_conversion(event_ts, 'TIMESTAMP WITH TIME ZONE')
from historic_events_stg
where id = '4';

EVENT_TS                          IS_TSTZ
------------------------------ ----------
July 20 1969 20:17:40 UTC               1

Of course, this is no substitute for a built-in SQL function in terms of both functionality and performance. However, it may provide the basis of something useful if you’re not in a position to play with all those 12c goodies just yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.