Using a full outer join to “diff” two tables

I was inspired to write this post by Joshua Ottwell’s thoughts on finding rows present in one table but not another.

What follows is an exploration of how we can use a Full Outer Join to perform a “diff” on the data in two tables. We’ll also look at doing something similar for two distinct result sets from the same table.

In this instance, we want to identify :

  • records that exist in the first table but not the second
  • records that exist in the second table but not the first
  • records that exist in both tables but where some values differ

Before going any further, I should say that the example that follows will make more sense if you consider Terry Pratchett’s observation that :

“In ancient times cats were worshipped as gods; they have not forgotten this.”

You’ll need an offering if you want to use this computer !

The Family Firm

Let’s say I have a report that I want to change.
I’ve simulated the output of the report and captured it in a table like this :

create table employee_report_baseline( 
    id number,
    emp_name varchar2(100),
    job_title varchar2(100) )
/

insert into employee_report_baseline( id, emp_name, job_title)
values(1,'DEBBIE', 'CEO');

insert into employee_report_baseline( id, emp_name, job_title)
values(2, 'MIKE', 'MINION');

insert into employee_report_baseline( id, emp_name, job_title)
values(3, 'TEDDY', 'DOG');

commit;

The output of the new version of the report is simulated like this :

create table employee_report_new 
    as 
    select * 
    from employee_report_baseline 
    where emp_name != 'MIKE';

update employee_report_new
set job_title = 'LADY-IN-WAITING'
where emp_name = 'DEBBIE';

insert into employee_report_new( id, emp_name, job_title)
values(4, 'CLEO', 'CAT');

commit;

Identifying missing or new records

First of all, let’s see if the records returned are the same for both reports. Happily ID is a unique key for the dataset, which makes the comparison fairly simple :

select bsl.id, nr.id, 
    case 
        when bsl.id is null then 'Added in New Report' 
        when nr.id is null then 'Missing from New Report'
    end as status
from employee_report_baseline bsl
full outer join employee_report_new nr
    on bsl.id = nr.id
where bsl.id is null 
    or nr.id is null
order by 1 nulls last;    

The absence of the key value from either table indicates that the record is in one table but not the other.

Sure enough, when we run this query we get :

BASELINE_ID NEW_ID STATUS                        
----------- ------ ------------------------------
          2        Missing from New Report       
                 4 Added in New Report           

All Differences

If we want to identify all of the differences, including changed records, we can do so with just a little extra typing :

select 
    nvl(bsl.id, nr.id) as id, 
    bsl.emp_name as old_emp_name, 
    nr.emp_name as new_emp_name,
    bsl.job_title as old_job_title,
    nr.job_title as new_job_title,
    case 
        when bsl.id is null then 'Added in New Report'
        when nr.id is null then 'Missing from New Report'
        else 'Different between Report Versions'
    end as status    
from employee_report_baseline bsl
full outer join employee_report_new nr
    on bsl.id = nr.id
where bsl.id is null
    or nr.id is null
    or nvl(bsl.emp_name, 'X') != nvl(nr.emp_name, 'X')
    or nvl(bsl.job_title, 'X') != nvl(nr.job_title, 'X')
order by 1 nulls last
/

As well as the two records that are missing from either result set, we can see that Debbie has been demoted ( or possibly usurped). Cleo is now the top dog(!) :

Comparing table definitions

One thing to bear in mind when using a full outer join is that it will match any row in the tables being joined.
This gets a bit annoying when you want to do something with a subset of data such as comparing the column definitions of two tables in USER_TAB_COLUMNS in Oracle.

To demonstrate :

create table old_firm(
    id number,
    first_name varchar2(500),
    last_name varchar2(500),
    job_title varchar2(100),
    start_date date,
    end_date date)
/

create table new_firm(
    id number,
    first_name varchar2(250),
    last_name varchar2(500),
    salary number,
    start_date date,
    end_date varchar2(20))
/

If we want to compare only the records relating to these two tables then we’ll need a couple of in-line-views to restrict the result sets that our full outer join will look at.
Incidentally, as we’re running this on Oracle 19c, we can throw in an in-line function as well to save a bit of typing…

with function col_diff( 
    i_old_name in varchar2 default null, 
    i_new_name in varchar2 default null,
    i_old_type in varchar2 default null,
    i_new_type in varchar2 default null,
    i_old_len in number default null,
    i_new_len in number default null)
    return varchar2
    is
    begin
        if i_old_name is null then
            return 'NEW';
        elsif i_new_name is null then    
            return 'MISSING';
        end if;    
        -- compare the attributes for the columns
        if i_old_type != i_new_type then
            return 'TYPE';
        elsif i_old_len != i_new_len then
            return 'LENGTH';
        end if;
    end;    
ofm as ( 
    select column_name, data_type, data_length
    from user_tab_columns
    where table_name = 'OLD_FIRM'),
nfm as (    
    select column_name, data_type, data_length
    from user_tab_columns
    where table_name = 'NEW_FIRM')
select
    nvl(ofm.column_name, nfm.column_name) as column_name,
    case col_diff( ofm.column_name, nfm.column_name, ofm.data_type, nfm.data_type, ofm.data_length, nfm.data_length)
        when 'NEW' then 'New Column'
        when 'MISSING' then 'Missing Column in NEW_FIRM'
        when 'TYPE' then 'Type Mismatch. OLD_FIRM type is '||ofm.data_type||' NEW_FIRM is '||nfm.data_type
        when 'LENGTH' then 'Length Mismatch. OLD_FIRM length is '||ofm.data_length||' NEW_FIRM is '||nfm.data_length
    end as status    
from ofm
full outer join nfm
    on ofm.column_name = nfm.column_name
where (ofm.column_name is null or nfm.column_name is null)
or( ofm.data_type != nfm.data_type or ofm.data_length != nfm.data_length)
/

I’ll have to leave it there. The “boss” is demanding Dreamies to appease her wrath for being referred to as “top dog” earlier.

Author: mikesmithers

Back in 1993, I discovered that I could get paid money for doing fun stuff with computers. Over the years, I've specialised in Oracle Databases as a developer, a DBA and sometimes, an architect. It's my evil alter-ego - The Antikyte - who writes a blog about my various technical adventures. Yes, that is his Death Star parked in the Disabled Bay. I currently live in the South-West of England with Deb, my long-suffering wife.

6 thoughts on “Using a full outer join to “diff” two tables”

    1. Ken,

      If you have an example of how to use minus to return all of the changes in a single query, I’d be interested to see it.

      As far as I can see, using MINUS in a single query will only give you records that are in the first table but not the second and records that have changed. It won’t give you records that are in the second table but not the first.
      For example :

      select id, emp_name, job_title
      from employee_report_baseline
      minus
      select id, emp_name, job_title
      from employee_report_new
      /
      Gives me the row that's changed and the row that's been removed but not the row that's been added :
      
      ID EMP_NAME             JOB_TITLE           
      -- -------------------- --------------------
       1 DEBBIE               CEO                 
       2 MIKE                 MINION              
      

      To find any records that are in the baseline but not the new report, I’d have to run a second query, this time with the baseline table in the top half of the MINUS :

      select id, emp_name, job_title
      from employee_report_new
      minus
      select id, emp_name, job_title
      from employee_report_baseline
      /
      
      ID EMP_NAME             JOB_TITLE           
      -- -------------------- --------------------
       1 DEBBIE               LADY-IN-WAITING     
       4 CLEO                 CAT                 
      

      Therefore, to achieve a similar effect with minus to what I’ve got with a full outer join, I’d need something like :

      with 
          bldiff as (
              select id, emp_name, job_title
              from employee_report_baseline
              minus
              select id, emp_name, job_title
              from employee_report_new),
          newdiff as (
              select id, emp_name, job_title
              from employee_report_new        
              minus
              select id, emp_name, job_title
              from employee_report_baseline)
      select id, emp_name, job_title
      from bldiff
      union
      select id, emp_name, job_title
      from newdiff
      order by id;
      
      
      ID EMP_NAME             JOB_TITLE           
      -- -------------------- --------------------
       1 DEBBIE               CEO                 
       1 DEBBIE               LADY-IN-WAITING     
       2 MIKE                 MINION              
       4 CLEO                 CAT                 
      

      Mike

      Like

  1. “nvl(bsl.emp_name, ‘X’) != nvl(nr.emp_name, ‘Y’)” holds if both are null -> do you want this ?
    Or do you want
    “nvl(bsl.emp_name, ‘X’) != nvl(nr.emp_name, ‘X’)” ?
    (
    which should be equivalent to
    “decode(bsl.emp_name, nr.emp_name, 1, 0) = 0”
    and also to
    “lnnvl(bsl.emp_name, nr.emp_name)”
    )

    Like

    1. Matthias,

      This is what happens when you get the Cat to peer review your code !
      You’re right of course, we don’t want return rows where the value is null in both tables.
      I’ve amended the code accordingly.

      I haven’t come across LNNVL before so thanks for not only spotting the error, but for teaching me something
      new as well 🙂

      Mike

      Liked by 1 person

  2. actually
    “… or nvl(bsl.emp_name, ‘X’) != nvl(nr.emp_name, ‘X’) …”
    is also false I think (as I noted later on after posting my last comment).
    [
    take bsl.emp_name = ‘X’ and nr.emp_name is null, that won’t be outputed
    ]

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.