I was inspired to write this post by Joshua Ottwell’s thoughts on finding rows present in one table but not another.
What follows is an exploration of how we can use a Full Outer Join to perform a “diff” on the data in two tables. We’ll also look at doing something similar for two distinct result sets from the same table.
In this instance, we want to identify :
- records that exist in the first table but not the second
- records that exist in the second table but not the first
- records that exist in both tables but where some values differ
Before going any further, I should say that the example that follows will make more sense if you consider Terry Pratchett’s observation that :
“In ancient times cats were worshipped as gods; they have not forgotten this.”
The Family Firm
Let’s say I have a report that I want to change.
I’ve simulated the output of the report and captured it in a table like this :
create table employee_report_baseline(
id number,
emp_name varchar2(100),
job_title varchar2(100) )
/
insert into employee_report_baseline( id, emp_name, job_title)
values(1,'DEBBIE', 'CEO');
insert into employee_report_baseline( id, emp_name, job_title)
values(2, 'MIKE', 'MINION');
insert into employee_report_baseline( id, emp_name, job_title)
values(3, 'TEDDY', 'DOG');
commit;
The output of the new version of the report is simulated like this :
create table employee_report_new
as
select *
from employee_report_baseline
where emp_name != 'MIKE';
update employee_report_new
set job_title = 'LADY-IN-WAITING'
where emp_name = 'DEBBIE';
insert into employee_report_new( id, emp_name, job_title)
values(4, 'CLEO', 'CAT');
commit;
Identifying missing or new records
First of all, let’s see if the records returned are the same for both reports. Happily ID is a unique key for the dataset, which makes the comparison fairly simple :
select bsl.id, nr.id,
case
when bsl.id is null then 'Added in New Report'
when nr.id is null then 'Missing from New Report'
end as status
from employee_report_baseline bsl
full outer join employee_report_new nr
on bsl.id = nr.id
where bsl.id is null
or nr.id is null
order by 1 nulls last;
The absence of the key value from either table indicates that the record is in one table but not the other.
Sure enough, when we run this query we get :
BASELINE_ID NEW_ID STATUS ----------- ------ ------------------------------ 2 Missing from New Report 4 Added in New Report
All Differences
If we want to identify all of the differences, including changed records, we can do so with just a little extra typing :
select
nvl(bsl.id, nr.id) as id,
bsl.emp_name as old_emp_name,
nr.emp_name as new_emp_name,
bsl.job_title as old_job_title,
nr.job_title as new_job_title,
case
when bsl.id is null then 'Added in New Report'
when nr.id is null then 'Missing from New Report'
else 'Different between Report Versions'
end as status
from employee_report_baseline bsl
full outer join employee_report_new nr
on bsl.id = nr.id
where bsl.id is null
or nr.id is null
or nvl(bsl.emp_name, 'X') != nvl(nr.emp_name, 'X')
or nvl(bsl.job_title, 'X') != nvl(nr.job_title, 'X')
order by 1 nulls last
/
As well as the two records that are missing from either result set, we can see that Debbie has been demoted ( or possibly usurped). Cleo is now the top dog(!) :
Comparing table definitions
One thing to bear in mind when using a full outer join is that it will match any row in the tables being joined.
This gets a bit annoying when you want to do something with a subset of data such as comparing the column definitions of two tables in USER_TAB_COLUMNS in Oracle.
To demonstrate :
create table old_firm(
id number,
first_name varchar2(500),
last_name varchar2(500),
job_title varchar2(100),
start_date date,
end_date date)
/
create table new_firm(
id number,
first_name varchar2(250),
last_name varchar2(500),
salary number,
start_date date,
end_date varchar2(20))
/
If we want to compare only the records relating to these two tables then we’ll need a couple of in-line-views to restrict the result sets that our full outer join will look at.
Incidentally, as we’re running this on Oracle 19c, we can throw in an in-line function as well to save a bit of typing…
with function col_diff(
i_old_name in varchar2 default null,
i_new_name in varchar2 default null,
i_old_type in varchar2 default null,
i_new_type in varchar2 default null,
i_old_len in number default null,
i_new_len in number default null)
return varchar2
is
begin
if i_old_name is null then
return 'NEW';
elsif i_new_name is null then
return 'MISSING';
end if;
-- compare the attributes for the columns
if i_old_type != i_new_type then
return 'TYPE';
elsif i_old_len != i_new_len then
return 'LENGTH';
end if;
end;
ofm as (
select column_name, data_type, data_length
from user_tab_columns
where table_name = 'OLD_FIRM'),
nfm as (
select column_name, data_type, data_length
from user_tab_columns
where table_name = 'NEW_FIRM')
select
nvl(ofm.column_name, nfm.column_name) as column_name,
case col_diff( ofm.column_name, nfm.column_name, ofm.data_type, nfm.data_type, ofm.data_length, nfm.data_length)
when 'NEW' then 'New Column'
when 'MISSING' then 'Missing Column in NEW_FIRM'
when 'TYPE' then 'Type Mismatch. OLD_FIRM type is '||ofm.data_type||' NEW_FIRM is '||nfm.data_type
when 'LENGTH' then 'Length Mismatch. OLD_FIRM length is '||ofm.data_length||' NEW_FIRM is '||nfm.data_length
end as status
from ofm
full outer join nfm
on ofm.column_name = nfm.column_name
where (ofm.column_name is null or nfm.column_name is null)
or( ofm.data_type != nfm.data_type or ofm.data_length != nfm.data_length)
/
I’ll have to leave it there. The “boss” is demanding Dreamies to appease her wrath for being referred to as “top dog” earlier.
use the minus command… you made this too hard
LikeLike
Ken,
If you have an example of how to use minus to return all of the changes in a single query, I’d be interested to see it.
As far as I can see, using MINUS in a single query will only give you records that are in the first table but not the second and records that have changed. It won’t give you records that are in the second table but not the first.
For example :
To find any records that are in the baseline but not the new report, I’d have to run a second query, this time with the baseline table in the top half of the MINUS :
Therefore, to achieve a similar effect with minus to what I’ve got with a full outer join, I’d need something like :
Mike
LikeLike
“nvl(bsl.emp_name, ‘X’) != nvl(nr.emp_name, ‘Y’)” holds if both are null -> do you want this ?
Or do you want
“nvl(bsl.emp_name, ‘X’) != nvl(nr.emp_name, ‘X’)” ?
(
which should be equivalent to
“decode(bsl.emp_name, nr.emp_name, 1, 0) = 0”
and also to
“lnnvl(bsl.emp_name, nr.emp_name)”
)
LikeLike
Matthias,
This is what happens when you get the Cat to peer review your code !
You’re right of course, we don’t want return rows where the value is null in both tables.
I’ve amended the code accordingly.
I haven’t come across LNNVL before so thanks for not only spotting the error, but for teaching me something
new as well 🙂
Mike
LikeLiked by 1 person
actually
“… or nvl(bsl.emp_name, ‘X’) != nvl(nr.emp_name, ‘X’) …”
is also false I think (as I noted later on after posting my last comment).
[
take bsl.emp_name = ‘X’ and nr.emp_name is null, that won’t be outputed
]
LikeLike
Mathias,
That is true, but as I’m not Elon Musk so, I’m unlikely to ever name a cat ‘X’ 🙂
Mike
LikeLiked by 1 person