Oracle SQL offers support for POSIX regular expressions by means of a suite of native functions.
Whilst providing the ability for complex search conditions that regular expressions offer, REGEXP_SUBSTR has acquired a reputation for being a fair bit slower when compared to the standard SUBSTR function.
What I’m going to demonstrate here is :
- how SUBSTR seems generally to be faster than REGEXP_SUBSTR
- how the performance of REGEXP_SUBSTR can improve dramatically when used with INSTR
- REGEXP_SUBSTR performs better when it matches the start of a string
To start with though, well discover why you’ll never see a Sith Lord on Sesame Street ( hint : it’s the way they count in a Galaxy Far, Far Away)…
A Test Table
We have a two-column table that looks like this :
create table star_wars_numbers
(
id number generated always as identity,
episode varchar2(4000)
)
/
insert into star_wars_numbers( episode)
values('Four');
insert into star_wars_numbers( episode)
values('Four|Five');
insert into star_wars_numbers( episode)
values('Four|Five|Six');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit|Eight');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit|Eight|Nine');
commit;
Whilst the contents of the EPISODES column may be the result of arcane Sith ritual, Moof-Milker activity in the design process cannot be ruled out.
Either way, the EPISODES column contains up to 11 values, with multiple columns being separated by a ‘|’.
Extracting each value in SQL
Using SUBSTR and INSTR to separate out the values in EPISODES is, as ever, quite a bit of typing…
select
id,
episode,
substr
(
episode,
1,
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end
) as "A New Hope",
case when instr(episode, '|', 1, 1) > 0 then
substr
(
episode,
instr(episode, '|', 1, 1) + 1, -- start
case
when instr(episode, '|', 1, 2) > 0 then instr(episode, '|', 1,2) -1
else length(episode)
end
-
instr(episode, '|', 1, 1)
)
end as "The Empire Strikes Back",
case when instr(episode, '|', 1, 2) > 0 then
substr( episode, instr(episode, '|', 1, 2) + 1,
case when instr(episode, '|', 1,3) > 0 then instr(episode, '|',1,3) - 1 else length(episode) end
-
instr(episode, '|', 1, 2))
end as "Return of the Jedi",
case when instr(episode, '|', 1, 3) > 0 then
substr( episode, instr(episode, '|', 1, 3) + 1,
case when instr(episode, '|', 1,4) > 0 then instr(episode, '|',1,4) - 1 else length(episode) end
-
instr(episode, '|', 1, 3))
end as "Phantom Menace",
case when instr(episode, '|', 1, 4) > 0 then
substr( episode, instr(episode, '|', 1, 4) + 1,
case when instr(episode, '|', 1,5) > 0 then instr(episode, '|',1,5) - 1 else length(episode) end
-
instr(episode, '|', 1, 4))
end as "Attack of the Clones",
case when instr(episode, '|', 1, 5) > 0 then
substr( episode, instr(episode, '|', 1, 5) + 1,
case when instr(episode, '|', 1,6) > 0 then instr(episode, '|',1,6) - 1 else length(episode) end
-
instr(episode, '|', 1, 5))
end as "Revenge of the Sith",
case when instr(episode, '|', 1, 6) > 0 then
substr( episode, instr(episode, '|', 1, 6) + 1,
case when instr(episode, '|', 1,7) > 0 then instr(episode, '|',1,7) - 1 else length(episode) end
-
instr(episode, '|', 1, 6))
end as "The Force Awakens",
case when instr(episode, '|', 1, 7) > 0 then
substr( episode, instr(episode, '|', 1, 7) + 1,
case when instr(episode, '|', 1,8) > 0 then instr(episode, '|',1,8) - 1 else length(episode) end
-
instr(episode, '|', 1, 7))
end as "Rogue One",
case when instr(episode, '|', 1, 8) > 0 then
substr( episode, instr(episode, '|', 1, 8) + 1,
case when instr(episode, '|', 1,9) > 0 then instr(episode, '|',1,9) - 1 else length(episode) end
-
instr(episode, '|', 1, 8))
end as "Solo",
case when instr(episode, '|', 1, 9) > 0 then
substr( episode, instr(episode, '|', 1, 9) + 1,
case when instr(episode, '|', 1, 10) > 0 then instr(episode, '|',1, 10) - 1 else length(episode) end
-
instr(episode, '|', 1, 9))
end as "The Last Jedi",
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from star_wars_numbers
/
…although these days ( version 12c and later), we can save ourselves a bit of effort by using an inline function…
with function extract_episode( i_string in varchar2, i_pos in pls_integer)
return varchar2
is
begin
if i_pos = 1 then
return substr(i_string, 1, case when instr(i_string, '|') > 0 then instr(i_string, '|') - 1 else length(i_string) end);
end if;
return
case when instr(i_string, '|', 1, i_pos -1 ) > 0 then
substr( i_string, instr(i_string, '|', 1, i_pos -1) + 1,
case when instr(i_string, '|', 1, i_pos) > 0 then instr(i_string, '|',1, i_pos) - 1 else length(i_string) end
-
instr( i_string, '|', 1, i_pos - 1))
end;
end;
select
id, episode,
extract_episode( episode, 1) as "A New Hope",
extract_episode( episode, 2) as "Empire Strikes Back",
extract_episode( episode, 3) as "Return of the Jedi",
extract_episode( episode, 4) as "The Phantom Menace",
extract_episode( episode, 5) as "Attack of the Clones",
extract_episode( episode, 6) as "Revenge of the Sith",
extract_episode( episode, 7) as "The Force Awakens",
extract_episode( episode, 8) as "Rogue One",
extract_episode( episode, 9) as "Solo",
extract_episode( episode, 10) as "The Last Jedi",
extract_episode( episode, 11) as "The Rise of Skywalker"
from star_wars_numbers
/
Whether you find the equivalent regexp query more elegant or just horrible is somewhat subjective :
select
regexp_substr(episode, '[^|]+') as "A New Hope",
regexp_substr(episode, '[^|]+',1, 2) as "The Empire Strikes Back",
regexp_substr(episode, '[^|]+',1, 3) as "Return of the Jedi",
regexp_substr(episode, '[^|]+',1, 4) as "The Phantom Menace",
regexp_substr(episode, '[^|]+',1, 5) as "Attack of the Clones",
regexp_substr(episode, '[^|]+',1, 6) as "Revenge of the Sith",
regexp_substr(episode, '[^|]+',1, 7) as "The Force Awakens",
regexp_substr(episode, '[^|]+',1, 8) as "Rogue One",
regexp_substr(episode, '[^|]+',1, 9) as "Solo",
regexp_substr(episode, '[^|]+',1, 10) as "The Last Jedi",
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from star_wars_numbers
/
Irrespective of aesthetic considerations, I should explain the regexp in use here, if only so I can understand it when I read this a few months from now :
[^|] - match any character that isn't pipe ("|") + - match one or more times
Next, we need to find out how the regexp function stacks up when it comes to performance, and we’re not going to find that with a mere 11 rows of data…
Performance Test Setup
Let’s make some use of all that space in the Galaxy Far, Far Away…
create table galaxy (
id number generated always as identity,
episode varchar2(4000))
/
begin
for i in 1..100000 loop
insert into galaxy(episode)
select episode from star_wars_numbers;
commit;
end loop;
end;
/
exec dbms_stats.gather_table_stats(user, 'galaxy');
All of the following tests were run on Oracle 19c Enterprise Edition running on Oracle Linux.
Everything was run in a SQL*Plus session from the command line on the database server.
The queries were run in SQL*Plus with the following settings :
set lines 130 clear screen set autotrace trace set timing on
Each query was executed at least twice consecutively to ensure that results were not skewed by the state of the buffer cache.
It’s also worth noting that, I found no low-level explanation for the performance discrepancies between the two functions when trawling through trace files. Therefore, I’ve concentrated on elapsed time as the main performance metric in these tests.
Test 1 – All Fields extracted in the Select Clause
Let’s start with the SUBSTR function ( referred to as “Standard” henceforth) :
select
id,
substr
(
episode, -- input string
1, -- start position
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end -- number of characters to extract
) as "A New Hope",
case when instr(episode, '|', 1, 1) > 0 then
substr
(
episode,
instr(episode, '|', 1, 1) + 1, -- start
case
when instr(episode, '|', 1, 2) > 0 then instr(episode, '|', 1,2) -1
else length(episode)
end
-
instr(episode, '|', 1, 1)
)
end as "The Empire Strikes Back",
case when instr(episode, '|', 1, 2) > 0 then
substr( episode, instr(episode, '|', 1, 2) + 1,
case when instr(episode, '|', 1,3) > 0 then instr(episode, '|',1,3) - 1 else length(episode) end
-
instr(episode, '|', 1, 2))
end as "Return of the Jedi",
case when instr(episode, '|', 1, 3) > 0 then
substr( episode, instr(episode, '|', 1, 3) + 1,
case when instr(episode, '|', 1,4) > 0 then instr(episode, '|',1,4) - 1 else length(episode) end
-
instr(episode, '|', 1, 3))
end as "Phantom Menace",
case when instr(episode, '|', 1, 4) > 0 then
substr( episode, instr(episode, '|', 1, 4) + 1,
case when instr(episode, '|', 1,5) > 0 then instr(episode, '|',1,5) - 1 else length(episode) end
-
instr(episode, '|', 1, 4))
end as "Attack of the Clones",
case when instr(episode, '|', 1, 5) > 0 then
substr( episode, instr(episode, '|', 1, 5) + 1,
case when instr(episode, '|', 1,6) > 0 then instr(episode, '|',1,6) - 1 else length(episode) end
-
instr(episode, '|', 1, 5))
end as "Revenge of the Sith",
case when instr(episode, '|', 1, 6) > 0 then
substr( episode, instr(episode, '|', 1, 6) + 1,
case when instr(episode, '|', 1,7) > 0 then instr(episode, '|',1,7) - 1 else length(episode) end
-
instr(episode, '|', 1, 6))
end as "The Force Awakens",
case when instr(episode, '|', 1, 7) > 0 then
substr( episode, instr(episode, '|', 1, 7) + 1,
case when instr(episode, '|', 1,8) > 0 then instr(episode, '|',1,8) - 1 else length(episode) end
-
instr(episode, '|', 1, 7))
end as "Rogue One",
case when instr(episode, '|', 1, 8) > 0 then
substr( episode, instr(episode, '|', 1, 8) + 1,
case when instr(episode, '|', 1,9) > 0 then instr(episode, '|',1,9) - 1 else length(episode) end
-
instr(episode, '|', 1, 8))
end as "Solo",
case when instr(episode, '|', 1, 9) > 0 then
substr( episode, instr(episode, '|', 1, 9) + 1,
case when instr(episode, '|', 1, 10) > 0 then instr(episode, '|',1, 10) - 1 else length(episode) end
-
instr(episode, '|', 1, 9))
end as "The Last Jedi",
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from galaxy
/
Runnnig this query, we get :
1100000 rows selected. Elapsed: 00:00:20.32 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857676 bytes sent via SQL*Net to client 811886 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Before we look at the REGEXP_SUBSTR equivalent, it’s probably worth considering the more streamlined in-line function version of this query :
with function extract_episode( i_string in varchar2, i_pos in pls_integer)
return varchar2
is
begin
return
case when instr(i_string, '|', 1, i_pos -1 ) > 0 then
substr( i_string, instr(i_string, '|', 1, i_pos -1) + 1,
case when instr(i_string, '|', 1, i_pos) > 0 then instr(i_string, '|',1, i_pos) - 1 else length(i_string) end
-
instr( i_string, '|', 1, i_pos - 1))
end;
end;
select
id,
substr
(
episode,
1,
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end
) as "A New Hope",
extract_episode( episode, 2) as "Empire Strikes Back",
extract_episode( episode, 3) as "Return of the Jedi",
extract_episode( episode, 4) as "The Phantom Menace",
extract_episode( episode, 5) as "Attack of the Clones",
extract_episode( episode, 6) as "Revenge of the Sith",
extract_episode( episode, 7) as "The Force Awakens",
extract_episode( episode, 8) as "Rogue One",
extract_episode( episode, 9) as "Solo",
extract_episode( episode, 10) as "The Last Jedi",
extract_episode( episode, 11) as "The Rise of Skywalker"
from galaxy
/
Whilst it’s a bit more readable, it’s also a lot slower :
1100000 rows selected. Elapsed: 00:00:41.76 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857676 bytes sent via SQL*Net to client 810042 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
REGEXP_SUBSTR however, takes slow to a new level…
select id,
regexp_substr(episode, '[^|]+') as "A New Hope",
regexp_substr(episode, '[^|]+',1, 2) as "The Empire Strikes Back",
regexp_substr(episode, '[^|]+',1, 3) as "Return of the Jedi",
regexp_substr(episode, '[^|]+',1, 4) as "The Phantom Menace",
regexp_substr(episode, '[^|]+',1, 5) as "Attack of the Clones",
regexp_substr(episode, '[^|]+',1, 6) as "Revenge of the Sith",
regexp_substr(episode, '[^|]+',1, 7) as "The Force Awakens",
regexp_substr(episode, '[^|]+',1, 8) as "Rogue One",
regexp_substr(episode, '[^|]+',1, 9) as "Solo",
regexp_substr(episode, '[^|]+',1, 10) as "The Last Jedi",
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:01:27.25 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857680 bytes sent via SQL*Net to client 809519 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Now see what happens when we give REGEXP_SUBSTR a little help from the humble INSTR (the “hybrid” query) :
select id,
regexp_substr(episode, '[^|]+') as "A New Hope",
decode( instr(episode, '|'), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,1) + 1)) as "The Empire Strikes Back",
decode( instr(episode, '|',1, 2), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,2) + 1)) as "Return of the Jedi",
decode( instr(episode, '|',1, 3), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,3) + 1)) as "The Phantom Menace",
decode( instr(episode, '|',1, 4), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,4) + 1)) as "Attack of the Clones",
decode( instr(episode, '|',1, 5), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,5) + 1)) as "Revenge of the Sith",
decode( instr(episode, '|',1, 6), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,6) + 1)) as "The Force Awakens",
decode( instr(episode, '|',1, 7), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,7) + 1)) as "Rogue One",
decode( instr(episode, '|',1, 8), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,8) + 1)) as "Solo",
decode( instr(episode, '|',1, 9), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,9) + 1)) as "The Last Jedi",
decode( instr(episode, '|',1, 10), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,10) + 1)) as "The Rise of Skywalker"
from galaxy
/
Yes, I have cheated a bit on the aesthetics here by using the more compact DECODE instead of CASE.
However, this does not affect the runtime of the query, which is a bit faster than the pure REGEXP_SUBSTR equivalent :
1100000 rows selected. Elapsed: 00:00:30.83 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857680 bytes sent via SQL*Net to client 810158 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Test 1 Results Summary
Query Method | Time (secs) |
SUBSTR ( Standard) | 20.32 |
REGEXP_SUBSTR with INSTR (Hybrid) | 30.83 |
SUBSTR with in-line function | 41.76 |
REGEXP_SUBSTR ( Regexp) | 87.25 |
The performance of the Hybrid approach does raise the question of how the REGEXP_SUBSTR compares when we’re just looking to extract a single field from a string, rather than all of them…
Test 2 – Extract the first field only
In this instance we’re just looking for the first field in EPISODES.
In this context, the hybrid approach doesn’t apply because we’re always starting our search at the start of the input string.
Starting, once again with the Standard approach :
select id,
substr
(
episode, 1,
case
when instr(episode, '|', 1,1) > 0 then instr(episode, '|', 1,1) -1
else length(episode)
end
) as "A New Hope"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.33 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25086067 bytes sent via SQL*Net to client 808790 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Now for REGEXP_SUBSTR :
select id,
regexp_substr( episode, '[^|]+') as "A New Hope"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:06.38 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25086067 bytes sent via SQL*Net to client 808868 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Interestingly, whilst it’s still not as fast, the Regexp runtime is only 20% longer than the Standard.
This contrasts markedly with the 430% longer that the Regexp took for our first test.
Test 2 Results Summary
Query Method | Time (secs) |
---|---|
Standard | 5.33 |
Regexp | 6.38 |
Does this relative performance hold true for any single field in the input string ?
Test 3 – Extract the last field only
Starting with the Standard query :
select
id,
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.44 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808915 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Once again, the Regexp Query is much slower…
select
id,
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:16.16 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808888 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
…unless we use the hybrid approach…
select
id,
decode( instr(episode, '|',1, 10), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,10) + 1)) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.60 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808736 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Test 3 Results Summary
Query Method | Time (secs) |
---|---|
Standard | 5.44 |
Hybrid | 5.60 |
Regexp | 16.16 |
Conclusions
These tests seem to bear out the fact that, in general, SUBSTR offers better performance than REGEXP_SUBSTR.
However, the performance of REGEXP_SUBSTR can be greatly improved if used in conjunction with INSTR.
As always, the decision as to which function to use will depend on the specific circumstances you find yourself in.