At a time when the debate rages about how much you should trust what AI tells you, it’s probably worth recalling Deep Thought’s wildly incorrect assertion that the answer to the Ultimate Question of Life, the Universe, and Everything is forty-two.
As any Database specialist will know, the answer is the same as it is to the question “How big is my Oracle Table ?” which is, of course, “It depends”.
What it depends on is whether you want to know the volume of data held in the table, or the amount of space the database is using to store the table and any associated segments (e.g. indexes).
Connecting to my trusty Free Tier OCI Oracle Instance ( running 19c Enterprise Edition), I’ve set out on my journey through (disk) space to see if I can find some more definitive answers…
How big is my table in terms of the raw data stored in it ?
Before going any further, I should be clear on the database language settings and character set that’s being used in the examples that follow. Note particularly that I’m not using a multi-byte character set :
select parameter, value from gv$nls_parameters order by parameter / PARAMETER VALUE ------------------------------ ------------------------------ NLS_CALENDAR GREGORIAN NLS_CHARACTERSET AL32UTF8 NLS_COMP BINARY NLS_CURRENCY £ NLS_DATE_FORMAT DD-MON-YYYY NLS_DATE_LANGUAGE ENGLISH NLS_DUAL_CURRENCY € NLS_ISO_CURRENCY UNITED KINGDOM NLS_LANGUAGE ENGLISH NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_NCHAR_CONV_EXCP FALSE NLS_NUMERIC_CHARACTERS ., NLS_SORT BINARY NLS_TERRITORY UNITED KINGDOM NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH24.MI.SSXFF TZR NLS_TIME_FORMAT HH24.MI.SSXFF NLS_TIME_TZ_FORMAT HH24.MI.SSXFF TZR 19 rows selected.
Now, let see if we can work out how much raw data is held in a table.
We’ll start with a very simple example :
create table marvin as
select 1000 + rownum as id
from dual
connect by rownum <= 1024;
Marvin may have a brain the size of a planet but his tabular namesake has more modest space requirements.
It’s 1024 records are all 4 digits long.
Therefore, the size of the table data should be 4096 bytes, right ?
“Hang on”, your thinking, “why not just lookup the size in USER_SEGMENTS and make this a really short post ?”
Well :
select bytes from user_segments where segment_name = 'MARVIN' and segment_type = 'TABLE' / BYTES ---------- 65536
USER_SEGMENTS will give you a size in bytes, but it’s not the same as the amount of raw data.
We’ll come back to this in a bit.
For now though, we can cross-check the size from elsewhere in the data dictionary, provided the stats on the table are up-to-date.
To ensure that this is so, I can run :
exec dbms_stats.gather_table_stats('MIKE', 'MARVIN');
This will ensure that statistics data is populated in the USER_TABLES view. This means that we can estimate the data volume by running the following query :
select num_rows, avg_row_len,
num_rows * avg_row_len as data_in_bytes
from user_tables
where table_name = 'MARVIN'
/
…which returns…
NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES ---------- ----------- ------------- 1024 4 4096
That looks promising.
We can further verify this by running :
select sum(length(id)) as data_in_bytes from marvin / DATA_IN_BYTES ------------- 4096
OK, now let’s see what happens with a slightly more complex data set, and an index as well …
create table hitchikers
(
id number generated always as identity,
character_name varchar2(250),
quote varchar2(4000),
constraint hitchikers_pk primary key (id)
)
/
declare
procedure ins( i_character in varchar2, i_quote in varchar2)
is
begin
insert into hitchikers( character_name, quote)
values(i_character, i_quote);
end;
begin
for i in 1..1024 loop
ins('Deep Thought', 'Forty-Two');
ins('Trillian', q'[we have normality... anything you can't cope with is, therefore, your own problem]');
ins('Ford Prefect', 'Time is an illusion. Lunchtime doubly so.');
ins('Zaphod Beeblebrox', q'[If there's anything more important than my ego around, I want it caught and shot right now!]');
ins(null, 'Anyone who is capable of getting themselves made President should on no account be allowed to do the job');
ins('Marvin', q'[Life! Loathe it or hate it, you can't ignore it.]');
ins('Arthur Dent', 'This must be Thursday. I never could get the hang of Thursdays');
ins('Slartibartfast', q'[I'd rather be happy than right any day]');
end loop;
commit;
end;
/
commit;
Once stats are present on the table, we can check the expected data size as before :
select num_rows, avg_row_length, num_rows * avg_row_length as data_in_bytes from user_tables where table_name = 'HITCHIKERS' / NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES ---------- ----------- ------------- 8192 75 614400
This time, the size in bytes figure we get back is not exact, as we can confirm with :
select sum( length(id) + nvl(length(character_name),0) + nvl(length(quote), 0)) as data_in_bytes from hitchikers / DATA_IN_BYTES ------------- 598957
To verify the actual size in bytes, we can dump the contents of a table into a csv file. In this case, I’m using SQLDeveloper :

The resulting file is a different size again :
ls -l hitchikers.csv -rw-rw-r-- 1 mike mike 656331 May 13 11:50 hitchikers.csv
This can be accounted for by the characters added as part of the csv formatting.
First, the csv file includes a header row :
head -1 hitchikers.csv "ID","CHARACTER_NAME","QUOTE"
Including the line terminator this is 30 bytes :
head -1 hitchikers.csv |wc -c
30
The format in each of the 8192 data rows includes :
- a comma after all but the last attribute on a row
- a line terminator after the last attribute
- double quotes enclosing each of the two VARCHAR attributes.
For example :
grep ^42, hitchikers.csv
42,"Trillian","we have normality... anything you can't cope with is, therefore, your own problem"
That’s a total of 7 extra bytes per data row.
Add all that up and it comes to 57374 bytes which are a consequence of csv formatting.
Subtract that from the file size and we get back to the calculated data size we started with :
656331 - 57374 = 598957
This confirms that the figures in USER_TABLES are approximate and you’ll need to bear this in mind if you’re relying on them to calculate the size of the data in a table.
Whilst were here, let’s see what effect compression might have on our ability to determine the raw data size.
We can do this by creating a table that has the same structure as HITCHIKERS and contains the same data, but which is compressed :
create table magrathea
(
id number,
character_name varchar2(250),
quote varchar2(4000),
constraint magrathea_pk primary key (id)
)
row store compress advanced
/
insert into magrathea( id, character_name, quote)
select id, character_name, quote
from hitchikers
/
commit;
exec dbms_stats.gather_table_stats(user, 'MAGRATHEA');
It turns out that, for the purposes of our raw data calculation, the effect of table compression is…none at all :
select num_rows, avg_row_len, num_rows * avg_row_len from user_tables where table_name = 'MAGRATHEA' / NUM_ROWS AVG_ROW_LEN NUM_ROWS*AVG_ROW_LEN ---------- ----------- -------------------- 8192 75 614400
However, if you look at the number of blocks used to store the table, the effects of compression are more evident :
select table_name, blocks from user_tables where table_name in ('HITCHIKERS', 'MAGRATHEA') order by 2 / TABLE_NAME BLOCKS ------------------------------ ---------- MAGRATHEA 20 HITCHIKERS 95
Incidentally, it’s worth noting that, as with the data size, the number of blocks reported in USER_TABLES are somewhat approximate.
USER_SEGMENTS reports the number of blocks for each table as :
select segment_name, blocks from user_segments where segment_name in ('HITCHIKERS', 'MAGRATHEA') order by 2 / SEGMENT_NAME BLOCKS ------------------------------ ---------- MAGRATHEA 24 HITCHIKERS 104
So it looks like compression will affect the amount of database space required to store an object but not the size of the actual data. This brings us nicely on to…
How big is my table in terms of the amount of space it’s taking up in the database ?
Let’s go back to MARVIN. Remember, this table contains 4K of raw data, but USER_SEGMENTS claims that it’s quite a bit larger :
select bytes from user_segments where segment_name = 'MARVIN' and segment_type = 'TABLE' / BYTES ---------- 65536
To understand how Oracle has come up with this figure, you need to consider that :
- the smallest unit of space that Oracle addresses is measured in blocks
- the size of these blocks is defined at tablespace level.
- any object that uses space is allocated that space in units of an extent – which is a number of contiguous blocks.
If we take a look at MARVIN, we can see that the table resides in the DATA tablespace and has been allocated a single extent of 8 blocks :
select tablespace_name, bytes, blocks, extents from user_segments where segment_name = 'MARVIN'; TABLESPACE_NAME BYTES BLOCKS EXTENTS ------------------------------ ---------- ---------- ---------- DATA 65536 8 1
The block size is defined at the tablespace level and is held in USER_TABLESPACES in bytes :
select block_size from user_tablespaces where tablespace_name = 'DATA'; BLOCK_SIZE ---------- 8192
If we now multiply the number of blocks in the table by the size of those blocks, we get back to the size that USER_SEGMENTS is reporting :
select seg.blocks * tsp.block_size from user_segments seg inner join user_tablespaces tsp on seg.tablespace_name = tsp.tablespace_name where seg.segment_name = 'MARVIN'; SEG.BLOCKS*TSP.BLOCK_SIZE ------------------------- 65536
MARVIN is a table with no ancillary segments, such as indexes.
To find the total space being used for the HITCHIKERS table, we’ll also need to consider the space being taken up by it’s index, HITCHIKERS_PK :
select seg.segment_name, seg.segment_type, seg.blocks, ts.block_size, seg.bytes from user_segments seg inner join user_tablespaces ts on ts.tablespace_name = seg.tablespace_name where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK') / SEGMENT_NAME SEGMENT_TYPE BLOCKS BLOCK_SIZE BYTES -------------------- --------------- ---------- ---------- ---------- HITCHIKERS TABLE 104 8192 851968 HITCHIKERS_PK INDEX 24 8192 196608
…in other words…
select sum(seg.bytes) from user_segments seg where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK') / SUM(SEG.BYTES) -------------- 1048576
On the subject of ancillary segments, what about LOBS ?
create table the_guide(
id number generated always as identity,
message clob);
declare
v_msg clob;
begin
for i in 1..1000 loop
v_msg := v_msg||q'[Don't Panic!]';
end loop;
insert into the_guide(message) values( v_msg);
end;
/
commit;
Unlike other segment types, LOBSEGMENT and LOBINDEX segments do not have their parent tables listed as the SEGMENT_NAME in USER_SEGMENTS.
Therefore, we need to look in USER_LOBS to identify it’s parent table for a LOBSEGMENT and USER_INDEXES for a LOBINDEX :
select segment_name, segment_type, bytes, blocks from user_segments where( segment_name = 'THE_GUIDE' or segment_name in ( select segment_name from user_lobs where table_name = 'THE_GUIDE' ) or segment_name in ( select index_name from user_indexes where table_name = 'THE_GUIDE' ) ) / SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS ------------------------------ --------------- ---------- ---------- THE_GUIDE TABLE 65536 8 SYS_IL0000145509C00002$$ LOBINDEX 65536 8 SYS_LOB0000145509C00002$$ LOBSEGMENT 1245184 152
In this instance, although the table segment itself is only taking up 65536 bytes, when you add in all of the supporting objects, the total space requirement increases to 1376256 bytes.
If you’ve managed to make this far then meet me at the Big Bang Burger Bar for a Pan Galactic Gargle Blaster. I need a drink after that.
Great post! Interestingly, following along on a free tier 19c ADW, I noticed both the table I explicitly compressed and the one I didn’t were the same number of blocks!
Looking at the generated DDL, I saw that the table I didn’t explicitly compress on creation still showed as “column store compress” (as opposed to the “row store compress” on the table where that was specified on table creation).
So I’m guessing that column store compress is the default for data warehouses, but not for ATP.
LikeLike
Phil,
Thanks. That’s good to know.
As you correctly surmise, I was using an ATP instance here.
I checked the generated DDL for the HITCHIKERS table I created in the post and it’s showing “NOCOMPRESS”.
Mike
LikeLiked by 1 person