This page provides information about the tables in the database.
Name | Type |
---|---|
schema_id | int8 |
schema | varchar |
table_id | int8 |
table | varchar |
owner_user_id | int4 |
owner | varchar |
creation | timestamp |
age | interval |
idle | interval |
types:sort | varchar |
types:dist | varchar |
counts:columns | int2 |
counts:constraints | int8 |
counts:sortkeys | int2 |
counts:slices | int2 |
store:blocks:sorted | int8 |
store:blocks:unsorted | int8 |
store:blocks:total | int8 |
store:rows:sorted | int8 |
store:rows:unsorted | int8 |
store:rows:total | int8 |
store::sorting %% | float4 |
i/o:bytes read:disk | int8 |
i/o:bytes read:network | int8 |
i/o:rows:inserted | int8 |
i/o:rows:deleted | int8 |
i/o:bytes processed:in memory | int8 |
i/o:bytes processed:on disk | int8 |
related pages | varchar |
The schema ID. This column is emitted in CSV exports only.
The schema name.
The table ID. This column is emitted in CSV exports only.
Surprisingly, table IDs turn out to be unique across all databases.
The table name.
The owner user ID. This column is emitted in CSV exports only.
The owner user name.
The creation timestamp of the table.
The age of the table (i.e. now() minus the creation timestamp).
Duration since the start of most recent table access in
stl_delete
, stl_insert
and
stl_scan
, which is to say, the time since the table last
had rows deleted, inserted, or was scanned. If empty, there are no
records in any of these system tables within the time period being
examined by the page.
Value | Sorting Type |
---|---|
compound | Compound |
interleaved | Interleaved |
unsorted | Unsorted |
Note that it is not possible for me to write a view to tell if a
table has auto
sorting. This is because the system table
which contains this information is owned by rdsdb
, the root
used controlled by AWS, and so is not accessible.
The only way to I can find out is to use svv_table_info
,
which is a view provided by AWS, which provides information about
tables.
However, firstly, the view is flawed. It only displays information about tables which have at least one row; empty tables are not listed.
Secondly, the view is huge. It’s a lot of SQL and work, and indeed is repeating many of the table reads I am performing in my own view. It also looks complex and I think then probably buggy.
As such, I’ve opted not to use it, and so this page does not tell you
if auto
sorting is active.
As it is, I would advise you never to use AUTO
, anyway.
You cannot have a knowingly correctly designed system when the system
contains undocumented black boxes provided by third parties, and where
those black boxes can and do silently change.
Finally, note that auto
is the default setting when
creating a table - if you do not specify sorting, that’s what you get -
and so to get an unsorted table you now need, after making the table, to
then issue an ALTER TABLE
command, to deactivate
AUTO
.
Value | Sorting Type |
---|---|
compound | Compound |
interleaved | Interleaved |
unsorted | Unsorted |
Note that it is not possible for me to write a view to tell if a
table has auto
sorting. This is because the system table
which contains this information is owned by rdsdb
, the root
used controlled by AWS, and so is not accessible.
The only way to I can find out is to use svv_table_info
,
which is a view provided by AWS, which provides information about
tables.
However, firstly, the view is flawed. It only displays information about tables which have at least one row; empty tables are not listed.
Secondly, the view is huge. It’s a lot of SQL and work, and indeed is repeating many of the table reads I am performing in my own view. It also looks complex and I think then probably buggy.
As such, I’ve opted not to use it, and so this page does not tell you
if auto
sorting is active.
As it is, I would advise you never to use AUTO
, anyway.
You cannot have a knowingly correctly designed system when the system
contains undocumented black boxes provided by third parties, and where
those black boxes can and do silently change.
Finally, note that auto
is the default setting when
creating a table - if you do not specify sorting, that’s what you get -
and so to get an unsorted table you now need, after making the table, to
then issue an ALTER TABLE
command, to deactivate
AUTO
.
The number of columns. Tables have system columns (there are three), and these are not included in the count.
The number of constraints; NOT NULL
, unique, primary
key, foreign keys.
The number of sortkeys.
The number of slices which hold blocks from this table.
If the number of slices holding blocks is less than the number of
slices in the cluster, then obviously, not all slices are participating
in queries which read this table. This is improper, and you should look
to fix it. However, for tables with less records than there are slices
in the cluster, they will naturally use less slices than are present in
the cluster. Finally, note that tables with all
distribution have a full copy of themselves existing on a single slice
on each node. Their rows, when they are read, are broadcast to all
slices. Remember only use all
for small tables; if you use
it with large tables, you end up producing a system which does not
scale, because only one slice per node can write to that that table.
The number of sorted blocks.
The number of unsorted blocks.
The total number of blocks.
The number of sorted rows.
The number of unsorted rows.
The total number of rows.
Sorting is not a per-table concept, but in fact a per-table, per-slice concept - in fact, it is even actually per-table, per-column, per-slice, but there isn’t much gain by adding in the per-column level, as will become clear.
What I mean by this is that sorting in the system tables and the docs is presented as a single number for a table.
This is wrong, and is misleading.
The key issue is that VACUUM
does not operate on an
entire table at once, but in fact on a subset of slices.
If you watch a table, in real-time, while VACUUM
runs,
you see one group of slices goes through the process of being sorted -
this is quite an involved process, and the slices being vacuumed go
through various contortions with how many sorted and unsorted blocks
they possess (both go up and down), until eventually it all sorts itself
out and the slice is sorted.
This takes quite a while, and the user can interrupt the process at any time.
This leaves the slice in an partially vacuumed state - and, critically, it means then that slices can vary in how vacuumed they are.
A table is as fast as its least sorted slice.
It is then that what we care about is the least sorted slice.
This column gives that percentage, which it is computed by on a per-slice basis dividing the number of sorted blocks on the slice by the total number of blocks on the slice.
This leads us to a the per-column nature of sorting.
It can be that row distribution between slices is not approximately equal. One slice may have significantly more rows than other slices - we can imagine, for ease of reasoning, an fairly extreme scenario where all slices have say three sorted blocks, except for one slice, which has ten sorted blocks.
If we then update one row, the three-block slices become four block slices, with three sorted blocks and one unsorted block. They are 3/4 = 75% sorted.
The ten-block slice becomes an eleven block slice, with one unsorted block, which is therefore 10/11 = 90.9% sorted.
This however is I think normally not much of a consideration - usually tables are well balanced between slices.
A final note, the automatic VACUUM
, which Redshift has
running in the background, is described as backing off, and stopping
working, when the user load on the cluster is too high, or the user
issues a VACUUM
. Obviously, if this is true, then this is a
source of partially vacuumed slices.
(As it is, auto-vacuum seems to be negligible in practise. It runs
too infrequently, and does too little work, to have a detectable impact
upon clusters. Also, in my testing, I only ever saw it run delete-only
VACUUM
.)
This column then shows the total number of bytes read from disk, as
best I can judge the types indicated in stl_scan
.
This column then shows the total number of bytes read from network,
as best I can judge the types indicated in stl_scan
.
Importantly, it only counts the receive side of network
activity - the step is scan
, after all, not
broadcast
, so we’re not counting bytes twice.
The number of rows inserted into the table.
For tables with all
distribution, this is the physical
number of rows (i.e. one per node), not the logical number of rows.
The number of rows deleted from the table.
For tables with all
distribution, this is the physical
number of rows (i.e. one per node), not the logical number of rows.
This column then shows the total number of bytes processed by the
stl_aggr
, stl_hash
, stl_save
,
stl_sort
and stl_unique
steps, when running in
memory rather than on disk.
This column then shows the total number of bytes processed by the
stl_aggr
, stl_hash
, stl_save
,
stl_sort
and stl_unique
steps, when running on
disk rather than in memory.