Tables v1

Description

This page provides information about the tables in the database.

Columns

Name	Type
schema_id	int8
schema	varchar
table_id	int8
table	varchar
owner_user_id	int4
owner	varchar
creation	timestamp
age	interval
idle	interval
types:sort	varchar
types:dist	varchar
counts:columns	int2
counts:constraints	int8
counts:sortkeys	int2
counts:slices	int2
store:blocks:sorted	int8
store:blocks:unsorted	int8
store:blocks:total	int8
store:rows:sorted	int8
store:rows:unsorted	int8
store:rows:total	int8
store::sorting %%	float4
i/o:bytes read:disk	int8
i/o:bytes read:network	int8
i/o:rows:inserted	int8
i/o:rows:deleted	int8
i/o:bytes processed:in memory	int8
i/o:bytes processed:on disk	int8
related pages	varchar

Column Descriptions

schema_id

The schema ID. This column is emitted in CSV exports only.

schema

The schema name.

table_id

The table ID. This column is emitted in CSV exports only.

Surprisingly, table IDs turn out to be unique across all databases.

table

The table name.

owner_user_id

The owner user ID. This column is emitted in CSV exports only.

owner

The owner user name.

creation

The creation timestamp of the table.

age

The age of the table (i.e. now() minus the creation timestamp).

idle

Duration since the start of most recent table access in stl_delete, stl_insert and stl_scan, which is to say, the time since the table last had rows deleted, inserted, or was scanned. If empty, there are no records in any of these system tables within the time period being examined by the page.

types:sort

Value	Sorting Type
compound	Compound
interleaved	Interleaved
unsorted	Unsorted

Note that it is not possible for me to write a view to tell if a table has auto sorting. This is because the system table which contains this information is owned by rdsdb, the root used controlled by AWS, and so is not accessible.

The only way to I can find out is to use svv_table_info, which is a view provided by AWS, which provides information about tables.

However, firstly, the view is flawed. It only displays information about tables which have at least one row; empty tables are not listed.

Secondly, the view is huge. It’s a lot of SQL and work, and indeed is repeating many of the table reads I am performing in my own view. It also looks complex and I think then probably buggy.

As such, I’ve opted not to use it, and so this page does not tell you if auto sorting is active.

As it is, I would advise you never to use AUTO, anyway. You cannot have a knowingly correctly designed system when the system contains undocumented black boxes provided by third parties, and where those black boxes can and do silently change.

Finally, note that auto is the default setting when creating a table - if you do not specify sorting, that’s what you get - and so to get an unsorted table you now need, after making the table, to then issue an ALTER TABLE command, to deactivate AUTO.

types:dist

Value	Sorting Type
compound	Compound
interleaved	Interleaved
unsorted	Unsorted

The only way to I can find out is to use svv_table_info, which is a view provided by AWS, which provides information about tables.

However, firstly, the view is flawed. It only displays information about tables which have at least one row; empty tables are not listed.

Secondly, the view is huge. It’s a lot of SQL and work, and indeed is repeating many of the table reads I am performing in my own view. It also looks complex and I think then probably buggy.

As such, I’ve opted not to use it, and so this page does not tell you if auto sorting is active.

counts:columns

The number of columns. Tables have system columns (there are three), and these are not included in the count.

counts:constraints

The number of constraints; NOT NULL, unique, primary key, foreign keys.

counts:sortkeys

The number of sortkeys.

counts:slices

The number of slices which hold blocks from this table.

If the number of slices holding blocks is less than the number of slices in the cluster, then obviously, not all slices are participating in queries which read this table. This is improper, and you should look to fix it. However, for tables with less records than there are slices in the cluster, they will naturally use less slices than are present in the cluster. Finally, note that tables with all distribution have a full copy of themselves existing on a single slice on each node. Their rows, when they are read, are broadcast to all slices. Remember only use all for small tables; if you use it with large tables, you end up producing a system which does not scale, because only one slice per node can write to that that table.

store:blocks:sorted

The number of sorted blocks.

store:blocks:unsorted

The number of unsorted blocks.

store:blocks:total

The total number of blocks.

store:rows:sorted

The number of sorted rows.

store:rows:unsorted

The number of unsorted rows.

store:rows:total

The total number of rows.

store::sorting %%

Sorting is not a per-table concept, but in fact a per-table, per-slice concept - in fact, it is even actually per-table, per-column, per-slice, but there isn’t much gain by adding in the per-column level, as will become clear.

What I mean by this is that sorting in the system tables and the docs is presented as a single number for a table.

This is wrong, and is misleading.

The key issue is that VACUUM does not operate on an entire table at once, but in fact on a subset of slices.

If you watch a table, in real-time, while VACUUM runs, you see one group of slices goes through the process of being sorted - this is quite an involved process, and the slices being vacuumed go through various contortions with how many sorted and unsorted blocks they possess (both go up and down), until eventually it all sorts itself out and the slice is sorted.

This takes quite a while, and the user can interrupt the process at any time.

This leaves the slice in an partially vacuumed state - and, critically, it means then that slices can vary in how vacuumed they are.

A table is as fast as its least sorted slice.

It is then that what we care about is the least sorted slice.

This column gives that percentage, which it is computed by on a per-slice basis dividing the number of sorted blocks on the slice by the total number of blocks on the slice.

This leads us to a the per-column nature of sorting.

It can be that row distribution between slices is not approximately equal. One slice may have significantly more rows than other slices - we can imagine, for ease of reasoning, an fairly extreme scenario where all slices have say three sorted blocks, except for one slice, which has ten sorted blocks.

If we then update one row, the three-block slices become four block slices, with three sorted blocks and one unsorted block. They are 3/4 = 75% sorted.

The ten-block slice becomes an eleven block slice, with one unsorted block, which is therefore 10/11 = 90.9% sorted.

This however is I think normally not much of a consideration - usually tables are well balanced between slices.

A final note, the automatic VACUUM, which Redshift has running in the background, is described as backing off, and stopping working, when the user load on the cluster is too high, or the user issues a VACUUM. Obviously, if this is true, then this is a source of partially vacuumed slices.

(As it is, auto-vacuum seems to be negligible in practise. It runs too infrequently, and does too little work, to have a detectable impact upon clusters. Also, in my testing, I only ever saw it run delete-only VACUUM.)

i/o:bytes read:disk

This column then shows the total number of bytes read from disk, as best I can judge the types indicated in stl_scan.

i/o:bytes read:network

This column then shows the total number of bytes read from network, as best I can judge the types indicated in stl_scan.

Importantly, it only counts the receive side of network activity - the step is scan, after all, not broadcast, so we’re not counting bytes twice.

i/o:rows:inserted

The number of rows inserted into the table.

For tables with all distribution, this is the physical number of rows (i.e. one per node), not the logical number of rows.

i/o:rows:deleted

The number of rows deleted from the table.