I’ve just rediscovered something I bumped into quite a while ago.
It is not possible for me to write a view to tell if a table has
AUTO
sorting. This is because the system table which
contains this information is not accessible to the user (admin or
otherwise).
The only way to find out is to use SVV_TABLE_INFO
, as
this view is owned by Amazon, and has the privs necessary to access this
information.
I’ve said it before and I’ll say it again : the system tables are an unbelievable mess.
Note though AUTO
is the default - if you do not specify
sorting, that’s what you get - and so if you actually want an unsorted
table you now need, after making the table, to then issue an
ALTER TABLE
command, to deactivate AUTO
.
Somebody hand me a weeping emoji, please.
There is at times a need in a query to obtain the current timestamp.
In principle, current timestamps can either be of the time the current transaction started, or the time at which the current query started, or the time the function to produce the timestamp was called (so multiple calls in one row will product multiple differing timestamps).
Redshift however has only three documented aliases or functions to return a current timestamp.
name | type | output | occurrence | docs |
---|---|---|---|---|
getdate() | function | timestamp | transaction | here |
sysdate | alias | timestamp | transaction | here |
timeofday() | function | varchar | call | here |
So, what have we actually got here?
First, getdate()
returns no fractional seconds. It’s a
deformed timestamp. No use to anyone, for anything. Why does it even
exist? I could just use date_trunc()
.
dev=# select getdate();
getdate
---------------------
2022-04-17 19:33:04
(1 row)
Second, sysdate
. This gives is the timestamp for the
beginning of the current transaction. Good!
dev=# begin;
BEGIN
dev=# select sysdate;
timestamp
----------------------------
2022-04-17 15:22:11.981487
(1 row)
dev=# select sysdate;
timestamp
----------------------------
2022-04-17 15:22:11.981487
(1 row)
Third, timeofday()
. This, rather baroquely, gives us a
varchar
human-readable timestamp for the time the function
was called. Weird, but also the only alias or function which provides a
timestamp of any kind at the time the function was called. You can then
go right ahead and it convert it back to the timestamp it was
originally by casting to timestamp
, but… like… the engineer
in me is crying on the inside, you know?
So this leaves us with fundamental functionality missing.
What we need, for the basic full set of functionality, is this;
What we have is this;
varchar
of the timestamp at time of
the function callThis is really basic missing functionality. Ga-ga baby-talk “do you have a real database” stuff. Redshift keeps adding large blocks of new external functionality - Concurrency Scaling Clusters, Spectrum, AQUA, you name it - but this is an example of the bread and butter basics, the stuff everyone uses, not properly implemented and not being fixed.
Now let’s take a look at the docs, and see what problems they have.
getdate()
.
The following example uses the GETDATE function to return the full timestamp for the current date.
Except it doesn’t provide a full timestamp, as the timestamp has no fractional seconds.
sysdate
.
There’s only the one error here; the page talks of
sysdate
as a function, when it’s an alias.
timeofday()
This is where all the cogs whirr and fling themselves out of the mechanism in a desperate bid for freedom.
Here’s the one-liner of documentation;
TIMEOFDAY is a special alias used to return the weekday, date, and time as a string value. It returns the time of day string for the current statement, even when it is within a transaction block.
It’s not an alias, it’s a function. More importantly, this function does not return the current time for the current statement. It returns the current time at the time the function is called; so if you called it say twice in one row, you’d have two different times.
We can see this with the following demonstration;
dev=# select timeofday(), timeofday() from stl_bcast limit 5;
timeofday | timeofday
-------------------------------------+-------------------------------------
Sun Apr 17 15:06:25.590826 2022 UTC | Sun Apr 17 15:06:25.590833 2022 UTC
Sun Apr 17 15:06:25.590837 2022 UTC | Sun Apr 17 15:06:25.590839 2022 UTC
Sun Apr 17 15:06:25.590842 2022 UTC | Sun Apr 17 15:06:25.590844 2022 UTC
Sun Apr 17 15:06:25.590846 2022 UTC | Sun Apr 17 15:06:25.590848 2022 UTC
Sun Apr 17 15:06:25.590850 2022 UTC | Sun Apr 17 15:06:25.590852 2022 UTC
(5 rows)
Finally, we can now take a detour into the world of undocumented current timestamp aliases and functions.
Here though, we must now remind ourselves to be careful of the difference between the leader node and the worker nodes; very likely everything from Postgres will run on the leader node, but it is an entirely different matter as to whether any given functionality is available on the worker nodes.
Everything that follows has been tested on worker nodes, by dint of
having a Redshift table in the from
clause.
name | type | output | occurrence |
---|---|---|---|
current_time | alias | timetz | transaction |
current_timestamp | alias | timestamptz | transaction |
localtimestamp | alias | timestamp | transaction |
So, basically, a weird little bunch.
All issue timestamps for the transaction, and so are the same as
sysdate
, but the output varies.
Obviously current_time
returns only the
time
part only, but both it and
current_timestamp
return a
timestamp with time zone
, not timestamp
; but
localtimestamp
returns a timestamp
, rather
than a timestamp with time zone
in our local time zone.
Inconsistent behaviour, to my eye.
On a Redshift system, all these will normally be UTC
,
and you can convert to a timestamp
easily enough by using
at time zone 'utc'
(you cannot mix
timestamp with time zone
and timestamp
).
Still, if you’re porting code to or from Postgres, which lacks
sysdate
(it comes from Oracle),
current_timestamp at time zone 'utc'
is useful, as it works
in the same way on both systems.
It’s actually hard to know if these aliases are actually supported or
not, as the lack of documentation may just be an oversight (there is at
least one other docs page I know of which simply does not exist, that
for the SQL command ALTER FUNCTION
).
An as aside, there’s a docs page here, which lists “Deprecated leader node-only functions”. In one sense, the page is correct - all of these aliases and functions do run on the leader node, they’re part of Postgres - but in another it’s not, because as we’ve seen, the devs somewhere along the line have actually and deliberately spent the time to implement three of these aliases on the worker nodes. Aliases and function don’t turn up on the worker nodes unless the RS devs put them there. So… is this page out of date? or are we implicitly being told not to use these aliases even though they are on the worker nodes? who knows. The Redshift docs should come with a crystal ball.
Note that Redshift doesn’t actually understand timezones, and so just
has “current time”, which AWS keep set to UTC
. You can
nowadays set a per-session offset to that time, and the offset for that
timezone will be applied to the output from time function, but the
cluster still only knows about a single time, not about timezones, and
it still isn’t using your time; you’re just getting the offset you
specified for your session.
Functions from Postgres which do not work on worker nodes are;
function |
---|
now() |
clock_timestamp() |
statement_timestamp() |
transaction_timestamp() |
Note the alias localtime
does not work, which is
curious, given that localtimestamp
does. Probably an
oversight.
extract()
So, to begin with; where Redshift has a leader node which is derived from Postgres, and worker nodes which are not derived from Postgres, any given functionality (such as a function or data type) is implemented either only on the leader node (often the case), only on the worker nodes (very rarely the case), or is implemented on both (often the case).
When functionality is implemented on both, behaviour and output can be different on the leader node and the worker nodes. The docs say absolutely nothing about this, and so it’s a booby-trap just waiting to catch the unwary.
We can see a simple (and in this case, harmless) example of this with
the interval
data type.
First, on the leader node;
dev=# select '2020-01-01 10:00:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp;
?column?
-----------------
01:00:00.123456
(1 row)
Second, on a worker node;
dev=# select '2020-01-01 10:00:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp from stl_bcast limit 1;
?column?
------------
3600123456
(1 row)
So we see here the leader node renders an interval
to
text as `HH:MM:SS.ssssss’, where-as a worker node prints as the total
number of microseconds (I suspect internally, they’re both a count of
microseconds).
A query runs on the worker nodes if it uses any resource on the
worker nodes, such as a normal Redshift table or a worker node specific
function. So here, the first query, which uses no worker node resources,
runs on the leader node. The second query, which has a from
clause specifying a table on the worker nodes, executes on the worker
nodes.
Now, the data type here is an interval
, not an
integer
, and it cannot simply be cast to an
integer
; both the leader node and worker nodes object to
this. The problem is that the database doesn’t know what
integer you want - the number of days in the interval? number of
minutes? number of microseconds?
You need to be able to convert interval
to
integer
, though, because although some functions, such as
avg()
, work directly with interval
, others,
such as stddev_pop()
, do not; and also because sometimes
you want to divide a quantity by an interval
, to figure out
number-of-whatevers per minute or second or whatever, and
integers
cannot be divided by an interval
. You
need to convert the interval
to an integer
, so
that the division give yous 1 unit of whatever per however many
microseconds, and you then covert the microseconds integer into
something more human-friendly, like seconds or minutes.
To convert an interval
to an integer
, you
need to use extract
.
Now, the very first thing to notice is that the documentation is
wrong, because it indicates extract
does not support
interval
. The SQL specification requires
interval
is supported, and if interval
were
not supported, there would be no way to convert an interval
to an integer
, and so fundamental SQL functionality would
be missing; so I think it’s a documentation error, and it’s not
like it’s going to be lonely in there, you know?
The extract
function takes two arguments, separated by
the keyword from
, like so;
extract( unit-of-time from time );
The second argument is a time
, timestamp
,
date
or interval
, so one of the time/date data
type, and the first argument is the unit of time to extract - such as
the number of seconds, or days, or years, and so on.
Now, where I write “the unit of time to extract”, what I mean to say
is that we pick a unit of time - say, day, or hour, or millisecond - and
we obtain by using extract
not the value of the entire time
in that unit, but rather, the number in that unit of time which
are specified in the time.
So if we have say five days, ten hours and six minutes, and we ask for hours, we’ll get 10.
Here’s an example, where we have an interval of 1 day, 3 hours, 15 minutes, 0 seconds, 123456 microseconds.
dev=# select extract( hour from '2020-01-02 12:15:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp );
date_part
-----------
3
(1 row)
dev=# select extract( minute from '2020-01-02 12:15:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp );
date_part
-----------
15
(1 row)
dev=# select extract( microsecond from '2020-01-02 12:15:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp );
date_part
-----------
123456
(1 row)
So, on the face of it, this is pretty useless for converting a time
of any kind into an integer
, because what we want is the
entire value of the time.
However, there is a unit of time to extract called
epoch
.
The docs
are great (as in completely not great). What does an epoch
give you? who knows. No clue. Nothing. Nada. It’s only the most
important function of extract
; I think this is an example
of a general and widespread problem with the docs that the author is not
technical, and no technical staff ever read the docs, so oversights like
this (or the numbers of flat-out blatant factual errors which I’ve found
over the years) are never noticed.
What epoch
gives you is the total time in
seconds, including the fractional part of seconds, like so;
dev=# select extract( epoch from '2020-01-02 12:15:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp );
date_part
--------------
98100.123456
(1 row)
So, bingo, right? problem solved! we now have our
interval
as a float, and we can do maths.
Well… no. Why do think this I’m writing this article? :-)
All these queries, you’ll note, have been on the leader node.
The leader node was originally Postgres, and in my experience, it does the right thing; but it’s totally irrelevant because there’s very little work you ever do which is leader node only (some system table work, maybe).
We have to look at what happens on the worker nodes.
Let’s try epoch
on a worker node.
dev=# select extract( epoch from '2020-01-02 12:15:00.123456'::timestamp - '2020-01-01 09:00:00.000000'::timestamp ) from stl_bcast limit 1;
date_part
--------------
98100.123456
(1 row)
Well, seems reasonable, right? it’s the same as before.
Sorry. Still wrong. I have a card up my sleeve ;-)
You see, what you’ve seen there is a worker node operating an
interval
computed from two literal timestamps;
that works.
But let’s now make a table and put our two example timestamps in there.
dev=# create table table_1 ( column_1 timestamp, column_2 timestamp );
CREATE TABLE
dev=# insert into table_1( column_1, column_2 ) values ( '2020-01-02 12:15:00.123456'::timestamp, '2020-01-01 09:00:00.000000'::timestamp );
INSERT 0 1
dev=# select * from table_1;
column_1 | column_2
----------------------------+---------------------
2020-01-02 12:15:00.123456 | 2020-01-01 09:00:00
(1 row)
And now epoch
again…
dev=# select extract( epoch from column_1 - column_2 ) from table_1;
date_part
-----------
98100
(1 row)
Oh dear. Not great. It turns out - as we’ll see in more detail - that
extract
on worker nodes does not work correctly
with an interval
computed from two timestamps taken from a
row from a table.
But there’s more - as it also turns out extract
on
worker nodes does work correctly when you’re using
interval
literals.
It’s only computed interval
s which go wrong.
dev=# select extract(millisecond from interval '1 day 500 millisecond') from stl_bcast limit 1;
date_part
-----------
500
(1 row)
dev=# select extract(epoch from interval '1 day 500 millisecond') from stl_bcast limit 1;
date_part
-----------
86400.5
(1 row)
Finally, included for completeness, we also have extract
on single timestamps, both literals and from tables.
With this, everything works except epoch, but which is missing the fractional part of the seconds.
First, here’s from a literal;
dev=# select extract( microsecond from '2020-01-02 12:15:00.123456'::timestamp );
date_part
-----------
456
(1 row)
dev=# select extract( year from '2020-01-02 12:15:00.123456'::timestamp );
date_part
-----------
2020
(1 row)
dev=# select extract( epoch from '2020-01-02 12:15:00.123456'::timestamp );
date_part
------------
1577967300
(1 row)
Second, the same value but from our test table;
dev=# select extract( microsecond from column_1 ) from table_1;
date_part
-----------
456
(1 row)
dev=# select extract( year from column_1 ) from table_1;
date_part
-----------
2020
(1 row)
dev=# select extract( epoch from column_1 ) from table_1;
date_part
------------
1577967300
(1 row)
So we have five different sets of behaviour;
Of these five, #1 and #2 work correctly, but you basically never use them, or they exist within their own bubble (leader node queries on the system tables).
The important use case, #3, is broken.
Just to make life a bit more complicated, #4, which at times you will
use in conjunction with #3, works correctly, and so when used in
conjunction either the output from #3 and #4 has to be normalized, so
they’re in the same terms (either both microseconds, or both seconds
with the fractional part); because if you use microseconds
with #3 you get the full time, but if you use microseconds
with #4, you get only the microseconds part of the time.
Finally, with #5, epoch
is broken.
(As an aside, there is one other function which is very similar to
extract()
, this being date_part()
. On worker
nodes this function does not support interval
- it will
throw an error - but it does support timestamp
, and in this
case both for literals and from rows, when used with epoch
it works correctly and provides fractional seconds; so you can use
date_part()
to work around this bug with epoch
in extract()
. Horrific mess, isn’t it?)
Let’s look at #3 and #4 in more detail, to see what’s going wrong.
To begin with, looking at #3, let’s try minute
, and see
what we get.
dev=# select extract( microsecond from column_1 - column_2 ) from table_1;
date_part
-------------
98100123456
(1 row)
dev=# select extract( minute from column_1 - column_2 ) from table_1;
date_part
-----------
1635
(1 row)
dev=# select extract( epoch from column_1 - column_2 ) from table_1;
date_part
-----------
98100
(1 row)
What’s happening is that rather than getting the number of the
specified unit-of-time, we’re getting the entire period of
interval
in the specified unit-of-time, but never with any
fractional part.
So rather than getting 15 for minute
, we get 1635.
The epoch
unit of time is giving us seconds, which is
correct, but with no fractional part, which is not, and we have no way
of getting the fractional part.
So, generally speaking, in case #3, where extract
is
broken, where we would use epoch
and get seconds and
fractional seconds, we now must use microsecond
, as this is
the only way to get the whole time, and that means converting that value
to what we expect by dividing by 1000000 (number of microseconds in a
second); or, alternatively, converting the values we obtain from
correctly working extract
(which will use
epoch
) into microseconds, by multiplying by 1000000.
So for example, say I had a sum of intervals, let’s say a total time spent on processing something, and I wanted to put that in terms of minutes; I need to divide the sum by the interval of one minute. This is what has to be done;
extract(microseconds from total_processing_time) / (extract(epoch from interval '1 minute') * 1000000)
Or, alternatively;
(extract(microseconds from total_processing_time) / 1000000) / extract(epoch from interval '1 minute')
This is simple enough, but it took quite a bit of investigation to work out what was happening to be able to figure it out.
Two final notes.
First, that the Redshift docs say nothing of it, but
interval
literals in Redshift does not support microseconds
or milliseconds. It’s just terrible. No one checking, no one notices,
Support deflect all incoming customer feedback. The easiest way I’ve
found around this is to use a fractional second,
e.g. interval '0.000001' second
.
Second, this leaves us with the problem of making millisecond or
microsecond intervals from a column of integer milliseconds or
microseconds (which you will sometimes find in the system tables),
rather than as literals. I was not able to work out myself has to do
this; I had to ask in the dbt slack, and then someone pointed out you
can use dateadd()
for this - and you can - and the reason I
missed it is because the documentation has a bug in the prototype for
the function. It lists the second argument as an interval
,
rather than an integer
.
So what do is you add your microseconds to an arbitrary timestamp, and then subject your arbitrary timestamp, like so;
dateadd( 'microsecond', stv_wlm_query_state.queue_time, '2000-01-01 00:00:00.000000'::timestamp ) - '2000-01-01 00:00:00.000000'::timestamp as query_duration
Bi-weekly Redshift benchmarks are up.
The ra3
node types appear to now be available in
af-south-1
, ap-northeast-3
and
eu-south-1
. The only region now (of those monitored - no
China, no Gov) which lacks ra3
is
ap-southeast-3
.
No further regions have received the disk read performance upgrade seen over the last two benchmarks (which halves disk read benchmark duration). Currently nine regions have this, thirteen do not.
The disk read improvement has been revoked from
dc2.large
in ap-northeast-3
.
Node/region benchmark performance is quite a mish-mash, but there seem to be certain common themes.
dc2.large
in some regions, but not all, is particularly
slow for disk write (normal region ~5 seconds, slow regions ~15
seconds)dc2.large
https://www.amazonredshift-observatory.ch/cross_region_benchmarks/index.html
Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers