The STT now has MVP for diffs.
Each release of Redshift now has a link to the diff with the previous version.
Currently shows tables and views (not functions), and the diff is computed by comparing columns - what columns there are, their names, data types and constraints. So it doesn’t compare, say, view text (which I do want to do).
Also, if a table or view is dropped, there’s no indication of this.
As I say - MVP - I was kicking back for a bit and finally got something going for diffs, been thinking to do so for a long time.
I recently investigated variability across dc2.large
nodes, and found it to be high. As such, I am now ignoring
dc2.large
variations unless they are outside of the normal
(large) range.
ap-northeast-2
dc2.large
network
performance is slow again, at 7 seconds rather than 4 seconds.
ap-northeast-2
, ca-central-1
,
eu-west-1
, eu-west-2
, us-west-1
and us-west-2
ra3.xlplus
the unusual slowness
of the disk-read-write test two weeks ago has passed, and performance
has returned to normal.
eu-central-1
dc2.large
disk-read-write
has significantly improved, now about 3 seconds instead of 4.5s. If this
persists, it’s an upgrade.
us-east-1
all nodes types except
dc2.large
have received significant processor performance
upgrades, loosing about one second each from the processor benchmark.
This is big news, given how widely used this region is.
https://www.amazonredshift-observatory.ch/cross_region_benchmarks/index.html
Woken up about 7am-ish by one of those big rolling booms.
First missile attack in a month, and there was a month or so peace and quiet prior to the attack a month ago.
I would have been asleep when the air raid warning went off.
I think Russia saves up its missiles for a month, from current production, and then launches an attack.
varbyte
sortingSo, I’ve been focusing on the Comboboulator, until it’s released, but
I took an hour or two off recently to have a quick look at how the
varbyte
data type sorts.
This is not a full investigation, just something done by hand, looking at two data types, which throws some light on this otherwise uninvestigated matter.
You can’t put a varbyte
into a sortkey, but the Zone Map
is always present, and so Redshift is computing a sorting value for
every value in a varbyte
column, just as it does for every
column.
The method by which sorting values are computed is intuitive and easy to reason about for about half of the data types, and is varying degrees of screwy and/or crazy for the others.
A sorting value is an int8
, so its eight bytes, and I
obtain by making a table (table_1
) with a single
varbyte(64)
column (so plenty of length), and issuing the
following commands, and then converting minvalue
or
maxvalue
(they should always be the same, as there is only
one value, but this is not in fact always the case in Redshift, although
it does seem to be the case with varbyte
) to a bit
pattern.
dev=# truncate table table_1;
TRUNCATE TABLE and COMMIT TRANSACTION
dev=# insert into table_1( column_1 ) values ('\000'::char(1)::varbyte);
INSERT 0 1
dev=# select minvalue, maxvalue from stv_blocklist where tbl = 110205 and col = 0;
minvalue | maxvalue
-------------------+-------------------
72057594037927968 | 72057594037927968
(1 row)
My initial guess was that the first eight bytes of the
varbyte
value would be used, and if the value was shorted
than eight bytes, padding would be with binary 0.
Here’s the first set of test data;
0::int2::varbyte -> 144115188075855872 -> 0000001000000000000000000000000000000000000000000000000000000000
1::int2::varbyte -> 144115188075856128 -> 0000001000000000000000000000000000000000000000000000000100000000
2::int2::varbyte -> 144115188075856384 -> 0000001000000000000000000000000000000000000000000000001000000000
255::int2::varbyte -> 144115188075921152 -> 0000001000000000000000000000000000000000000000001111111100000000
256::int2::varbyte -> 144115188075855873 -> 0000001000000000000000000000000000000000000000000000000000000001
257::int2::varbyte -> 144115188075856129 -> 0000001000000000000000000000000000000000000000000000000100000001
32767::int2::varbyte -> 144115188075921279 -> 0000001000000000000000000000000000000000000000001111111101111111
So, what do we have?
We have a two byte value, the int2
, which where Intel is
little-endian has the most significant bytes to the right (i.e. the byte
order looks reversed) and then, finally, in the most significant byte of
the sorting value, we have the decimal value 2
(00000010
).
Let’s gather some more evidence, and see what happens with
int4
.
0::int4::varbyte -> 288230376151711744 -> 0000010000000000000000000000000000000000000000000000000000000000
1::int4::varbyte -> 288230376168488960 -> 0000010000000000000000000000000000000001000000000000000000000000
2::int4::varbyte -> 288230376185266176 -> 0000010000000000000000000000000000000010000000000000000000000000
255::int4::varbyte -> 288230380429901824 -> 0000010000000000000000000000000011111111000000000000000000000000
256::int4::varbyte -> 288230376151777280 -> 0000010000000000000000000000000000000000000000010000000000000000
We see the four bytes of the int4
, and now the most
significant byte contains the value 4.
So what happens with the first byte when we have an
int8
, which is going to occupy the whole sorting value?
0::int8::varbyte -> 0 -> 0
1::int8::varbyte -> 72057594037927936 -> 0000000100000000000000000000000000000000000000000000000000000000
2::int8::varbyte -> 144115188075855872 -> 0000001000000000000000000000000000000000000000000000000000000000
3::int8::varbyte -> 216172782113783808 -> 0000001100000000000000000000000000000000000000000000000000000000
15::int8::varbyte -> 1080863910568919040 -> 0000111100000000000000000000000000000000000000000000000000000000
16::int8::varbyte -> 1152921504606846976 -> 0001000000000000000000000000000000000000000000000000000000000000
255::int8::varbyte -> -72057594037927936 -> 1111111100000000000000000000000000000000000000000000000000000000
256::int8::varbyte -> 281474976710656 -> 0000000000000001000000000000000000000000000000000000000000000000
And the answer is, it goes away.
Mmm.
Let’s take a quick peak at char
, before drawing some
conclusions
\000::char(1)::varbyte -> 72057594037927968 -> 0000000100000000000000000000000000000000000000000000000000100000
\001::char(1)::varbyte -> 72057594037927937 -> 0000000100000000000000000000000000000000000000000000000000000001
\002::char(1)::varbyte -> 72057594037927938 -> 0000000100000000000000000000000000000000000000000000000000000010
Mmm. So, I think \000
writes a binary 0, but
what I get in the sorting value is binary 32 - a space.
I understand that char
terminates with a
NULL
, C-style, and so I suspect what’s happened here is
I’ve fooled Redshift into thinking the string is actually an empty
string, and then Redshift has padded with spaces.
So we get one byte of a space, and the most significant byte (which
seems to hold the number of bytes in the sorting value, when that number
is 7 or less) holds a 1 (00000001
).
Moving on to \001
, now I really do get a binary
1, no space padding, and the count byte remains 1.
We see that \002
then does as we would expect.
So what happens with longer char
s?
\001::char(2)::varbyte -> 144115188075864065 -> 0000001000000000000000000000000000000000000000000010000000000001
\001::char(3)::varbyte -> 216172782115889153 -> 0000001100000000000000000000000000000000001000000010000000000001
\001::char(4)::varbyte -> 288230376690688001 -> 0000010000000000000000000000000000100000001000000010000000000001
\001::char(8)::varbyte -> 2314885530818453505 -> 0010000000100000001000000010000000100000001000000010000000000001
Well now, isn’t this interesting. In all cases we’re writing binary
1. What we see is that the binary 1 is written (it’s the right-most byte
in the displayed sorting value), the char
is then padding
with spaces (00100000
) to its length, and after its length,
it is padded with binary 0, and then - if we’re 7 bytes or less, we get
a count byte.
That’s quite something.
So, where does this leave us?
This is not easy to reason about.
The varbyte
type stores bytes, which come from any
sources - we could be issuing queries which convert char
,
or int
, or what-have-you, in varbyte
, and all
going into the same column.
Firstly, we see that the data type of the original data matters - not just the value of the original data.
If we put the value 5
into a varbyte
, its
sorting value will differ depending on whether it is an
int4
or an int8
.
That’s not good. That’s hard to reason about.
Secondly, for values which are 7 bytes or less, we have this seemingly completely strange byte-count to reason about. What is that about?
Where it is in the most significant byte, the effect of it is to sort short values such that values with the same number of bytes are sorted together - we get all the 1 byte values, which are sorted with respect to themselves, then all the 2 byte values, which are sorted with respect to themselves, and so on.
Why? why I would want to sort my data first according to its length in bytes and then its value, rather than only by its value?
Why would an four byte decimal 10 be sorted as a larger value than a two byte decimal 10?
I can’t see a reason for this, and I think sorting should be based on and only on value.
Thirdly, the padding behaviour of char
shows up in
sorting, so we have different sorting behaviour depending on whether
we’re looking at bytes padded with a space (which is up to the end of
the char
s length) or with binary 0 (up to the end of the
sorting value) - so here again the data type, rather than the value, is
driving sorting.
I may be wrong, but I think this is too hard to reason about, and so
we must limit our use of varbyte
to that which we can
reason about.
I completely understand the third issue (but not that it is not
documented), because char
is defined as being padded with
spaces, so that really is its value, but sorting in general should have
been arranged so it depends on and only on value. That’s the
only way that can reasoned about.
What this means is that if you actually want to knowingly control the
sorting of your varbyte
, you need to ensure all the data in
the column comes from the same single data type. Anything else is likely
rapidly going to become too complicated to reason about correctly.
So, I read a lot in the press about people sheltering from or during missile attacks, in their basements, or in shared shelters, in subways, or what-have-you.
I have never done this, not once, ever.
Now it might be the shelters are populated and people are using them, I wouldn’t know; I’ve never been in them.
But when the air raid warnings go off while I’m outside, absolutely nothing happens - everyone just gets on with their business, although I do typically go home, because all the public buildings close so I normally can’t do whatever I need to do, but that’s purely practical.
Kyiv is huge, the number of missiles is relative to its size is tiny, and the area of any single explosion is relatively minute, no one is aiming at me, and I’m not near a target (which would lead to real risk, from near-misses).
Of course, when there is a missile attack, people do die and are maimed - Kyiv has three million people in it. If you fire missiles into it, you are going to kill people.
But the odds of it being me are so staggeringly minute that going into shelter is completely needless.
I’m of the view people who do shelter are those who are risk averse, maybe also they have kids who could be nervous, and also who lack the knowledge to accurately judge the risks involved during these attacks.
So today was a typical missile raid.
I slept through the air raid alarm, was woken up the deep, rolling boom of a missile strike, I’d missed that one, I listened for a bit to hear if there were any more (which would mean there might be enough going on it was worth getting out of bed to have a look), but there wasn’t, so I stayed in bed (I was warm, comfy and sleepy), and went back to sleep.
The power hasn’t been interrupted, I have a meeting in an hour or so, and then I’ll be going swimming as usual.
But we must note also much as this was uneventful for me, Putin has taken goodness knows how much money from ordinary people in Russia, used it to make missiles, had those missiles fired them into Kyiv, and about a dozen people completely ordinary people, people who are you and me, people who are those who did the work which made the wealth which Putin took to build missiles, have been killed or maimed.
For any given entity, there is how complex it is, and to what extent the documentation can explain that complexity.
I am coming to the view that IAM is too complex to use, in part because of its intrinsic complexity, and in part because of the confused and confusing documentation.
Whenever I find I must turn to IAM for functionality, I then spend days blundering around in dense fog, getting nowhere. It is the death of productivity.
I am currently trying to figure out how to share an AMI, which is in one region only, with another account, to another region.
It’s been two days.
Last time I was involved with something like this was VPC.
It took me five days of sheer hell to figure out how to start a Redshift cluster in a VPC.
I am now starting to think if there are other ways to share an AMI, which do not involve IAM, because anything is more productive than using IAM.
So, interesting news.
A few days ago, a new Redshift release came out in four regions, 1.0.47470.
This version has been rolled back, and these four regions are now back on 1.0.47357.
This is very unusual. I think I’ve seen once before, in over a year.
Note this is fine if you’re on maintenance track - it’s there to insulate against such issues - but remember AWS removed tracks from serverless; you get whatever they give you.
I archive the system tables for each release, and I recently introduced a diff between a release and the previous release (currently for tables and views - not functions yet). The diff is currently based on changes to columns, either adding, removing or changing, and so shows new objects, and changed objects, but not yet deleted objects.
For the last couple of releases, there have been no system table changes but now, finally, there are!
https://www.amazonredshift-observatory.ch/system_table_tracker/1.0.47357_to_1.0.47470/index.html
And from this we can get an idea of what work has been going in since the previous release, which in this particular case, perhaps may shed some light on why the rollback occurred.
However, right now, where this is MVP, it’s hard work to use. It needs to show which objects are new, and the number of changes columns, and the system table tracker in general needs to show if an entity is accessible or not.
There are about one hundred changed objects, which is too many to inspect by hand.
I’ve had a look round, manually, and what I noticed is that the new version, which has been rolled back, brought in what I think is the beta functionality to automatically copy files from S3 into Redshift.
We see this new table;
Which is now gone.
This is a SYS
view, and their SQL is censored by AWS so
we can’t know what it does, but I note no matching table - I suspect
this view may be a veneer wrapper over a function. Once I add in diffs
over functions, we’ll be able to see.
There are a few interesting new views (which hopefully are accessible);
We can also note details like SVV_TABLE_INFO
having a
new column (create_time
) added;
I think there’s a fair chance the docs will not be updated to include this new column, but we’ll see. I’ll keep an eye out for it.
It has come to my attention that AWS have a trademark on “Amazon Redshift”, so I’ve moved the site from “Amazon Redshift Research Project” to “Redshift Research Project”.
A little late this fortnight, as I’ve been travelling.
ap-northeast-2
dc2.large
networking
remains slow, at 7s rather than 4s.
ca-central-1
dc2.large
disk-read-write
unusually fast, at 3.29s (normally 4s to 5.5s).
eu-south-1
dc2.large
disk-read-write
unusually fast, at 3.18s (normally 4.5s).
eu-west-1
dc2.large
disk read
staggeringly slow, at 0.64s (normally 0.06s) - and the standard
deviation here is 0.04s, so this was not the result of a freak test
result, although it could be the result of a freak cluster.
us-east-1
dc2.large
processor benchmark
faster than it has ever been, at 2.90s. Processor benchmarks are
normally extremely consistent, so changes are meaningful. Previous
benchmark was 3.35s, before that 3.81s, and has been at 3.8s for a long
time.
There’s quite a few new AWS regions, where I need to add in those which support Redshift.
Hopefully for the next benchmark run.
https://www.redshift-observatory.ch/cross_region_benchmarks/index.html
Surprise for me just now - I issued an explain, and it’s turned up in the query history page. I’m pretty sure they did not show up before.
So I’ve just spent half an hour investigating something which has been in the back of my mind for quite a while now.
The official docs do not actually define what is required for a merge join to occur, but one requirement is specified (although now I look for it, I cannot find it - all I can find is an AWS blog post which mentions it), which is that the tables involved must be 80% or more sorted.
This is an interesting statement - what does 80% sorted actually mean?
Remember that a table is distributed over slices, each slice holding a part (ideally equal in number of rows) of the table. Does 80% sorted mean every single slice must be over 80% sorted? or does it mean all slices, taken together, are 80% or more sorted? does it mean blocks, or does it mean rows?
No one knows.
So I’ve done a little digging.
It looks like 80% sorted means by rows and for the table as a whole, so for all slices.
In fact, it looks like Redshift is using the unsorted
column in SVV_TABLE_INFO
, which to me is just mind-blowing.
That column counts the total number of unsorted rows and the total
number of rows, and uses those two numbers to say how sorted the table
is.
I consider this is flatly wrong, because the speed of a query is that of the least sorted slice. A table, to my eye, is as sorted as its least sorted slice, and sorting must be considered on the basis of blocks (disk I/O), not rows.
So I made two tables, both with a single int8
column,
key distribution, compound sorting on that one column.
I put ten rows in both, vacuum full to 100 percent, and I get a marge join.
If I add enough extra rows (one by one) to a single slice that
unsorted
in svv_table_info
hits 20%, I get
hash join.
If I truncate both tables, populate again to 10 rows, and now add an
extra row to every slice, so four unsorted rows at a time (2x
dc2.large
), when I get to my third insert, and so have 12
unsorted rows, I get to a bit more than 20% unsorted and the merge join
returns.
I think Redshift is using the wrong metric to determine whether or not to merge join.
This leads to another question.
If I have a slice which is, say, 50% sorted, while the other slices are 100% sorted, and I’m getting a merge join, what’s actually going on under the hood?
Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers