I’m still investigating AZ64 encoding.
With content I’ve fully investigated in the past, I can produce a white paper in a timely manner each week - it’s just a matter of writing the script and making the write up - but with new investigations, they take as long as they take.
AZ64 has a lot going on. Having come to know how it works quite well, I’m of the view it’s actually part of the ongoing effort to automate Redshift.
So, not formally investigated in any way, but I’ve been populating tables with self joins and a limit clause which I compute, to generate lots of rows, and to populate an exact number of rows, and it looks like at least for self-joins (and so probably for normal joins) the number of rows produced by the join directly and proportionately affects the time taken by the query, even though there is a limit clause.
So even though you might be saying “limit 1000”, that the self-join is producing say a billion rows, that’s still taking as long as it would take.
I’m still working on AZ64.
In fact I’ve done nothing for the last two weeks except trying to bend my brain around what AZ64 is getting up to internally.
There seems to be quite a bit of seemingly strange and arbitrary behaviour. I’m at the point where I can explain most of it, in the sense that I can do the math and produce the same numbers, but what’s going on doesn’t entirely in and of itself make sense.
It’s like Feynman on Physics - I can describe the spin of an electron, but I can’t tell you why an election has spin.
I about an hour ago figured out another one of the internal mechanisms.
I think I have one more to go.
I’m not sure I’ll be able to publish tomorrow (Sunday) though, because the test suite I think will take about six hours to run, and I need a clean run through before I can publish - the debugging run throughs absorb plenty of time in the first place, and then you need the time for a full, passing test run.
New Redshift version, 1.0.32946, is out. System table dump here;
https://amazonredshift-observatory.ch/system_table_explorer/index.html
Just to say I’ve not died and gone to the great database in the sky :-)
I was working on AZ64, and I’ve figured out a major part of it, but there’s one more major part to figure out, and I didn’t think I’d publish this weekend, so I decided to finish the MVCC white paper.
Unfortuntely, I had to handle some other matters which took up all of Saturday, and so I’ve spent today taking my existing MVCC content and bringing up to standard, and writing the script to go with the white paper, but I need another day to get to the point of being ready to publish.
I am looking to publish the MVCC paper tomorrow.
It could perfectly well do with another week of full-time work, but I don’t have that kind of time right now, and the core content, the explanation of MVCC, is in good shape. I’d like to do a bunch more work in the test script, and actually explore a bit one or two things I’m not clear about with regard to how Redshift behaves with table locks, but I can’t justify missing another weekend for that.
Separately, a new Redshift version is out. I’ll publish the system tables tomorrow (in the middle of working on some stuff for the MVCC test script, right now).
Well, I came close to finishing the MVCC paper on Sunday, but…
So I’m still working on it, and during the weekdays, that means only an hour or two most days.
Changing subject, something occurred to me; so, I make full system table dumps every time there’s a new Redshift release.
Might be interesting to produce a list of the diffs between each successive version.
The MVCC paper is close to being finished - I’m finishing off, with no new work to undertake.
I would have had it done over the weekend, but I had to attend to other matters on Saturday and most of Sunday.
I’m currently tidying up / finishing off the accompanying test
script, and in the process I’ve learned something new, and awkward; when
you issue a query which requests a lock and blocks because of it, that
query is considered running (which it is, even though it’s blocked), but
only on the leader node, so you can only see it in the incorrectly named
STV_RECENTS
- which has no column for transaction ID.
However, if you look in the system table for locks, you can see its
requested lock, and you can see a transaction ID - but I can’t see any
way to figure out which query that lock request is for, because the only
ID available in STV_RECENTS
is a process ID, but it’s a
leader node process ID, completely different to a worker node
process ID (which is what’s given in STV_LOCKS
).
The docs should probably mention somewhere that leader node and worker node process IDs are totally different, but we ought to have world peace, too, and they’re both about as likely.
In any event, it looks then like all locks are collected on the leader node, before a query is passed over the worker nodes and enters WLM.
In other news, a new Redshift version is out, and I shall produce the system table dump today.
Home 3D Друк Blog Bring-Up Times Cross-Region Benchmarks Email Forums Mailing Lists Redshift Price Tracker Redshift Version Tracker Replacement System Tables Reserved Instances Marketplace System Table Tracker The Known Universe White Papers