Is going to be “Query Compilation”.
Currently running the final test runs of the test script.
I’ve completed this for a Redshift with compile offload (takes less than an hour), and I’m now running it on a Redshift without compile offload, and I’m wondering if it will finish even overnight, or if the cluster will crash (I handle out-of-memory errors, but not cluster crashes =-)
Once this has been done, I then need to check if a bug/flaw I knew of which allowed you to place a permanent load on the leader node still exists, and if so, I need to run both sets of tests twice, once with and once without load on the leader node.
I also need to run the test script on each node type (dc, ds, ra3).
So three node types, and if the “without” tests run overnight with an idle leader node… well, it may be a week until I’m done running the test script!
I just ran a test script for 21 hours on Redshift and the bloody cluster disconnected me on during the final test!
After a lot of work, a lot of preperation, because there’s only one bite at this cherry per Redshift release, because of the query compilation cache, tomorrow I will make the actual, real runs (two regions) of the query compilation test script.
I think then I should have time (I’m now finally working part time, two days a week) during the day to finish and publish the white paper; if not, it will certainly be the day after (barring anything unexpected).
I am working on the query compilation white paper.
Because of the query compilation cache, I can only run the test script once - the second time the results are wrong because of the cache.
This morning, after a lot of preparation work, I kicked the test off
in eu-west-1
, which takes about three hours.
After the test was on its way. I was working on my laptop and found a SWF file. Right clicking offered VLC, so I thought, let’s see what happens.
What happens is that VLC fucks your laptop.
The desktop froze (I could move the mouse, but nothing responded), and I was unable to recover, even after killing VLC and XFCE, and had to reboot. This broke the test, which I cannot now re-run, because to the extent it did run, it has partially populated the cache.
Right now, I do not think much of VLC.
I spent today trying to run the test script, in particular in a region without compile off-load.
There are few regions which have ra3
and do not have
compile off-load - about five, I recall.
One way or another, I managed to burn through all of them with both the current and trailing version of Redshift and only obtain one pure set of results, for Hong Kong trailing - and the performance in that region was so slow, and together with the finding that the query offload (in regions which offer it) actually operates differently in different regions, that I am now facing the possibility there are significant differences between regions, so much so that it will be misleading to test only one region of each kind. I need in fact to figure out what’s going on here.
As it is however, I now have to wait for the next Redshift release in
a region with ra3
and without offload, to be able to obtain
more pure result sets from non-offload regions (getting results from
offload regions is easy; there’s lots of them).
As an aside, relating to this, it’s become clear that the query compilation cache is per-region (as well as per-Redshift version). I don’t know yet if it’s also per-node type. The wonderful claim from AWS that the cache is now “unlimited” looks more and more limited every day.
So I now need to write a script which connects to every region and fires up a cluster of every type and notes the version number. I think I’ll make a web-page out of it, and run it daily.
New toy on the site.
Too late to write anything now, except to note : in and of itself,
the RVT is fun but not very useful (except that is shows us when
Redshift releases go wrong, because an out-of-band release rapidly
follows a normal release, and also in that it records cluster bring-up
time, and it’ll be interesting to see how that varies by node type (I
can tell you now ds2
nodes are sloooow) but far more
interesting to see how it varies by region), but what’s much more
important is that I have completed a reliable framework to concurrently
create clusters of each node type in every region, issue SQL to them,
and gather the results.
I’ve been planning to create a benchmark suite for a long time, although it’s quite hard to do because it’s difficult to isolate the different computational resources involved in a query, but anyways - issuing the same benchmark across all node types (well, the smaller types anyway - I’m not made of money) and across regions will be interesting, especially as the work on query compliation has already produced evidence of significant Redshift performance differences depending on region.
So, I’ve not blogged for a while, because I’ve been completely busy writing code.
I was working on the query compilation white paper, but I ran out of regions I could use for testing, until new Redshift releases were made, and so I needed to write something which would let me know when this happened.
I also have needed (and had already begun) the process of writing a decent version of the framework being used by the test code, for spinning up clusters and connecting to them and so on (the original code was exploratory, to figure out how to handle all the AWS VPC stuff).
Consequently, I completed this code and extended it to spin up clusters concurrently in arbitrary regions, and so then had a nice framework for the Redshift Version Tracker; spin up a cluster in (almost) every region, get the version, store to Postgres, produce a web-page.
(I initially had this running on the 10th, just as a new release was coming out, but I improved and extended the data being stored, such that the initial two days of results now lacked data which would be available for later days, so I ditched those days, and so now, from the 12th onward, we’re in an inter-release period, I think, when nothing happens. A shame! it was very interesting already to see what happens; and we already see it looks like normally there are two versions of Redshift released.)
But there’s more - I’ve had in mind for some time to make a benchmark suite, to test disk, network and processor. I’ve also already had evidence from the query compilation work of cross-region of both quantitative and qualitative differences in Redshift performance.
Why not make a cross-region benchmark suite? and run it, say, weekly?
That’s what I’ve been working on.
The framework/database for it is about done - the problem now is actually defining the benchmarks.
It’s really quite challenging to isolate disk and network (processor
I can isolate, I think, and so I have a benchmark for that). For
example, my initial network test was to run a query which first ran
locally on all slices, and then ran again, but now with a
distribute
step. Problem is, all network steps
write to a temp table, and that temp table is then read; you
always get disk as well.
I’m now about to check to see if a simultaneous equation can figure out how much of the duration of the benchmark is disk and how much is memory.
So, current plan : get cross-region benchmarks up, finish query compilation paper, finish AZ64 paper, then start working on the AMI of Redshift tools. Once the AMI and the initial tool is up, I’m thinking to implement Permifrost; I have had for a long time the necessary queries to do so.
I’m putting this out because I realised there’s a flaw in my table design, which I’m now fixing, but it means I need to ditch all existing data because the code previously did not collect enough information to fill the new tables, so I can’t convert existing data to the new table design.
It’ll take another ten days or so to get a reasonable data set, so here’s what I have so far; this being the accumulated data only for bring-up times.
node_type_name | region_name | num_records | bring_up_mean | mean_sd | bring_up_min | bring_up_max
----------------+----------------+-------------+---------------+----------+--------------+--------------
ds2.xlarge | me-south-1 | 8 | 00:14:24 | 00:09:47 | 00:04:05 | 00:35:54
dc2.large | ap-southeast-3 | 8 | 00:06:45 | 00:05:17 | 00:02:05 | 00:13:39
dc2.large | ap-southeast-1 | 8 | 00:06:17 | 00:03:14 | 00:01:31 | 00:13:44
dc2.large | us-east-1 | 9 | 00:05:35 | 00:09:08 | 00:01:25 | 00:30:56
ra3.xlplus | us-east-1 | 9 | 00:05:32 | 00:08:32 | 00:01:15 | 00:27:43
ds2.xlarge | us-east-1 | 9 | 00:04:54 | 00:06:45 | 00:01:25 | 00:22:43
ra3.xlplus | ap-south-1 | 8 | 00:04:30 | 00:01:59 | 00:00:59 | 00:06:37
ra3.xlplus | ca-central-1 | 8 | 00:04:22 | 00:01:48 | 00:01:06 | 00:05:42
ra3.xlplus | ap-east-1 | 9 | 00:04:22 | 00:01:54 | 00:01:17 | 00:06:54
ra3.xlplus | eu-west-3 | 8 | 00:04:18 | 00:01:54 | 00:00:57 | 00:05:39
dc2.large | me-south-1 | 8 | 00:04:16 | 00:00:51 | 00:02:18 | 00:05:26
dc2.large | us-east-2 | 8 | 00:03:59 | 00:02:21 | 00:01:25 | 00:06:48
dc2.large | eu-south-1 | 10 | 00:03:54 | 00:05:20 | 00:01:48 | 00:19:54
dc2.large | ap-south-1 | 8 | 00:03:49 | 00:05:56 | 00:01:21 | 00:19:31
dc2.large | ap-east-1 | 9 | 00:03:26 | 00:01:53 | 00:01:42 | 00:06:25
ra3.xlplus | ap-southeast-1 | 8 | 00:03:25 | 00:02:12 | 00:01:06 | 00:05:59
ra3.xlplus | ap-southeast-2 | 8 | 00:03:22 | 00:02:06 | 00:01:03 | 00:06:00
ds2.xlarge | ap-east-1 | 9 | 00:02:39 | 00:03:12 | 00:01:16 | 00:11:42
ds2.xlarge | ap-northeast-3 | 8 | 00:02:33 | 00:01:05 | 00:01:44 | 00:04:35
dc2.large | ap-northeast-2 | 8 | 00:02:28 | 00:02:13 | 00:01:14 | 00:08:15
dc2.large | af-south-1 | 8 | 00:02:21 | 00:00:10 | 00:02:01 | 00:02:34
ds2.xlarge | us-east-2 | 8 | 00:01:56 | 00:01:46 | 00:01:03 | 00:06:37
ds2.xlarge | eu-south-1 | 10 | 00:01:53 | 00:00:32 | 00:01:15 | 00:03:00
dc2.large | ap-northeast-3 | 8 | 00:01:44 | 00:00:13 | 00:01:23 | 00:02:05
ds2.xlarge | af-south-1 | 8 | 00:01:44 | 00:00:20 | 00:01:20 | 00:02:21
dc2.large | eu-north-1 | 9 | 00:01:40 | 00:00:22 | 00:01:13 | 00:02:32
dc2.large | ap-northeast-1 | 9 | 00:01:38 | 00:00:18 | 00:01:22 | 00:02:22
dc2.large | ca-central-1 | 9 | 00:01:34 | 00:00:09 | 00:01:26 | 00:01:58
ds2.xlarge | ap-southeast-1 | 9 | 00:01:34 | 00:00:45 | 00:01:11 | 00:03:41
dc2.large | us-west-2 | 8 | 00:01:32 | 00:00:20 | 00:01:10 | 00:02:21
dc2.large | us-west-1 | 9 | 00:01:31 | 00:00:12 | 00:01:15 | 00:01:55
ds2.xlarge | us-west-1 | 8 | 00:01:30 | 00:00:30 | 00:01:05 | 00:02:49
dc2.large | eu-west-1 | 8 | 00:01:30 | 00:00:04 | 00:01:19 | 00:01:35
dc2.large | ap-southeast-2 | 9 | 00:01:29 | 00:00:11 | 00:01:11 | 00:01:52
ds2.xlarge | ap-northeast-1 | 8 | 00:01:29 | 00:00:18 | 00:01:05 | 00:02:04
dc2.large | eu-central-1 | 27 | 00:01:27 | 00:00:07 | 00:01:07 | 00:01:38
dc2.large | sa-east-1 | 8 | 00:01:24 | 00:00:08 | 00:01:16 | 00:01:42
dc2.large | eu-west-2 | 9 | 00:01:23 | 00:00:12 | 00:01:07 | 00:01:45
dc2.large | eu-west-3 | 8 | 00:01:23 | 00:00:06 | 00:01:14 | 00:01:35
ds2.xlarge | eu-north-1 | 9 | 00:01:23 | 00:00:13 | 00:01:04 | 00:01:54
ds2.xlarge | ap-southeast-2 | 9 | 00:01:23 | 00:00:16 | 00:01:06 | 00:01:47
ds2.xlarge | eu-west-1 | 8 | 00:01:22 | 00:00:08 | 00:01:13 | 00:01:35
ra3.xlplus | ap-northeast-1 | 9 | 00:01:21 | 00:00:14 | 00:01:05 | 00:01:53
ds2.xlarge | ap-northeast-2 | 9 | 00:01:21 | 00:00:13 | 00:01:09 | 00:01:43
ra3.xlplus | eu-west-1 | 9 | 00:01:20 | 00:00:12 | 00:01:03 | 00:01:39
ds2.xlarge | ap-south-1 | 9 | 00:01:20 | 00:00:09 | 00:01:10 | 00:01:39
ds2.xlarge | us-west-2 | 9 | 00:01:20 | 00:00:11 | 00:01:01 | 00:01:42
ds2.xlarge | sa-east-1 | 7 | 00:01:19 | 00:00:06 | 00:01:10 | 00:01:31
ra3.xlplus | us-east-2 | 9 | 00:01:19 | 00:00:16 | 00:00:58 | 00:01:51
ra3.xlplus | us-west-1 | 9 | 00:01:19 | 00:00:16 | 00:00:55 | 00:01:44
ra3.xlplus | us-west-2 | 9 | 00:01:19 | 00:00:12 | 00:01:05 | 00:01:43
ds2.xlarge | ca-central-1 | 9 | 00:01:16 | 00:00:08 | 00:01:05 | 00:01:32
ra3.xlplus | ap-northeast-2 | 8 | 00:01:15 | 00:00:13 | 00:00:58 | 00:01:38
ds2.xlarge | eu-west-3 | 8 | 00:01:15 | 00:00:08 | 00:01:04 | 00:01:29
ra3.xlplus | sa-east-1 | 9 | 00:01:14 | 00:00:08 | 00:01:00 | 00:01:30
ra3.xlplus | eu-north-1 | 9 | 00:01:13 | 00:00:07 | 00:01:04 | 00:01:26
ds2.xlarge | eu-central-1 | 10 | 00:01:12 | 00:00:08 | 00:01:02 | 00:01:27
ds2.xlarge | eu-west-2 | 9 | 00:01:12 | 00:00:07 | 00:01:04 | 00:01:29
ra3.xlplus | eu-west-2 | 9 | 00:01:11 | 00:00:06 | 00:00:57 | 00:01:19
ra3.xlplus | eu-central-1 | 11 | 00:01:09 | 00:00:06 | 00:00:58 | 00:01:24
(60 rows)
So I’m working on the cross-region benchmarking white paper.
I fire up three two node cluster in twenty-two regions (more or less - a few regions do not support all node types).
The code I currently have is structured to bring all these clusters up, benchmark them, and then once that’s all done, write the results to Postgres.
Problem is, I am as you might expect running into reliability issues; and so the system for producing benchmarks simply isn’t reliable enough. I’ve had one successful full run (and each full run costs about 15 USD).
Obviously, unexpected disconnections can occur - but that’s an obvious and network induced problem, and is handed by automatic re-connection.
The problem I’m looking at right now seems to be a failure in Redshift, where the “Workload Manager” (presumably a process providing WLM) failed.
Here’s what I find in STL_ERROR
;
userid | process | recordtime | pid | errcode | context | error
--------+----------------------------------+----------------------------+------------+---------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | pulse | 2022-01-23 15:07:37.815633 | 9704 | 0 | CRITICAL_PROCESS_MISSING | Process WLM Process (9990) is not running.
0 | pulse | 2022-01-23 15:07:28.063039 | 9704 | 0 | CRITICAL_PROCESS_MISSING | Process WLM Process (9990) is not running.
0 | pulse | 2022-01-23 15:06:58.059784 | 9704 | 0 | CRITICAL_PROCESS_MISSING | Process WLM Process (9990) is not running.
1 | wlm | 2022-01-23 15:06:33.508236 | 1073897540 | 1038 | wlm | Workload Manager (pid 9990) failure
1 | wlm | 2022-01-23 15:06:33.508181 | 1073897540 | 1009 | | 'error: could not complete because of conflict with concurrent transaction code: 13000 context: PG ERROR query: 0 location: wlm_main.cpp:152 process: wlm [pid=9990] '
1 | padbmaster | 2022-01-23 15:06:33.507743 | 1073807462 | 9 | | could not complete because of conflict with concurrent transaction
(8 rows)
And it looks like it fell over because in its own internal working,
it issued queries on system tables it uses that a serialization
isolation failure occurred (and indeed, at this point, I’m issuing
multiple concurrent insert
queries to the same table, where
each query runs on a single slice per node.)
Note I am not getting a serialization isolation failure. Rather, from my point of view, my Python code, I’m issuing code and the thread doing so just stops.
My guess is there’s no disconnection, so the call to
psycopg2
to issue SQL doesn’t fail - so the thread just
sits there.
It’s not obvious to me how I can deal directly with this problem, client side : I issue a query, and it never returns.
Probably to psycopg2
it just looks like a normal,
long-running query.
I’m going to experiment now with the statement timeout functionality
in psycopg2
.
In other news, in the current benchmark run, I had a
dc2.large
cluster which took 73 minutes to come up.
It looks like it’s benchmark performance was really off the scale bad, too.
I have noted for a long time a strong correlation between bring up time and the performance of a cluster when you’re using it.
Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers