SYS_TRANSACTION_HISTORY
As regular readers (both of you =-) will know, I’m a big fan of the system tables.
As irregular readers will not know, that was irony.
As a part of this, I absolutely avoid using all of the new
SYS_
views, because AWS have blocked users from seeing the
source code of those views, I think as part of their effort to hide that
Serverless is not serverless.
As such, I have no idea what the SYS
views are doing,
and there are so many issues in the system tables that you MUST
be able to see the source of the views, to figure out the problems, so
you can work around the problems, so you can get at least something out
of them.
Without that, you’re going to get screwed over, because the information you’re getting is not going to be what you think you’re getting.
With current client, someone was using
SYS_TRANSACTION_HISTORY
, and coming up with very different
numbers/stats to me, for commit behaviour.
So I today had a bit of a look at it - and what I’ve found is that
the column commit_start_time
is in fact the start of the
commit queuing, not the start of the commit work.
Commit queue times on our system vary from zero (rare) to two minutes - which completely destroys any analysis which thinks its looking at the time taken for actual work.
The docs say nothing (the say this - I quote - “The start time of the commit”, which is in fact wrong), and I believe the client was using these numbers under the mistaken impression they were measuring actual work time only. I mean, that’s what you would expect, because having only start of queuing and end of work is absolutely bloody useless and no one with any sense (for that I think we could well read real world experience) would provide that.
I think SYS_TRANSACTION_HISTORY
is actually pulling in
data from STL_COMMIT_STATS
, which has separate columns for
start of queuing and start of work, and if we could view the source,
we’d be able to see that and know what was wrong. As it is we can’t, so
the SYS
views simply have to be avoided. They might as well
not exist for all the good they do, except they do exist and
people are using them and being harmed by them.
AWS, with regard to Redshift, have a culture of secrecy, and this is absolutely and categorically harmful to Redshift users.
I use Pandoc for all my doc work - I write markdown, and convert.
Right now Pandoc is being a complete fail.
I have a text file, an email, I want to convert to PDF.
I’ve quoted with “>”, and added `” where I want to add content into the document.
What do I get?
The quotes disappear and there’s no way to control the font size (I mean there is, it just doesn’t do anything).
So I’m converting the email to HTML, setting the style for
<p>
to give me pages when I need it, and using
wkhtmltopdf
, because it’s easier to do this than the work
to figure out how to solve these incredible problems with Pandoc.
Rumour has it that STL
tables (views after 2019) are not
tables as such, but are table-like interfaces onto log files.
I’ve just done a bit of ad hoc checking; they do not appear to be column store (processing two columns takes as long as one column), which is expected, but unexpectedly, they do not appear to take part in transactions.
I ran a query on STL_QUERY
, on a busy system. First part
is a CTE which produces one row per row in the table. Then I have two
more CTEs, both of which use the first.
In both CTEs, I count the number of rows. Number of rows should be identical - the second and third CTE are both using and using only the first CTE as the and the only table in their from clause.
However - I get different counts. The AWS metrics queue differs by 9, one of the user-defined queues, which is busy, differs by 204 (the total query time for was some minutes, so it fits just fine).
So it looks like to read STL
tables correctly, we need
to into a limiting clause, something like
starttime < current_timestamp
, so ensure we get the same
data throughout a query.
This, assuming I have it right, absolutely and categorically needed to be documented by AWS. It’s incredible that it is not.
(I can do a stronger check of this - emit
current_timestamp
from the query, then look to see how many
queries turned up after that time, and see if it matches up - but that’s
for when I have a bit of spare time.)
What also surprises me here is that the first CTE isn’t being
materialized. Maybe it’s because the underlying table is an
STL
. It looks to me like all that might be happening is
textual substitution within the query.
This in RS release notes;
https://docs.aws.amazon.com/redshift/latest/mgmt/cluster-versions.html#cluster-version-183
Amazon Redshift patch 183 Improves performance of INSERT/COPY statements for Serverless or provisioned data warehouses elastically resized by 2x or higher in size.
(Because Serverless is an elastic resized normal cluster.)
SYS_QUERY_HISTORY
prod=# select count(*) from stl_wlm_query;
count
---------
7327667
(1 row)
prod=# select count(*) from sys_query_history;
count
----------
14815791
(1 row)
prod=# select count(*) from stl_wlm_query where query in ( select query_id from sys_query_history );
count
-------
0
(1 row)
query_id
in sys_query_history
looks like it
is NOT the query ID everywhere else.
Incredibly, this is not documented.
I’m absolutely speechless.
If this is really true, the people doing this are raving lunatics, and the people responsible for docs need to put in a chair and have kids TV slimey goo dumped on them until they quit AWS.
Also worth reminding here that AWS specifically block you from
viewing the SQL of the SYS
views, so we now can’t figure
out what’s going on.
SYS
viewsAn AWS chap pointed me at this;
https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html
Which has in it this;
Improving query identifier tracking using the SYS monitoring views
And this tells us the query identifiers in STL
views
differ to that in SYS
views. There is
documentation about this.
What I would say though is that if you have something called “query ID” in one view, and you have something else called “query ID” in another view, and you have nothing on the page itself of the view telling readers the two things with the same name are not the same, no one is going to think they are two different things with the same name and they need to look in the docs to find an explanation of this.
Also, far as I can see (the docs to my eye specially avoid saying this, but it looks to be so) there remains no way to map STL query ID to SYS query ID, so the new system tables are wholly segregated from all the existing system tables, with regard to queries (transaction ID is the same in both, and maybe you can try to kludge it that way, but that’s not a solution, that’s a slow and awkward workaround).
I have to say, the looseness of the writing in the docs drives me crazy.
In SYS views, the query_id column records user-submitted queries in their original form.
First, the query_id
column records the query ID
of queries. It does not record the query. Trivial issue? here,
yes. Other places, NO. Other places you find out what’s written
is just plain wrong, and you could only come to know that by
investigating what actually goes on, so you can then understand how the
author wrote loosely.
Second, what is original form? it’s not explained. We then get an explanation that a query can be broken down into child queries, and if you want to see those queries, you use another table. We are left to infer “original form” means before being broken down. That’s not how you write technical documentation.
Third, it’s glossed over, but I suspect (I’ve not checked) where the
text says “user-submitted” it means for example all the queries issued
by user rdsdb
are being censored. All the queries used to
run the console, health check, and so on. They actually perform a fair
bit of work. You should understand what you’re not seeing.
I stay away from the docs. Firstly, my experience has been I cannot find what I’m looking for and secondly, when I do read the docs, I don’t trust them, and the writing often drives me crazy.
I’ve been working on a product, replacement system tables.
One of the views gives information about nodes.
Today we had the weekly cluster reboot.
The reboot ran, and I had a look at the nodes, and saw this;
node | ps | fs | ts | disks | disk_size | total_disk_size | disk_used | rms_used | total_used
------+----+----+----+-------+-----------+-----------------+-----------+----------+------------
0 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12401559 | -1 | 12401558
1 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12492962 | -2 | 12492960
2 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12452633 | -2 | 12452631
3 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12464522 | -2 | 12464520
4 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12238519 | -2 | 12238517
5 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12546885 | -2 | 12546883
6 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 2213868 | 10102487 | 12316355
7 | 0 | 16 | 16 | 4 | 6772561 | 27090244 | 12061677 | -2 | 12061675
(8 rows)
rms_used
is the number of blocks in RMS (S3 or
S3-like).disk_used
is the number of blocks on local disk.stv_blocklist
, one
row per block). Really this means 0.ps
/fs
/ts
is partial slices,
full slices, total slices (compute/data slices, to use AWS
terminology).So what do we see here? well, we had 8 nodes, and we still do, but one of them has almost all its data in RMS.
I waited a bit and looked again and saw that the number in RMS was going down, at maybe 5 GB per second (my sense of it by eye) and those blocks were being transferred onto local disk.
What we had in fact was a new replacement node, and it arrived when we rebooted.
With about 12 terabytes on the node, and 5 GB per second, it would copy down fairly quickly - but then we issued an elastic resize.
After that, we saw (as expected) this;
node | ps | fs | ts | disks | disk_size | total_disk_size | disk_used | rms_used | total_used
------+----+----+----+-------+-----------+-----------------+-----------+----------+------------
0 | 0 | 19 | 19 | 4 | 6772561 | 27090244 | 12997353 | 1570626 | 14567979
1 | 0 | 19 | 19 | 4 | 6772561 | 27090244 | 13097940 | 1636651 | 14734591
2 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 13051688 | 815274 | 13866962
3 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 13074460 | 1063083 | 14137543
4 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 12837869 | 878998 | 13716867
5 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 13156211 | 878311 | 14034522
6 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 3937495 | 9899686 | 13837181
(7 rows)
So, here what we have;
All the nodes were node downloading, and the query which produces this output now took about 30s, 35s to run, rather than being instant.
I watched this for a while and data copied down pretty quickly, and about five mins later we were like this;
node | ps | fs | ts | disks | disk_size | total_disk_size | disk_used | rms_used | total_used
------+----+----+----+-------+-----------+-----------------+-----------+----------+------------
0 | 0 | 19 | 19 | 4 | 6772561 | 27090244 | 13919059 | 644379 | 14563438
1 | 0 | 19 | 19 | 4 | 6772561 | 27090244 | 14042085 | 690853 | 14732938
2 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 13865367 | -2 | 13865365
3 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 14038631 | 97511 | 14136142
4 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 13715259 | -2 | 13715257
5 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 14032959 | -2 | 14032957
6 | 1 | 18 | 19 | 4 | 6772561 | 27090244 | 4932616 | 8902971 | 13835587
(7 rows)
And a minute or two later everything was down except for the new node. The query to produce this output was back to running instantly.
We then re-activated the ETL system, and at that point the rate at which the new node downloaded became a lot slower - maybe 100 GB/hour.
So we’re currently looking at about another four hours before the new node has caught up.
I’m surprised the shadow of an elastic resize is so long once the system is active, but hey, it makes sene - so it looks best to let the system fully download data from RMS before going to work. So you should totally get hold of this view from me so you can know when that’s happened :-)
My experience of cluster level work is that RS does quite a bit of work to look as if the operation has been done quickly, but actually under the hood its still working hard (and you just don’t know about it - you only see it in query performance).
Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers