This source provides information about all queries - queued, running, aborted, completed, the lot.
The first question is the ordering of queries.
Now, in Redshift, there are leader node queries and there are worker node queries, and they are not quite the same.
Leader node queries run directly on the leader node; they do not participate in WLM (the queuing system for worker node queries), and so have no notion of queuing and are not compiled; they have only a start time, when they began executing.
Worker node queries participate in WLM, have a notion of queuing and
are compiled; so they have the time they enter the WLM system, the time
they begin queuing, and once they begin executing, the time they began
executing (note, these days, compilation is now intermixed with
execution - it used to be all compilation occurred prior to execution -
more about this in cluster_query_compile_plan
).
(Leader node queries have only a process ID and a transaction ID. Worker node queries also have a task ID and a query ID. Note the transactions IDs for leader node queries are in a very different number range of the transaction IDs for worker node queries. However, transactions which contain both leader node and worker node queries do seem to work correctly.)
For ordering, leader node queries use their start time, and worker node queries use the time they enter WLM; the actual state of the query (queued, running, etc) is not considered in the ordering of queries.
The idea here is to see the flow of queries in actual chronological order, to convey the actual query flow as it arrive at the cluster.
A query - with the same query ID - can queue and execute more than once, and so can here have multiple rows. As far as I can tell, this is rare, so I’ve not taken any particular steps to deal with it, but be aware query ID is, surprisingly, not unique.
Queries which use Python functions obey different and more restrictive queuing rules, so you can see queries which apparently are being leap-frogged in their queue; this is why.
Name | Type |
---|---|
start | timestamp |
end | timestamp |
state | varchar |
routing:condition | varchar |
queue_id | int8 |
routing:queue | varchar |
routing:csc | varchar |
owner_user_id | int4 |
owner | varchar |
pid | int4 |
xid | int8 |
tid | int4 |
qid | int4 |
segments:compiled | int8 |
segments:total | int8 |
durations:queue | interval |
durations:compile | interval |
durations:execute | interval |
durations:perceived | interval |
i/o:bytes read:disk | int8 |
i/o:bytes read:network | int8 |
i/o:rows:inserted | int8 |
i/o:rows:deleted | int8 |
i/o:rows:returned | int8 |
i/o:bytes processed:in memory | int8 |
i/o:bytes processed:on disk | int8 |
text | varchar |
related pages | varchar |
For leader node queries, the time the query began executing, as there is no queuing on the leader node.
For worker node queries, the time the query was recognized by the WLM system (which is to say, the very first moment the query existed, prior even to the routing rules dispatching the query to a queue).
The time the query completed.
The state of the query. According to stv_wlm_query_state
,
some possible values are;
State |
---|
Classified |
Completed |
Dequeued |
Evicted |
Evicting |
Initialized |
Invalid |
Queued |
QueuedWaiting |
Rejected |
Returning |
Running |
TaskAssigned |
The docs are ten years old and manually updated, and I think not reviewed by technical staff, and so are quite often incorrect in some way, and specifically indicate this is not an exhaustive list, so take this as rough guide.
It’s not obvious to me what all of these mean, and the documentation is silent beyond providing this list.
The full text of the conditions of the routing rule which routed this query.
The queue ID. This column is emitted in CSV exports only.
The name of the queue which received this query.
CSC
if this query went to a CSC cluster,
NULL
otherwise.
The owner user ID. This column is emitted in CSV exports only.
The owner user name.
The process ID.
When a user connects to the cluster, a process is spawned, which accepts that connection. That’s the process the ID is for. That process is also the process which issues queries, and when you need to cancel a query, you in fact cancel the process, which by being cancelled, inherently also cancels the query it is running.
The transaction ID.
The task ID.
I’m not clear what tasks are really about. I think it’s to do with
queries being bounced around queues by routing rules, but I’ve not
looked into it. I did think it would mean that a query ID was used for
and only for one query, but this is not the case; to my considerable
surprise, you can find the same query ID showing up more than once in
stl_query
, for example.
The query ID. This is really the WLM query ID; leader node queries have no query ID, at least, not that you see in the system tables. They do really have query IDs, of course, but they’re on the leader node only and they never show up in the system tables. All we see from the leader node are transaction IDs and process IDs.
Of the total number of segments, the number which were marked as
compiled
in svl_compile_info
.
However, this system table has over the years not changed, and so now is inadequate to represent what’s actually going on with compilation.
The total number of segments for this query.
The time spent queuing. I take this from
stv_wlm_query_state
. I know the same query can be queued
multiple times; I do not yet know what happens to the queue time,
whether it represents the cumulative queue time, or only the time of the
most recent queuing.
It used to be compiling occurred in full, prior to execution. These days, compiling occurs in-line with execution, with each stream, if it needs it, performing compilation before it begins.
Additionally, it is possible for a query to be executed more than once - for example, it might enter SQA, be executed for a while, then be stopped, and be queued again, and execute again, in a normal queue.
In this situation, it could be some or even all compilation occurs during the first execution in SQA, and so occurs potentially long before the second execution of the query.
Given all this, I can’t see any reliable way now to untangle compilation time from execution time.
So this column is simply the total time of every compile that has been performed, regardless of when or how it happened.
It used to be compiling occurred in full, prior to execution. These days, compiling occurs in-line with execution, with each stream, if it needs it, performing compilation before it begins.
What time means is that the times recorded in the system tables, which are the start of execution and the end of execution, no longer mean what they used to mean.
When compilation occurred prior to execution, the start and end execution times showed how long the query took to execute.
These days, compilation time is mixed into the execution time, so you would need to subtract the compilation time.
Unfortunately, it is also now possible for a query to be executed more than once - for example, it might enter SQA, be executed for a while, then be stopped, and be queued again, and execute again, in a normal queue.
In this situation, it could be some or even all compilation occurs during the first execution in SQA, and so occurs potentially long before the second execution of the query.
I think, although I am not yet certain, that execution time is immediately prior to the beginning of the first compilation.
As such, in the usual case, where a query executes only the once, subtracting the compile time from the execution time should give the actual execution time.
In the cases with multiple execution, of course, this will be wrong.
Right now, I simply subtract the execution start from the execution end, and that’s the execution time.
You need to remember it includes compile time.
This is the time from the query entering the WLM system, until the query completes - this is how long the user waits for their query to complete.
Note that it is possible for a query - with the same query ID, no less - to run more than once. I suspect in this case the times in the system tables for each run of the query are for that one query, and so the perceive time for the user is actually from the first query entering WLM, to the final run of the query completing.
Right now I do nothing about this - I look at only the current query.
This column then shows the total number of bytes read from disk, as
best I can judge the types indicated in stl_scan
.
This column then shows the total number of bytes read from network,
as best I can judge the types indicated in stl_scan
.
Importantly, it only counts the receive side of network
activity - the step is scan
, after all, not
broadcast
, so we’re not counting bytes twice.
The number of rows inserted into the table.
For tables with all
distribution, this is the physical
number of rows (i.e. one per node), not the logical number of rows.
The number of rows deleted from the table.
For tables with all
distribution, this is the physical
number of rows (i.e. one per node), not the logical number of rows.
The number of rows returned from the leader node to the SQL client.
This column then shows the total number of bytes processed by the
stl_aggr
, stl_hash
, stl_save
,
stl_sort
and stl_unique
steps, when running in
memory rather than on disk.
This column then shows the total number of bytes processed by the
stl_aggr
, stl_hash
, stl_save
,
stl_sort
and stl_unique
steps, when running on
disk rather than in memory.
The query text, truncated at the 48 character mark.