First pass at getting function diffs up.
Works, but the links to old/new current broken.
Functions can’t be diffed like tables or views, because they are not unique by their name, so I can’t tell if a function has been modified - imagine I have ten functions with the same name but different arguments, and say three of them change arguments, but all keep the same name. Which old function has changed into which new function? so for functions, there’s no concept of modified - only old and new.
Finally, functions diff on everything - which is to say, schema name, language name, volatility, return base data type, function name, argument base data types and their order.
Here’s an example of a release with a lot of action.
https://www.redshift-observatory.ch/system_table_tracker/1.0.46534_to_1.0.46607/index.html
I wanted for a very long time to publish a PDF which describes all the major considerations involved in using Redshift - the cluster, nodes, tables, queries, data distribution, everything.
The problem with this document is that although by ordering the content appropriately, and explaining concepts as it goes along, it can be made viable for weakly technical readers, it is too long for such readers; they do not want to know all the details, but the key, central issues.
As such I’ve today written a much shorter, punchier document, which directly cover the main considerations - in fact, it’s pretty much the written version of the lecture I have found myself giving again and again and again, to clients.
I need to write the conclusion, and then the document needs a certain time to mature, to be polished. It shouldn’t be too long, certainly this week.
So I finally added the four new regions to the back-end I have for tracking Redshift stuff.
The regions are;
Now, I Googled beforehand to see which regions support Redshift, and didn’t find anything which obviously listed this information, but I did find a page listing Redshift API support and so on, and all these regions were listed.
Well, having just tried now to fire up dc2.large
clusters, I can say none of these regions support Redshift, at least,
not for me, doing that.
Pricing info exists in all these regions for RS, though, and that’s now up.
Too late now for a write up, but two new version of RS have come out - and it looks like in the first (which is still latest in some regions) some system tables were dropped accidentally, as they have been reinstated in the second version. That or my code is buggy, which is possible.
https://www.redshift-observatory.ch/system_table_tracker/1.0.48004_to_1.0.48714/index.html
https://www.redshift-observatory.ch/system_table_tracker/1.0.48714_to_1.0.48805/index.html
Example of this is sys_copy_job
, but most of the tables
dropped in 48714 have come back in 48805.
I ran the benchmarks, too, but again too late for write-up. Tomorrow.
I need to spin up a 48714, manually, to make sure some of the tables really are missing.
(Addendum. I’ve manually brought up a 48714 and 48805, and confirmed
sys_copy_job
is missing in the former and has returned in
the latter. Looks like my code is okay :-)
So, I’ve been working on-and-off for some time on a white paper which covers the fundamentals of Redshift.
This is not that white paper.
That white paper is completely thorough, and although written for a weakly technical audience (managers and better), by being thorough, for there are a lot of concepts to cover, it is long, and by being long, detracts from its approachability. Only IT professionals will read it, I think.
So I’ve produced this white paper, now, which is written not to be completely thorough, but to be short and punchy, and to cover the critical concepts as succinctly as possible.
If you’re using, or thinking of using, Redshift, this is the white paper now to read, or to have someone read for you, so you can learn about everything you currently have no idea about whatsoever, and which you would otherwise run into, and quite possibly have a miserable and then failed project before moving onto a different database.
This white paper is basically the hour long conversation I have had innumerable times with I don’t know how many people, who have been having problems with Redshift, or were thinking to use Redshift.
Thank God for writing, as from now on I can just send them a URL :-)
(A bit delayed - the benchmarks were taken on the 6th April, which is about a week late, and published only today, which is two weeks late.)
ap-northeast-2
for both dc2.large
and
ra3.xlplus
both are showing very high standard deviation
for the disk-read-write benchmark. Deviation should be about 0.05s, but
is about 2s. This happened six weeks ago as well.
eu-west-1
dc2.large
still very slow for
disk-read. Normally, 0.07s to 0.18s, two weeks ago, 0.64s, this week
0.52s. Seems to be an ongoing issue.
eu-west-2
dc2.large
disk-read-write
seems to have been upgraded. Currently 3.2s, usually 4.5s to
6s.
us-east-1
dc2.large
has the same
disk-read problem as eu-west-1
. Currently 0.53s, should be
0.07s.
http://www.redshift-observatory.ch/cross_region_benchmarks/index.html
ap-northeast-2
dc2.large
network
performance fell back again, to ~8s - normally 4s.
ap-southeast-1
ra3.xlplus
network
performance fell back from ~2.3s to 3s. This is unusual -
ra3
nodes are usually very consistent.
eu-central-1
dc2.large
disk-read-write
benchmark ran at 6.3s, the norm for the last six months has been ~4s
(before that, before the general upgrades, ~7s).
eu-central-1
ra3.xlplus
looking back
over the benchmarks is flipping between about 2.5s and 3.5s. This is not
seen in other regions.
eu-west-1
dc2.large
(like
eu-central-1
) ran slowly on the disk-read-write benchmark,
5.5s rather than the usual ~4s, but the last month of ~10x slowdown on
the read benchmark has gone, with read performance returning to
0.06s.
us-west-1
dc2.large
disk-read-write
benchmark continues to show very high standard deviation, with the
benchmark at about 4.6s but the deviation being about 2.5s (in other
regions, usually about 0.2s).
https://www.redshift-observatory.ch/cross_region_benchmarks/index.html
New PDF, MERGE
, which oddly enough is about the new
MERGE
command.
https://www.redshift-observatory.ch/white_papers/downloads/MERGE.pdf
Serverless has been in GA for about a year, and I’ve not investigated, in part because it’s expensive (sixty-second minimum billing for a single query, including system table queries), and in part because the system tables have been very nearly almost completely removed, which will make investigation time consuming.
However, over the year, I’ve been picking up hints and clues, and forming up something of a feeling for what might be going on, and now I’m writing out my current thoughts.
Historically, Redshift has been a clustered database, and so there is a cluster, composed of nodes, the database being a server itself (a database server), which runs on server hardware (the cluster).
As of July 2022, AWS introduced a new Redshift variant, Amazon Redshift Serverless.
What “Serverless” here means is not defined.
From a user’s point of view, what’s changed is the general “computing power” of the cluster is now defined in and only in terms of “Redshift Processing Units”.
Exactly what these are is not defined, but the minimum number is 32 and the maximum is 512, and billing is now no longer per second of cluster up-time, but is per-second of Redshift Processing Units usage (with a sixty-second minimum charge). Exactly what usage is, is also not defined.
The most obvious issue is that it is not clear in what way(s) Serverless scales.
Does it scale in terms of accelerating individual queries? does it scale in the number of concurrent queries?
Is a single Redshift Processing Unit a single full speed query, and more Redshift Processing Units will make no difference to performance? or if I have a single query, and I add Redshift Processing Units, will that query go faster?
Time to examine such clues and hints as I’ve found.
There’s a fascinating reply, from an AWS staffer on re:Post, here;
https://repost.aws/questions/QUacMGrtjYSAaVzCmStUidoA/concurrency-scaling-with-redshift-serverless
If you are at maximum RPU’s then Redshift will queue up user queries for execution.
Now it might be old-style queues and slots have gone, and this is a new type of queue which is for when serverless capacity is all in use, but I doubt it - my guess is that we still have queues and slots, and that we have is AutoWLM deciding how to allocate resource between queries.
https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-billing.html
Amazon Redshift Serverless scales up from its base RPU capacity to handle periods of higher load. It some cases, RPU capacity can remain at a higher setting for a period after query load falls.
That does not sound very serverless to me. That sounds like servers being added and removed by some background process.
There’s an article from FiveTran, which is Serverless
cheerleader-squad (and compares Serverless vs a single node
dc2.large
, which shows you where that article and author
are at), but it has a fascinating nugget of information : changing the
maximum number of Redshift Processing Units takes some
minutes.
https://www.fivetran.com/blog/i-took-redshift-serverless-for-a-spin-heres-what-i-found
64 RPUs → 128 RPUs = 5 min 27 sec
128 RPUs → 64 RPUs = 5 min 25 sec
128 RPUs → 256 RPUs = 7 min 15 sec
32 RPUs → 512 RPUs = 4 min 49 sec
512 RPUs → 256 RPUs = 3 min 58 sec
That does not look serverless. Serverless is like lambda, where you have a large array of instances and when you fire off a lambda function, it goes to one of those servers; you don’t have to wait while you change the maximum number of lambdas available to you. That sounds like a cluster - you are adjusting how many nodes, or perhaps their type, are provisioned in your cluster.
Next we have this, which is a page with the filename “serverless known issues” in the URL, but the page title is now the more diplomatic “Considerations when using Amazon Redshift Serverless” :-)
https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-known-issues.html
What we see here is that we need in our VPC about one IP address per 3 Redshift Processing Units. That has to be a node.
However, mitigating against the idea of a leader node is this;
https://docs.aws.amazon.com/redshift/latest/dg/declare.html#declare-constraints
The maximum cursor size for Serverless is small, at 150,000mb - unless it’s a typo, which is might well be, because the docs are not well checked.
We see though clearly here that the cursor is being materialized in full on some kind of resource constrained hardware.
(We can also see that single node ra3
get a very small
cursor maximum, just as single node cluster of other types do - we can
reasonably think that single node ra3
, like
dc2
, is squeezing a leader node and worker node into a
single node of hardware, and so should be avoided.)
The amount of store allowed to serverless is large enough it must be cloud based and that must mean RMS.
https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-capacity.html#serverless-rpu-capacity
The 8, 16, and 24 RPU base RPU capacities are targeted towards workloads that require less than 128TB of data. If your data requirements are greater than 128 TB, you must use a minimum of 32 RPU.
and
Configurations of 8 or 16 RPU support Redshift managed storage capacity of up to 128 TB. If you’re using more than 128 TB of managed storage, you can’t downgrade to less than 32 RPU.
That much store can’t be local disk.
My current guess is that Serverless is a single normal Redshift
cluster, with Concurrency Scaling Clusters providing extra capacity (on
ra3
nodes CSC can perform write operations to RMS, but with
lots of limitations),
AutoWLM handling queues and slots - so it’s all existing functionality -
what has actually changed is and only is billing.
Billing is now per second of total cluster size - i.e. the cluster itself, plus any CSC clusters currently running.
When you change maximum Redshift Processing Units, you’re actually getting a new cluster spun up, which is appropriately provisioned to the maximum you’ve chosen.
The system tables have been almost completely removed, in part I would guess because if they were retained, it would be obvious serverless was in fact still a single cluster; which means you now have no clue what state your tables are in, or your users, privileges, queues, queries, you name it. This is not a serious product; a database with no meaningful system tables is a toy, not something for use in business critical systems.
The one caveat to all this is that it does not explain the low cursor size limit, but that limit needs to be verified.
The next steps for me are to get a cluster running, check to see if single queries scale as more Redshift Processing Units are added, and check the cursor size maximum.
Version 2 of the MERGE document has been published, with the explanation from mauro (from Slack DBT) as to the purpose of the temp table.
The abstract, below, is the new version 2 abstract.
The MERGE
command is implemented as a wrapper around
existing SQL commands, calling CREATE TEMP TABLE
,
UPDATE
and INSERT
or DELETE
. The
SQL standard specifies the MERGE
command will either match
(and so update) or not match (and so insert) each source row. When
manually issuing an UPDATE
followed by an
INSERT
, it is possible for a row to match and not
match, when the change performed by the update causes a row which did
match to no longer match. This problem is solved by using a temp table,
to store all rows which are to be inserted, prior to running the update.
However, in the case when the data is such this problem does not occur,
the temp table is not necessary, and is pure overhead. As such,
MERGE
is a kind of worst-case implementation; it has to
always work correctly, so it always uses a temp table, even when it is
not necessary. Finally, MERGE
mandates the use of and only
of a table for merge source rows, and I can see no reason for this, as
none of the wrapped commands require it, and all accept a sub-query or
view.
https://www.redshift-observatory.ch/white_papers/downloads/MERGE.pdf
In a couple of regions - us-east-1
and
us-east-2
, and now also sa-east-1
, at least
with dc2.large
(the only type I’ve tested with so far)
something odd is happening.
Clusters start up, and I can see they are available and I can get an
IP, but the clusters continue to claim to have an active operation (even
though they are in the available
state), and this means
firstly, my scripting is getting confused when starting clusters and
consider those clusters failed (but in fact they have been started, and
I now must manually delete them) and also that I can in fact not delete
them manually until the seemingly-spurious ongoing operation finally
completes.
I’ve not looked more closely yet (it’s 2am).
I’ve been noticing this for a week or so, I thought maybe it was transient, but it seems to be ongoing.
Two new RS versions out - 1.0.49676, which is now in two regions only, and 1.0.49780, which is in all the other regions.
Quite a lot of change (new function
user_is_member_of_role()
in 49676 looks particularly
interesting), but also 49676 has introduced quite a bit of new stuff
which 49780 has removed (including that new function).
https://www.redshift-observatory.ch/system_table_tracker/1.0.49087_to_1.0.49676/index.html
https://www.redshift-observatory.ch/system_table_tracker/1.0.49676_to_1.0.49780/index.html
Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers