Index
- Initial Meeting / Discovery Phas
- Cluster Cost Reduction
- Emergency Fire-Fighting
- Consultation
- Design
- Implementation
- Training
Initial Meeting / Discovery Phase
So, first, we meet - usually by email of course - and you outline your situation and what you're looking for.
Having put me in the picture, I will then need to ask questions (by email or meeting) and/or even have a look around your system, to form up a picture of situation adequate to allow a quote to be produced.
Everything is quote based, because there's so much variation in work, with a single exception, which is cluster cost reduction. Here the price is and is only one month of the saving made.
What follows then now in this document is an enumeration and description of typical services and work. It's not prescriptive, so if you have something different in mind, that's absolutely fine.
Cluster Cost Reduction
AWS bill monthly. The price of this service is and only is the amount saved, for one month; there is no time-based monetary cost or minimum fee. There is of course non-monetary cost, in that I will need to be vetted, given access to your system, and very likely will need to ask questions about what you're doing with your system and organize changes to fit in safely with your ongoing use of the system.
The main method to achieve cost reduction is to reduce the number of nodes in the cluster, and/or to change the node type in use. To do either or both, it is necessary to do work to reduce disk use and/or improve query efficiency.
In general, in practise, there are two sets of changes; one set of superficial changes which give about 2x improvement, and a second set of fundamental changes - which require the correct use of sorting and so usually a complete system redesign - which give 10x or even 100x improvement. You may or may not wish to undertake the latter, particularly so as it requires training for your staff; if a correct design is taken over by staff who do not know about Redshift, it will be operated incorrectly and/or further developed but in ways which are incorrect, and the performance gains will go out the window. Sorting is a lot of work, but if you want timely SQL on Big Data, you have to live it, because there's no other way.
Emergency Fire-Fighting
A significant subset of clients reach out when their Redshift cluster is in crisis - essential business functions are not executing in a timely manner.
In this situation, it's a case of diving in and putting out fires immediately, if not sooner.
With regard to fire-fighting, with Redshift, there are basically two sets of incorrect operation; there's fundamental incorrect operation, where sorting is not correctly operated, where fixing this typically requires a system redesign, and there is superficial incorrect operation, where Redshift has a number of novel, unexpected and exciting ways in which to go wrong, but which can quite easily be fixed and/or circumvented.
Normally clients have both sets of problems, and so getting a failing system back on its feet enough for critical business function to run in a timely manner is achievable in a timely manner, by fixing superficial incorrect operation.
This gives enough performance recovery that essential business functions return to running in a timely manner.
Given the need for speed, there's no formal investigation or anything like that; it's a case of finding out what's wrong as quickly as possible and as each issue is uncovered, drawing up a fix, again, as quickly as possible.
This happens in part through meeting(s) with client engineer(s), where I ask questions and where necessary inspect the system (I have my own tooling for this) and start digging.
Normally in the first meeting there's a slew of fixes. It happens quickly.
With fixes, it's always been the client which instructs their engineers to proceed with implementing fixes (as it would take time for me to get up to speed on their system).
Consultation
Consultation is everything which is not actual practical work - so, basically, sitting down with me in a meeting and me explaining whatever it is you need to know.
Design
Design work has two stages.
The first stage consists of an initial meeting to determine if the use case(s) are viable for Redshift, and then, if they are, further meetings to find out about all the data sources and to find out about how the data is to be used and queried.
The second stage is then producing the design, and the time taken to produce a design, in practise, basically depends upon the total number of columns on all tables, because non-trivial work has to be done for each column, and where there are usually quite a few columns, that work then usually ends up dominating the total time taken.
Typically we're looking at two or three days for small systems, maybe two weeks for large systems, assuming that design work does not uncover deeper issues or questions (such as discovering that various data sources do not in fact join up, data quality issues serious enough to prevent implementation, and so on), and that client engineers are available without undue delay for questions.
Implementation
The time needed for implementation is pretty well known, as the design exists, and so a quote can be given before work begins.
As with design, it depends primarily upon the total number of columns, as non-trivial work has to be performed on a per-column basis.
Training
Training is absolutely, completely and totally essential for Redshift systems.
Redshift is not inherently or automatically scalable or timely on Big Data; for this to happen, the table design must be correct for sorting, cluster admin need to understand sorting and the implications of sorting, as do developers, and also users need to understand sorting.
All of these conditions must be met for Redshift to scale. If any of them are not met, Redshift does not scale.
Training for admin, developers and users differs.
Admin training is the simplest and shortest, user training next and a lot more complex primary due to needing to understand sorting enough to correctly write queries, with developers needing the most comprehensive training, as they need to understand sorting enough to design groups of tables correctly.
Training is available in a number of different formats; lectures to large groups, in smaller groups (up to four or so) with full interaction, or one on one.
Additionally, I find training, especially in complex matters (such as sorting), is iterative and users do much better to build up both training and experience together over time, and so what I prefer and which I think works best, is to have training over a period of time, in small doses as and when needed, so each trainee can take steps forward as and when they're ready for them (and work on any particular point until its understood). This approach also allows training to blend in with normal work.