Steps To Scalability

How do I increase the scalability of my system?

Before we tackle that question, we must first ask: what do we mean by scalability, and how exactly are we measuring it? Are we confident that is the right metrics to care about? How do we correctly break that down into component sub-goals?

For web-based systems, what we mean by scalability is often expressed the scale of possible usage—how many users your system can handle concurrently and correctly. And promptly, lest we forget that every hundred milliseconds of latency can translate into a massive difference in user behavior or usage pattern (users get tired of waiting and move on). If you want to improve the scalability of your system, here is my high-level approach:

Begin measuring. Obtain your baseline. Measure the changes.
Enumerate all relevant factors. Understand what are your best levers to pull.
Determine an acceptable long tail. Getting to 100% is extremely expensive on the margin.
Find the biggest bottleneck. Get the most bang for your buck.
Consider both the software and hardware. Add more hardware for a small constant factor of improvement. Change your software if you need an order of magnitude improvement.
Do the math. Know the unit economics of how your system consumes resources.
Test. Simulate load and collect data, make informed estimates, and obtain confidence in your system.
Iterate. Iterate until satisfied.

Understanding the scalability of your system is all about data collection. Once you have good data about your system’s resource usage, throughput, and ceiling limits, you will have a pretty good understanding of your system’s scalability.

Steps For Web-based Systems

Since clients like to hire my company to scale web applications, here are more specifics for scaling a web-based application in particular:

Measure. If you don’t measure at all, you don’t know what you’re doing has any effect on your system’s scalability. Get your baseline, so that you can calculate to what extent each effect is. Measure system response time, as the faster your system response time, the more people you can serve in parallel. Measure memory and other resource pressure with increased usage. Measure each individual system component you will be optimizing along the way. Measuring will inform your decision on what next to optimize, be it lowering memory consumption or reducing demand on the database or network. Keep in mind that “improvements in one area will often bring about improved performance in another, but not always; sometimes one can even be at the expense of another”.
Enumerate all relevant factors. While it may be most helpful to measure what is within your control, recall there are often several factors both upstream and downstream that can have a big influence over your metrics (e.g. ISPs, caching, user’s choice of device). By systematically going through the list of factors both inside and outside of your control, you avail yourself to all your options and you increase your chances of catching something you might have previously missed. Are you sending extra Javascript or CSS over the network that you don’t actually need? Are you serving your static assets yourself instead of offloading them to a CDN? Are you caching the most effectively? Is your data center located needlessly far from your users? As you enumerate, think of how much contributing force each factor has on the metric you care about.
Determine an acceptable long tail. Understanding that response times will fall on some non-uniform distribution is part of a mature mindset to improve scalability. If the 99th percentile of system requests meet an ideal standard, but a small long tail of requests do not, is that acceptable? Since solving the long tail is typically much harder and sometimes several factor times more costly than solving the initial easy cases, incurring the cost to squeeze the last 0.1% of requests to fall into the ideal bucket will very likely not be cost-effective. Define the minimal bar for your system’s success and what kind of long tail can be tolerated.
Find the biggest bottleneck. Look for where limited resources (e.g. CPU, memory, disk space, network bandwidth) are creating demand bottlenecks. A starving process in the system will quickly act as a bottleneck on your system. Profile where time is spent to discover application code that may have been written poorly or or processes that were configured sub-optimally. A very common mistake in application code is making more database queries then it needs to. The solution here is to reduce and optimize your database queries and add caching. By and large, the bottleneck for most web-based applications will be the database. As you clear away bottlenecks, you may find that eventually your biggest bottleneck will become how much load the database can handle. For nearly all our clients, we recommend a strategy where they power their database with increasingly bigger and better resourced machines. This strategy gets them very far, and in truth, a single-machine database will suffice for the vast majority of web applications. If your needs really do surpass the limit on how big and fast a database get on a single machine, you will have to resort to a distributed database. Avoid this for as long as possible. Increasing the complexity of your system will result in a very bitter future pill to swallow.
Consider both the software and hardware. In general, we tell our clients to add more hardware if they are looking for a small constant factor of improvement. If they need an order of magnitude improvement, we look to the software. A system that needs an order of magnitude performance improvement likely indicates that there are very bad database access patterns in the code or big-O algorithmic problems, two of the biggest performance culprits in the modern web software. Hardware is the right solution for a common class of cases since developer time is expensive and servers are cheap compared to developers. But if your code is needlessly resource-hungry in a big way, scaling hardware can be a bad idea and will mean paying for extra hardware needlessly. Good judgement is needed to know when it’s worth it to invest in optimizing performance and when to avoid scaling a bad system that isn’t working. In the end, well-designed software takes into account the scaling ability of hardware. It’s not at all surprising that software and hardware work best when they work together.
Do the math. Because server resources will always act as a limit on your system, understanding the unit economics of your resource consumption can be helpful. Do you know how much RAM/CPU is needed to process per some unit of user requests? If you already have data on how many servers were handling peak user numbers before your servers started to fall over, take that data and do the math to estimate your resource consumption numbers (e.g. how much RAM/CPU usage needed per some unit of users). If you don’t have that as a data source, seek out data in however way you can. Even something as crude as SSH-ing into your machines and using htop to watch a single machine’s performance as you hit it increasing amounts of load can serve as a good starting point. Use your estimate to calculate a sufficient size of server and how many servers you need. Multiply by cost of server to get an estimated system operational cost per unit of time and user. Also, measure how much database disk space is taken up for each unit of users to calculate space per unit of users. If you don’t have data or know beforehand the number of users that will need to be served, use whatever information you do have to make reasonable assumptions and come up with a plan of action if those assumptions turn out to be wrong. Remember to buffer, and if you can, cross-verify your estimates from different data sources where possible (from tools like APM software, from your cloud host provider, etc).
Always test. Software systems are frequently complex enough that coming up with answers is hard without testing end-to-end. How much can my new system now take? Sometimes, you won’t know until you have hit the failure limit. If possible, try to simulate the load you want to handle. If you cannot because it is too expensive or difficult, then you have to extrapolate from smaller, less expensive, easier-to-perform tests. Create a workload that would be representative of your expected load and run it. Remember to set up monitoring beforehand so that you can collect data and identify the key performance bottlenecks. Make estimates off that data. The more and better data you have to base your estimates off of, the more confidence you can have in your estimate. In addition to the representative workload test, another test to run is the worst-case scenario in order to figure out your upper ceiling. To do this, find your most expensive set of actions in the system and then create an entry point that takes those actions. One crude but simple approach can be creating a temporary, protected “load testing” endpoint that you’ve configured to perform your most expensive and/or common operations (frequently, involving some expensive interaction with the database) and then using code or a tool to make a ton of concurrent requests to hit that specific endpoint as you watch and measure your system. Crude can be effective enough. However you do it, test, and test as closely as cost-effectively possible to your real-world scenario. If you cannot test the entire system end-to-end, test each system component. If you cannot do that, stress test just your bottlenecks–if you understand where your application bottlenecks will most likely be and you stress test those places, you’ll obtain a fair picture of what overall load the system can handle.
Iterate. Is it fast enough? Is it memory-efficient enough? If not, figure out why. Employ the scientific method and model the problem mathematically. Iterate until satisfied.

Conclusion

Getting a good understanding of your system’s scalability requires methodical experimentation. Knowing which changes to make from there takes some expertise. Taking these above steps will set you on the right path, but prior experience can help a great deal. If you are looking to hire software experts who have the experience to help you make the right calls, you can reach out to my team. We serve as those experts for dozens of groups, and we’d be happy to serve as those experts for you as well.