Basics of Scalability, Latency and Cloud Elasticity

What is Scalability?

Scalability means the application’s ability to handle and withstand increased workload without sacrificing performance.
For example, if your app takes x seconds to respond to a user request. It should take the same x seconds to respond to each of your app’s million concurrent user requests.
The app’s back-end infrastructure should not crumble under a load of a million concurrent requests. It should scale well when subjected to a heavy traffic load and maintain the system’s latency.

What is Latency?

Latency is the time a system takes to respond to a user request. Let’s say you send a request to an app to fetch an image and the system takes 2 seconds to respond to your request. The latency of the system is 2 seconds.
Minimum latency is what efficient software systems strive for. No matter how much the traffic load on a system builds up, the latency should not go up. This is what scalability is.
If the latency remains the same, we can say that the application scaled well with the increased load and is highly scalable.
This latency is generally divided into two parts:
- Network latency - is the time that the network takes to send a data packet from point A to point B.
  - The network should be efficient enough to handle the increased traffic load on the website.
  - To cut down the network latency, businesses use a CDN (Content Delivery Network) to deploy their servers across the globe as close to the end-user as possible.
  - These close-to-the-user locations are also known as Edge locations.
- Application latency - is the time the application takes to process a user request.
  - The first step is to run stress and load tests on the application and scan for the bottlenecks that slow down the system as a whole.

How can we scale?

There are two ways to scale an application:
- Vertically
  - Vertical scaling means adding more power to our server. Let’s say our app is hosted by a server with 16 gigs of RAM. To handle the increased load, we now augment the RAM to 32 gigs.
  - Ideally, when the traffic starts to build on the app, the first step should be to scale vertically. Vertical scaling is also called scaling up.
  - Simple since no code refactoring is needed nor any new complex configs are needed.
  - There is a limit to scaling up - due to obvious technical limitations you cannot scale up infinitely
  - Also dynamic scaling up/down in real-time is not possible.
  - Since the same machine is scaled up - availability is an issue
  - Choose this when traffic is predictable, consistent, and within a certain limit
- Horizontally
  - Horizontal scaling, also known as scaling out, means adding more hardware to the existing hardware resource pool. This increases the computational power of the system as a whole
  - there is no limit to how much we can scale horizontally, assuming we have infinite resources.
  - Also can scale dynamically in real-time as the traffic on our website climbs and drops over a period of time.
  - Since multiple machines are available - availability is not an issue
  - Code needs to be adapted to work with distributed systems.
    - why so?
      - your code has to be stateless.
      - No static instances of class - coz if the server goes down, data is lost. generally, static instances hold values that are shared by all objects - this pool is per classloader. if the server goes down, the classloader is gone and hence data is gone.
      - hence use distributed memory like Redis, Memcache

Cloud Elasticity

The process of adding and removing servers, stretching, and returning to the original infrastructural computational capacity on the fly is popularly known as cloud elasticity. It saves businesses tons of money every single day!

Let's stay in touch for more insights and updates. You can find me on my LinkedIn Profile. Looking forward to connecting!

AI & Cloud Insights by Vinay C

Search This Blog

Basics of Scalability, Latency and Cloud Elasticity

What is Scalability?

What is Latency?

How can we scale?

Cloud Elasticity