Scaling the growth of an app has been a really intimidating topic nowadays. Most of the internet resources and blog posts are revolving around scaling Ruby applications. It is very interesting to know what is the ceiling and how much Ruby on Rails can achieve for the rapid growth of applications.
Scaling increases Not by Speed but Throughput
Scaling hosts are only speeding up the response times when requests have been spending much time on waiting to be served by applications. If there are no existing requests waiting for service, scaling goes out to be just a waste of money.
Ruby on Rails scales applications correctly from 1 up to 1,000 requests per minute and the considerable thing we need to know about this scaling process is the amount of how HTTP routing and application servers are actually working.
How is Requests Routed to App Servers?
One important decision to make in terms of scaling a web application with Ruby is the kind of application server you need to choose. Most of Ruby’s scaling posts may be outdated because of the dramatic change that happened for the past 5 years and last year’s whirlwind change. Nevertheless, we still understand the existing advantages of each choice of application server. We just have to be more knowledgeable of how are these requests being routed to such application servers. A lot of developers still have little understanding of the exact process of how these requests are queued and routed.
Lifecycle of Requests
Once a request arrives to your Heroku app, the first place it stops is the load balancer. The job of a load balancer is to ensure that load between routers of Heroku are distributed evenly. This load balancer will pass off the request to any of the router that is best for it.
There are enormous numbers of undisclosed Heroku routers and we can assume right now that the number is pretty large, approximately above 100. The job of the routers is to the dynos of the application and passes the request to a certain dyno. It may take one to five minutes of locating the dynos and the routers will be attempting to connect with a random dyno in the application. When Heroku already chosen one random dyno, it waits for five seconds for the dyno to claim the request and establish an open connection. As the request waits, it will be placed in the queue of router’s requests.
Application Server Choices
- Phusion Passenger 5
- Puma (Threaded)
- Puma (Clustered)
Each of these servers varies in their claims in terms of speed but all of them can handle thousand requests every minute. Scaling applications are not only based on the response times. The application can slow down with the increased request queues. Request queues should be checked first because if they are empty, it will be a waste of money and time. This is also applicable to worker hosts. They are scaled based on deep job queue.