What is Server Load Balancing?

Server load balancing is a technique (using load balancing hardware or programs) which seeks to distribute the work load in order to optimize IP-based queries from the Internet or Intranet throughout a server farm using various methods. The technique most commonly used is server clusters, especially high availability clusters.

After the initial set-up, the administrator can adapt these methods or scheduling rules to specific requirements. To think of it in more basic terms, server load balancing is like taking several individual servers and making them appear as one great big server. This is called clustering. In the most extreme cases, you can even have several clusters of servers and load balance across these separate clusters. You might see this on a site like YouTube.
Reasons for Load Balancing

There are many reasons a company would consider upgrading to a load balanced solution, but the most common reasons are scalability, high availability, and predictability. Consider a company with a website hosted on a dedicated server that is accessed thousands of times a day. Regardless of the company’s size, they need to be confident:

The site can handle sudden and/or gradual increases in traffic
The site is available all the time, 24 hours a day, 7 days a week, 365 days a year (24x7x365)
The site gives a consistent experience on each visit

1. Scalability

Scalability is being able to handle sudden and/or gradual increases in traffic. As you grow your web presence, ideally, you will have more people visiting your site and viewing more pages or making more requests. If you have a single server, these increases may overwhelm the server and result in downtime. If you need to upgrade, you will experience similar downtime as you move to the new server. Server load balancing allows you to seamlessly scale your hardware as needed. Imagine you had a farm, and each year you plowed a bigger field or one year started a second field. If you have a one-horse plow, you could look to buy a younger, stronger horse or get a new plow, but you’d need to take time away from farming to go find the right ones and buy them. Instead, you could simply add your other horse to the team to balance the extra work needed. Eventually you may need to have two teams of horse plows, but you can easily plan to scale up to that. The farm is like your web traffic, and the horses are like your servers. By adding a second server, you can balance the workload across both servers, and if needed, you can add a whole other cluster of servers – but you can easily plan to scale up to that, too.

2. High Availability

High availability is ensuring your site is available all the time, 24x7x365. With a single server, you may need to do maintenance or upgrade your hard drive or RAM. During that time, the site will need to be taken down, and customers might find their way to your competition. With server load balancing, you can upgrade or run maintenance on one server, and the additional requests are automatically sent to the rest of the servers in the cluster. Comparing this to our farm, if you have a one-horse team, your horse may get sick or need to be re-shoed meaning you can’t plow your field. If you had a multi-horse team, the rest of the horses could work a bit harder to complete the plowing without anyone really noticing the difference.

3. Predictability

Predictability is providing a user a consistent experience each time they visit. You may have certain times of day that more people visit your site, or you may run a promotion or release a new service. This is similar to the Slashdot effect. With a single server, peak times may mean a painfully slow website or web application. Customers may get frustrated and leave. With multiple servers, these peaks are easily addressed by sharing the workload. Comparing this to our farm, in the summer, the horse may slow down or may not be able to work at all in the afternoon sun. With several horses working together, working in the heat isn’t as big an issue because the workload is being shared by several horses.

Getting a more efficient server or improved network hardware alone will not meet all of these requirements, since it can only improve on the performance – this is like simply replacing your horse or your plow. To attain a high-level of availability with minimum downtime and fast access speeds, it is recommended to have two or more dedicated servers operating simultaneously. These mirrored servers must then be load balanced for automatic failover and detection of poor application performance in any of the online servers. If one mirror server fails, another mirror server takes over automatically. The server load balancer knows the extent of the load on the servers and can therefore direct queries in the most efficient and best possible way.
Different Methods for Server Load Balancing

There are a few methods for server load balancing and they are listed below.
NOTE: The method that you should use depends on your application and the types of servers that you have running.

Round Robin – This is the most common method of server load balancing. Each server that is load balanced is arbitrarily identified (i.e. A, B, C, D), and each visitor is sent to each server in that order. The first visitor goes to A, the second to B, and so on. In this example, the fifth visitor would be directed to server ‘A.’
Weighted Round Robin – This is a similar method of server load balancing that is often used when two different servers are being balanced. For example, you may have a single processor dual core server and a dual processor quad core server. The dual processor quad core server is approximately twice as fast and can handle twice the workload of the single processor dual core server. In this example, you could have the first visitor and the second visitor sent to the dual processor quad core server, while the third visitor would be sent to the single processor dual core server. This would be a situation where the servers are given a 2:1 ratio or weight, meaning twice the visitors are sent to one server as are sent to the other.
Least Connections – As the name implies, this method of server load balancing checks to see which server has the least number of connections (has the least number of visitors making requests), and sends new visitors to that server. This method is recommended if the application you are using has a big variance in the length of each session, with a mix of very short sessions and very long sessions. This method is not as good as the Weighted Round Robin for HTTP, unless you have some large file downloads of 100MB or more mixed in with web pages.
Fastest Response – As the name implies, this method of server load balancing checks to see which server responds to a query the fastest and sends new visitors to that server. This works well for applications where response time of the application and the operating system is directly related to the server load, however, this method is not advisable for most web servers.

Supported Server Protocols for Load Balancing

The server load balancer supports:

Most UDP and TCP/IP based protocols
DNS
HTTP
HTTPS (SSL)
SMTP
POP
IMAP
NNTP
FTP

The load balancer also supports popular applications such as:

Streaming Medi
Active Server Pages (ASP)
SQL
UDP/IP-based protocols that include DNS, WAP, RADIUS, and others

What is Server Load Balancing?