Recently, while working on a SQL server optimization project, we had the opportunity to look into one interesting problem. We had a huge database (to the tune of 800GB) which was being hammered with approximately 30000 transactions per second. Database load was expected to grow by a factor of 100 in coming days and the idea was to devise a solution which could handle that load. This was a SQL server 2005 enterprise edition database hosted on an 8 processor fifth Generation server. We wouldn’t say this server was on its knees with this load but yes there were wait times longer than expected and to add to that there were times when data traffic suddenly went up significantly and in those times DB was not able to keep up.
Though this is not a very common scenario in many of the modern day applications out there but this definitely is a hallmark of databases handling loads from specific industries like banking. Applications intended for these industries normally have huge volume of small database transactions. In this article, we present one of the approaches you can take to handle a scenario such as this.
Introducing Broker Hub
To demonstrate the problem which has these type of database requirements, let’s use the example of a Broker Hub – a stock broking hub. Stock broking applications have very high volume of small database transactions and also there are spurts in database activity depending on market conditions. For simplicity sake, let’s assume that we were at a point when database design and usage pattern for Broker hub database was in the most optimal state.
First choice – Scale up
So to optimize Broker Hub further, we had a number of ideas and first choice was obviously to increase the hardware capacity. Increasing the hardware capacity did help the case. We could handle upwards of 50000 transactions a second by moving to a better system with 16 processors and a SAN array of high speed disks. But above 50000 in our load environments, we could still see the database to be the bottleneck.
Next obvious idea was to try SQL Server 2008 which has support for performance optimization features like advanced compression (reducing the overall disk IO) and support for virtually unlimited number of objects (2,147,483,647) and database size (524,272 TB). Again we could see the difference. Without enabling features like compression we could achieve a bigger number of around 70000 transactions per second.
Would it work? Probably not long term.
Probably enabling compression and using other features to optimize performance would have resulted in a higher figure but the problem here was that there was a limit to this. Adding hardware or moving to newer version of a database (or even to a different database) wouldn’t have given us the virtually unlimited (100 times 30000 transactions per minute) capacity we were looking at. Obviously we needed some way to deploy more than one server to split the load and thus increase the capacity to handle very high transaction loads. In simpler terms we needed a scale out solution instead of scale up for our database. Looking around for out of the box solutions in the market didn’t help, simply because there aren’t many scale-out solutions available in the market to load balance SQL server. Oracle has launched such a solution but even that requires syncing between different servers in cluster which takes up a lot of network bandwidth thus reducing the overall effectiveness of the solution.
Scale out – Approach
After a lot of brainstorming sessions, it was decided that it was time to create our own scale out solution. The idea was to create a design which could help us scale out as our user base grew but at the same time being able to handle sudden increases in transaction volumes. But as it happens all the time, we didn’t have too much money to be spent on this. We looked at various existing products such as SharePoint to see how they stored their data and came up with a very simple first draft of the scale out solution. It looked very similar to way SharePoint does load balancing for its data stores. We had a cluster of database servers connected to the Broker Hub. Each database was configured to handle a set of users with specific user Ids and thus held data only for those users. Merge replication was used to replicate data from all databases to a central database which was used for all reporting. Here’s how it looked like:

Broker Hub front end had to be modified a bit in terms of process flow. In this case, each user logging into Broker Hub had to be connected to a specific database based on his user id. The task of identifying which database has the information related to the user trying to log in was handled by a new module added to the application called DB Load balancer. The process flow in this approach looked like this:
- User foo with User ID (n) logs in.
- Application calls Custom DB load balancer to find out that all data specific to user foo is on database server DB 2.
- Application creates a connection to DB 2 on behalf of this user.
- All user transactions from User 1 are directed to DB 2 thereafter.
- Periodically user data is synchronized to the master database.
Design Decisions for this approach
This approach was pretty simple to implement and could achieve the results we were looking for. But there were a few practical issues:
- We have to introduce concept of ID buckets. So every transaction table on each server was assigned unique bucket from which it could allocate Ids. With this we overcame the problem of how to maintain unique IDs.
- All the masters would be replicated using transaction with update option. So a change on one gets replicated all over.
- Adding new users was not as simple as before. Every database was configured for a specific set of users and this set increased sequentially. This meant that we always had to add the new user in the last server available and if last server was full then we had to deploy a new database even for a single user (and probably even before the last server was at its full capacity). This was accepted as an acceptable fact as this would happen once a while.
- Any admin report had to combine the data from all the servers to be useful which meant an additional job to aggregate all the data. This was accepted as a design reality.
Conclusion
So we could see very clearly that we were achieving the desired performance figures for this approach. Our aim was to find out how best we could achieve load balancing capabilities for the specific database load our application was creating and that was achieved. It actually became a life saver for us in more than one situation.

By Robert Nice on Jun 4, 2010
There’s really only one serious database that, out of the box, solves such problems for you, and that’s Oracle. Still…, too late now.
Scaling a database is one of the hardest tasks a system developer may ever have to face. The central problem of how to synchronize multiple servers together, especially for transaction durability is by no means trivial.
For those not up to speed with database inner workings, the basic problem is what happens if you’re reading data from server 1, while somebody is deleting data from server 2? Server 1 and 2 need to communicate with each other and lock records appropriately to ensure consistency. This communication is, as described in the article, often a bottleneck itself.
The article states that the bandwidth required to cluster the servers will impact the network. Obviously this is a hypothetical scenario, but that’s what secondary network interfaces are for. In almost every case, including this one, the real problem is hard disk I/O. Hard disks just can’t spin fast enough to keep up with main memory and/or network interfaces. That’s where big buckets of cache come in. Where possible, you should have enough to store the database in memory, or at least the indices. It makes a huge difference.
Once that’s all been exhausted however, I agree with the article, you do have no choice but to enter the world of multi server clustering. The most popular and straightforward approach is described here, but it has some major shortfalls for many (most) applications.
In the above scenario the database is basically chopped into pieces, and when you login, it finds which server your data is on and directs you to it. As you can see from the ‘Design Decisions’ for this system, functionality was actually decreased, and effort was increased, just to increase capacity. As somebody much wiser than me once told me, when it gets really difficult, you’re doing it wrong.
Let’s go over the issues and explore the possible solutions:
1. We have to introduce concept of ID buckets.
You really don’t. It’s straightforward to have one table on a server that tracks where everything is, after all that’s how the load balancer works here anyway. When a user is created, simply query the load balancer, or assign round-robin to a server, and record it in the table. Instead of having ids 1-100, 101-200 etc. on subsequent servers, simply add a column to the database. In most systems newer users tend to be more active, so it will also distribute load better.
2. All the masters would be replicated…
There’s no real indication here of what a ‘master’ is so this is a moot point. But it brings up the topic of multi-master replication and why this solution is the way it is. If you copy each server to every other server, you have to ensure there are no clashes. I assume this is why ‘buckets’ of ids have been assigned to each server. This is a straightforward yet clumsy (but quick) approach. If instead you prefix your primary keys to include a numerical column indicating the server, all such issues go away, and my comments to #1 still apply.
3. Adding new users was not as simple as before.
This is also completely mitigated by #1
4. Any admin report had to combine the data…
If you’ve done #2 properly and truly have multi-master replication firing properly, then you can query any server for any data set. The need for reassembly and offline processes diminishes.
The key, like so many things in the world of databases, is in the table design. If you know you’re going to be scaling to multiple servers, and you should always aim to, then you should design your tables to not clash when merged together. These simple steps early on save you countless hours when the marketing department sends you that ‘100 x traffic’ memo. You just order more hardware and plug it in.
Obviously one solution doesn’t fit all scenarios, and the article and this response are making many assumptions and generalizations. The article is a great starting point for minimally modifying a system to greatly increase capacity, and should be understood before going any further.
Reply
Zack Cao Reply:
April 11th, 2011 at 5:50 am
Hi, Robert
Very good points!
I’m curious about the out of box solution from Oracle. Is it Oracle RAC (Real Application Clusters) or other technologies Oracle provides, or combination of different technologies that Oracle provides? Could you elaborate a little more about how Oracle can achieve HA as well as load balancing with optimal performance?
Thanks,
Zack
Reply
By Shariq on Sep 15, 2010
@Robert You are right that their is a way you can do this thing on 1 table.I agree no need to create so many databases even tough i work on sql server and you had the old concept that was available and well known in oracle. I still don’t find logic when people say create headroom by leaving CPU of database server. Servers are designed to consume CPU. This writter probably belongs to the same headroom class of people.
Reply
By madhavi on Mar 16, 2011
I think it will also be very easier to do all this task by using Oracle. what you all say?
Reply