Recently, while working on a SQL server optimization project, we had the opportunity to look into one interesting problem. We had a huge database (to the tune of 800GB) which was being hammered with approximately 30000 transactions per second. Database load was expected to grow by a factor of 100 in coming days and the idea was to devise a solution which could handle that load. This was a SQL server 2005 enterprise edition database hosted on an 8 processor fifth Generation server. We wouldn’t say this server was on its knees with this load but yes there were wait times longer than expected and to add to that there were times when data traffic suddenly went up significantly and in those times DB was not able to keep up.
Though this is not a very common scenario in many of the modern day applications out there but this definitely is a hallmark of databases handling loads from specific industries like banking. Applications intended for these industries normally have huge volume of small database transactions. In this article, we present one of the approaches you can take to handle a scenario such as this.
Introducing Broker Hub
To demonstrate the problem which has these type of database requirements, let’s use the example of a Broker Hub – a stock broking hub. Stock broking applications have very high volume of small database transactions and also there are spurts in database activity depending on market conditions. For simplicity sake, let’s assume that we were at a point when database design and usage pattern for Broker hub database was in the most optimal state.
First choice – Scale up
So to optimize Broker Hub further, we had a number of ideas and first choice was obviously to increase the hardware capacity. Increasing the hardware capacity did help the case. We could handle upwards of 50000 transactions a second by moving to a better system with 16 processors and a SAN array of high speed disks. But above 50000 in our load environments, we could still see the database to be the bottleneck.
Next obvious idea was to try SQL Server 2008 which has support for performance optimization features like advanced compression (reducing the overall disk IO) and support for virtually unlimited number of objects (2,147,483,647) and database size (524,272 TB). Again we could see the difference. Without enabling features like compression we could achieve a bigger number of around 70000 transactions per second.
Would it work? Probably not long term.
Probably enabling compression and using other features to optimize performance would have resulted in a higher figure but the problem here was that there was a limit to this. Adding hardware or moving to newer version of a database (or even to a different database) wouldn’t have given us the virtually unlimited (100 times 30000 transactions per minute) capacity we were looking at. Obviously we needed some way to deploy more than one server to split the load and thus increase the capacity to handle very high transaction loads. In simpler terms we needed a scale out solution instead of scale up for our database. Looking around for out of the box solutions in the market didn’t help, simply because there aren’t many scale-out solutions available in the market to load balance SQL server. Oracle has launched such a solution but even that requires syncing between different servers in cluster which takes up a lot of network bandwidth thus reducing the overall effectiveness of the solution.
Scale out – Approach
After a lot of brainstorming sessions, it was decided that it was time to create our own scale out solution. The idea was to create a design which could help us scale out as our user base grew but at the same time being able to handle sudden increases in transaction volumes. But as it happens all the time, we didn’t have too much money to be spent on this. We looked at various existing products such as SharePoint to see how they stored their data and came up with a very simple first draft of the scale out solution. It looked very similar to way SharePoint does load balancing for its data stores. We had a cluster of database servers connected to the Broker Hub. Each database was configured to handle a set of users with specific user Ids and thus held data only for those users. Merge replication was used to replicate data from all databases to a central database which was used for all reporting. Here’s how it looked like:
Broker Hub front end had to be modified a bit in terms of process flow. In this case, each user logging into Broker Hub had to be connected to a specific database based on his user id. The task of identifying which database has the information related to the user trying to log in was handled by a new module added to the application called DB Load balancer. The process flow in this approach looked like this:
- User foo with User ID (n) logs in.
- Application calls Custom DB load balancer to find out that all data specific to user foo is on database server DB 2.
- Application creates a connection to DB 2 on behalf of this user.
- All user transactions from User 1 are directed to DB 2 thereafter.
- Periodically user data is synchronized to the master database.
Design Decisions for this approach
This approach was pretty simple to implement and could achieve the results we were looking for. But there were a few practical issues:
- We have to introduce concept of ID buckets. So every transaction table on each server was assigned unique bucket from which it could allocate Ids. With this we overcame the problem of how to maintain unique IDs.
- All the masters would be replicated using transaction with update option. So a change on one gets replicated all over.
- Adding new users was not as simple as before. Every database was configured for a specific set of users and this set increased sequentially. This meant that we always had to add the new user in the last server available and if last server was full then we had to deploy a new database even for a single user (and probably even before the last server was at its full capacity). This was accepted as an acceptable fact as this would happen once a while.
- Any admin report had to combine the data from all the servers to be useful which meant an additional job to aggregate all the data. This was accepted as a design reality.
So we could see very clearly that we were achieving the desired performance figures for this approach. Our aim was to find out how best we could achieve load balancing capabilities for the specific database load our application was creating and that was achieved. It actually became a life saver for us in more than one situation.
Share and Enjoy
- Nagarro’s Brigid Wefelnberg finishes 3rd in Kalahari 250 km Ultramaratho…
- Windows Server AppFabric
- Log your exceptions to the cloud
- Nagarro Ranks First in Overall HR Practices in Leading Indian Employer S…
- Windows Phone 7 is NOT the next version of Windows Mobile 6.5
- Balkrishna Dubey is key speaker at Frankfurt event
- Case study: load balancing high transaction volume databases
- Nagarro Ranked #1 in HR Practices, #6 Overall Best Company in Recent Dat…
- Usability is in the details
- HTML5 mobile apps vs native mobile apps