We recently undertook an exercise to analyze the database performance, and subsequently optimize the overall performance for large, data-intensive business application. This post outlines the approach we adopted to analyze the problem, and how we went about arriving at solutions.
A Bit of Background
The application database is SQL Server 2005, single instance (non replicated, non clustered). The database has patient data (Electronic Medical Records) and the end clients (clinics and hospitals) access this data via web services. The database had around 3 million records, which they expected to scale to around 5 million. The software product vendor was receiving performance related complaints from end customers, and had solicited our help in resolving these issues.
Analysis
The first step was to establish the nature of the performance problems. For this we worked with the business and development teams of ISV to determine the nature of the performance issues. Users had reported performance problems in the following situations:
- Longer than expected response times for transactions or queries.
- Insufficient transaction throughput to complete the required workload.
- Decrease in transaction throughput.
We needed to
- Determine if the performance degrades with increased load or with increase in data in the system.
- Determine the volume of historical data and if they cause long running queries.
- Examine the hardware configuration (RAID, multiple disk controllers) of disk subsystems.
- Take regular measurements of resource utilization and database activity and determine whether there is a disproportionate utilization of CPU, memory, disks or network. System Monitor (aka Performance Monitor) helps to identify potential hardware bottlenecks w.r.t. a few key counters (see http://www.sql-server-performance.com/articles/per/performance_audit_part2_p1.aspx) and potentially obvious performance problems.
The next step was to establish performance objectives. For this we developed a plan for measuring the system performance. The most common metrics used for measuring database application performance are:
- transactions per second (TPS)
- response time
We then needed to define clear objectives for the above metrics. Selecting a tool (we evaluated some of the many available; refer: http://www.testingfaqs.org/t-load.html) that can automate measuring the TPS and/or response times by generating peak / off–peak / average load conditions.
We would regularly monitor the application performance on these metrics to get a feedback on whether the optimization changes are reaping any improvements.
The next post shall cover a discussion of the various solution approaches that could be adopted to improve the database problems once the problems (or at least some pattern that may be imminent in the problem) have been identified. I shall describe the various areas and perspectives of looking for a solution to the database performance problems.
