The ChallengeAt BluePI we do a lot of performance re-engineering projects. Some of them involve tuning the application code, some the architecture, however frequently it is the database that is the bottleneck. One peculiar situation we encountered was when one our customers bemoaned that the free, MYSQL was giving much better response under load as compared to MS SQL Server. The difference was huge! 90% of samples of the login use case in SQL server was 1000 times slower than MYSQL. It goes without saying that SQL Server was not an option with this sort of performance. So we were tasked to determine the root cause and then remediate it.
- Application code may be poorly written
- Database Indexes may not be properly tuned
- SQL queries may be unoptimised
- Network/Disk or any other physical resources may be constrained
Application code may be poorly writtenAs the application code does not change between the two databases it is unlikely that the application code would be a problem. So the only investigation that we did was around the use of database connection pools. A poorly optimised connection pool could be leaking and slowing down the database. We checked thoroughly the number of connections, the application was using for MS SQL Server through SQL profiler. We checked each and every request but application was correctly using the connection pool and releasing connection after transactions. So application code could be ruled out as a possible cause.
Database Indexes may not be properly tunedInterestingly the index definitions were exactly the same in both MS SQL and MySQL. At first look it appeared that there were nothing wrong with the indexes as MYSQL seemed to be performing really well with the same index definition. Some of the tables were pretty big - millions of rows. So we decided to turn our focus on addressing the size by closely looking at the partitioning scheme. MYSQL provides different types of partitioning - range, hash, key, list, and composite partitioning which can be defined at the time of creating a table. On the other hand SQL Server only offers range partitioning. In SQL server this is only possible after creating the table, creating partitioning objects in SQL Server (partition schemes and functions) that are then applied to the corresponding tables. We decided to park this as mysql did not have any partitioning.
SQL queries are not optimised correctly.The possibility that fetch queries for MS SQL Server were not optimised and was leading to higher query execution time was now the focus of our attention. We calculated the time on all queries which were being executed by application through SQL profiler and we found one query was taking a lot of time and was being called from different places in the application code. Having found this one slow query on MS SQL we questioned how is same query on MYSQL is performing so well. The Table which was being used in this query contained huge data and the data fetch was taking more time in SQL server as compared to MYSQL. To resolve this, we implemented partitioning on SQL server(RDS instance of AWS) to parallely divide data in various tables on the basis of different event id on different months. Voila! It worked well and lowered query times to a large extent. In SQL server 2008 R2, we simply put the check constraints on date so that there would be different tables for each month. On the top of that we put one view which do the union all on all the various table to get the complete data. In SQL server views are logical views of table that are be faster in comparison to physical read. In SQL server 2012 enterprise edition we have in built function which do parallel partition on table on the basis of different parameter.
Components and ConceptsThe following terms are applicable to table and index partitioning.
Partition functionA database object that defines how the rows of a table or index are mapped to a set of partitions based on the values of certain column, called a partitioning column. That is, the partition function defines the number of partitions that the table will have and how the boundaries of the partitions are defined. For example, given a table that contains sales order data, you may want to partition the table into twelve (monthly) partitions based on a datetime column such as a sales date.
Partition schemeA database object that maps the partitions of a partition function to a set of filegroups. The primary reason for placing your partitions on separate filegroups is to make sure that you can independently perform backup operations on partitions. This is because you can perform backups on individual filegroups.
Partitioning columnThe column of a table or index that a partition function uses to partition the table or index. Computed columns that participate in a partition function must be explicitly marked PERSISTED. All data types that are valid for use as index columns can be used as a partitioning column, except time stamp. The ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max) data types cannot be specified. Also, Microsoft .NET Framework common language runtime (CLR) user-defined type and alias data type columns cannot be specified.
Aligned indexAn index that is built on the same partition scheme as its corresponding table. When a table and its indexes are in alignment, SQL Server can switch partitions quickly and efficiently while maintaining the partition structure of both the table and its indexes. An index does not have to participate in the same named partition function to be aligned with its base table. However, the partition function of the index and the base table must be essentially the same, in that 1) the arguments of the partition functions have the same data type, 2) they define the same number of partitions, and 3) they define the same boundary values for partitions.
Nonaligned indexAn index partitioned independently from its corresponding table. That is, the index has a different partition scheme or is placed on a separate filegroup from the base table. Designing an nonaligned partitioned index can be useful in the following cases:
- The base table has not been partitioned.
- The index key is unique and it does not contain the partitioning column of the table.
- You want the base table to participate in collocated joins with more tables using different join columns.
Partition eliminationThe process by which the query optimizer accesses only the relevant partitions to satisfy the filter criteria of the query. So with approach we lower done the lo gin number from lac to thousand at 90% percentile but this is still very high in comparison to MYSQL. [table id=1 alternating_row_colors=true /] [table id=3 alternating_row_colors=true /]
ConclusionFinally we resolved this issue and performance of MS SQL server improved a lot and was even better then MYSQL. The issue was the table which we partitioned out on first attempt contained an index on one specific column which is same is MYSQL too. The catch was indexing works very differently if we compare MYSQL vs SQL server. In MYSQL once you create normal index it does various kind of optimisation by itself, which DBA will not know unless s/he digs deeper. On the other hand, in case of MS SQL server if you are creating simple index (non-clustered index) then the possibility is, it might not be optimised. Once we have moved particular column from normal index to clustered index things work like a charm and overall performance of an application turned out to be even better than MYSQL. [table id=4 alternating_row_colors=true /]
Click Here to know the use of SQL in Janusgraph with Cassandra Category : Database