Suddenly working system performance nose dived, now what?

We had a ETL process, which used to process a small table of thirty million records a time. These records were moved in a staging table for the extensive processing. Once the processing completes, we truncate the staging table and get the data for the next day.

The system ran fine for couple of days, then suddenly the ETL processed slowed and the ETL complete time increased by many folds, from 15 minutes it reached to four hours or so.

Our first attempt to find the cause of slow query performance was to check the Query plan. There I noticed that we are using an Index Lookup.

   SELECT @DLStartTime = MIN(DownloadTime)      FROM etl.VLSCRawDataFromStager(nolock)      WHERE (DownloadURI = @DownloadURI AND ASID = @ASID) OR TransactionID = @TransactionId</p>  <p>

We had couple of non-clustered indexes on DownloadURI, ASID and on TransactionId. And a clustered Index on RowId. Looking at the query, you may think the query must be using these simple non-crusted indexes to get the DownloadTime. However, the Query plan looks like as follows:

execution

 

Query is using the key lookup operator. The use of a Key Lookup operator in a query plan indicates that the query might benefit from performance tuning. For example, query performance might be improved by adding a covering index. read more on http://msdn.microsoft.com/en-us/library/bb326635.aspx

Detail of two non-clustered indexes seeks, output shows as  ‘RowId’ and then this ‘RowId’ is used in turn to do a Key lookup on clustered Index for the output ‘DownloadTime’. Key lookup fetches the extra columns from the clustered index when the non clustered index that’s used to retrieve the rows doesn’t have all the columns required.

Bookmark lookup is not a cheap operation. The  plan can be improved by  adding one or more columns to an existing index so as to eliminate a bookmark lookup, in SQL Server 2005 onwards, you can add columns using the include clause of the create index statement.  Included columns are more efficient than key columns; they save disk space and make searching and updating the index more efficient.  SQL decide on the query plan depending on the number of rows returned. Read more on

http://sqlserverpedia.com/blog/sql-server-bloggers/key-lookup-threshold/comment-page-1/
http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx

We modified one of the index and included other columns.

 CREATE NONCLUSTERED INDEX [VLSCRawDataFromStager_ndx2] ON [etl].[VLSCRawDataFromStager]    (  [DownloadTime] ASC,  [DownloadURI] ASC,  [ASID] ASC,  [TransactionID] ASC   )     <br />GO</p>  <p>

This change removed the Key lookup from the query plan, and suddenly our performance increased by many fold.

It worked for few hours, but before our celebration beer can become warm, the performance again tanked. Now what?

What the hell happened?  For every run we are truncating the tables, and the system ran fine for weeks, now it is tanking, specially when we removed the nasty key lookup. We thought we improved the system but system performance is not improved.

Now, I used sys.dm_db_index_physical_stats check the indexes, this returns fragmentation information for the data and indexes of the specified table or view.

 select * from sys.dm_db_index_physical_stats (DB_ID(N'DataBaseName'), OBJECT_ID(N'TableName'), NULL, NULL , NULL); 

and of course we see the indexes are fragmented more then 95%. I  realized the truncate table statement does not get rid of Indexes, TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, indexes and so on remain. read more http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx

Overlooking this fact threw us in wrong direction, else this is the first thing we should be checking, but we thought truncate table is cleaning the indexes too, which turn out not be the case. However, the simple fix was to get rid of indexes too, and recreate them after every truncate. This fix bought our ETL performance back to acceptable level.

 

P.S

However the result set from sys.dm_db_index_physical_stats has index_id and object_id as the column name. You may waste few minutes to write query to get the index name from these ids.  So here is the query:

 select i.*, stat.* from sys.dm_db_index_physical_stats (DB_ID(N'DataBaseName'), OBJECT_ID(N'TableName'), NULL, NULL , NULL) stat join sys.indexes i on i.object_id = stat.object_id and i.index_id = stat.index_id 

OBJECT_ID is evaluated in the current database context, not in the context of the database which you specified in for DB_ID. If the current database does not have the table, then Object_ID will return NULL and your result will be wrong

To get all the indexes in a database it is simply:

 select i.*, stat.* from sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), NULL, NULL, NULL , NULL) stat join sys.indexes i on i.object_id = stat.object_id and i.index_id = stat.index_id where stat.avg_fragmentation_in_percent &gt; 10   <p></p> 

Read more at http://msdn.microsoft.com/en-us/library/ms188917.aspx

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s