Saturday, 4 June 2011

SQL SERVER PERFORMANCE AND WAIT QUEUE

Waits and Queues: A Performance Methodology

As an application, SQL Server may request system resources as it executes a user query and waits for its request to be completed.  Waits are represented by SQL Server wait statistics. SQL Server 2005 tracks wait information any time that a user connection or session_id is waiting. This wait information is summarized and categorized across all connections so that a performance profile can be obtained for a given work load. Therefore, SQL Server wait types identify and categorize user (or thread) waits from an application workload or user perspective.

Queues measure system resources and utilization.  The queues part of performance is represented by Performance Monitor objects and counters and other sources of information. Performance Monitor counters measure various aspects of performance such as transfer rates for disks or the processor time consumed.  SQL Server object counters are exposed to Performance Monitor using the dynamic management view (DMV) sys.dm_os_performance_counters. Thus, Performance Monitor counters show performance from a resource point of view. 

Associations or correlations of wait types to performance counters, and interesting performance counter ratios and comparisons round out the picture.  The association of waits and queues allows one to eliminate irrelevant counters insofar as the performance bottleneck is concerned and focus effectively on the problem area.  Comparisons of one counter to another provide perspective in order to draw the right conclusion.  For example, say you encounter 1000 lock waits during a production workload.  Does this cause a serious performance problem? To determine what role these play in overall performance you need to also consider the total number of lock requests (how many locks are requested overall vs. how many result in lock waits), the wait time or duration of the lock waits, and the time span of the test.  If the total number of lock requests was 10 million, perhaps 1000 waits is insignificant.  Further, if the associated lock wait time is 50 seconds overall, and the workload is over 8 hours, this too is insignificant.  On the other hand, if you average a 50 second wait for EACH of the 1000 lock waits, this IS significant.  In sum, associations or correlations allow us to determine relevancy to overall performance.

Application performance can be easily explained by looking at SQL Server waits and System or Resource queues. In SQL Server 2005, the dynamic management view (DMV) sys.dm_os_wait_stats provides a valuable source of wait information from an application perspective. The system or resource perspective uses Performance Monitor counters and other data sources to provide a breakdown of system resource usage according to resource queues. Taken together, the value of the application and resource perspectives used by the waits and queues methodology enables us to narrow in on the bottleneck and eliminate the irrelevant performance data.
Execution Model (simplified)
The best analogy to depict the execution model for SQL Server is the grocery store checkout line. The cashier is the CPU. The customer who is currently being checked out by the cashier is the running session. The customers who are waiting in line represent the runnable queue. If customer1 who is being checked out requires a price check on a product, customer1 must wait until the price check is completed.  Meanwhile, the next in line, customer2, is immediately checked out by the cashier until the price check is completed for customer1.  When the price check is completed, the cashier can resume the check out of customer1.  This is the simplest illustration of the SQL Server execution model called SQLOS.
The SQL Server SQLOS uses schedulers to manage the execution of user requests.  SQLOS Schedulers map to CPUs.  Assuming a 4-CPU Server, there would be 4 SQLOS schedulers by default.  The following diagrams depict a simplified version of execution model using a single SQLOS scheduler.  The execution model in Figure 1 depicts how SQL Server user requests or sessions (denoted by SPIDs) are scheduled for execution.   

Figure 1: Execution Model – Running, runnable and suspended status, Runnable Queue and Wait List

Figure 2 shows how SQL Server sessions rotate between the following statuses: Running (only one session can be running or executing, per scheduler), Runnable (sessions waiting for CPU), or Suspended.  SPIDs with suspended statuses are placed in Waiter List until the requested resources are available. If a running session needs a data page that is not in cache, or needs a page that is blocked by another user’s lock, the session is moved to the wait list. The next SPID or session_id in the runnable queue is scheduled to start running. 
Figure 2: Execution Model – How status changes affect SPIDs

The status change sequence of events is as follows:
  1. SPID60 needs a page not in cache.  Thus its status changes from Running to Suspended with wait type IO_Completion
  2. SPID60 moved to Waiter List
  3. SPID51 moves from Runnable queue with a runnable status to Running status, SPID64 then moves to the top of the Runnable queue
  4. SPID56 is waiting for a parallel process to complete.  When the parallel process is completed, the status for SPID56 changes from Suspended with wait type CXPACKET to Runnable
  5. SPID56 moved to the bottom of the Runnable queue

Figure 3 depicts execution “after” session_ids (or SPIDs) have rotated clockwise due to status changes. 

Figure 3: Execution Model – After status change
The wait list means a thread has to wait for a resource. Example of resource waits includes IOs to complete, a lock to be released, a memory grant, and so on. When the session is moved to the wait list, a wait type is assigned and time is accumulated. When the resource becomes available, the thread is moved to the runnable queue and it executes as soon as the CPU is available.  The clockwise rotation between running, runnable and suspended states continues until the user request is completed.

Waiter List and Wait Types
When a SQL Server 2005 session_id goes into a suspended status, a wait state is assigned indicating the reason why the session_id is suspended.  The waiter list, shown in a DMV called sys.dm_os_waiting_tasks, contains currently suspended sessions and reasons for the suspension including the session_id, wait_type and the session’s accumulated wait time for this wait type in the column wait_duration_ms. If the wait is due to blocking where a lock cannot be obtained until another session releases their lock, the session holding as the lock, also known as the blocker and  blocked resource are shown in the columns blocking_session_id and resource.
The current wait list can be seen in sys.dm_os_waiting_tasks. The current runnable queue is found in sys.dm_exec_requests where the status is “runnable”. The total time that is spent waiting in sys.dm_os_waiting_tasks is found in the column wait_time_ms and the time that is spent waiting for CPU in the runnable queue is called signal_wait _time_ms. Resource waits can be computed by subtracting signal_wait_time_ms from wait_time_ms. A runnable queue is unavoidable with an OLTP workload because there are large volumes of identical transactions. The key question is not the length of the runnable queue but rather how much time is spent waiting for CPU compared to the resource waits of the waiter list. The difference between resource and signal waits shows the extent of CPU pressure, if any, on overall performance. A low signal (where signal is less than 25% of the total waits) to resource wait ratio indicates there is little CPU pressure.
Dynamic Management Views (DMVs) and Functions (DMFs)
Dynamic Management Views (DMVs) and Dynamic Management Functions (DMFs) expose changing server state information that typically spans many sessions, many transactions, and many requests. Dynamic management views and functions reflect what’s going on inside the server process itself or across all sessions in the server. They are useful for diagnostics, memory and process tuning, and monitoring potentially across all sessions in the server.
Useful DMVs for performance tuning purposes include sys.dm_exec_requests, sys.dm_os_waiting_tasks, and sys.dm_os_wait_stats.
Sys.dm_exec_requests
Each SQL Server session has a unique session_id in the system DMV sys.dm_exec_requests. The stored procedure sp_who2 provides a list of these sessions in addition to other connection information such as command, resource, wait types, wait time, and status.  User queries will have a session_id > 50.  Common status values are ‘running’, ‘runnable’ and ‘suspended’, as described in the Execution Model discussion.  A session status of ‘Sleeping’ indicates SQL Server is waiting for the next SQL Server command.
Sys.dm_os_waiting_tasks
The waiter list that shows all waiting sessions and the reasons for the waits can be found in the DMV sys.dm_os_waiting_tasks. The session_id, wait type, and associated wait time can be seen.  In addition, if the session is blocked from acquiring a lock, the session holding (known as blocking) the lock as well as the blocked resource is shown in the columns blocking_session_id and resource.

Sys.dm_os_wait_stats
Sys.dm_os_wait_stats is the DMV that contains wait statistics, which are aggregated across all session ids since the last restart of SQL Server or since the last time that the wait statistics were reset manually using DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR).  Resetting wait statistics can be helpful before running a test or workload.

Anytime a session_id waits for a resource, the session_id is moved to the waiter list along with an associated wait type.  The DMV sys.dm_os_waiting_tasks shows the waiter list at a given moment in time.  Waits for all session_ids are aggregated in sys.dm_os_wait_stats. 

The stored procedures track_waitstats_2005 and get_waitstats_2005 can be used to measure the wait statistics for a given workload.

Track_waitstats_2005 stored procedure

Track_waitstats_2005 is a stored procedure that captures wait statistics from the DMV sys.dm_os_wait_stats and provides a ranking of descending order based on percentage.  You can use this ranking to identify the greatest opportunities for performance improvements.
The script location is: http://www.microsoft.com/technet/scriptcenter/scripts/sql/sql2005/waitstats/sql05vb049.mspx

Get_waitstats_2005 stored procedure
The stored procedure get_waitstats_2005 reports the wait types that are collected by track_waitstats_2005. The get_waitstats_2005 procedure can be run during the execution of track_waitstats or after track_waitstats is completed.   Running get_waitstats_2005 during the execution of track_waitstats_2005 will return a report of intermediate results while running get_waitstats_2005 at the conclusion of track_waitstats_2005 will return the final wait statistics report.   The report provides a detailed picture of different wait types during the time measured, and the accumulated wait time for each. 
Get_waitstats_2005 reports information about waits. Total wait time is composed of resource waits and signal waits. Resource waits are computed by subtracting signal waits from total waits.  Because signal waits represent the amount of time spent waiting in the runnable queue for CPU resources, they are a measure of CPU pressure. The application blueprints identify the significance CPU pressure by comparing signal waits with total waits.
The script location is: http://www.microsoft.com/technet/scriptcenter/scripts/sql/sql2005/perf/sql05vb021.mspx

Performance and Tuning Blue Prints
In this white paper, we examine different types of applications, how they use resources, and how you can performance tune the applications for different workloads. An OLTP workload differs significantly from a relational data warehouse or reporting application workload and it is useful to understand these differences and how it affects the objectives for high performance.
Although resource bottlenecks differ from application to application, the performance and tuning methodology called Waits and Queues is highly accurate and the results are reproducible. Once you resolve one bottleneck, there will be another as you scale the application larger.
An OLTP workload is generally characterized by high numbers of small identical transactions. In contrast, a data warehouse or reporting application is characterized by a few large transactions, each very different. These distinctions result in very different objectives and resource usage profiles. The blueprints reflect these distinctions.
OLTP blueprint
An OLTP application is characterized by a high volume of small identical transactions, which frequently include SELECT, INSERT, UPDATE, and DELETE operations.
Unlike large data warehouse or reporting transactions where multiple CPUs work in parallel, dividing up the query into smaller pieces, the small transactions of OLTP do not require parallelism. Parallelism is multiple CPUs working in parallel, dividing up the query into smaller pieces.
While a query is divided across multiple CPUs it will run faster, but it does so by sacrificing CPU resources, as it requires merges and sorts of the smaller pieces before presenting the final result set. An OLTP transaction is small to begin with so there is no need for parallel operations that basically sacrifice CPU and memory resources for speed of execution. Plus, with the high transaction volumes of OLTP, it is important not to waste CPU resources. Parallelism is most appropriate for the big, low volume transactions of data warehouse or reporting applications.
The implications are significant for database design, resource usage and system performance.
OLTP Performance blue print objectives: There are likely to be performance and scalability problems if any of resource issues the following tables are true.
Note   The values in Value column are good starting point. The actual values will vary.
Common scenarios to avoid in OLTP
Database Design

Rule
Description
Value
Source
Problem Description
1
High Frequency queries having a high number of table joins.
>4
Sys.dm_exec_sql_text
Sys.dm_exec_cached_plans
High frequency queries with lots of joins can be too normalized for high OLTP scalability.
2
Frequently updated tables having # indexes.
>3
Sys.indexes
sys.dm_db_operational_index_stats
Excessive index maintenance for OLTP.
3
Big IOs
Table Scans
Range Scans
>1
Perfmon object
SQL Server Access Methods
Sys.dm_exec_query_stats
A missing index flushes the cache.
4
Unused Indexes.
Index not in Sys.dm_db_index_usage_stats.  If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats
Avoid Index maintenance for unused indexes.


CPU

Rule
Description
Value
Source
Problem Description
1
Signal Waits
>25%
Sys.dm_os_wait_stats
Time in runnable queue is pure CPU wait.
2
Plan reuse
<90%
Perfmon object
SQL Server Statistics
OLTP identical transactions should ideally have >95% plan reuse.
3
Parallelism: Cxpacket waits
>5%
Sys.dm_os_wait_stats
Parallelism reduces OLTP throughput. CXPACKET indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces. Ordinarily a well tuned OLTP application would not parallelize unless an index is missing, there is an incomplete WHERE clause, or the query is not a true OLTP transaction.


Memory

Rule
Description
Value
Source
Problem Description
1
Page life expectancy
<300 sec
Perfmon object
SQL Server Buffer Manager
SQL Server Buffer Nodes
Page life expectancy is the average number of seconds a data page stays in cache.  Low values could indicate a cache flush that is caused by a big read.  Pure OLTP workloads do NOT issue big reads, thus possible missing index.
2
Page life expectancy
Drops by 50%
Perfmon object
SQL Server Buffer Manager
Page life expectancy is the average number of seconds a data page stays in cache.  Low values could indicate a cache flush that is caused by a big read.  Pure OLTP workloads do NOT issue big reads, thus possible missing index.
3
Memory Grants Pending
>1
Perfmon object
SQL Server Memory Manager
Current number of processes waiting for a workspace memory grant.
4
SQL cache hit ratio
<90%
SQL cache hit ratio falls under 90% for sustained periods of time greater than 60 sec.
It is likely that large scans have to be performed, which in turn flushes out the buffer cache.


IO

Rule
Description
Value
Source
Problem Description
1
Average Disk sec/read
>20 ms
Perfmon object
Physical Disk
Reads should take 4-8 ms without any IO pressure.
2
Average Disk sec/write
>20 ms
Perfmon object
Physical Disk
Writes (sequential) can be as fast as 1 ms for transaction log.
3
Big IOs
Table Scans
Range Scans
>1
Perfmon object
SQL Server Access Methods
A missing index flushes the cache.
4
If Top 2 values for wait stats are any of the following:
ASYNCH_IO_COMPLETION
IO_COMPLETION
LOGMGR
WRITELOG
PAGEIOLATCH_x
Top 2
Sys.dm_os_wait_stats
If top 2 wait_stats values include IO, there is an IO bottleneck.
5
Low bytes per sec.

Perfmon object
Physical Disk



Blocking

Rule
Description
Value
Source
Problem Description
1
Block percentage
>2%
Sys.dm_db_index_operational_stats
Frequency of blocks.
2
Block process report
30 sec
Sp_configure profiler
Report of statements.
3
Average Row Lock Waits
>100ms
Sys.dm_db_index_operational_stats
Duration of blocks.
4
If Top 2 values for wait stats are any of the following:
LCK_M_BU
LCK_M_IS
LCK_M_IU
LCK_M_IX
LCK_M_RIn_NL
LCK_M_RIn_S
LCK_M_RIn_U
LCK_M_RIn_X
LCK_M_RS_S
LCK_M_RS_U
LCK_M_RX_S
LCK_M_RX_U
LCK_M_RX_X
LCK_M_S
LCK_M_SCH_M
LCK_M_SCH_S
LCK_M_SIU
LCK_M_SIX
LCK_M_U
LCK_M_UIX
LCK_M_X
Top 2
Sys.dm_os_wait_stats
If top 2 wait_stats values include locking, there is a blocking bottleneck.
5
High number of deadlocks
>5 per hour
Trace flag 1204 to display in the errorlog and or the profiler deadlock graph.
If the deadlock occurs with the same participant SQL commands or operations multiple times, it is likely that there is a locking problem.


Network

Rule
Description
Value
Source
Problem Description
1
High network latency coupled with an application that has many round trips to the database.
Output queue length >2
Perfmon object: Network Interface
Indicates that the latency between the application server and the database is high.
Could be caused by significant network infrastructure between the application and the instance of SQL Server.
2
Network bandwidth is used up.
Packets Outbound Discarded
Packets Outbound Errors
Packets Received Discarded
Packets Received Errors
Perfmon object: Network Interface
Dropped packets are detected.


In summary, given the high volume of identical small transactions that characterize OLTP, transactions per second and resource usage can be improved as follows: 
1.      Database designs usually keep the number of indexes to a functional minimum as every insert, update, and delete incurs index maintenance.
2.      CPU can be reduced with plan reuse and join reduction.
3.      IO performance can be reduced with good indexing, join reduction, and high page life expectancy.
4.      Memory is optimal when there are no sudden drops in Page Life Expectancy.
5.      Sorts can be limited with index usage. That is, a certain sort order is supported by an index that is sorted the same way, either ascending or descending.
6.      Blocking can be reduced with index design and short transactions.
DataWarehouse blueprint
Compared to OLTP, data warehouse applications are characterized by low volumes of large transactions. This workload profile is exactly the opposite of OLTP. Data warehouse and reporting applications typically include big SELECT or read operations. The implications are significant for database design, resource usage, and system performance.  Index fragmentation, cache turnover, and IO performance are often important factors for these workloads.
Data warehouse performance blue print objectives: There are performance problems if any of the resource issues in the following tables are true.
Note   While the actual value in the value column could be debated, it is a good starting point when identifying common performance problems with DataWarehousing and Reporting applications.
Common Scenarios to avoid with DataWarehousing
Database Design

Rule
Description
Value
Source
Problem Description
1
Excessive sorting and RID lookup operations should be reduced with covered indexes.

Sys.dm_exec_sql_text
Sys.dm_exec_cached_plans
Large data warehouse can benefit from more indexes. Indexes can be used to cover queries and avoid sorting. The cost of index overhead is only paid when data is loaded.
2
Excessive fragmentation:
Average fragmentation_in_percent should be <25%
>25%
sys.dm_db _index_physical_stats
Reducing index fragmentation through index rebuilds can benefit big range scans, common in data warehouse and Reporting scenarios. 
3
Scans and ranges are common. Look for missing indexes
>= 1
Perfmon object
SQL Server Access Methods
Sys.dm_db_missing_index_group_stats
Sys.dm_db_missing_index_groups
Sys.dm_db_missing_index_details
A missing index flushes the cache.
4
Unused Indexes should be avoided
If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats
Index maintenance for unused indexes should be avoided.


Resource issue: CPU

Rule
Description
Value
Source
Problem Description
1
Signal Waits
> 25%
Sys.dm_os_wait_stats
Time in runnable queue is pure CPU wait.
2
Avoid plan reuse
> 25%
Perfmon object
SQL Server Statistics
Data warehouse has fewer transactions than OLTP, each with significantly bigger IO. Therefore, having the correct plan is more important than reusing a plan. Unlike OLTP, data warehouse queries are not identical.
3
Parallelism: Cxpacket waits
<10%
Sys.dm_os_wait_stats
Parallelism is desirable in data warehouse or reporting workloads.


Resource issue: Memory

Rule
Description
Value
Source
Problem Description
1
Memory grants pending
>1
Perfmon object
SQL Server Memory Manager
Memory grant not available for query to run.  Check for
Sufficient memory and page life expectancy.
2
Page life expectancy
Drops by 50%
Perfmon object
SQL Server Buffer Manager
Page life expectancy is the average number of seconds a data page stays in cache.  Low values could indicate a cache flush that is caused by a big read. 
Look for possible missing index.


Resource issue: IO

Rule
Description
Value
Source
Problem Description
1
Average Disk sec/read
>20 ms
Perfmon object
Physical Disk
Reads should take 4-8ms without any IO pressure.
2
Average Disk sec/write
>20 ms
Perfmon object
Physical Disk
Writes (sequential) can be as fast as 1 ms for transaction log.
3
Big scans
>1
Perfmon object
SQL Server Access Methods
A missing index flushes the cache.
4
If Top 2 values for wait stats are any of the following:
ASYNCH_IO_COMPLETION
IO_COMPLETION
LOGMGR
WRITELOG
PAGEIOLATCH_x
Top 2
Sys.dm_os_wait_stats
If top 2 wait_stats values include IO, there is an IO bottleneck


Resource issue: Blocking

Rule
Description
Value
Source
Problem Description
1
Block percentage
>2%
Sys.dm_db_index_operational_stats
Frequency of blocks.
2
Block process report
30 sec
Sp_configure, profiler
Report of statements.
3
Average Row Lock Waits
>100ms
Sys.dm_db_index_operational_stats
Duration of blocks.
4
If Top 2 values for wait stats are any of the following:
LCK_M_BU
LCK_M_IS
LCK_M_IU
LCK_M_IX
LCK_M_RIn_NL
LCK_M_RIn_S
LCK_M_RIn_U
LCK_M_RIn_X
LCK_M_RS_S
LCK_M_RS_U
LCK_M_RX_S
LCK_M_RX_U
LCK_M_RX_X
LCK_M_S
LCK_M_SCH_M
LCK_M_SCH_S
LCK_M_SIU
LCK_M_SIX
LCK_M_U
LCK_M_UIX
LCK_M_X
Top 2
Sys.dm_os_wait_stats
If top 2 wait_stats values include IO, there is a blocking bottleneck.
Consider using row versioning to minimize shared locking blocks.


Exactly the opposite of OLTP applications, reporting or relational data warehouse applications are characterized by small numbers of (different) big transactions. These are frequently SELECT intensive operations. The implications are significant for database design, resource usage, and system performance.
Reporting and data warehouse performance objectives are as follows:
1.      Data warehouse and relational data warehouse designs can have more indexes as the cost of index maintenance is paid only one time, during the batch update process.
2.      Plan reuse should generally be avoided. Plan reuse may result in picking up a plan that was good for some other query (with different data distribution), but may not be good for this query.  The time taken for plan generation of a large DataWarehouse query is not nearly as important as having the right plan. 
3.      Sorts can and should be minimized with correct index usage.
4.      Missing index situations should be investigated and corrected.
5.      Large IOs such as range scans benefits from on disk contiguity. Index fragmentation should be frequently monitored and kept to a minimum with index rebuilds.
6.      Blocking is generally uncommon as most data warehouse transactions are read operations.
7.      Parallelism is generally desirable for data warehouse applications.

No comments:

Post a Comment