admin管理员组

文章数量:1531792

The SQL Server SOS_SCHEDULER_YIELD is a fairly common wait type and it could indicate one of two things:

SQL Server SOS_SCHEDULER_YIELD是一种相当常见的等待类型,它可能表示以下两种情况之一:

  • SQL Server CPU scheduler is utilized properly and is working efficiently

    SQL Server CPU调度程序已得到正确利用,并且正在有效地工作

  • There is a pressure on CPU

    CPU承受压力

The first thing that has to be properly understood is that the SOS_SCHEDULER_YIELD wait type, even so named, is not actually an actual wait type at all, at least not comparing to other wait types. So it is often misinterpreted despite its prevalence as a wait type in SQL Server. The first thing that has to be explained is what the SOS_SCHEDULER_YIELD wait type is, for a starting point the Microsoft online book definition will be used:

必须正确理解的第一件事是,即使这样命名,SOS_SCHEDULER_YIELD等待类型也实际上根本不是实际的等待类型,至少与其他等待类型相比。 因此,尽管它在SQL Server中盛行为等待类型,但经常被误解。 首先要解释的是SOS_SCHEDULER_YIELD等待类型是什么,对于起点,将使用Microsoft联机丛书定义:

“Occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed.”

当任务自愿产生调度程序以执行其他任务时发生。 在此等待期间,任务正在等待其范围更新。”

To properly understand what this means, when and why the SOS_SCHEDULER_YIELD wait type occurs, a better knowledge of SQL scheduler and its functionality is needed. So let’s take a look, in detail, at the SQL OS scheduler design and how it works

为了正确理解这意味着什么,何时以及为什么出现SOS_SCHEDULER_YIELD等待类型,需要对SQL调度程序及其功能有更好的了解。 因此,让我们详细了解一下SQL OS调度程序设计及其工作方式。

SQL Server OS (via the storage engine) is exclusively in charge for thread scheduling (as it doesn’t rely on Windows OS) and it will assign a scheduler to every CPU core in order to manage threads. So all requests made by the user will be scheduled for execution via SQL OS. SQL OS will use schedulers for that and it will create one scheduler for each CPU core. If SQL Server is using a machine with two octa core processors, SQL OS will create 16 schedulers for that instance

SQL Server OS(通过存储引擎)专门负责线程调度(因为它不依赖Windows OS),并且它将为每个CPU内核分配一个调度程序以管理线程。 因此,用户发出的所有请求都将通过SQL OS计划执行。 SQL OS将为此使用调度程序,并将为每个CPU内核创建一个调度程序。 如果SQL Server使用带有两个八核心处理器的计算机,则SQL OS将为该实例创建16个调度程序

The SQL OS scheduler (or SOS Scheduler) consists of three components:

SQL OS调度程序(或SOS调度程序)包含三个组件:

Processor – this is the physical or logical CPU/CPU core and it is in charge of processing the thread at rate of one at a time

处理器 –这是物理或逻辑CPU / CPU核心,负责一次处理一个线程

Waiter list – the waiter list stores the threads that are in suspended state as they have to wait for a resource to became available. The waiter list does not impose any time limitations to contained threads and there are no parameters that can be set to limit the time a thread can spend in the waiter list. Though the timeout specified in the query execution session is takes precedence, the thread could still be canceled as a consequence of the execution timeout

服务员列表 –服务员列表存储处于挂起状态的线程,因为它们必须等待资源变为可用。 等待者列表没有对包含的线程施加任何时间限制,并且没有可以设置参数来限制线程可以在等待者列表中花费的时间。 尽管在查询执行会话中指定的超时优先,但是由于执行超时,线程仍可能被取消

Runnable queue – The runnable queue uses a strict First-In-First-Out (FIFO) order of threads. The thread that is transitioning from the waiter list or processor into a runnable queue will be moved to the last position of the queue. The only situation when FIFO order can be overridden is when the resource governor is enabled. When enabled, some different resource pool priorities can be assigned to a workload groups; High, Medium and Low priority. The thread with the higher priority assigned can override the lower priority thread in the runnable queue. Please note that the use of the resource governor can be considered a special case and it is not often seen in practice

可运行队列 –可运行队列使用严格的先进先出(FIFO)线程顺序。 从等待者列表或处理器过渡到可运行队列的线程将移至队列的最后位置。 可以覆盖FIFO顺序的唯一情况是启用资源调控器时。 启用后,可以将一些不同的资源池优先级分配给工作负荷组。 高,中和低优先级。 分配了较高优先级的线程可以覆盖可运行队列中的较低优先级的线程。 请注意,使用资源调控器可以被认为是一种特殊情况,在实践中并不常见

The thread in the scheduler can have one of the three different states:

调度程序中的线程可以具有三种不同状态之一:

RUNNING – a thread is running on processor core. Only one thread can be active at the time for running on each core

正在运行 –线程正在处理器内核上运行。 一次只能在一个线程上活动以在每个内核上运行

SUSPENDED – thread calls for the resource that are not available are moved in the waiter list. It will stay in the waiter list until the resource became available

挂起 –对资源不可用的线程调用在等待者列表中移动。 它将保留在服务员列表中,直到资源可用为止

RUNNABLE – a thread that has available resources, but is moved to the runnable queue where it will wait until the CPU became available

RUNNABLE –具有可用资源的线程,但已移至可运行队列,它将在该队列中等待直到CPU可用

We have two different scenarios that describe how the scheduler is working and both will be explained here

我们有两种不同的方案来描述调度程序的工作方式,这两种方案都将在此处进行说明

  1. Thread running through three states until completing the user request

    线程在三种状态下运行,直到完成用户请求

    The running thread that is trying to access a required resource, which is not immediately available to be acquired, will became suspended and will be moved to the wait list. It will stay in this suspended state until the requested resource becomes available. Note that there is no time limit for a thread that is in the waiter list, nor limit on the number of threads. The moment when the resource becomes available for the thread, it will be moved to a runnable queue where it will wait for the processor to became available. When the processor becomes available it will complete the request upon which it will enter the sleep state again (unless it has to wait for the resource again, in which case the sequence will be repeated)

    试图访问所需资源的正在运行的线程(暂时无法获取)正在暂停,并将其移至等待列表。 它将保持此挂起状态,直到请求的资源可用为止。 请注意,对于等待者列表中的线程没有时间限制,也没有线程数量的限制。 资源可用于线程的那一刻,它将被移动到可运行队列,在该队列中它将等待处理器变得可用。 当处理器可用时,它将完成请求,然后再次进入睡眠状态(除非必须再次等待资源,在这种情况下将重复执行该顺序)

    What can be seen in image is:

    在图像中可以看到的是:

    1. The SPID 53 requires a resource that is not immediately available and it will be moved to a waiter list with the appropriate wait type

      SPID 53需要的资源不是立即可用的,它将以适当的等待类型移动到服务员列表中

    2. The SPID 72 resource is available and it will be moved from a waiter list into a runnable queue at the last position

      SPID 72资源可用,它将从服务员列表移到最后位置的可运行队列中

    3. SPID 51 enters the running state as it is its turn and CPU resource is available

      SPID 51轮到其进入运行状态,并且CPU资源可用

  2. Thread running through two states until completing the user request

    线程在两种状态下运行,直到完成用户请求

    Each thread has a fixed quantum (time given to threads to use CPU free of interruptions) of 4 milliseconds assigned to it by SQL OS and it cannot be changed. Each thread is accountable to estimate when that quantum assigned to it is exhausted (via SQL OS helper routine) and, if so, will yield voluntarily to allow the next thread to get in to use its CPU time (quantum). In this case, the yielded thread will not be moved in the waiter list as it’s resource is actually available and there isn’t anything remaining to wait for, so it will be moved directly at the bottom of the runnable queue of its scheduler.

    每个线程都有一个固定的时间量(为线程分配的时间,该时间用于线程使用CPU而不中断),SQL OS为其分配的时间为4毫秒,并且无法更改。 每个线程都要负责估计分配给它的量子的耗尽时间(通过SQL OS帮助程序),如果这样,它将自动产生,以允许下一个线程进入以使用其CPU时间(量子)。 在这种情况下,产生的线程将不会在侍者列表中移动,因为它的资源实际上是可用的,并且没有任何剩余可等待的东西,因此它将直接移到其调度程序的可运行队列的底部。

    This second scenario is exactly where the SQL OS will register thread movement from a running state into a runnable queue (runnable state) as the SOS_SCHEDULER_YIELD wait type. Since there is nothing to wait (the thread doesn’t wait for any resource, and it just voluntarily yielding to a runnable queue), SOS_SCHEDULER_YIELD will have 0 milliseconds of resource wait time. This will create the signal wait time that will show how long the thread has waited in the runnable queue to get back in the running state again

    这第二种情况正是SQL OS将线程从运行状态注册到可运行队列(可运行状态)的位置,它是SOS_SCHEDULER_YIELD等待类型。 由于没有什么可等待的(线程不等待任何资源,而只是自愿产生可运行的队列),因此SOS_SCHEDULER_YIELD将具有0毫秒的资源等待时间。 这将创建信号等待时间,该时间将显示线程在可运行队列中等待了多长时间再次回到运行状态

    In the example above:

    在上面的示例中:

    1. SPID 53 has exhausted its scheduled quantum (4 milliseconds) and is in transition to a runnable queue

      SPID 53用尽了预定的时间(4毫秒),并且正在过渡到可运行的队列

    2. SPID 51 is going to a running state as processor has become available after SPID 53 voluntarily moved to a runnable queue

      SPID 53自愿移至可运行队列后,由于处理器已可用,因此SPID 51进入运行状态

SOS_SCHEDULER_YIELD高的原因 (High SOS_SCHEDULER_YIELD causes )

As it can be seen from the previous example, the SOS_SCHEDULER_YIELD wait type is the common wait type and it is the actual indicator that the thread has exhausted its quantum. If the SOS_SCHEDULER_YIELD is the prevalent wait type, it might indicate that CPU pressure is the problem, but this doesn’t necessarily mean that CPU is not powerful enough to process the user requests

从前面的示例可以看出,SOS_SCHEDULER_YIELD等待类型是常见的等待类型,它是线程已耗尽其数量的实际指示符。 如果SOS_SCHEDULER_YIELD是普遍的等待类型,则可能表明CPU压力是问题所在,但这并不一定意味着CPU功能不足以处理用户请求

Less experienced administrators are often reaching to set MAXDOP to 1 in order to troubleshoot excessive CXPACKET waits or set the CTFP number too high, and in that way create high SOS_SCHEDULER_YIELD wait type values; a direct consequence of this inappropriate troubleshooting. A common, but incorrect, perception to high SOS_SCHEDULER_YIELD values is that CPU is the bottleneck and more CPU power is needed. High SOS_SCHEDULER_YIELD wait type values indicate that the query that causing high SOS_SCHEDULER_YIELD needs more CPU power, as it could finish faster with such additional resources

经验不足的管理员通常会尝试将MAXD​​OP设置为1,以解决过多的CXPACKET等待或将CTFP数量设置得过高,从而创建较高的SOS_SCHEDULER_YIELD等待类型值; 这种不适当的故障排除的直接后果。 对SOS_SCHEDULER_YIELD高值的常见但不正确的认识是CPU是瓶颈,需要更多的CPU能力。 高SOS_SCHEDULER_YIELD等待类型值表示导致高SOS_SCHEDULER_YIELD的查询需要更多的CPU能力,因为使用这样的附加资源可以更快地完成查询

Here is an example of how a single threaded query will always use one CPU core. While the core is utilized 100% causing the bottleneck in a query execution resulting in a high SOS_SCHEDULER_YIELD wait type value, the rest of the seven cores are more or less idle. This example clearly indicates that it is not the CPU that is the bottleneck nor root cause of the high SOS_SCHEDULER_YIELD waits, as it is obvious that there is abundance of CPU resources available. Instead, it is a case that the query needs more CPU power but it is limited to use only the single core for execution. So poor optimization if the SQL Server, in this case, caused the query to be executed on a single core, instead of running 6 time faster if executed in parallel on 6 cores.

这是一个单线程查询如何始终使用一个CPU内核的示例。 虽然100%使用内核导致查询执行中的瓶颈,导致较高的SOS_SCHEDULER_YIELD等待类型值,但七个内核中的其余内核或多或少处于空闲状态。 此示例清楚地表明,不是SCPU_SCHEDULER_YIELD等待的瓶颈或根本原因,不是CPU,因为很明显有大量可用的CPU资源。 取而代之的是,查询需要更多的CPU能力,但仅限于使用单个内核执行。 如果在这种情况下,SQL Server导致查询在单个内核上执行,而不是在6个内核上并行执行要快6倍,那么优化效果就很差。

Spinlock could be often indicated as one of the reasons for excessive SOS_SCHEDULER_YIELD wait type values to be shown, but this is actually wrong. For those who want to get more in depth analysis on why the spinlock is not related to excessive amount of SOS_SCHEDULER_YIELD wait type, please see this article: SOS_SCHEDULER_YIELD waits and the LOCK_HASH spinlock

自旋锁通常被指示为显示过多SOS_SCHEDULER_YIELD等待类型值的原因之一,但这实际上是错误的。 对于那些想深入了解为什么自旋锁与过多的SOS_SCHEDULER_YIELD等待类型无关的人,请参阅本文: SOS_SCHEDULER_YIELD等待和LOCK_HASH自旋锁

So the question is when and what to troubleshoot when high SOS_SCHEDULER_YIELD wait type values are present? To answer that question, three different scenarios have to be described since the answer depends on whether the SOS_SCHEDULER_YIELD is frequent and/or is it coupled with high signal wait times

所以问题是当出现高SOS_SCHEDULER_YIELD等待类型值时何时以及如何解决? 为了回答这个问题,必须描述三种不同的情况,因为答案取决于SOS_SCHEDULER_YIELD是否频繁和/或信号等待时间是否长

  • SOS_SCHEDULER_YIELD is frequent but with a low signal wait time

    SOS_SCHEDULER_YIELD频繁但信号等待时间短

    This is a typical scenario where SOS_SCHEDULER_YIELD waits are not indicators of the problem. It just indicates that there are lot of threads that are running on the scheduler, and all threads are executing with the same priority, without a single thread that is governing CPU. In this case, something else should be investigated for the performance issues

    这是典型的情况, SOS_SCHEDULER_YIELD等待不是问题的指示。 它仅表明调度程序上正在运行许多线程,并且所有线程都以相同的优先级执行,而没有一个线程在控制CPU。 在这种情况下,应该对性能问题进行其他调查

  • SOS_SCHEDULER_YIELD is highly frequent, has a high signal wait time

    SOS_SCHEDULER_YIELD频率很高,信号等待时间较长

    This is the scenario where CPU pressure is highly possible as it might indicate that a large number of CPU intensive queries are trying to reach the CPU resources and thus the high frequency of SOS_SCHEDULER_YIELD waits. The high wait time in this case shows that the yielded thread that was moved to a runnable queue, needed to wait a long time for the scheduler to bring it back to a running state

    在这种情况下,CPU压力很大,因为这可能表明大量CPU密集型查询正试图到达CPU资源,因此SOS_SCHEDULER_YIELD的等待频率很高。 在这种情况下,较高的等待时间表明,移出到可运行队列的成品线程需要等待很长时间才能使调度程序将其恢复为运行状态

    Usually this requires some query tuning as it is the query itself that is causing the problem. Take a look at the execution plan for unplanned CPU intensive operations like intense data conversion, large index and/or table scans or user defined functions. Also check for dropped non clustered indexes

    通常,这需要进行一些查询调整,因为正是查询本身导致了问题。 查看计划外的CPU密集型操作的执行计划,例如密集的数据转换,大索引和/或表扫描或用户定义的功能。 同时检查删除的非聚集索引

  • SOS_SCHEDULER_YIELD is not frequent but has a high signal wait time

    SOS_SCHEDULER_YIELD并不频繁,但信号等待时间较长

    Such a scenario indicates that there are not many threads that are waiting for CPU to go to a running state, so it is the external (system) that is causing the CPU resources to be monopolized rather than some SQL Server thread. Various examples could be the reason, like a Windows application that is CPU intensive or is running with higher priority and thus doesn’t return CPU control to a SQL thread that is in the runnable queue.

    这种情况表明没有太多等待CPU进入运行状态的线程,因此是导致CPU资源被垄断的外部(系统)而不是某些SQL Server线程。 原因可能有多种多样,例如Windows应用程序占用大量CPU或以更高的优先级运行,因此不会将CPU控制权返回给可运行队列中SQL线程。

    One of the reasons that is often overlooked, but should be investigated, is the power management feature on the server. When enabled, it causes fluctuation of the CPU frequency by scaling it up and down depending on how the feature will estimate the requirements for the CPU power. The frequency change, albeit quite frequent, is often not fast enough to keep the pace with SQL Server demands and thus causes the creation of the SOS_SCHEDULER_YIELD wait type. Turning off power management and allowing CPU to run at the highest speed usually is the easiest solution here

    服务器上的电源管理功能是经常被忽略但应该调查的原因之一。 启用后,它会根据该功能如何估算CPU功率需求而按比例放大和缩小来引起CPU频率波动。 频率变化虽然很频繁,但往往不够快,无法跟上SQL Server的要求,因此导致创建SOS_SCHEDULER_YIELD等待类型。 通常,关闭电源管理并允许CPU以最高速度运行是这里最简单的解决方案

翻译自: https://www.sqlshack/how-to-handle-excessive-sos_scheduler_yield-wait-type-values-in-sql-server/

本文标签: 类型如何在ServersqlSOSSCHEDULERYIELD