admin管理员组

文章数量:1530085

sql server死锁

In this article, we will talk about the deadlocks in SQL Server, and then we will analyze a real deadlock scenario and discover the troubleshooting steps.

在本文中,我们将讨论SQL Server中的死锁,然后我们将分析实际的死锁情形并发现故障排除步骤。

In general, we can find out various theoretical advice and examples about the deadlock problems on the web but, in this article, we will tackle a true deadlock story and learn the solution steps so that we’ll get a chance to work on a case based problem.

总的来说,我们可以找到有关网络上死锁问题的各种理论建议和示例,但是在本文中,我们将解决一个真实的死锁故事并学习解决步骤,以便我们有机会进行案例研究。基于问题。

First of all, let’s explain the deadlock concept. A deadlock problem occurs when two (or more than two) operations already want to access resources locked by the other one. In this circumstance, database resources are affected negatively because both processes are constantly waiting for each other. This contention issue is terminated by the SQL Server intervention. It chooses a victim from the transactions who are involved in the deadlock, forces it to rollback all actions.

首先,让我们解释一下死锁的概念。 当两个(或两个以上)操作已经要访问被另一个操作锁定的资源时,就会发生死锁问题。 在这种情况下,数据库资源受到负面影响,因为两个进程一直在互相等待。 SQL Server干预终止了此争用问题。 它从涉及死锁的事务中选择一个受害者,迫使它回滚所有操作。

As can be seen from this explanation, deadlock in SQL Server is a special contention problem; also, each deadlock has a unique characteristic, so the solution has different approaches according to problem characteristics. Now, let’s take a glance at the problem scenario.

从该解释可以看出,SQL Server中的死锁是一个特殊的争用问题;它可能会导致死锁。 而且,每个死锁都有独特的特征,因此根据问题的特征,解决方案有不同的方法。 现在,让我们看一下问题场景。

问题场景 (The problem sceneraio)

In this real scenario, an in-house application returns an error to the users, and users notify the development team about this error.

在这种实际情况下,内部应用程序将错误返回给用户,并且用户将有关此错误的信息通知开发团队。

The development team realizes that it is a deadlock issue, but they could not find the main reason for the problem. Under these circumstances, the team decides to receive consultancy service from an experienced database administrator. In the next sections, we will learn how database administrators analyze and resolve this deadlock problem.

开发团队意识到这是一个僵局问题,但是他们找不到导致该问题的主要原因。 在这种情况下,团队决定从经验丰富的数据库管理员那里获得咨询服务。 在下一部分中,我们将学习数据库管理员如何分析和解决此死锁问题。

先决条件 (Prerequisites)

The development team was using the following table to store the order numbers, and the following query was used to create the first row of the day.

开发团队使用下表存储订单号,并使用以下查询创建当天的第一行。

CREATE TABLE [TestTblCounter](
  [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
  [SerialNumber] [int] NULL,
  [LogDate] [datetime] NULL)
  GO
     IF NOT EXISTS(SELECT Id
                   FROM TestTblCounter 
                   WHERE LogDate = CONVERT(VARCHAR(100), GETDATE(), 112))
         BEGIN
             INSERT INTO TestTblCounter
             VALUES
             ('1', 
              CONVERT(VARCHAR(100), GETDATE(), 112)
             )
     END

The data structure of the table below is similar to the below illustration; namely, there is only one row per day.

下表的数据结构类似于下图; 也就是说,每天只有一排。

I am using the following stored procedure to create a new order number.

我正在使用以下存储过程来创建新的订单号。

CREATE PROCEDURE CreateLogNo
AS
     DECLARE @LogNo AS VARCHAR(50), @LogCounter AS INT= 0;
     BEGIN TRAN;
 
     UPDATE TestTblCounter
       SET 
           SerialNumber = SerialNumber + 1
     WHERE LogDate = CONVERT(VARCHAR(100), GETDATE(), 112);
     SELECT @LogCounter = SerialNumber
     FROM TestTblCounter WITH(TABLOCKX)
     WHERE LogDate = CONVERT(VARCHAR(100), GETDATE(), 112);
     SELECT @LogCounter AS LogNumber;
     COMMIT TRAN

We will also use the SQLQueryStress tool to generate a similar workload of the production system.

我们还将使用SQLQueryStress工具生成类似的生产系统工作负载。

使用system_health会话监视SQL Server中的死锁
(Monitoring the deadlocks in SQL Server with system_health session
)

The database administrator decided to research the deadlocks problems because of the error message. The error message obviously was indicating a deadlock problem. As a first step, he decided to check the system_health session for the deadlocks.

由于该错误消息,数据库管理员决定研究死锁问题。 该错误消息显然表明存在死锁问题。 第一步,他决定检查system_health会话是否存在死锁。

Transaction (Process ID XX) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

事务(进程ID XX)已与另一个进程在锁定资源上发生死锁,并且已被选择为死锁受害者。 重新运行事务。

The system_health is the default extended event session of the SQL Server, and it started automatically when the database engine starts. The system_health session collects various system data, and one of them is deadlock information. The following query reads the .xel file of the system_health session and gives information about the deadlock problems which were occurred. The system_health session can be a good starting point to figure out the deadlock problems. The below query helps to find out the deadlock problems which is captured by the system_health session.

system_health是SQL Server的默认扩展事件会话,它在数据库引擎启动时自动启动。 system_health会话收集各种系统数据,其中之一是死锁信息。 以下查询读取system_health会话的.xel文件,并提供有关已发生的死锁问题的信息。 system_health会话可以作为找出死锁问题的良好起点。 以下查询有助于找出system_health会话捕获的死锁问题。

DECLARE @xelfilepath NVARCHAR(260)
SELECT @xelfilepath = dosdlc.path
FROM sys.dm_os_server_diagnostics_log_configurations AS dosdlc;
SELECT @xelfilepath = @xelfilepath + N'system_health_*.xel'
 DROP TABLE IF EXISTS  #TempTable
 SELECT CONVERT(XML, event_data) AS EventData
        INTO #TempTable FROM sys.fn_xe_file_target_read_file(@xelfilepath, NULL, NULL, NULL)
         WHERE object_name = 'xml_deadlock_report'
SELECT EventData.value('(event/@timestamp)[1]', 'datetime2(7)') AS UtcTime, 
            CONVERT(DATETIME, SWITCHOFFSET(CONVERT(DATETIMEOFFSET, 
      EventData.value('(event/@timestamp)[1]', 'VARCHAR(50)')), DATENAME(TzOffset, SYSDATETIMEOFFSET()))) AS LocalTime, 
            EventData.query('event/data/value/deadlock') AS XmlDeadlockReport
     FROM #TempTable
     ORDER BY UtcTime DESC;

When we click any row of the XmlDeadlockReport column, the deadlock report will appear.

当我们单击XmlDeadlockReport列的任何行时,将显示死锁报告。

使用扩展事件监视SQL Server中的死锁
(Monitoring the deadlocks in SQL Server using Extended Events
)

The database administrator found some clues about the deadlock problem through the captured data by the system_health session. However, he thought that the system_health session shows the more recent events because of the file size limitations, so it cannot be reliable to detect all deadlocks in SQL Server. So, he decided to create a new extended event session that can capture all the deadlocks.

数据库管理员通过system_health会话通过捕获的数据找到了有关死锁问题的一些线索。 但是,他认为由于文件大小限制,system_health会话显示的是最新事件,因此检测SQL Server中的所有死锁并不可靠。 因此,他决定创建一个新的扩展事件会话,该会话可以捕获所有死锁。

Extended Event is a system monitoring tool that helps to collect events and system information from SQL Server. With the help of the XEvent, we can also capture deadlock information from SQL Server. Firstly, we will launch SQL Server Management Studio and navigate to Session, which is placed under the Management folder. Right-click on the Sessions folder and select New Session.

扩展事件是一种系统监视工具,可帮助从SQL Server收集事件和系统信息。 借助XEvent,我们还可以从SQL Server捕获死锁信息。 首先,我们将启动SQL Server Management Studio并导航到Session ,该会话位于Management文件夹下。 右键单击“ 会话”文件夹,然后选择“ 新建会话”。

In the New Session screen, we will give a name to the session and check the Start the event session immediately after the session creation checkbox; thus, the session will be started after the completion of the creating process.

在“ 新会话”屏幕中,我们将为该会话命名,并选中“ 在创建会话后立即启动事件会话”复选框; 因此,会话将在创建过程完成后开始。

On the Event tab, we select the events which we want to capture. For this session, we will select the following events:

在“ 事件”选项卡上,我们选择要捕获的事件。 对于此会话,我们将选择以下事件:

  • database_xml_deadlock_report

    database_xml_deadlock_report
  • xml_deadlock_report

    xml_deadlock_report
  • xml_deadlock_report_filtered

    xml_deadlock_report_filtered

We will click the Configure button and select global events that will be captured with the events:

我们将单击配置按钮,然后选择将与事件一起捕获的全局事件:

  • client app name

    客户端应用名称
  • client connection id

    客户端连接ID
  • client hostname

    客户端主机名
  • database id

    数据库ID
  • database name

    数据库名称
  • nt username

    nt用户名
  • username

    用户名
  • sql text

    sql文本
  • username

    用户名

On the Data Storage tab, we will select the event_file type to store the captured data and click the OK button.

在“ 数据存储”选项卡上,我们将选择event_file类型来存储捕获的数据,然后单击“ 确定”按钮。

The session will be created and then started automatically to capture the deadlock events.

将创建该会话,然后自动启动以捕获死锁事件。

分析和解决SQL Server中的死锁
(Analyzing and resolving deadlocks in SQL Server
)

In this section, firstly, we will simulate the deadlock problem and then try to find out the main reason for the deadlock issue. We start the SQLQueryStress with the following parameters and wait for the completion of the query execution process.

在本节中,我们将首先模拟死锁问题,然后尝试找出造成死锁问题的主要原因。 我们使用以下参数启动SQLQueryStress,并等待查询执行过程完成。

When we open the details of the exception, it shows exceptions message.

当我们打开异常的详细信息时,它会显示异常消息。

To find out more details about the deadlock issue, we need to check out the extended event session, which was created to capture the deadlock events. We expand the MonitorDeadlock session and right-click on the target node then select the View Target Data. The captured deadlocks will be shown in the right pane.

要查找有关死锁问题的更多详细信息,我们需要检出扩展事件会话,该会话是为了捕获死锁事件而创建的。 我们展开MonitorDeadlock会话,然后右键单击目标节点,然后选择“ 查看目标数据”。 捕获的死锁将显示在右窗格中。

The xml_deadllock_report event includes more details about the deadlock, and we can also find the deadlock graph.

xml_deadllock_report事件包含有关死锁的更多详细信息,我们还可以找到死锁图。

When we interpret the deadlock graph, the SPID 65 (victim) has acquired an intent exclusive lock and wants to place an update lock to the TestTblCounter table. The SPID 64 has acquired an exclusive lock to the TestTblCounter and wants to place an exclusive lock to the same table because of the TABLOCKX hint.

当我们解释死锁图时, SPID 65 (受害者)已经获得了意图互斥锁,并希望将更新锁放置到TestTblCounter表中。 SPID 64已获得对TestTblCounter的排他锁 由于TABLOCKX提示,希望对同一表放置排它锁。

TABLOCKX hint helps to place an exclusive lock to the table until the select statement will be completed or the transaction will be completed. The disadvantage of the TABLOCKX hint is to reducing the concurrency, so it increases the locking time. When we decided to use it, we need to take account of the lock and contention problems. Particularly for this scenario, this hint usage logic is improper. When we reconsider the query, the update statement modifies some rows, and then the select statement fetches the same modified rows, but it placed an exclusive lock to the whole table until it completed because of the TABLOCKX hint. The most pointless part of this query is the line at which the value is assigned to the variable because the data assignment to the variable is performed randomly.

TABLOCKX提示有助于在表上放置排他锁,直到select语句将完成或事务将完成。 TABLOCKX提示的缺点是减少并发性,因此会增加锁定时间。 当我们决定使用它时,我们需要考虑锁定和争用问题。 特别是对于这种情况,此提示用法逻辑不正确。 当我们重新考虑查询时,update语句修改了一些行,然后select语句获取了相同的修改后的行,但是由于TABLOCKX提示,它对整个表放置了排他锁,直到完成为止。 该查询最没有意义的部分是将值分配给变量的行,因为对变量的数据分配是随机执行的。

Why do we need to place an exclusive lock to all rows of the table if we want to obtain the last updated or inserted row? As a result, we can remove the TABLOCKX hint, which causes deadlocks in the query. At the same time, when we get the last inserted or updated row value, we can use the OUTPUT clause.

如果要获取最后更新或插入的行,为什么需要对表的所有行都设置排他锁? 结果,我们可以删除TABLOCKX提示,该提示会导致查询中的死锁。 同时,当我们获得最后插入或更新的行值时,可以使用OUTPUT子句。

ALTER PROCEDURE [dbo].[CreateLogNo]
AS
     DECLARE @LogNo AS VARCHAR(50), @LogCounter AS INT= 0;
    DECLARE @UptTable AS TABLE(SerNumber  VARCHAR(50));
     BEGIN TRAN;
 
   
IF NOT EXISTS(SELECT ID
              FROM TestTblCounter
              WHERE LogDate = CONVERT(VARCHAR(100), GETDATE(), 112))
    BEGIN
        INSERT INTO TestTblCounter
     OUTPUT INSERTED.SerialNumber  INTO @UptTable
        VALUES
        (1, 
         CONVERT(VARCHAR(100), GETDATE(), 112)
        );
 
END;
ELSE
BEGIN
 
     UPDATE TestTblCounter
       SET 
           SerialNumber = SerialNumber + 1
        OUTPUT INSERTED.SerialNumber  INTO @UptTable
     WHERE LogDate = CONVERT(VARCHAR(100), GETDATE(), 112);
 
   END
     SELECT @LogCounter = SerNumber FROM @UptTable ORDER BY SerNumber DESC
     SELECT @LogCounter AS LogNumber;
     COMMIT TRAN

When we simulate a new workload with the help of the SQLQueryStress for the 200 concurrent users for the changed stored procedure, we will not face any deadlock issues.

当我们借助SQLQueryStress为200个并发用户针对更改的存储过程模拟新的工作负载时,我们将不会遇到任何死锁问题。

结论 (Conclusion)

In this article, we explained deadlocks in SQL Server and then analyze a true story that was experienced by a development team. The significant point to generate a solution is properly understanding and interpreting the deadlock report and graph. Otherwise, it will be very difficult to find the main cause of the problem.

在本文中,我们解释了SQL Server中的死锁,然后分析了开发团队经历过的真实故事。 产生解决方案的重点是正确理解和解释死锁报告和图形。 否则,将很难找到问题的主要原因。

Recap the solution steps:

回顾解决方案步骤:

  • Check the system_health session for deadlocks

    检查system_health会话是否存在死锁
  • Create an extended event session to capture the deadlocks

    创建一个扩展事件会话以捕获死锁
  • Analyze the deadlock reports and graphs to figure out the problem

    分析死锁报告和图表以找出问题所在
  • If it is possible to make improvements or changing the queries involved in the deadlock

    如果有可能进行改进或更改死锁中涉及的查询

翻译自: https://www.sqlshack/how-to-resolve-deadlocks-in-sql-server/

sql server死锁

本文标签: 死锁如何解决Serversql