《MySQL Waiting for table metadata lock故障分析》要點:
本文介紹了MySQL Waiting for table metadata lock故障分析,希望對您有用。如果有疑問,可以聯(lián)系我們。
源起
線上執(zhí)行ALTER時,通過show processlist查看到出現(xiàn)Waiting for table metadata lock?,導(dǎo)致后面的查詢都無法執(zhí)行.
5217122 | create_table_04 | 172.100.207.148:31291 | finance | Query | 1829 | Waiting for table metadata lock | ALTER TABLE `Pay` MODIFY COLUMN `pay` smallint(6) NULL DEFAULT 0 COMMENT '付??' 5217155 | bx_live_dml | 172.100.210.4:34730 | finance | Prepare | 1714 | Waiting for table metadata lock | select * from Pay where (pay =4 or pay =8) and projectId ='CSZY0'
由于當(dāng)時直接把ALTER給kill了,所以基本沒有現(xiàn)場,所以下面模擬什么情況MySQL產(chǎn)生這種事故.
MySQL元數(shù)據(jù)鎖
MySQL DBA對于Waiting for table metadata lock肯定不會陌生,一般都是進行alter操作時被堵住了,導(dǎo)致了我們在show processlist 時,看到線程的狀態(tài)是在等metadata lock.
為了在并發(fā)環(huán)境下維護表元數(shù)據(jù)的數(shù)據(jù)一致性,在表上有活動事務(wù)(顯式或隱式)的時候,不可以對元數(shù)據(jù)進行寫入操作.因此從MySQL5.5版本開始引入了MDL鎖(metadata lock),來保護表的元數(shù)據(jù)信息,用于解決或者保證DDL操作與DML操作之間的一致性.對于引入MDL,其主要解決了2個問題,一個是事務(wù)隔離問題,比如在可重復(fù)隔離級別下,會話A在2次查詢期間,會話B對表結(jié)構(gòu)做了修改,兩次查詢結(jié)果就會不一致,無法滿足可重復(fù)讀的要求;另外一個是數(shù)據(jù)復(fù)制的問題,比如會話A執(zhí)行了多條更新語句期間,另外一個會話B做了表結(jié)構(gòu)變更并且先提交,就會導(dǎo)致slave在重做時,先重做alter,再重做update時就會出現(xiàn)復(fù)制錯誤的現(xiàn)象.
所以在對表進行上述操作時,如果表上有活動事務(wù)(未提交或回滾),請求寫入的會話會等待在Metadata lock wait .例如下面的這種情形:
若沒有MDL鎖的保護,則事務(wù)2可以直接執(zhí)行DDL操作,并且導(dǎo)致事務(wù)1出錯,5.1版本即是如此.5.5版本加入MDL鎖就在于保護這種情況的發(fā)生,由于事務(wù)1開啟了查詢,那么獲得了MDL鎖,鎖的模式為SHARED_READ,事務(wù)2要執(zhí)行DDL,則需獲得EXCLUSIVE鎖,兩者互斥,所以事務(wù)2需要等待.
注:支持事務(wù)的InnoDB引擎表和不支持事務(wù)的MyISAM引擎表,都會出現(xiàn)Metadata Lock Wait等待現(xiàn)象.一旦出現(xiàn)Metadata Lock Wait等待現(xiàn)象,后續(xù)所有對該表的訪問都會阻塞在該等待上,導(dǎo)致連接堆積,業(yè)務(wù)受影響.
引起Metadata lock wait出現(xiàn)的場景:Waiting for table metadata lock
場景一:當(dāng)前有執(zhí)行DML操作時執(zhí)行ALTRE操作.
# SESSION A mysql> insert into sbtest2 select * from sbtest1; # SESSION B mysql> alter table sbtest2 add test1 int; //等待SESSION A執(zhí)行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | 267 | root | localhost | sbtest | Query | 7 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table sbtest2 add test1 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec) # SESSION D mysql> select * from sbtest2 limit 10; //等待元數(shù)據(jù)鎖; # SESSION E mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | 267 | root | localhost | sbtest | Query | 20 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 13 | Waiting for table metadata lock | alter table sbtest2 add test1 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 308 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | select * from sbtest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ 4 rows in set (0.00 sec)
從上述例子可以看出,我們在執(zhí)行DDL語句的時候得事先看一下,進程中是否已經(jīng)存在某些DML語句占用了表的元數(shù)據(jù)鎖,這樣會導(dǎo)致DDL語句處于鎖等待狀態(tài).一旦出現(xiàn)Waiting for table metadata lock等待現(xiàn)象,后續(xù)所有對該表的訪問都會阻塞在該等待上,包括讀操作,導(dǎo)致連接堆積,業(yè)務(wù)受影響.
場景二:當(dāng)前有對表的長時間查詢或使用mysqldump/mysqlpump時,使用alter會被堵住.
# SESSION A mysql> select *,sleep(10) from sbtest2; # SESSION B mysql> alter table sbtest2 add test2 int; //等待SESSION A執(zhí)行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ | 267 | root | localhost | sbtest | Query | 12 | User sleep | select *,sleep(10) from sbtest2 | | 271 | root | localhost | sbtest | Query | 8 | Waiting for table metadata lock | alter table sbtest2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 311 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from sbtest.sbtest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ 4 rows in set (0.00 sec)
場景三:顯示或者隱式開啟事務(wù)后未提交或回滾,比如查詢完成后未提交或者回滾,使用alter會被堵住.
# SESSION A mysql> begin; mysql> select * from sbtest2; # SESSION B mysql> alter table sbtest2 add test2 int; //等待SESSION A執(zhí)行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | 267 | root | localhost | sbtest | Sleep | 36 | | NULL | | 271 | root | localhost | sbtest | Query | 30 | Waiting for table metadata lock | alter table sbtest2 add test2 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ 3 rows in set (0.00 sec)
場景四:表上有失敗的查詢事務(wù),比如查詢不存在的列,語句失敗返回,但是事務(wù)沒有提交,此時alter仍然會被堵住.
# SESSION A mysql> begin; mysql> select error from sbtest2; ERROR 1054 (42S22): Unknown column 'error' in 'field list' # SESSION B mysql> alter table sbtest2 add test3 int; //等待SESSION A提交或回滾; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | 267 | root | localhost | sbtest | Sleep | 7 | | NULL | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table sbtest2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 311 | root | localhost | NULL | Sleep | 413 | | NULL | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ 4 rows in set (0.00 sec) # SESSION D mysql> select * from information_schema.innodb_trx; Empty set (0.00 sec)
其實SESSION A中的事務(wù)并未開啟,但是由于select獲取表元數(shù)據(jù)的語句,語法上是有效的,雖然執(zhí)行失敗了,但是任然不會釋放元數(shù)據(jù)鎖,故而導(dǎo)致SESSION B的alter動作被阻塞.
通過SESSION D查看當(dāng)前打開事務(wù)時,你會發(fā)現(xiàn)沒有,從而找不到原因.所以當(dāng)出現(xiàn)這種場景時,如何判斷是哪個進程導(dǎo)致的呢,我們可以嘗試查看表performance_schema. events_statements_current,分析進程狀態(tài)來進行判斷.
mysql> select * from performance_schema. events_statements_current\G *************************** 1. row *************************** THREAD_ID: 293 EVENT_ID: 32 END_EVENT_ID: 32 EVENT_NAME: statement/sql/select SOURCE: socket_connection.cc:101 TIMER_START: 212721717099954000 TIMER_END: 212721717213807000 TIMER_WAIT: 113853000 LOCK_TIME: 0 SQL_TEXT: select error from sbtest2 DIGEST: 0bbb2d5d1be45e77debea68111264885 DIGEST_TEXT: SELECT ERROR FROM `sbtest2` CURRENT_SCHEMA: sbtest OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 1054 RETURNED_SQLSTATE: 42S22 MESSAGE_TEXT: Unknown column 'error' in 'field list' ERRORS: 1
然后找到其sid, kill掉該session,也可以kill掉DDL所在的session解決可以解決此問題.
另外,測試時SESSION A要顯式開啟一個事務(wù),否則查詢會隱式回滾結(jié)束,無法重現(xiàn)上面的場景.SESSION B執(zhí)行alter后,沒有立即阻塞住,而是立馬開始copy to tmp table,這個過程結(jié)束后,才進行了MDL鎖等待.這怎么解釋呢,應(yīng)該是執(zhí)行alter操作主要分為創(chuàng)建臨時新表->插入老表的數(shù)據(jù)->臨時新表rename to老表三個步驟,在這種情況下,到最后一步才需要MDL鎖,所以copy過程中不會阻塞.由于沒有查詢在進行,而且查詢也沒有進入innodb層 (失敗返回),所以show processlist和information_schema.innodb_trx沒有可以參考的信息.
出現(xiàn)以上幾種情況時,這個時候如果進行如下操作就會引起MDL:
使用Profile分析場景三:顯示或者隱式開啟事務(wù)后未提交或回滾,比如查詢完成后未提交或者回滾,使用alter會被堵住
# SESSION A mysql> set profiling=on; mysql> begin; mysql> select * from sbtest.sbtest2 limit 1; # SESSION B mysql> set profiling=on; mysql> alter table sbtest.sbtest5 add test2 int; //等待SESSION A執(zhí)行完; # SESSION C mysql> set profiling=on; mysql> select * from sbtest.sbtest2 limit 1; # SESSION D mysql> set profiling=on; mysql> show processlist; +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ | 325 | root | localhost | NULL | Query | 25 | Waiting for table metadata lock | alter table sbtest.sbtest2 add test5 int | | 326 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from sbtest.sbtest2 limit 1 | | 327 | root | localhost | NULL | Query | 0 | starting | show processlist | | 328 | root | localhost | NULL | Sleep | 50 | | NULL | +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ 4 rows in set (0.00 sec)
然后回滾SESSION A,等待SESSION B和SESSION C執(zhí)行完,查看profile.
查看SESSION A
# SESSION A mysql> show profiles; +----------+-------------+------------------------------------------+ | Query_ID | Duration | Query | +----------+-------------+------------------------------------------+ | 1 | 42.81646375 | alter table sbtest.sbtest2 add test5 int | +----------+-------------+------------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show profile for query 1; +--------------------------------+----------+ | Status | Duration | +--------------------------------+----------+ | starting | 0.000069 | | checking permissions | 0.000004 | | checking permissions | 0.000004 | | init | 0.000005 | | Opening tables | 0.000128 | | setup | 0.000032 | | creating table | 0.000654 | | After create | 0.000053 | | Waiting for table metadata loc | 1.000069 | ..................... | After create | 0.000013 | | Waiting for table metadata loc | 0.871435 | | After create | 0.000042 | | System lock | 0.000013 | | preparing for alter table | 0.002475 | | altering table | 9.752928 | | committing alter table to stor | 0.185624 | | end | 0.000021 | | query end | 0.000010 | | closing tables | 0.000007 | | freeing items | 0.000020 | | cleaning up | 0.000011 | +--------------------------------+----------+ 83 rows in set, 1 warning (0.00 sec)
查看SESSION C
mysql> show profiles; +----------+-------------+--------------------------------------+ | Query_ID | Duration | Query | +----------+-------------+--------------------------------------+ | 1 | 10.75216050 | select * from sbtest.sbtest2 limit 1 | +----------+-------------+--------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show profile for query 1; +--------------------------------+-----------+ | Status | Duration | +--------------------------------+-----------+ | starting | 0.000080 | | checking permissions | 0.000007 | | Opening tables | 0.000012 | | Waiting for table metadata loc | 10.751829 | | Opening tables | 0.000094 | | init | 0.000019 | | System lock | 0.000010 | | optimizing | 0.000004 | | statistics | 0.000011 | | preparing | 0.000009 | | executing | 0.000003 | | Sending data | 0.000040 | | end | 0.000006 | | query end | 0.000008 | | closing tables | 0.000008 | | freeing items | 0.000014 | | cleaning up | 0.000009 | +--------------------------------+-----------+ 17 rows in set, 1 warning (0.00 sec)
從上述測試可以看出,SESSION C需要打開表時碰到了元數(shù)據(jù)鎖.MySQL不論SESSION A執(zhí)行的是select還是delete,此時alter table語句無法獲取到metadata獨占鎖,會進行等待;所以會影響SESSION C的讀取.
這是最基本的一種情形,這個和MySQL 5.6中的online ddl并不沖突.一般alter table的操作過程中,在after create步驟會獲取metadata獨占鎖,當(dāng)進行到altering table的過程時(通常是最花時間的步驟),對該表的讀寫都可以正常進行,這就是online ddl的表現(xiàn),并不會像之前在整個alter table過程中阻塞寫入,當(dāng)然并不是所有ALTER語句都支持online ddl.
總之,alter table的語句是很危險的(其實他的危險其實是未提交事物或者長事務(wù)導(dǎo)致的),在操作之前最好確認(rèn)對要操作的表沒有任何進行中的操作、沒有未提交事務(wù)、也沒有顯式事務(wù)中的報錯語句.如果有alter table的維護任務(wù),在無人監(jiān)管的時候運行,最好通過lock_wait_timeout設(shè)置好超時時間,避免長時間的metedata鎖等待.
原文來自微信公眾號:運維那點事
轉(zhuǎn)載請注明本頁網(wǎng)址:
http://www.fzlkiss.com/jiaocheng/2733.html