admin管理员组文章数量:1618692
SpringBoot查询Doris报错
ERROR [http-nio-10020-exec-12] [http-nio-10020-exec-12raceId] [] [5] @@GlobalExceptionAdvice@@ | server error
org.springframework.dao.RecoverableDataAccessException:
### Error querying database. Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 426 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.
; Communications link failure
The last packet successfully received from the server was 426 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.; nested exception is com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 426 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.
Doris定时调度的的insert into select 任务报错
ERROR 2013 (HY000) at line 7: Lost connection to MySQL server during query
分析
可能慢查询导致
慢查询导致集群压力巨大
有好几个慢查询达到120s-400s,这对于Doris集群来说是不能承受的,因为全局的query_timeout参数是60,推测有人的任务会话变量设置为600s或更高
让开发下线慢查询任务以及调优SQL
100多秒的慢查询任务下线后就正常了
但是过了一会SpringBoot服务告警。报错又有了
doris参数
interactive_timeout=3880000
wait_timeout=3880000
doris FE服务节点告警日志
2021-06-03 16:00:08,398 WARN (Connect-Scheduler-Check-Timer-0|79) [ConnectContext.checkTimeout():365] kill wait timeout connection, remote: 1.1.1.1:57399, wait timeout: 3880000
2021-06-03 16:00:08,398 WARN (Connect-Scheduler-Check-Timer-0|79) [ConnectContext.kill():339] kill timeout query, 1.1.1.1.1:57399, kill connection: true
Doris监控
由此看出,15:44的连接数骤降
#ELK日志
也能看到SpringBoot服务查询Doris的告警报错也是从15:44开始的
所以15:44到底有什么操作变量影响了集群呢?
根据报错
看waite_time时间为3880000s 为44天,但是源码里默认的是28800s
interactive_timeout=3880000
wait_timeout=3880000
没人上线,没人割接,集群管理员也掌握在我手里,没有改参数,但是还是不确定参数为啥会变,去fe.audit审计日志查看操作记录,果然
有人(内鬼)在用 2020.2.3版本的DataGrip,15:44进行了set GLOBAL参数的修改,修改了
interactive_timeout=3880000
wait_timeout=3880000
将两个参数回调至28800s,集群的connections连接数立马恢复了上来
这里需要注意的是,跟社区讨论,Doris中只有wait_timeout
有作用,另外的interactive_timeout
为了兼容mysql没作用
疑问:为什么Doris中wait_timeout参数在特别大的时候会导致连接报错Communications link failure?
反而调小后就能恢复正常呢,需要梳理代码看下逻辑了…
包含图片完整文档请查看
连接doris报错Communications link failure
本文标签: 报错DorisCommunicationsfailureLINK
版权声明:本文标题:连接Doris报错Communications link failure 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://m.elefans.com/xitong/1728784048a1173094.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论