Kafka Producer NetworkException and Timeout Exceptions
2021-04-12 19:26
We have faced similar problem. Many NetworkExceptions
in the logs and from time to time TimeoutException
.
Cause
Once we gathered TCP logs from production it turned out that some of the TCP connections to Kafka brokers (we have 3 broker nodes) were dropped without notifying clients after like 5 minutes of being idle (no FIN
flags on TCP layer). When client was trying to re-use this connection after that time, then RST
flag was returned. We could easily match those connections resets in TCP logs with NetworkExceptions
in application logs.
As for TimeoutException
, we could not do the same matching as by the time we found the cause, this type of error was not occurring anymore. However we confirmed in a separate test that dropping TCP connection could also result in TimeoutException
. I guess this is because of the fact that Java Kafka Client is using Java NIO Socket Channel under the hood. All the messages are being buffered and then dispatched once connection is ready. If connection will not be ready within timeout (30 seconds), then messages will expire resulting in TimeoutException
.
Solution
For us the fix was to reduce connections.max.idle.ms on our clients to 4 minutes. Once we applied it, NetworkExceptions
were gone from our logs.
We are still investigating what is dropping the connections.
Edit
The cause of the problem was AWS NAT Gateway which was dropping outgoing connections after 350 seconds.
https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html#nat-gateway-troubleshooting-timeout
上一篇:【jquery 交换位置】jquery交换Div位置
下一篇:前端+web
文章标题:Kafka Producer NetworkException and Timeout Exceptions
文章链接:http://soscw.com/index.php/essay/74850.html