Story about on-premise installation should be reviewed on OS level?
Today I would like to share a small story about the performance improvement of Confluence for one large company. They used that installation for more than 3.2k requests per minute based on the daily stats from frontend reverse proxy logs.
The architecture of that is quite simple based on the reverse proxy (nginx), Confluence app (tomcat), PostgreSQL as RDMS.
All those services work on CentOS 7.
On Confluence in (Tomcat logs and application logs), nothing informative logs were.
So in dmesg I found quite simple information about SYN flood on the Confluence app side. Of course, it’s quite crazy to see that situation in an organisation in 2020’s.
# dmesg | tail
[ 734.711105] systemd: Started Journal Service.
[1140053.637848] FS-Cache: Loaded
[1140053.662442] FS-Cache: Netfs 'cifs' registered for caching
[1140053.662535] Key type cifs.spnego registered
[1140053.662538] Key type cifs.idmap registered
[1140053.662889] Unable to determine destination address.
[1140083.610257] Unable to determine destination address.
[1179192.646345] TCP: request_sock_TCP: Possible SYN flooding on port 8090. Sending cookies. Check SNMP counters.
[8486015.346237] DCCP: Activated CCID 2 (TCP-like)
[8486015.368881] sctp: Hash tables configured (bind 512/512)
So after changing the configs below all problems are gone.
More info you can find here:
If you meet some performance degradation on on-premises installation, please, start investigation from low level (bare metal/virtualization, OS, docker, etc.).
Otherwise, it’s quite an interesting journey in the forest of wonder.
Also, I do recommend monitoring network parameters on servers as well.