Story about on-premise installation should be reviewed on OS level?

 Hello, 

Today I would like to share a small story about the performance improvement of Confluence for one large company. They used that installation for more than 3.2k requests per minute based on the daily stats from frontend reverse proxy logs.

The architecture of that is quite simple based on the reverse proxy (nginx),  Confluence app (tomcat), PostgreSQL as RDMS. 

image.png

All those services work on CentOS 7. 

On Confluence in (Tomcat logs and application logs), nothing informative logs were. 

So in dmesg I found quite simple information about SYN flood on the Confluence app side. Of course, it’s quite crazy to see that situation in an organisation in 2020’s.

# dmesg | tail
[  734.711105] systemd[1]: Started Journal Service.
[1140053.637848] FS-Cache: Loaded
[1140053.662442] FS-Cache: Netfs 'cifs' registered for caching
[1140053.662535] Key type cifs.spnego registered
[1140053.662538] Key type cifs.idmap registered
[1140053.662889] Unable to determine destination address.
[1140083.610257] Unable to determine destination address.
[1179192.646345] TCP: request_sock_TCP: Possible SYN flooding on port 8090. Sending cookies.  Check SNMP counters.
[8486015.346237] DCCP: Activated CCID 2 (TCP-like)
[8486015.368881] sctp: Hash tables configured (bind 512/512)

 

So after changing the configs  below all problems are gone.

net.ipv4.tcp_max_syn_backlog
net.core.somaxconn

 

More info you can find here:

https://access.redhat.com/solutions/30453

 

Conclusion: 

If you meet some performance degradation on on-premises installation, please, start investigation from low level (bare metal/virtualization, OS, docker,  etc.). 

Otherwise, it’s quite an interesting  journey in the forest of wonder. 

Also, I do recommend monitoring network parameters on servers as well. 

 

Cheers,

Gonchik

Comments

Popular posts from this blog

How only 2 parameters of PostgreSQL reduced anomaly of Jira Data Center nodes

Stories about detecting Atlassian Confluence bottlenecks with APM tool [part 1]

Atlassian Community, let's collaborate and provide stats to vendors about our SQL index usage