While most Linux Kernels nowadays come with nice sysctl defaults, there’s always room for improvement. Some parameters can be used for performance tuning, others can be critical for security hardening.
What is sysctl?
sysctl is an interface to view and dynamically change parameters in Linux and other *NIX operating systems. In Linux, most of the dynamic Kernel settings can be changed via sysctl. The parameters set by sysctl are also available under the virtual /proc filesystem.
How do I use sysctl?
To read values you’ve two options:
# Option 1: Using the sysctl command to read current parameters: sysctl net.ipv4.ip_forward # display specific parameter sysctl net.ipv4 # display all net.ipv4.* parameters sysctl -a # display all parameters # Option 2: Using the /proc filesystem: cat /proc/sys/net/ipv4/ip_forward
To write values you can use both options again:
# Option 1: Using the sysctl command to change a parameter: sysctl net.ipv4.ip_forward=1 # Option 2: Using the /proc filesystem to change a parameter: echo 1 >/proc/sys/net/ipv4/ip_forward
However, these parameters are not persistent. You’ve to configure them in /etc/sysctl.conf or /etc/sysctl.d/* if you want them active after a reboot.
/etc/sysctl.conf /etc/sysctl.d/
Please note that configuration changes will not be detected automatically. You’ve to trigger the reload manually:
sysctl -p [filename]
Tuning Linux with sysctl
Kernel
To automatically reboot a system after a kernel panic, you can set the following parameter to the amount of seconds to wait before reboot:
kernel.panic = 60
Linux Kernels provide a magic SysRq key, which allows the user to perform low-level commands regardless of the systems state. To enable this magic key you’ve to set:
kernel.sysrq = 1
To make sure core dumps will always be written set the following parameter:
fs.suid_dumpable = 2
It can be useful to have the PID appended on the filename of core dumps. This can be especially useful for debugging multi-threaded applications and it’s easy to setup:
kernel.core_uses_pid = 1
To increase the maximum number of used process IDs you can define the following parameter:
kernel.pid_max = 65536
Memory
To tune the memory (VM) behaviour in Linux, you can set some vm.* parameters.
For example to tell the Kernel how aggressively memory pages should be written to disk (aka swapping), you’ve to change the swappiness value. The higher the value, the more aggressive the swapping:
vm.swappiness
When you look at filesystems then most of the time some kind of cache is involved. The amount of filesystem cache is based on the percentage of total available memory. To set the maximum amount of filesystem cache can be defined with:
vm.dirty_ratio = 40
When the defined percentage of memory is reached, then all I/O writes are blocked until enough dirty pages have been flushed to disk by pdflush. This is quite suboptimal because on a healthy system you don’t want to have blocked I/O writes at all. Therefor there’s another parameter, which defines the minimal percentage of dirty memory before the background pdflush process starts to flush out dirty memory pages:
vm.dirty_background_ratio = 10
As already described before, pdflush is in charge of flushing dirty pages to disk. So you can optionally change the flush interval by setting the following parameter (in hundredths of seconds, e.g. 500 = 5s):
vm.dirty_writeback_centisecs = 500
Of course pdflush needs to know when data can be removed from cache. Sometimes it makes sense to increase the time how long “untouched” data lives be in the cache before it’s marked as expired. Just overwrite the following parameter (again in hundredths of seconds):
vm.dirty_expire_centiseconds = 3000
If you want to have more informations about the memory on your system, just have a look at:
cat /proc/meminfo
Filesystem
To increase the maximum amount of file descriptors you can use.
fs.file-max = 65535
Exec Shield
Exec Shield is a protection against worms and other automated remote attacks on Linux systems. It was invented by Red Hat in 2002. To enable Exec Shield:
kernel.exec-shield = 1 kernel.randomize_va_space = 1
Network Core
Some applications are configured for performance and sometimes an application can handle huge buffers. To increase the maximum buffer size for all sockets / connections (this will affect all buffers, e.g. net.ipv4.tcp_rmem) you can use:
net.core.rmem_max = 8388608 net.core.wmem_max = 8388608
When a system is under heavy load and an interface receives a lot of packets, then the Kernel might not process them fast enough. You can increase the number of packets hold in the queue (backlog) by changing:
net.core.netdev_max_backlog = 5000
IPv4
First of all we recommend you tune ICMP a bit. You can do that by ignoring ICMP broadcasts, which will protect you from ICMP floods. We also ignore bogus responses to broadcast frames (violation against RFC1122), so that our log isn’t full of Kernel warnings:
net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv4.icmp_ignore_bogus_error_responses = 1
SYN floods are a type of DDoS and can harm your system. To protect from it you should enable SYN cookies, resize the SYN backlog (queue size) and reduce SYN/ACK retries:
# Turn on SYN cookies to protect from SYN flood attacks. net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.tcp_synack_retries = 3
To log packets with impossible addresses simply enable:
net.ipv4.conf.all.log_martians = 1 net.ipv4.conf.default.log_martians = 1
To disable IP source routing (SRR), so that nobody can tell us which path a packet should take:
net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.accept_source_route = 0
By default, routers router everything and even packages which don’t belong to their network(s). To avoid that we’ve to make sure strict reverse path filtering is enabled as defined in RFC3704:
net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.default.rp_filter = 1
Some applications support higher read and write buffers for sockets. The buffer size parameters are defined by 3 values (min, default, max). To increase the maximum buffer set:
net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 87380 8388608
To get better throughput in a network, it might make sense to enable TCP window scaling as defined in RFC1323:
net.ipv4.tcp_window_scaling = 1
Disable (ICMP) redirects at all. Please note that the send_redirects parameters should be enabled on routers:
net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 net.ipv4.conf.all.secure_redirects = 0 net.ipv4.conf.default.secure_redirects = 0 net.ipv4.conf.all.send_redirects = 0 # Don't disable this on routers! net.ipv4.conf.default.send_redirects = 0 # Don't disable this on routers!
Finally disable IPv4 forwarding on non-routing systems:
net.ipv4.ip_forward = 0
IPv6
Those who don’t use IPv6 at all should disable it:
net.ipv6.conf.all.disable_ipv6 = 1
If you’re already using IPv6 you might be interested in the following parameters.
On non-routing systems you should disable router solicitations:
net.ipv6.conf.default.router_solicitations = 0 net.ipv6.conf.all.router_solicitations = 0
You should also don’t accept routing preferences from router advertisements:
net.ipv6.conf.default.accept_ra_rtr_pref = 0 net.ipv6.conf.all.accept_ra_rtr_pref = 0
Don’t try to learn prefix information in router advertisements:
net.ipv6.conf.default.accept_ra_pinfo = 0 net.ipv6.conf.all.accept_ra_pinfo = 0
Don’t accept hop limits from router advertisements:
net.ipv6.conf.default.accept_ra_defrtr = 0 net.ipv6.conf.all.accept_ra_defrtr = 0
Disable IPv6 auto configuration, so that no unicast addresses can automatically be configured on your interface from a router advertisement:
net.ipv6.conf.default.autoconf = 0 net.ipv6.conf.all.autoconf = 0
If you don’t want your system to be verbose about its neighbours, you should disable neighbour solicitations at all:
net.ipv6.conf.default.dad_transmits = 0 net.ipv6.conf.all.dad_transmits = 0
Unless you need more than one global unicast address, you should fix the number of assigned global unicast addresses per interface to 1:
net.ipv6.conf.default.max_addresses = 1 net.ipv6.conf.all.max_addresses = 1
.all & .default
A lot of sysctl parameters have several values, because there’s a .default, .all and sometimes even a .<interface> value. While the .<interface> value is obvious, you’ve to look closer on the other two.
According to a comment on the linux-kernel mailing list, there’s one major difference:
- The default value will only be applied ONCE, at the point when an interface is created.
- The all value will ALWAYS applied in addition.
This means when an interface is created, the default value will be applied to it once. However, you can overwrite that with the interface-specific parameter. The global .all parameter will always be applied in addition and in the end it depends of the logical operator how the “final value” looks like.
For example there are parameters where all settings need to be 1 (aka AND), where only one of the settings need to be 1 (aka OR) or where the highest value will be used (aka MAX).
So it’s important to know that existing interfaces might have a different value than the one you’ve set as default or all.
Where to find documentation?
IMHO the best documentation for sysctl is available directly in the Kernel docs. You can find them in the kernel.org git repository. Just have a look at the Documentation/networking/ip-sysctl.txt and Documentation/filesystems/proc.txt files.
You can also have a look at the man pages on your Linux server. For example if you want to know more about the tcp settings you can run:
man -7 tcp
25 Comments