1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3 4<html> 5 6<head> 7 8<title>Postfix Stress-Dependent Configuration</title> 9 10<meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> 11 12</head> 13 14<body> 15 16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix 17Stress-Dependent Configuration</h1> 18 19<hr> 20 21<h2>Overview </h2> 22 23<p> This document describes the symptoms of Postfix SMTP server 24overload. It presents permanent main.cf changes to avoid overload 25during normal operation, and temporary main.cf changes to cope with 26an unexpected burst of mail. This document makes specific suggestions 27for Postfix 2.5 and later which support stress-adaptive behavior, 28and for earlier Postfix versions that don't. </p> 29 30<p> Topics covered in this document: </p> 31 32<ul> 33 34<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> 35 36<li><a href="#concurrency"> Service more SMTP clients at the same time </a> 37 38<li><a href="#time"> Spend less time per SMTP client </a> 39 40<li><a href="#hangup"> Disconnect suspicious SMTP clients </a> 41 42<li><a href="#legacy"> Temporary measures for older Postfix releases </a> 43 44<li><a href="#adapt"> Automatic stress-adaptive behavior </a> 45 46<li><a href="#feature"> Detecting support for stress-adaptive behavior </a> 47 48<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a> 49 50<li><a href="#other"> Other measures to off-load zombies </a> 51 52<li><a href="#credits"> Credits </a> 53 54</ul> 55 56<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2> 57 58<p> Under normal conditions, the Postfix SMTP server responds 59immediately when an SMTP client connects to it; the time to deliver 60mail is noticeable only with large messages. Performance degrades 61dramatically when the number of SMTP clients exceeds the number of 62Postfix SMTP server processes. When an SMTP client connects while 63all Postfix SMTP server processes are busy, the client must wait 64until a server process becomes available. </p> 65 66<p> SMTP server overload may be caused by a surge of legitimate 67mail (example: a DNS registrar opens a new zone for registrations), 68by mistake (mail explosion caused by a forwarding loop) or by malice 69(worm outbreak, botnet, or other illegitimate activity). </p> 70 71<p> Symptoms of Postfix SMTP server overload are: </p> 72 73<ul> 74 75<li> <p> Remote SMTP clients experience a long delay before Postfix 76sends the "220 hostname.example.com ESMTP Postfix" greeting. </p> 77 78<ul> 79 80<li> <p> NOTE: Broken DNS configurations can also cause lengthy 81delays before Postfix sends "220 hostname.example.com ...". These 82delays also exist when Postfix is NOT overloaded. </p> 83 84<li> <p> NOTE: To avoid "overload" delays for end-user mail 85clients, enable the "submission" service entry in master.cf (present 86since Postfix 2.1), and tell users to connect to this instead of 87the public SMTP service. </p> 88 89</ul> 90 91<li> <p> The Postfix SMTP server logs an increased number of "lost 92connection after CONNECT" events. This happens because remote SMTP 93clients disconnect before Postfix answers the connection. </p> 94 95<ul> 96 97<li> <p> NOTE: A portscan for open SMTP ports can also result in 98"lost connection ..." logfile messages. </p> 99 100</ul> 101 102<li> <p> Postfix 2.3 and later logs a warning that all server ports 103are busy: </p> 104 105<pre> 106Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp" 107 (25) has reached its process limit "30": new clients may experience 108 noticeable delays 109Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this 110 condition, increase the process count in master.cf or reduce the 111 service time per client 112</pre> 113 114</ul> 115 116<p> Legitimate mail that doesn't get through during an episode of 117Postfix SMTP server overload is not necessarily lost. It should 118still arrive once the situation returns to normal, as long as the 119overload condition is temporary. </p> 120 121<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2> 122 123<p> One measure to avoid the "all server processes busy" condition 124is to service more SMTP clients simultaneously. For this you need 125to increase the number of Postfix SMTP server processes. This will 126improve the 127responsiveness for remote SMTP clients, as long as the server machine 128has enough hardware and software resources to run the additional 129processes, and as long as the file system can keep up with the 130additional load. </p> 131 132<ul> 133 134<li> <p> You increase the number of SMTP server processes either 135by increasing the default_process_limit in main.cf (line 3 below), 136or by increasing the SMTP server's "maxproc" field in master.cf 137(line 10 below). Either way, you need to issue a "postfix reload" 138command to make the change effective. </p> 139 140<li> <p> Process limits above 1000 require Postfix version 2.4 or 141later, and an operating system that supports kernel-based event 142filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll). 143</p> 144 145<li> <p> More processes use more memory. You can reduce the Postfix 146memory footprint by using cdb: 147lookup tables instead of Berkeley DB's hash: or btree: tables. </p> 148 149<pre> 150 1 /etc/postfix/main.cf: 151 2 # Raise the global process limit, 100 since Postfix 2.0. 152 3 default_process_limit = 200 153 4 154 5 /etc/postfix/master.cf: 155 6 # ============================================================= 156 7 # service type private unpriv chroot wakeup maxproc command 157 8 # ============================================================= 158 9 # Raise the SMTP service process limit only. 15910 smtp inet n - n - 200 smtpd 160</pre> 161 162<li> <p> NOTE: older versions of the SMTPD_POLICY_README document 163contain a mistake: they configure a fixed number of policy daemon 164processes. When you raise the SMTP server's "maxproc" field in 165master.cf, SMTP server processes will report problems when connecting 166to policy server processes, because there aren't enough of them. 167Examples of errors are "connection refused" or "operation timed 168out". </p> 169 170<p> To fix, edit master.cf and specify a zero "maxproc" field 171in all policy server entries; see line 6 in the example below. 172Issue a "postfix reload" command to make the change effective. </p> 173 174<pre> 1751 /etc/postfix/master.cf: 1762 # ============================================================= 1773 # service type private unpriv chroot wakeup maxproc command 1784 # ============================================================= 1795 # Disable the policy service process limit. 1806 policy unix - n n - 0 spawn 1817 user=nobody argv=/some/where/policy-server 182</pre> 183 184</ul> 185 186<h2><a name="time"> Spend less time per SMTP client </a></h2> 187 188<p> When increasing the number of SMTP server processes is not 189practical, you can improve Postfix server responsiveness by eliminating 190delays. When Postfix spends less time per SMTP session, the same 191number of SMTP server processes can service more clients in a given 192amount of time. </p> 193 194<ul> 195 196<li> <p> Eliminate non-functional RBL lookups (blocklists that are 197no longer in operation). These lookups can degrade performance. 198Postfix logs a warning when an RBL server does not respond. </p> 199 200<li> <p> Eliminate redundant RBL lookups (people often use multiple 201Spamhaus RBLs that include each other). To find out whether RBLs 202include other RBLs, look up the websites that document the RBL's 203policies. </p> 204 205<li> <p> Eliminate header_checks and body_checks, and keep just a few 206emergency patterns to block the latest worm explosion or backscatter 207mail. See BACKSCATTER_README for examples of the latter. 208 209<li> <p> Group your header_checks and body_checks patterns to avoid 210unnecessary pattern matching operations: 211 212<pre> 213 1 /etc/postfix/header_checks: 214 2 if /^Subject:/ 215 3 /^Subject: virus found in mail from you/ reject 216 4 /^Subject: ..other../ reject 217 5 endif 218 6 219 7 if /^Received:/ 220 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1 221 9 /^Received: from ..other../ reject .... 22210 endif 223</pre> 224 225</ul> 226 227<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2> 228 229<p> Under conditions of overload you can improve Postfix SMTP server 230responsiveness by hanging up on suspicious clients, so that other 231clients get a chance to talk to Postfix. </p> 232 233<ul> 234 235<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" 236(Postfix 2.3-2.5) to hang up on clients that that match botnet-related 237RBLs (see next bullet) or that match selected non-RBL restrictions 238such as SMTP access maps. The Postfix SMTP server will reject mail 239and disconnect without waiting for the remote SMTP client to send 240a QUIT command. </p> 241 242<li> <p> To hang up connections from blacklisted zombies, you can 243set specific Postfix SMTP server reject codes for specific RBLs, 244and for individual responses from specific RBLs. We'll use 245zen.spamhaus.org as an example; by the time you read this document, 246details may have changed. Right now, their documents say that a 247response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP 248address, which means that the machine is probably running a bot of 249some kind. To give a 521 response instead of the default 554 250response, use something like: </p> 251 252<pre> 253 1 /etc/postfix/main.cf: 254 2 smtpd_client_restrictions = 255 3 permit_mynetworks 256 4 reject_rbl_client zen.spamhaus.org=127.0.0.10 257 5 reject_rbl_client zen.spamhaus.org=127.0.0.11 258 6 reject_rbl_client zen.spamhaus.org 259 7 260 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps 261 9 26210 /etc/postfix/rbl_reply_maps: 26311 # With Postfix 2.3-2.5 use "421" to hang up connections. 26412 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable; 26513 $rbl_class [$rbl_what] blocked using 26614 $rbl_domain${rbl_reason?; $rbl_reason} 26715 26816 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable; 26917 $rbl_class [$rbl_what] blocked using 27018 $rbl_domain${rbl_reason?; $rbl_reason} 271</pre> 272 273<p> Although the above example shows three RBL lookups (lines 4-6), 274Postfix will only do a single DNS query, so it does not affect the 275performance. </p> 276 277<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not 278cause Postfix to disconnect). The down-side of replying with 421 279is that it works only for zombies and other malware. If the client 280is running a real MTA, then it may connect again several times until 281the mail expires in its queue. When this is a problem, stick with 282the default 554 reply, and use "smtpd_hard_error_limit = 1" as 283described below. </p> 284 285<li> <p> You can automatically turn on the above overload measure 286with Postfix 2.5 and later, or with earlier releases that contain 287the stress-adaptive behavior source code patch from the mirrors 288listed at http://www.postfix.org/download.html. Simply replace line 289above 8 with: </p> 290 291<pre> 292 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps} 293</pre> 294 295</ul> 296 297<p> More information about automatic stress-adaptive behavior is 298in section "<a href="#adapt">Automatic stress-adaptive behavior</a>". 299</p> 300 301<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2> 302 303<p> See the next section, "<a href="#adapt">Automatic stress-adaptive 304behavior</a>", if you are running Postfix version 2.5 or later, or 305if you have applied the source code patch for stress-adaptive 306behavior from the mirrors listed at http://www.postfix.org/download.html. 307</p> 308 309<p> The following measures can be applied temporarily during overload. 310They still allow <b>most</b> legitimate clients to connect and send 311mail, but may affect some legitimate clients. </p> 312 313<ul> 314 315<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the 316postfix-users list from a variety of sysadmins shows that reducing 317the "normal" smtpd_timeout to 60s is unlikely to affect legitimate 318clients. However, it is unlikely to become the Postfix default 319because it's not RFC compliant. Setting smtpd_timeout to 10s (line 3202 below) or even 5s under stress will still allow <b>most</b> 321legitimate clients to connect and send mail, but may delay mail 322from some clients. No mail should be lost, as long as this measure 323is used only temporarily. </p> 324 325<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this 326to 1 under stress (line 3 below) helps by disconnecting clients 327after a single error, giving other clients a chance to connect. 328However, this may cause significant delays with legitimate mail, 329such as a mailing list that contains a few no-longer-active user 330names that didn't bother to unsubscribe. No mail should be lost, 331as long as this measure is used only temporarily. </p> 332 333<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default 334100. This prevents clients from keeping idle connections open by 335repeatedly sending NOOP or RSET commands. </p> 336 337</ul> 338 339<blockquote> 340<pre> 3411 /etc/postfix/main.cf: 3422 smtpd_timeout = 10 3433 smtpd_hard_error_limit = 1 3444 smtpd_junk_command_limit = 1 345</pre> 346</blockquote> 347 348<p> With these measures, no mail should be lost, as long 349as these measures are used only temporarily. The next section of 350this document introduces a way to automate this process. </p> 351 352<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2> 353 354<p> Postfix version 2.5 introduces automatic stress-adaptive behavior. 355This is also available as a source code patch for Postfix versions 3562.4 and 2.3 from the mirrors listed at 357http://www.postfix.org/download.html. </p> 358 359<p> It works as follows. When a "public" network service such as 360the SMTP server runs into an "all server ports are busy" condition, 361the Postfix master(8) daemon logs a warning, restarts the service 362(without interrupting existing network sessions), and runs the 363service with "-o stress=yes" on the server process command line: 364</p> 365 366<blockquote> 367<pre> 36880821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes 369</pre> 370</blockquote> 371 372<p> Normally, the Postfix master(8) daemon runs such a service with 373"-o stress=" on the command line (i.e. with an empty parameter 374value): </p> 375 376<blockquote> 377<pre> 37883326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 379</pre> 380</blockquote> 381 382<p> Services that have local access only never have "-o stress" 383parameters on the command line. This includes services internal to 384Postfix such as the queue manager, and services that listen on a 385loopback interface only, such as after-filter SMTP services. </p> 386 387<p> The "stress" parameter value is the key to making main.cf 388parameter settings stress adaptive. The following settings are the 389default with Postfix 2.6 and later. With earlier Postfix versions 390that have stress-adaptive support, append the lines below to the 391main.cf file and issue a "postfix reload" command: </p> 392 393<blockquote> 394<pre> 3951 smtpd_timeout = ${stress?10}${stress:300}s 3962 smtpd_hard_error_limit = ${stress?1}${stress:20} 3973 smtpd_junk_command_limit = ${stress?1}${stress:100} 398</pre> 399</blockquote> 400 401<p> Translation: <p> 402 403<ul> 404 405<li> <p> Line 1: under conditions of stress, use an smtpd_timeout 406value of 10 seconds instead of the default 300 seconds. Experience 407on the postfix-users list from a variety of sysadmins shows that 408reducing the "normal" smtpd_timeout to 60s is unlikely to affect 409legitimate clients. However, it is unlikely to become the Postfix 410default because it's not RFC compliant. Setting smtpd_timeout to 41110s (line 2 below) or even 5s under stress will still allow most 412legitimate clients to connect and send mail, but may delay mail 413from some clients. No mail should be lost, as long as this measure 414is used only temporarily. </p> 415 416<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit 417of 1 instead of the default 20. This helps by disconnecting clients 418after a single error, giving other clients a chance to connect. 419However, this may cause significant delays with legitimate mail, 420such as a mailing list that contains a few no-longer-active user 421names that didn't bother to unsubscribe. No mail should be lost, 422as long as this measure is used only temporarily. </p> 423 424<li> <p> Line 3: under conditions of stress, use an 425smtpd_junk_command_limit of 1 instead of the default 100. This 426prevents clients from keeping idle connections open by repeatedly 427sending NOOP or RSET commands. </p> 428 429</ul> 430 431<p> The syntax of ${name?value} and ${name:value} is explained at 432the beginning of the postconf(5) manual page. </p> 433 434<p> NOTE: Please keep in mind that the stress-adaptive feature is 435a fairly desperate measure to keep <b>some</b> legitimate mail 436flowing under overload conditions. If a site is reaching the SMTP 437server process limit when there isn't an attack or bot flood 438occurring, then either the process limit needs to be raised or more 439hardware needs to be added. </p> 440 441<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2> 442 443<p> To find out if your Postfix installation supports stress-adaptive 444behavior, use the "ps" command, and look for the smtpd processes. 445Postfix has stress-adaptive support when you see "-o stress=" or 446"-o stress=yes" command-line options. Remember that Postfix never 447enables stress-adaptive behavior on servers that listen on local 448addresses only. </p> 449 450<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX 451and other System-V flavors, use "ps -ef" instead of "ps ax". </p> 452 453<blockquote> 454<pre> 455$ ps ax|grep smtpd 45683326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 45784345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl 458</pre> 459</blockquote> 460 461<p> You can't use postconf(1) to detect stress-adaptive support. 462The postconf(1) command ignores the existence of the stress parameter 463in main.cf, because the parameter has no effect there. Command-line 464"-o parameter" settings always take precedence over main.cf parameter 465settings. <p> 466 467<p> If you configure stress-adaptive behavior in main.cf when it 468isn't supported, nothing bad will happen. The processes will run 469as if the stress parameter always has an empty value. </p> 470 471<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2> 472 473<p> You can manually force stress-adaptive behavior on, by adding 474a "-o stress=yes" command-line option in master.cf. This can be 475useful for testing overrides on the SMTP service. Issue "postfix 476reload" to make the change effective. </p> 477 478<p> Note: setting the stress parameter in main.cf has no effect for 479services that accept remote connections. </p> 480 481<blockquote> 482<pre> 4831 /etc/postfix/master.cf: 4842 # ============================================================= 4853 # service type private unpriv chroot wakeup maxproc command 4864 # ============================================================= 4875 # 4886 smtp inet n - n - - smtpd 4897 -o stress=yes 4908 -o . . . 491</pre> 492</blockquote> 493 494<p> To permanently force stress-adaptive behavior off with a specific 495service, specify "-o stress=" on its master.cf command line. This 496may be desirable for the "submission" service. Issue "postfix reload" 497to make the change effective. </p> 498 499<p> Note: setting the stress parameter in main.cf has no effect for 500services that accept remote connections. </p> 501 502<blockquote> 503<pre> 5041 /etc/postfix/master.cf: 5052 # ============================================================= 5063 # service type private unpriv chroot wakeup maxproc command 5074 # ============================================================= 5085 # 5096 submission inet n - n - - smtpd 5107 -o stress= 5118 -o . . . 512</pre> 513</blockquote> 514 515<h2><a name="other"> Other measures to off-load zombies </a> </h2> 516 517<p> OpenBSD <a href="http://www.openbsd.org/spamd/">spamd</a> 518implements a daemon that handles all connections from "new" clients. 519Only well-behaved mail clients are allowed to talk to the mail 520server. Other clients are tarpitted, and will never get a chance 521to affect mail server performance. </p> 522 523<p> At some point in the future, Postfix may come with a simple 524front-end daemon that does basic greylisting and pipelining detection 525to keep zombies and other ratware away from Postfix itself. This 526would use the "pass" service type which has been available in 527stable Postfix releases since Postfix 2.5. </p> 528 529<h2><a name="credits"> Credits </a></h2> 530 531<ul> 532 533<li> Thanks to the postfix-users mailing list members for sharing 534early experiences with the stress-adaptive feature. 535 536<li> The RBL example and several other paragraphs of text were 537adapted from postfix-users postings by Noel Jones. 538 539<li> Wietse implemented stress-adaptive behavior as the smallest 540possible patch while he should be working on other things. 541 542</ul> 543 544</body> </html> 545