1<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2        "http://www.w3.org/TR/html4/loose.dtd">
3
4<html>
5
6<head>
7
8<title>Postfix Stress-Dependent Configuration</title>
9
10<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
11
12</head>
13
14<body>
15
16<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
17Stress-Dependent Configuration</h1>
18
19<hr>
20
21<h2>Overview </h2>
22
23<p> This document describes the symptoms of Postfix SMTP server
24overload. It presents permanent main.cf changes to avoid overload
25during normal operation, and temporary main.cf changes to cope with
26an unexpected burst of mail. This document makes specific suggestions
27for Postfix 2.5 and later which support stress-adaptive behavior,
28and for earlier Postfix versions that don't.  </p>
29
30<p> Topics covered in this document: </p>
31
32<ul>
33
34<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> 
35
36<li><a href="#concurrency"> Service more SMTP clients at the same time </a> 
37
38<li><a href="#time"> Spend less time per SMTP client </a>
39
40<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
41
42<li><a href="#legacy"> Temporary measures for older Postfix releases </a>
43
44<li><a href="#adapt"> Automatic stress-adaptive behavior </a>
45
46<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
47
48<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
49
50<li><a href="#other"> Other measures to off-load zombies </a>
51
52<li><a href="#credits"> Credits </a>
53
54</ul>
55
56<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
57
58<p> Under normal conditions, the Postfix SMTP server responds
59immediately when an SMTP client connects to it; the time to deliver
60mail is noticeable only with large messages.  Performance degrades
61dramatically when the number of SMTP clients exceeds the number of
62Postfix SMTP server processes.  When an SMTP client connects while
63all Postfix SMTP server processes are busy, the client must wait
64until a server process becomes available. </p>
65
66<p> SMTP server overload may be caused by a surge of legitimate
67mail (example: a DNS registrar opens a new zone for registrations),
68by mistake (mail explosion caused by a forwarding loop) or by malice
69(worm outbreak, botnet, or other illegitimate activity).  </p>
70
71<p> Symptoms of Postfix SMTP server overload are: </p>
72
73<ul>
74
75<li> <p> Remote SMTP clients experience a long delay before Postfix
76sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
77
78<ul>
79
80<li> <p> NOTE: Broken DNS configurations can also cause lengthy
81delays before Postfix sends "220 hostname.example.com ...". These
82delays also exist when Postfix is NOT overloaded.  </p>
83
84<li> <p> NOTE:  To avoid "overload" delays for end-user mail
85clients, enable the "submission" service entry in master.cf (present
86since Postfix 2.1), and tell users to connect to this instead of
87the public SMTP service. </p>
88
89</ul>
90
91<li> <p> The Postfix SMTP server logs an increased number of "lost
92connection after CONNECT" events. This happens because remote SMTP
93clients disconnect before Postfix answers the connection. </p>
94
95<ul>
96
97<li> <p> NOTE: A portscan for open SMTP ports can also result in
98"lost connection ..." logfile messages. </p>
99
100</ul>
101
102<li> <p> Postfix 2.3 and later logs a warning that all server ports
103are busy: </p>
104
105<pre>
106Oct  3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
107 (25) has reached its process limit "30": new clients may experience
108 noticeable delays
109Oct  3 20:39:27 spike postfix/master[28905]: warning: to avoid this
110 condition, increase the process count in master.cf or reduce the
111 service time per client
112</pre>
113
114</ul>
115
116<p> Legitimate mail that doesn't get through during an episode of
117Postfix SMTP server overload is not necessarily lost. It should
118still arrive once the situation returns to normal, as long as the
119overload condition is temporary.  </p>
120
121<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
122
123<p> One measure to avoid the "all server processes busy" condition
124is to service more SMTP clients simultaneously. For this you need
125to increase the number of Postfix SMTP server processes. This will
126improve the
127responsiveness for remote SMTP clients, as long as the server machine
128has enough hardware and software resources to run the additional
129processes, and as long as the file system can keep up with the
130additional load. </p>
131
132<ul>
133
134<li> <p> You increase the number of SMTP server processes either
135by increasing the default_process_limit in main.cf (line 3 below),
136or by increasing the SMTP server's "maxproc" field in master.cf
137(line 10 below).  Either way, you need to issue a "postfix reload"
138command to make the change effective.  </p>
139
140<li> <p> Process limits above 1000 require Postfix version 2.4 or
141later, and an operating system that supports kernel-based event
142filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
143</p>
144
145<li> <p> More processes use more memory. You can reduce the Postfix
146memory footprint by using cdb:
147lookup tables instead of Berkeley DB's hash: or btree: tables. </p>
148
149<pre>
150 1 /etc/postfix/main.cf:
151 2     # Raise the global process limit, 100 since Postfix 2.0.
152 3     default_process_limit = 200
153 4
154 5 /etc/postfix/master.cf:
155 6     # =============================================================
156 7     # service type  private unpriv  chroot  wakeup  maxproc command
157 8     # =============================================================
158 9     # Raise the SMTP service process limit only.
15910     smtp      inet  n       -       n       -       200     smtpd
160</pre>
161
162<li> <p> NOTE: older versions of the SMTPD_POLICY_README document
163contain a mistake: they configure a fixed number of policy daemon
164processes.  When you raise the SMTP server's "maxproc" field in
165master.cf, SMTP server processes will report problems when connecting
166to policy server processes, because there aren't enough of them.
167Examples of errors are "connection refused" or "operation timed
168out".  </p>
169
170<p> To fix, edit master.cf and specify a zero "maxproc" field
171in all policy server entries; see line 6 in the example below.
172Issue a "postfix reload" command to make the change effective.  </p>
173
174<pre>
1751 /etc/postfix/master.cf:
1762     # =============================================================
1773     # service type  private unpriv  chroot  wakeup  maxproc command
1784     # =============================================================
1795     # Disable the policy service process limit.
1806     policy    unix  -       n       n       -       0       spawn
1817         user=nobody argv=/some/where/policy-server
182</pre>
183
184</ul>
185
186<h2><a name="time"> Spend less time per SMTP client </a></h2>
187
188<p> When increasing the number of SMTP server processes is not
189practical, you can improve Postfix server responsiveness by eliminating
190delays.  When Postfix spends less time per SMTP session, the same
191number of SMTP server processes can service more clients in a given
192amount of time. </p>
193
194<ul>
195
196<li> <p> Eliminate non-functional RBL lookups (blocklists that are
197no longer in operation). These lookups can degrade performance.
198Postfix logs a warning when an RBL server does not respond. </p>
199
200<li> <p> Eliminate redundant RBL lookups (people often use multiple
201Spamhaus RBLs that include each other).  To find out whether RBLs
202include other RBLs, look up the websites that document the RBL's
203policies. </p>
204
205<li> <p> Eliminate header_checks and body_checks, and keep just a few
206emergency patterns to block the latest worm explosion or backscatter
207mail.  See BACKSCATTER_README for examples of the latter.
208
209<li> <p> Group your header_checks and body_checks patterns to avoid
210unnecessary pattern matching operations:
211
212<pre>
213 1  /etc/postfix/header_checks:
214 2      if /^Subject:/
215 3      /^Subject: virus found in mail from you/ reject
216 4      /^Subject: ..other../ reject
217 5      endif
218 6  
219 7      if /^Received:/
220 8      /^Received: from (postfix\.org) / reject forged client name in received header: $1
221 9      /^Received: from ..other../ reject ....
22210      endif
223</pre>
224
225</ul>
226
227<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>
228
229<p> Under conditions of overload you can improve Postfix SMTP server
230responsiveness by hanging up on suspicious clients, so that other
231clients get a chance to talk to Postfix.  </p>
232
233<ul>
234
235<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
236(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
237RBLs (see next bullet) or that match selected non-RBL restrictions
238such as SMTP access maps.  The Postfix SMTP server will reject mail
239and disconnect without waiting for the remote SMTP client to send
240a QUIT command.  </p>
241
242<li> <p> To hang up connections from blacklisted zombies, you can
243set specific Postfix SMTP server reject codes for specific RBLs,
244and for individual responses from specific RBLs. We'll use
245zen.spamhaus.org as an example; by the time you read this document,
246details may have changed.  Right now, their documents say that a
247response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
248address, which means that the machine is probably running a bot of
249some kind.  To give a 521 response instead of the default 554
250response, use something like: </p>
251
252<pre>
253 1  /etc/postfix/main.cf:
254 2      smtpd_client_restrictions =
255 3         permit_mynetworks
256 4         reject_rbl_client zen.spamhaus.org=127.0.0.10
257 5         reject_rbl_client zen.spamhaus.org=127.0.0.11
258 6         reject_rbl_client zen.spamhaus.org
259 7  
260 8      rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
261 9  
26210  /etc/postfix/rbl_reply_maps:
26311      # With Postfix 2.3-2.5 use "421" to hang up connections.
26412      zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
26513       $rbl_class [$rbl_what] blocked using
26614       $rbl_domain${rbl_reason?; $rbl_reason}
26715  
26816      zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
26917       $rbl_class [$rbl_what] blocked using
27018       $rbl_domain${rbl_reason?; $rbl_reason}
271</pre>
272
273<p> Although the above example shows three RBL lookups (lines 4-6),
274Postfix will only do a single DNS query, so it does not affect the
275performance. </p>
276
277<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
278cause Postfix to disconnect). The down-side of replying with 421
279is that it works only for zombies and other malware. If the client
280is running a real MTA, then it may connect again several times until
281the mail expires in its queue. When this is a problem, stick with
282the default 554 reply, and use "smtpd_hard_error_limit = 1" as
283described below.  </p>
284
285<li> <p> You can automatically turn on the above overload measure
286with Postfix 2.5 and later, or with earlier releases that contain
287the stress-adaptive behavior source code patch from the mirrors
288listed at http://www.postfix.org/download.html. Simply replace line
289above 8 with: </p>
290
291<pre>
292 8      rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
293</pre>
294
295</ul>
296
297<p> More information about automatic stress-adaptive behavior is
298in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
299</p>
300
301<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
302
303<p> See the next section, "<a href="#adapt">Automatic stress-adaptive
304behavior</a>", if you are running Postfix version 2.5 or later, or
305if you have applied the source code patch for stress-adaptive
306behavior from the mirrors listed at http://www.postfix.org/download.html.
307</p>
308
309<p> The following measures can be applied temporarily during overload.
310They still allow <b>most</b> legitimate clients to connect and send
311mail, but may affect some legitimate clients. </p>
312
313<ul>
314
315<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the
316postfix-users list from a variety of sysadmins shows that reducing
317the "normal" smtpd_timeout to 60s is unlikely to affect legitimate
318clients. However, it is unlikely to become the Postfix default
319because it's not RFC compliant. Setting smtpd_timeout to 10s (line
3202 below) or even 5s under stress will still allow <b>most</b>
321legitimate clients to connect and send mail, but may delay mail
322from some clients.  No mail should be lost, as long as this measure
323is used only temporarily.  </p>
324
325<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this
326to 1 under stress (line 3 below) helps by disconnecting clients
327after a single error, giving other clients a chance to connect.
328However, this may cause significant delays with legitimate mail,
329such as a mailing list that contains a few no-longer-active user
330names that didn't bother to unsubscribe. No mail should be lost,
331as long as this measure is used only temporarily. </p>
332
333<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default
334100. This prevents clients from keeping idle connections open by
335repeatedly sending NOOP or RSET commands. </p>
336
337</ul>
338
339<blockquote>
340<pre>
3411  /etc/postfix/main.cf:
3422      smtpd_timeout = 10
3433      smtpd_hard_error_limit = 1
3444      smtpd_junk_command_limit = 1
345</pre>
346</blockquote>
347
348<p> With these measures, no mail should be lost, as long
349as these measures are used only temporarily. The next section of
350this document introduces a way to automate this process. </p>
351
352<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
353
354<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
355This is also available as a source code patch for Postfix versions
3562.4 and 2.3 from the mirrors listed at
357http://www.postfix.org/download.html.  </p>
358
359<p> It works as follows. When a "public" network service such as
360the SMTP server runs into an "all server ports are busy" condition,
361the Postfix master(8) daemon logs a warning, restarts the service
362(without interrupting existing network sessions), and runs the
363service with "-o stress=yes" on the server process command line:
364</p>
365
366<blockquote>
367<pre>
36880821  ??  S      0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
369</pre>
370</blockquote>
371
372<p> Normally, the Postfix master(8) daemon runs such a service with
373"-o stress=" on the command line (i.e.  with an empty parameter
374value):  </p>
375
376<blockquote>
377<pre>
37883326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
379</pre>
380</blockquote>
381
382<p> Services that have local access only never have "-o stress"
383parameters on the command line. This includes services internal to
384Postfix such as the queue manager, and services that listen on a
385loopback interface only, such as after-filter SMTP services.  </p>
386
387<p> The "stress" parameter value is the key to making main.cf
388parameter settings stress adaptive. The following settings are the
389default with Postfix 2.6 and later. With earlier Postfix versions
390that have stress-adaptive support, append the lines below to the
391main.cf file and issue a "postfix reload" command: </p>
392
393<blockquote>
394<pre>
3951 smtpd_timeout = ${stress?10}${stress:300}s
3962 smtpd_hard_error_limit = ${stress?1}${stress:20}
3973 smtpd_junk_command_limit = ${stress?1}${stress:100}
398</pre>
399</blockquote>
400
401<p> Translation: <p>
402
403<ul>
404
405<li> <p> Line 1: under conditions of stress, use an smtpd_timeout
406value of 10 seconds instead of the default 300 seconds. Experience
407on the postfix-users list from a variety of sysadmins shows that
408reducing the "normal" smtpd_timeout to 60s is unlikely to affect
409legitimate clients. However, it is unlikely to become the Postfix
410default because it's not RFC compliant. Setting smtpd_timeout to
41110s (line 2 below) or even 5s under stress will still allow most
412legitimate clients to connect and send mail, but may delay mail
413from some clients. No mail should be lost, as long as this measure
414is used only temporarily. </p>
415
416<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit
417of 1 instead of the default 20. This helps by disconnecting clients
418after a single error, giving other clients a chance to connect.
419However, this may cause significant delays with legitimate mail,
420such as a mailing list that contains a few no-longer-active user
421names that didn't bother to unsubscribe. No mail should be lost,
422as long as this measure is used only temporarily. </p>
423
424<li> <p> Line 3: under conditions of stress, use an
425smtpd_junk_command_limit of 1 instead of the default 100. This
426prevents clients from keeping idle connections open by repeatedly
427sending NOOP or RSET commands. </p>
428
429</ul>
430
431<p> The syntax of ${name?value} and ${name:value} is explained at
432the beginning of the postconf(5) manual page. </p>
433
434<p> NOTE: Please keep in mind that the stress-adaptive feature is
435a fairly desperate measure to keep <b>some</b> legitimate mail
436flowing under overload conditions.  If a site is reaching the SMTP
437server process limit when there isn't an attack or bot flood
438occurring, then either the process limit needs to be raised or more
439hardware needs to be added.  </p>
440
441<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>
442
443<p> To find out if your Postfix installation supports stress-adaptive
444behavior, use the "ps" command, and look for the smtpd processes.
445Postfix has stress-adaptive support when you see "-o stress=" or
446"-o stress=yes" command-line options. Remember that Postfix never
447enables stress-adaptive behavior on servers that listen on local
448addresses only. </p>
449
450<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
451and other System-V flavors, use "ps -ef" instead of "ps ax". </p>
452
453<blockquote>
454<pre>
455$ ps ax|grep smtpd
45683326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
45784345  ??  Ss     0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
458</pre>
459</blockquote>
460
461<p> You can't use postconf(1) to detect stress-adaptive support.
462The postconf(1) command ignores the existence of the stress parameter
463in main.cf, because the parameter has no effect there.  Command-line
464"-o parameter" settings always take precedence over main.cf parameter
465settings.  <p>
466
467<p> If you configure stress-adaptive behavior in main.cf when it
468isn't supported, nothing bad will happen.  The processes will run
469as if the stress parameter always has an empty value. </p>
470
471<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>
472
473<p> You can manually force stress-adaptive behavior on, by adding
474a "-o stress=yes" command-line option in master.cf. This can be
475useful for testing overrides on the SMTP service. Issue "postfix
476reload" to make the change effective.  </p>
477
478<p> Note: setting the stress parameter in main.cf has no effect for
479services that accept remote connections. </p>
480
481<blockquote>
482<pre>
4831 /etc/postfix/master.cf:
4842     # =============================================================
4853     # service type  private unpriv  chroot  wakeup  maxproc command
4864     # =============================================================
4875     # 
4886     smtp      inet  n       -       n       -       -       smtpd
4897         -o stress=yes
4908         -o . . .
491</pre>
492</blockquote>
493
494<p> To permanently force stress-adaptive behavior off with a specific
495service, specify "-o stress=" on its master.cf command line.  This
496may be desirable for the "submission" service. Issue "postfix reload"
497to make the change effective.  </p>
498
499<p> Note: setting the stress parameter in main.cf has no effect for
500services that accept remote connections. </p>
501
502<blockquote>
503<pre>
5041 /etc/postfix/master.cf:
5052     # =============================================================
5063     # service type  private unpriv  chroot  wakeup  maxproc command
5074     # =============================================================
5085     # 
5096     submission inet n       -       n       -       -       smtpd
5107         -o stress=
5118         -o . . .
512</pre>
513</blockquote>
514
515<h2><a name="other"> Other measures to off-load zombies </a> </h2>
516
517<p> OpenBSD <a href="http://www.openbsd.org/spamd/">spamd</a>
518implements a daemon that handles all connections from "new" clients.
519Only well-behaved mail clients are allowed to talk to the mail
520server. Other clients are tarpitted, and will never get a chance
521to affect mail server performance. </p>
522
523<p> At some point in the future, Postfix may come with a simple
524front-end daemon that does basic greylisting and pipelining detection
525to keep zombies and other ratware away from Postfix itself. This
526would use the "pass" service type which has been available in
527stable Postfix releases since Postfix 2.5. </p>
528
529<h2><a name="credits"> Credits </a></h2>
530
531<ul>
532
533<li>  Thanks to the postfix-users mailing list members for sharing
534early experiences with the stress-adaptive feature.
535
536<li>  The RBL example and several other paragraphs of text were
537adapted from postfix-users postings by Noel Jones.
538
539<li>  Wietse implemented stress-adaptive behavior as the smallest
540possible patch while he should be working on other things.
541
542</ul>
543
544</body> </html>
545