1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>Permanent Message Handling</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
9    <link rel="start" href="index.html" title="Getting Started with Replicated Berkeley DB Applications" />
10    <link rel="up" href="introduction.html" title="Chapter 1. Introduction" />
11    <link rel="prev" href="elections.html" title="Holding Elections" />
12    <link rel="next" href="txnapp.html" title="Chapter 2. Transactional Application" />
13  </head>
14  <body>
15    <div class="navheader">
16      <table width="100%" summary="Navigation header">
17        <tr>
18          <th colspan="3" align="center">Permanent Message Handling</th>
19        </tr>
20        <tr>
21          <td width="20%" align="left"><a accesskey="p" href="elections.html">Prev</a> </td>
22          <th width="60%" align="center">Chapter 1. Introduction</th>
23          <td width="20%" align="right"> <a accesskey="n" href="txnapp.html">Next</a></td>
24        </tr>
25      </table>
26      <hr />
27    </div>
28    <div class="sect1" lang="en" xml:lang="en">
29      <div class="titlepage">
30        <div>
31          <div>
32            <h2 class="title" style="clear: both"><a id="permmessages"></a>Permanent Message Handling</h2>
33          </div>
34        </div>
35      </div>
36      <div class="toc">
37        <dl>
38          <dt>
39            <span class="sect2">
40              <a href="permmessages.html#permmessagenot">When Not to Manage
41                            Permanent Messages</a>
42            </span>
43          </dt>
44          <dt>
45            <span class="sect2">
46              <a href="permmessages.html#permmanage">Managing Permanent Messages</a>
47            </span>
48          </dt>
49          <dt>
50            <span class="sect2">
51              <a href="permmessages.html#permimplement">Implementing Permanent
52                    Message Handling</a>
53            </span>
54          </dt>
55        </dl>
56      </div>
57      <p>
58                Messages received by a replica may be marked with 
59                special flag that indicates the message is permanent. 
60                Custom replicated applications will receive notification of
61                this flag via the <code class="literal">DB_REP_ISPERM</code> return value
62                from the 
63                    <code class="methodname">DB_ENV-&gt;rep_process_message()</code>
64                    
65                method.
66                
67                There is no hard requirement that a replication application look for, or
68                respond to, this return code. However, because robust replicated
69                applications typically do manage permanent messages, we introduce 
70                the concept here. 
71            </p>
72      <p>
73                    A message is marked as being permanent if the message
74                    affects transactional integrity. For example,
75                    transaction commit messages are an example of a message
76                    that is marked permanent. What the application does
77                    about the permanent message is driven by the durability
78                    guarantees required by the application.
79            </p>
80      <p>
81                    For example, consider what the Replication Manager does when it
82                    has permanent message handling turned on and a
83                    transactional commit record is sent to the replicas.
84                    First, the replicas must transactional-commit the data
85                    modifications identified by the message. And then, upon
86                    a successful commit, the Replication Manager sends the master a
87                    message acknowledgment.
88            </p>
89      <p>
90                    For the master (again, using the Replication Manager), things are a little more complicated than
91                simple message acknowledgment.  Usually in a replicated
92                application, the master commits transactions
93                asynchronously; that is, the commit operation does not
94                block waiting for log data to be flushed to disk before
95                returning. So when a master is managing permanent
96                messages, it typically blocks the committing thread
97                immediately before <code class="methodname">commit()</code>
98                returns. The thread then waits for acknowledgments from
99                its replicas. If it receives enough acknowledgments, it
100                continues to operate as normal. 
101            </p>
102      <p>
103                If the master does not
104                receive message acknowledgments — or, more likely, it does not receive
105                <span class="emphasis"><em>enough</em></span> acknowledgments — the
106                committing thread flushes its log data to disk and then
107                continues operations as normal. The master application can
108                do this because replicas that fail to handle a message, for
109                whatever reason, will eventually catch up to the master. So
110                by flushing the transaction logs to disk, the master is
111                ensuring that the data modifications have made it to
112                stable storage in one location (its own hard drive).
113            </p>
114      <div class="sect2" lang="en" xml:lang="en">
115        <div class="titlepage">
116          <div>
117            <div>
118              <h3 class="title"><a id="permmessagenot"></a>When Not to Manage
119                            Permanent Messages</h3>
120            </div>
121          </div>
122        </div>
123        <p>
124                            There are two reasons why you might
125                            choose to not implement permanent messages.
126                            In part, these go to why you are using
127                            replication in the first place.
128                    </p>
129        <p>
130                        One class of applications uses replication so that
131                        the application can improve transaction
132                        through-put. Essentially, the application chooses a
133                        reduced transactional durability guarantee so as to
134                        avoid the overhead forced by the disk I/O required
135                        to flush transaction logs to disk. However, the
136                        application can then regain that durability
137                        guarantee to a certain degree by replicating the
138                        commit to some number of replicas.
139                    </p>
140        <p>
141                        Using replication to improve an application's
142                        transactional commit guarantee is called
143                        <span class="emphasis"><em>replicating to the network.</em></span>
144                    </p>
145        <p>
146                        In extreme cases where performance is of critical
147                        importance to the application, the master might
148                        choose to both use asynchronous commits
149                        <span class="emphasis"><em>and</em></span> decide not to wait for
150                        message acknowledgments. In this case the master
151                        is simply broadcasting its commit activities to its
152                        replicas without waiting for any sort of a reply. An
153                        application like this might also choose to use
154                        something other than TCP/IP for its network
155                        communications since that protocol involves a fair
156                        amount of packet acknowledgment all on its own. Of
157                        course, this sort of an application should also be
158                        very sure about the reliability of both its network and
159                        the machines that are hosting its replicas.
160                    </p>
161        <p>
162                            At the other extreme, there is a
163                            class of applications that use replication
164                            purely to improve read performance. This sort
165                            of application might choose to use synchronous
166                            commits on the master because write
167                            performance there is not of critical
168                            performance. In any case, this kind of an
169                            application might not care to know whether its
170                            replicas have received and successfully handled
171                            permanent messages because the primary storage
172                            location is assumed to be on the master, not
173                            the replicas.
174                    </p>
175      </div>
176      <div class="sect2" lang="en" xml:lang="en">
177        <div class="titlepage">
178          <div>
179            <div>
180              <h3 class="title"><a id="permmanage"></a>Managing Permanent Messages</h3>
181            </div>
182          </div>
183        </div>
184        <p>
185                            With the exception of a rare breed of
186                            replicated applications, most masters need some
187                            view as to whether commits are occurring on
188                            replicas as expected. At a minimum, this is because
189                            masters will not flush their log buffers unless
190                            they have reason to expect that permanent
191                            messages have not been committed on the
192                            replicas. 
193                    </p>
194        <p>
195                        That said, it is important to remember that
196                        managing permanent messages involves a fair amount
197                        of network traffic. The messages must be sent to
198                        the replicas and the replicas must acknowledge
199                        them. This represents a performance overhead
200                        that can be worsened by congested networks or
201                        outright outages.
202                    </p>
203        <p>
204                        Therefore, when managing permanent messages, you
205                        must first decide on how many of your replicas must
206                        send acknowledgments before your master decides
207                        that all is well and it can continue normal
208                        operations. When making this decision, you could
209                        decide that <span class="emphasis"><em>all</em></span> replicas must
210                        send acknowledgments. But unless you have only one
211                        or two replicas, or you are replicating over a very
212                        fast and reliable network, this policy could prove
213                        very harmful to your application's performance.
214                    </p>
215        <p>
216                        Therefore, a common strategy is to wait for an
217                        acknowledgment from a simple majority of replicas.
218                        This ensures that commit activity has occurred on
219                        enough machines that you can be reliably certain
220                        that data writes are preserved across your network.
221                    </p>
222        <p>
223                        Remember that replicas that do not acknowledge a
224                        permanent message are not necessarily unable to
225                        perform the commit; it might be that network
226                        problems have simply resulted in a delay at the
227                        replica. In any case, the underlying DB
228                        replication code is written such that a replica that
229                        falls behind the master will eventually take action
230                        to catch up.
231                    </p>
232        <p>
233                            Depending on your application, it may be
234                            possible for you to code your permanent message
235                            handling such that acknowledgment must come
236                            from only one or two replicas. This is a
237                            particularly attractive strategy if you are
238                            closely managing which machines are eligible to
239                            become masters. Assuming that you have one or
240                            two machines designated to be a master in the
241                            event that the current master goes down, you
242                            may only want to receive acknowledgments from
243                            those specific machines.
244                    </p>
245        <p>
246                        Finally, beyond simple message acknowledgment, you
247                        also need to implement an acknowledgment timeout
248                        for your application. This timeout value is simply
249                        meant to ensure that your master does not hang
250                        indefinitely waiting for responses that will never
251                        come because a machine or router is down.
252                    </p>
253      </div>
254      <div class="sect2" lang="en" xml:lang="en">
255        <div class="titlepage">
256          <div>
257            <div>
258              <h3 class="title"><a id="permimplement"></a>Implementing Permanent
259                    Message Handling</h3>
260            </div>
261          </div>
262        </div>
263        <p>
264                            How you implement permanent message handling
265                            depends on which API you are using to implement
266                            replication. If you are using the Replication Manager, then
267                            permanent message handling is configured using
268                            policies that you specify to the framework. In
269                            this case, you can configure your application
270                            to:
271                   </p>
272        <div class="itemizedlist">
273          <ul type="disc">
274            <li>
275              <p>
276                                    Ignore permanent messages (the master
277                                    does not wait for acknowledgments). 
278                                   </p>
279            </li>
280            <li>
281              <p>
282                                           Require acknowledgments from a
283                                           quorum. A quorum is reached when
284                                           acknowledgments are received from the
285                                           minimum number of electable
286                                           peers needed to ensure that
287                                           the record remains durable if
288                                           an election is held. 
289                                   </p>
290              <p>
291                                       An <span class="emphasis"><em>electable peer</em></span> is any other
292                                       site that potentially can be elected master.
293                                   </p>
294              <p>
295                                           The goal here is to be
296                                           absolutely sure the record is
297                                           durable. The master wants to
298                                           hear from enough electable
299                                           peer that they have
300                                           committed the record so that if
301                                           an election is held, the master
302                                           knows the record will exist even
303                                           if a new master is selected.
304                                   </p>
305              <p>
306                                           This is the default policy.
307                                   </p>
308            </li>
309            <li>
310              <p>
311                                     Require an acknowledgment from at least one replica. 
312                                   </p>
313            </li>
314            <li>
315              <p>
316                                           Require acknowledgments from
317                                           all replicas.
318                                   </p>
319            </li>
320            <li>
321              <p>
322                                      Require an acknowledgment from at least one electable peer.
323                                   </p>
324            </li>
325            <li>
326              <p>
327                                           Require acknowledgments from all electable peers.
328                                   </p>
329            </li>
330          </ul>
331        </div>
332        <p>
333                        Note that the Replication Manager simply flushes its transaction
334                        logs and moves on if a permanent message is not
335                        sufficiently acknowledged.
336                   </p>
337        <p>
338                        For details on permanent message handling with the
339                        Replication Manager, see <a class="xref" href="fwrkpermmessage.html" title="Permanent Message Handling">Permanent Message Handling</a>.
340                   </p>
341        <p>
342                        If these policies are not sufficient for your
343                        needs, or if you want your application to take more
344                        corrective action than simply flushing log buffers
345                        in the event of an unsuccessful commit, then you
346                        must use implement replication using the Base APIs.
347                   </p>
348        <p>
349                        When using the Base APIs, messages are
350                        sent from the master to its replica using a
351                        <code class="function">send()</code> callback that you
352                        implement.  Note, however, that DB's replication 
353                        code automatically sets the permanent 
354                        flag for you where appropriate. 
355                   </p>
356        <p>
357                        If the <code class="function">send()</code> callback returns with a
358                        non-zero status, DB flushes the transaction log 
359                        buffers for you. Therefore, you must cause your
360                        <code class="function">send()</code> callback to block waiting
361                        for acknowledgments from your replicas. 
362                        As a part of implementing the
363                        <code class="function">send()</code> callback, you implement
364                        your permanent message handling policies. This
365                        means that you identify how many replicas must
366                        acknowledge the message before the callback can
367                        return <code class="literal">0</code>.  You must also
368                        implement the acknowledgment timeout, if any.
369                   </p>
370        <p>
371                        Further, message acknowledgments are sent from the
372                        replicas to the master using a communications
373                        channel that you implement (the replication code
374                        does not provide a channel for acknowledgments).
375                        So implementing permanent messages means that when
376                        you write your replication communications channel,
377                        you must also write it in such a way as to also
378                        handle permanent message acknowledgments.
379                   </p>
380        <p>
381                        For more information on implementing permanent
382                        message handling using a custom replication layer,
383                        see the <em class="citetitle">Berkeley DB Programmer's Reference Guide</em>.
384                   </p>
385      </div>
386    </div>
387    <div class="navfooter">
388      <hr />
389      <table width="100%" summary="Navigation footer">
390        <tr>
391          <td width="40%" align="left"><a accesskey="p" href="elections.html">Prev</a> </td>
392          <td width="20%" align="center">
393            <a accesskey="u" href="introduction.html">Up</a>
394          </td>
395          <td width="40%" align="right"> <a accesskey="n" href="txnapp.html">Next</a></td>
396        </tr>
397        <tr>
398          <td width="40%" align="left" valign="top">Holding Elections </td>
399          <td width="20%" align="center">
400            <a accesskey="h" href="index.html">Home</a>
401          </td>
402          <td width="40%" align="right" valign="top"> Chapter 2. Transactional Application</td>
403        </tr>
404      </table>
405    </div>
406  </body>
407</html>
408