1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>Permanent Message Handling</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
9    <link rel="home" href="index.html" title="Getting Started with Replicated Berkeley DB Applications" />
10    <link rel="up" href="introduction.html" title="Chapter��1.��Introduction" />
11    <link rel="previous" href="elections.html" title="Holding Elections" />
12    <link rel="next" href="txnapp.html" title="Chapter��2.��Transactional Application" />
13  </head>
14  <body>
15    <div class="navheader">
16      <table width="100%" summary="Navigation header">
17        <tr>
18          <th colspan="3" align="center">Permanent Message Handling</th>
19        </tr>
20        <tr>
21          <td width="20%" align="left"><a accesskey="p" href="elections.html">Prev</a>��</td>
22          <th width="60%" align="center">Chapter��1.��Introduction</th>
23          <td width="20%" align="right">��<a accesskey="n" href="txnapp.html">Next</a></td>
24        </tr>
25      </table>
26      <hr />
27    </div>
28    <div class="sect1" lang="en" xml:lang="en">
29      <div class="titlepage">
30        <div>
31          <div>
32            <h2 class="title" style="clear: both"><a id="permmessages"></a>Permanent Message Handling</h2>
33          </div>
34        </div>
35        <div></div>
36      </div>
37      <p>
38                Messages received by a replica may be marked with an
39                special flag that indicates the message is permanent. 
40                Custom replicated applications will receive notification of
41                this flag via the <tt class="literal">DB_REP_ISPERM</tt> return value
42                from the 
43                    
44                    <tt class="methodname">DbEnv::rep_process_message()</tt>
45                method.
46                
47                There is no hard requirement that a replication application look for, or
48                respond to, this return code. However, because robust replicated
49                applications typically do manage permanent messages, we introduce 
50                the concept here. 
51            </p>
52      <p>
53                    A message is marked as being permanent if the message
54                    affects transactional integrity. For example,
55                    transaction commit messages are an example of a message
56                    that is marked permanent. What the application does
57                    about the permanent message is driven by the durability
58                    guarantees required by the application.
59            </p>
60      <p>
61                    For example, consider what the replication framework does when it
62                    has permanent message handling turned on and a
63                    transactional commit record is sent to the replicas.
64                    First, the replicas must transactional-commit the data
65                    modifications identified by the message. And then, upon
66                    a successful commit, the replication framework sends the master a
67                    message acknowledgment.
68            </p>
69      <p>
70                    For the master (again, using the replication framework), things are a little more complicated than
71                simple message acknowledgment.  Usually in a replicated
72                application, the master commits transactions
73                asynchronously; that is, the commit operation does not
74                block waiting for log data to be flushed to disk before
75                returning. So when a master is managing permanent
76                messages, it typically blocks the committing thread
77                immediately before <tt class="methodname">commit()</tt>
78                returns. The thread then waits for acknowledgments from
79                its replicas. If it receives enough acknowledgments, it
80                continues to operate as normal. 
81            </p>
82      <p>
83                If the master does not
84                receive message acknowledgments ��� or, more likely, it does not receive
85                <span class="emphasis"><em>enough</em></span> acknowledgments ��� the
86                committing thread flushes its log data to disk and then
87                continues operations as normal. The master application can
88                do this because replicas that fail to handle a message, for
89                whatever reason, will eventually catch up to the master. So
90                by flushing the transaction logs to disk, the master is
91                ensuring that the data modifications have made it to
92                stable storage in one location (its own hard drive).
93            </p>
94      <div class="sect2" lang="en" xml:lang="en">
95        <div class="titlepage">
96          <div>
97            <div>
98              <h3 class="title"><a id="permmessagenot"></a>When Not to Manage
99                            Permanent Messages</h3>
100            </div>
101          </div>
102          <div></div>
103        </div>
104        <p>
105                            There are two reasons why you might
106                            choose to not implement permanent messages.
107                            In part, these go to why you are using
108                            replication in the first place.
109                    </p>
110        <p>
111                        One class of applications uses replication so that
112                        the application can improve transaction
113                        through-put. Essentially, the application chooses a
114                        reduced transactional durability guarantee so as to
115                        avoid the overhead forced by the disk I/O required
116                        to flush transaction logs to disk. However, the
117                        application can then regain that durability
118                        guarantee to a certain degree by replicating the
119                        commit to some number of replicas.
120                    </p>
121        <p>
122                        Using replication to improve an application's
123                        transactional commit guarantee is called
124                        <span class="emphasis"><em>replicating to the network.</em></span>
125                    </p>
126        <p>
127                        In extreme cases where performance is of critical
128                        importance to the application, the master might
129                        choose to both use asynchronous commits
130                        <span class="emphasis"><em>and</em></span> decide not to wait for
131                        message acknowledgments. In this case the master
132                        is simply broadcasting its commit activities to its
133                        replicas without waiting for any sort of a reply. An
134                        application like this might also choose to use
135                        something other than TCP/IP for its network
136                        communications since that protocol involves a fair
137                        amount of packet acknowledgment all on its own. Of
138                        course, this sort of an application should also be
139                        very sure about the reliability of both its network and
140                        the machines that are hosting its replicas.
141                    </p>
142        <p>
143                            At the other end of the extreme, there is a
144                            class of applications that use replication
145                            purely to improve read performance. This sort
146                            of application might choose to use synchronous
147                            commits on the master because write
148                            performance there is not of critical
149                            performance. In any case, this kind of an
150                            application might not care to know whether its
151                            replicas have received and successfully handled
152                            permanent messages because the primary storage
153                            location is assumed to be on the master, not
154                            the replicas.
155                    </p>
156      </div>
157      <div class="sect2" lang="en" xml:lang="en">
158        <div class="titlepage">
159          <div>
160            <div>
161              <h3 class="title"><a id="permmanage"></a>Managing Permanent Messages</h3>
162            </div>
163          </div>
164          <div></div>
165        </div>
166        <p>
167                            With the exception of a rare breed of
168                            replicated applications, most masters need some
169                            view as to whether commits are occurring on
170                            replicas as expected. At a minimum, this is because
171                            masters will not flush their log buffers unless
172                            they have reason to expect that permanent
173                            messages have not been committed on the
174                            replicas. 
175                    </p>
176        <p>
177                        That said, it is important to remember that
178                        managing permanent messages involves a fair amount
179                        of network traffic. The messages must be sent to
180                        the replicas and the replicas must then acknowledge
181                        the message. This represents a performance overhead
182                        that can be worsened by congested networks or
183                        outright outages.
184                    </p>
185        <p>
186                        Therefore, when managing permanent messages, you
187                        must first decide on how many of your replicas must
188                        send acknowledgments before your master decides
189                        that all is well and it can continue normal
190                        operations. When making this decision, you could
191                        decide that <span class="emphasis"><em>all</em></span> replicas must
192                        send acknowledgments. But unless you have only one
193                        or two replicas, or you are replicating over a very
194                        fast and reliable network, this policy could prove
195                        very harmful to your application's performance.
196                    </p>
197        <p>
198                        Therefore, a common strategy is to wait for an
199                        acknowledgment from a simple majority of replicas.
200                        This ensures that commit activity has occurred on
201                        enough machines that you can be reliably certain
202                        that data writes are preserved across your network.
203                    </p>
204        <p>
205                        Remember that replicas that do not acknowledge a
206                        permanent message are not necessarily unable to
207                        perform the commit; it might be that network
208                        problems have simply resulted in a delay at the
209                        replica. In any case, the underlying DB
210                        replication code is written such that a replica that
211                        falls behind the master will eventually take action
212                        to catch up.
213                    </p>
214        <p>
215                            Depending on your application, it may be
216                            possible for you to code your permanent message
217                            handling such that acknowledgment must come
218                            from only one or two replicas. This is a
219                            particularly attractive strategy if you are
220                            closely managing which machines are eligible to
221                            become masters. Assuming that you have one or
222                            two machines designated to be a master in the
223                            event that the current master goes down, you
224                            may only want to receive acknowledgments from
225                            those specific machines.
226                    </p>
227        <p>
228                        Finally, beyond simple message acknowledgment, you
229                        also need to implement an acknowledgment timeout
230                        for your application. This timeout value is simply
231                        meant to ensure that your master does not hang
232                        indefinitely waiting for responses that will never
233                        come because a machine or router is down.
234                    </p>
235      </div>
236      <div class="sect2" lang="en" xml:lang="en">
237        <div class="titlepage">
238          <div>
239            <div>
240              <h3 class="title"><a id="permimplement"></a>Implementing Permanent
241                    Message Handling</h3>
242            </div>
243          </div>
244          <div></div>
245        </div>
246        <p>
247                            How you implement permanent message handling
248                            depends on which API you are using to implement
249                            replication. If you are using the replication framework, then
250                            permanent message handling is configured using
251                            policies that you specify to the framework. In
252                            this case, you can configure your application
253                            to:
254                   </p>
255        <div class="itemizedlist">
256          <ul type="disc">
257            <li>
258              <p>
259                                    Ignore permanent messages (the master
260                                    does not wait for acknowledgments). 
261                                   </p>
262            </li>
263            <li>
264              <p>
265                                           Require acknowledgments from a
266                                           quorum. A quorum is reached when
267                                           acknowledgments are received from the
268                                           minimum number of electable
269                                           replicas needed to ensure that
270                                           the record remains durable if
271                                           an election is held. 
272                                   </p>
273              <p>
274                                           The goal here is to be
275                                           absolutely sure the record is
276                                           durable. The master wants to
277                                           hear from enough electable
278                                           replicas that they have
279                                           committed the record so that if
280                                           an election is held, the master
281                                           knows the record will exist even
282                                           if a new master is selected.
283                                   </p>
284              <p>
285                                           This is the default policy.
286                                   </p>
287            </li>
288            <li>
289              <p>
290                                     Require an acknowledgment from at least one replica. 
291                                   </p>
292            </li>
293            <li>
294              <p>
295                                           Require acknowledgments from
296                                           all replicas.
297                                   </p>
298            </li>
299            <li>
300              <p>
301                                      Require an acknowledgment from a
302                                      peer. (The replication framework allows you to
303                                      designate one environment as a peer of
304                                      another).
305                                   </p>
306            </li>
307            <li>
308              <p>
309                                           Require acknowledgments from
310                                           all peers.
311                                   </p>
312            </li>
313          </ul>
314        </div>
315        <p>
316                        Note that the replication framework simply flushes its transaction
317                        logs and moves on if a permanent message is not
318                        sufficiently acknowledged.
319                   </p>
320        <p>
321                        For details on permanent message handling with the
322                        replication framework, see <a href="fwrkpermmessage.html">Permanent Message Handling</a>.
323                   </p>
324        <p>
325                        If these policies are not sufficient for your
326                        needs, or if you want your application to take more
327                        corrective action than simply flushing log buffers
328                        in the event of an unsuccessful commit, then you
329                        must use write a custom replication implementation. 
330                   </p>
331        <p>
332                        For custom replication implementation, messages are
333                        sent from the master to its replica using a
334                        <tt class="function">send()</tt> callback that you
335                        implement.  Note, however, that DB's replication 
336                        code automatically sets the permanent 
337                        flag for you where appropriate. 
338                   </p>
339        <p>
340                        If the <tt class="function">send()</tt> callback returns with a
341                        non-zero status, DB flushes the transaction log 
342                        buffers for you. Therefore, you must cause your
343                        <tt class="function">send()</tt> callback to block waiting
344                        for acknowledgments from your replicas. 
345                        As a part of implementing the
346                        <tt class="function">send()</tt> callback, you implement
347                        your permanent message handling policies. This
348                        means that you identify how many replicas must
349                        acknowledge the message before the callback can
350                        return <tt class="literal">0</tt>.  You must also
351                        implement the acknowledgment timeout, if any.
352                   </p>
353        <p>
354                        Further, message acknowledgments are sent from the
355                        replicas to the master using a communications
356                        channel that you implement (the replication code
357                        does not provide a channel for acknowledgments).
358                        So implementing permanent messages means that when
359                        you write your replication communications channel,
360                        you must also write it in such a way as to also
361                        handle permanent message acknowledgments.
362                   </p>
363        <p>
364                        For more information on implementing permanent
365                        message handling using a custom replication layer,
366                        see the <i class="citetitle">Berkeley DB Programmer's Reference Guide</i>.
367                   </p>
368      </div>
369    </div>
370    <div class="navfooter">
371      <hr />
372      <table width="100%" summary="Navigation footer">
373        <tr>
374          <td width="40%" align="left"><a accesskey="p" href="elections.html">Prev</a>��</td>
375          <td width="20%" align="center">
376            <a accesskey="u" href="introduction.html">Up</a>
377          </td>
378          <td width="40%" align="right">��<a accesskey="n" href="txnapp.html">Next</a></td>
379        </tr>
380        <tr>
381          <td width="40%" align="left" valign="top">Holding Elections��</td>
382          <td width="20%" align="center">
383            <a accesskey="h" href="index.html">Home</a>
384          </td>
385          <td width="40%" align="right" valign="top">��Chapter��2.��Transactional Application</td>
386        </tr>
387      </table>
388    </div>
389  </body>
390</html>
391