1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>Permanent Message Handling</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" /> 9 <link rel="home" href="index.html" title="Getting Started with Replicated Berkeley DB Applications" /> 10 <link rel="up" href="introduction.html" title="Chapter��1.��Introduction" /> 11 <link rel="previous" href="elections.html" title="Holding Elections" /> 12 <link rel="next" href="txnapp.html" title="Chapter��2.��Transactional Application" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">Permanent Message Handling</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="elections.html">Prev</a>��</td> 22 <th width="60%" align="center">Chapter��1.��Introduction</th> 23 <td width="20%" align="right">��<a accesskey="n" href="txnapp.html">Next</a></td> 24 </tr> 25 </table> 26 <hr /> 27 </div> 28 <div class="sect1" lang="en" xml:lang="en"> 29 <div class="titlepage"> 30 <div> 31 <div> 32 <h2 class="title" style="clear: both"><a id="permmessages"></a>Permanent Message Handling</h2> 33 </div> 34 </div> 35 <div></div> 36 </div> 37 <p> 38 Messages received by a replica may be marked with an 39 special flag that indicates the message is permanent. 40 Custom replicated applications will receive notification of 41 this flag via the <tt class="literal">DB_REP_ISPERM</tt> return value 42 from the 43 44 45 method. 46 47 There is no hard requirement that a replication application look for, or 48 respond to, this return code. However, because robust replicated 49 applications typically do manage permanent messages, we introduce 50 the concept here. 51 </p> 52 <p> 53 A message is marked as being permanent if the message 54 affects transactional integrity. For example, 55 transaction commit messages are an example of a message 56 that is marked permanent. What the application does 57 about the permanent message is driven by the durability 58 guarantees required by the application. 59 </p> 60 <p> 61 For example, consider what the replication framework does when it 62 has permanent message handling turned on and a 63 transactional commit record is sent to the replicas. 64 First, the replicas must transactional-commit the data 65 modifications identified by the message. And then, upon 66 a successful commit, the replication framework sends the master a 67 message acknowledgment. 68 </p> 69 <p> 70 For the master (again, using the replication framework), things are a little more complicated than 71 simple message acknowledgment. Usually in a replicated 72 application, the master commits transactions 73 asynchronously; that is, the commit operation does not 74 block waiting for log data to be flushed to disk before 75 returning. So when a master is managing permanent 76 messages, it typically blocks the committing thread 77 immediately before <tt class="methodname">commit()</tt> 78 returns. The thread then waits for acknowledgments from 79 its replicas. If it receives enough acknowledgments, it 80 continues to operate as normal. 81 </p> 82 <p> 83 If the master does not 84 receive message acknowledgments ��� or, more likely, it does not receive 85 <span class="emphasis"><em>enough</em></span> acknowledgments ��� the 86 committing thread flushes its log data to disk and then 87 continues operations as normal. The master application can 88 do this because replicas that fail to handle a message, for 89 whatever reason, will eventually catch up to the master. So 90 by flushing the transaction logs to disk, the master is 91 ensuring that the data modifications have made it to 92 stable storage in one location (its own hard drive). 93 </p> 94 <div class="sect2" lang="en" xml:lang="en"> 95 <div class="titlepage"> 96 <div> 97 <div> 98 <h3 class="title"><a id="permmessagenot"></a>When Not to Manage 99 Permanent Messages</h3> 100 </div> 101 </div> 102 <div></div> 103 </div> 104 <p> 105 There are two reasons why you might 106 choose to not implement permanent messages. 107 In part, these go to why you are using 108 replication in the first place. 109 </p> 110 <p> 111 One class of applications uses replication so that 112 the application can improve transaction 113 through-put. Essentially, the application chooses a 114 reduced transactional durability guarantee so as to 115 avoid the overhead forced by the disk I/O required 116 to flush transaction logs to disk. However, the 117 application can then regain that durability 118 guarantee to a certain degree by replicating the 119 commit to some number of replicas. 120 </p> 121 <p> 122 Using replication to improve an application's 123 transactional commit guarantee is called 124 <span class="emphasis"><em>replicating to the network.</em></span> 125 </p> 126 <p> 127 In extreme cases where performance is of critical 128 importance to the application, the master might 129 choose to both use asynchronous commits 130 <span class="emphasis"><em>and</em></span> decide not to wait for 131 message acknowledgments. In this case the master 132 is simply broadcasting its commit activities to its 133 replicas without waiting for any sort of a reply. An 134 application like this might also choose to use 135 something other than TCP/IP for its network 136 communications since that protocol involves a fair 137 amount of packet acknowledgment all on its own. Of 138 course, this sort of an application should also be 139 very sure about the reliability of both its network and 140 the machines that are hosting its replicas. 141 </p> 142 <p> 143 At the other end of the extreme, there is a 144 class of applications that use replication 145 purely to improve read performance. This sort 146 of application might choose to use synchronous 147 commits on the master because write 148 performance there is not of critical 149 performance. In any case, this kind of an 150 application might not care to know whether its 151 replicas have received and successfully handled 152 permanent messages because the primary storage 153 location is assumed to be on the master, not 154 the replicas. 155 </p> 156 </div> 157 <div class="sect2" lang="en" xml:lang="en"> 158 <div class="titlepage"> 159 <div> 160 <div> 161 <h3 class="title"><a id="permmanage"></a>Managing Permanent Messages</h3> 162 </div> 163 </div> 164 <div></div> 165 </div> 166 <p> 167 With the exception of a rare breed of 168 replicated applications, most masters need some 169 view as to whether commits are occurring on 170 replicas as expected. At a minimum, this is because 171 masters will not flush their log buffers unless 172 they have reason to expect that permanent 173 messages have not been committed on the 174 replicas. 175 </p> 176 <p> 177 That said, it is important to remember that 178 managing permanent messages involves a fair amount 179 of network traffic. The messages must be sent to 180 the replicas and the replicas must then acknowledge 181 the message. This represents a performance overhead 182 that can be worsened by congested networks or 183 outright outages. 184 </p> 185 <p> 186 Therefore, when managing permanent messages, you 187 must first decide on how many of your replicas must 188 send acknowledgments before your master decides 189 that all is well and it can continue normal 190 operations. When making this decision, you could 191 decide that <span class="emphasis"><em>all</em></span> replicas must 192 send acknowledgments. But unless you have only one 193 or two replicas, or you are replicating over a very 194 fast and reliable network, this policy could prove 195 very harmful to your application's performance. 196 </p> 197 <p> 198 Therefore, a common strategy is to wait for an 199 acknowledgment from a simple majority of replicas. 200 This ensures that commit activity has occurred on 201 enough machines that you can be reliably certain 202 that data writes are preserved across your network. 203 </p> 204 <p> 205 Remember that replicas that do not acknowledge a 206 permanent message are not necessarily unable to 207 perform the commit; it might be that network 208 problems have simply resulted in a delay at the 209 replica. In any case, the underlying DB 210 replication code is written such that a replica that 211 falls behind the master will eventually take action 212 to catch up. 213 </p> 214 <p> 215 Depending on your application, it may be 216 possible for you to code your permanent message 217 handling such that acknowledgment must come 218 from only one or two replicas. This is a 219 particularly attractive strategy if you are 220 closely managing which machines are eligible to 221 become masters. Assuming that you have one or 222 two machines designated to be a master in the 223 event that the current master goes down, you 224 may only want to receive acknowledgments from 225 those specific machines. 226 </p> 227 <p> 228 Finally, beyond simple message acknowledgment, you 229 also need to implement an acknowledgment timeout 230 for your application. This timeout value is simply 231 meant to ensure that your master does not hang 232 indefinitely waiting for responses that will never 233 come because a machine or router is down. 234 </p> 235 </div> 236 <div class="sect2" lang="en" xml:lang="en"> 237 <div class="titlepage"> 238 <div> 239 <div> 240 <h3 class="title"><a id="permimplement"></a>Implementing Permanent 241 Message Handling</h3> 242 </div> 243 </div> 244 <div></div> 245 </div> 246 <p> 247 How you implement permanent message handling 248 depends on which API you are using to implement 249 replication. If you are using the replication framework, then 250 permanent message handling is configured using 251 policies that you specify to the framework. In 252 this case, you can configure your application 253 to: 254 </p> 255 <div class="itemizedlist"> 256 <ul type="disc"> 257 <li> 258 <p> 259 Ignore permanent messages (the master 260 does not wait for acknowledgments). 261 </p> 262 </li> 263 <li> 264 <p> 265 Require acknowledgments from a 266 quorum. A quorum is reached when 267 acknowledgments are received from the 268 minimum number of electable 269 replicas needed to ensure that 270 the record remains durable if 271 an election is held. 272 </p> 273 <p> 274 The goal here is to be 275 absolutely sure the record is 276 durable. The master wants to 277 hear from enough electable 278 replicas that they have 279 committed the record so that if 280 an election is held, the master 281 knows the record will exist even 282 if a new master is selected. 283 </p> 284 <p> 285 This is the default policy. 286 </p> 287 </li> 288 <li> 289 <p> 290 Require an acknowledgment from at least one replica. 291 </p> 292 </li> 293 <li> 294 <p> 295 Require acknowledgments from 296 all replicas. 297 </p> 298 </li> 299 <li> 300 <p> 301 Require an acknowledgment from a 302 peer. (The replication framework allows you to 303 designate one environment as a peer of 304 another). 305 </p> 306 </li> 307 <li> 308 <p> 309 Require acknowledgments from 310 all peers. 311 </p> 312 </li> 313 </ul> 314 </div> 315 <p> 316 Note that the replication framework simply flushes its transaction 317 logs and moves on if a permanent message is not 318 sufficiently acknowledged. 319 </p> 320 <p> 321 For details on permanent message handling with the 322 replication framework, see <a href="fwrkpermmessage.html">Permanent Message Handling</a>. 323 </p> 324 <p> 325 If these policies are not sufficient for your 326 needs, or if you want your application to take more 327 corrective action than simply flushing log buffers 328 in the event of an unsuccessful commit, then you 329 must use write a custom replication implementation. 330 </p> 331 <p> 332 For custom replication implementation, messages are 333 sent from the master to its replica using a 334 <tt class="function">send()</tt> callback that you 335 implement. Note, however, that DB's replication 336 code automatically sets the permanent 337 flag for you where appropriate. 338 </p> 339 <p> 340 If the <tt class="function">send()</tt> callback returns with a 341 non-zero status, DB flushes the transaction log 342 buffers for you. Therefore, you must cause your 343 <tt class="function">send()</tt> callback to block waiting 344 for acknowledgments from your replicas. 345 As a part of implementing the 346 <tt class="function">send()</tt> callback, you implement 347 your permanent message handling policies. This 348 means that you identify how many replicas must 349 acknowledge the message before the callback can 350 return <tt class="literal">0</tt>. You must also 351 implement the acknowledgment timeout, if any. 352 </p> 353 <p> 354 Further, message acknowledgments are sent from the 355 replicas to the master using a communications 356 channel that you implement (the replication code 357 does not provide a channel for acknowledgments). 358 So implementing permanent messages means that when 359 you write your replication communications channel, 360 you must also write it in such a way as to also 361 handle permanent message acknowledgments. 362 </p> 363 <p> 364 For more information on implementing permanent 365 message handling using a custom replication layer, 366 see the <i class="citetitle">Berkeley DB Programmer's Reference Guide</i>. 367 </p> 368 </div> 369 </div> 370 <div class="navfooter"> 371 <hr /> 372 <table width="100%" summary="Navigation footer"> 373 <tr> 374 <td width="40%" align="left"><a accesskey="p" href="elections.html">Prev</a>��</td> 375 <td width="20%" align="center"> 376 <a accesskey="u" href="introduction.html">Up</a> 377 </td> 378 <td width="40%" align="right">��<a accesskey="n" href="txnapp.html">Next</a></td> 379 </tr> 380 <tr> 381 <td width="40%" align="left" valign="top">Holding Elections��</td> 382 <td width="20%" align="center"> 383 <a accesskey="h" href="index.html">Home</a> 384 </td> 385 <td width="40%" align="right" valign="top">��Chapter��2.��Transactional Application</td> 386 </tr> 387 </table> 388 </div> 389 </body> 390</html> 391