1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>Designing Your Application for Recovery</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" /> 9 <link rel="home" href="index.html" title="Getting Started with Berkeley DB Transaction Processing" /> 10 <link rel="up" href="filemanagement.html" title="Chapter 5. Managing DB Files" /> 11 <link rel="previous" href="recovery.html" title="Recovery Procedures" /> 12 <link rel="next" href="hotfailover.html" title="Using Hot Failovers" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">Designing Your Application for Recovery</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td> 22 <th width="60%" align="center">Chapter 5. Managing DB Files</th> 23 <td width="20%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td> 24 </tr> 25 </table> 26 <hr /> 27 </div> 28 <div class="sect1" lang="en" xml:lang="en"> 29 <div class="titlepage"> 30 <div> 31 <div> 32 <h2 class="title" style="clear: both"><a id="architectrecovery"></a>Designing Your Application for Recovery</h2> 33 </div> 34 </div> 35 <div></div> 36 </div> 37 <p> 38 When building your DB application, you should consider how you will run recovery. If you are building a 39 single threaded, single process application, it is fairly simple to run recovery when your application first 40 opens its environment. In this case, you need only decide if you want to run recovery every time you open 41 your application (recommended) or only some of the time, presumably triggered by a start up option 42 controlled by your application's user. 43 </p> 44 <p> 45 However, for multi-threaded and multi-process applications, you need to carefully consider how you will 46 design your application's startup code so as to run recovery only when it makes sense to do so. 47 </p> 48 <div class="sect2" lang="en" xml:lang="en"> 49 <div class="titlepage"> 50 <div> 51 <div> 52 <h3 class="title"><a id="multithreadrecovery"></a>Recovery for Multi-Threaded Applications</h3> 53 </div> 54 </div> 55 <div></div> 56 </div> 57 <p> 58 If your application uses only one environment handle, then handling recovery for a multi-threaded 59 application is no more difficult than for a single threaded application. You simply open the environment 60 in the application's main thread, and then pass that handle to each of the threads that will be 61 performing DB operations. We illustrate this with our final example in this book (see 62 <a href="txnexample_c.html">Transaction Example</a> 63 64 65 66 for more information). 67 </p> 68 <p> 69 Alternatively, you can have each worker thread open its own environment handle. However, in this case, 70 designing for recovery is a bit more complicated. 71 </p> 72 <p> 73 Generally, when a thread performing database operations fails 74 or hangs, it is frequently best to simply 75 restart the application and run recovery upon application 76 startup as normal. However, not all applications can afford 77 to restart because a single thread has misbehaved. 78 </p> 79 <p> 80 If you are attempting to continue operations in the face of a misbehaving thread, 81 then at a minimum recovery must be run if a thread performing database operations fails or hangs. 82 </p> 83 <p> 84 Remember that recovery clears the environment of all 85 outstanding locks, including any that might be outstanding 86 from an aborted thread. If these locks are not cleared, 87 other threads performing database operations can back up 88 behind the locks obtained but never cleared by the failed 89 thread. The result will be an application that hangs 90 indefinitely. 91 </p> 92 <p> 93 To run recovery under these circumstances: 94 </p> 95 <div class="orderedlist"> 96 <ol type="1"> 97 <li> 98 <p> 99 Suspend or shutdown all other threads performing 100 database operations. 101 </p> 102 </li> 103 <li> 104 <p> 105 Discarding any open environment handles. Note that 106 attempting to gracefully close these handles may be 107 asking for trouble; the close can fail if the 108 environment is already in need of recovery. For 109 this reason, it is best and easiest to simply discard the handle. 110 </p> 111 </li> 112 <li> 113 <p> 114 Open new handles, running recovery as you open 115 them. 116 See <a href="recovery.html#normalrecovery">Normal Recovery</a> for more information. 117 </p> 118 </li> 119 <li> 120 <p> 121 Restart all your database threads. 122 </p> 123 </li> 124 </ol> 125 </div> 126 <p> 127 A traditional way to handle this activity is to spawn a watcher thread that is responsible for making 128 sure all is well with your threads, and performing the above actions if not. 129 </p> 130 <p> 131 However, in the case where each worker thread opens and maintains its own environment handle, recovery 132 is complicated for two reasons: 133 </p> 134 <div class="orderedlist"> 135 <ol type="1"> 136 <li> 137 <p> 138 For some applications and workloads, it might be 139 worthwhile to give your database threads the 140 ability to gracefully finalize any on-going 141 transactions. If this is the case, your 142 code must be capable of signaling each thread 143 to halt DB activities and close its 144 environment. If you simply run recovery against the 145 environment, your database threads will 146 detect this and fail in the midst of performing their 147 database operations. 148 </p> 149 </li> 150 <li> 151 <p> 152 Your code must be capable of ensuring only one 153 thread runs recovery before allowing all other 154 threads to open their respective environment 155 handles. Recovery should be single threaded because when 156 recovery is run against an environment, it is 157 deleted and then recreated. This will cause all 158 other processes and threads to "fail" when they 159 attempt operations against the newly recovered 160 environment. If all threads run recovery 161 when they start up, then it is likely that some 162 threads will fail because the environment that they 163 are using has been recovered. This will cause the thread to have to re-execute its own recovery 164 path. At best, this is inefficient and at worst it could cause your application to fall into an 165 endless recovery pattern. 166 </p> 167 </li> 168 </ol> 169 </div> 170 </div> 171 <div class="sect2" lang="en" xml:lang="en"> 172 <div class="titlepage"> 173 <div> 174 <div> 175 <h3 class="title"><a id="multiprocessrecovery"></a>Recovery in Multi-Process Applications</h3> 176 </div> 177 </div> 178 <div></div> 179 </div> 180 <p> 181 Frequently, DB applications use multiple processes to interact with the databases. For example, you may 182 have a long-running process, such as some kind of server, and then a series of administrative tools that 183 you use to inspect and administer the underlying databases. Or, in some web-based architectures, different 184 services are run as independent processes that are managed by the server. 185 </p> 186 <p> 187 In any case, recovery for a multi-process environment is complicated for two reasons: 188 </p> 189 <div class="orderedlist"> 190 <ol type="1"> 191 <li> 192 <p> 193 In the event that recovery must be run, you might 194 want to notify processes interacting with the environment 195 that recovery is about to occur and give them a 196 chance to gracefully terminate. Whether it is 197 worthwhile for you to do this is entirely dependent 198 upon the nature of your application. Some 199 long-running applications with multiple processes 200 performing meaningful work might want to do this. 201 Other applications with processes performing database 202 operations that are likely to be harmed by error conditions in other 203 processes will likely find it to be not worth the 204 effort. For this latter group, the chances of 205 performing a graceful shutdown may be low anyway. 206 </p> 207 </li> 208 <li> 209 <p> 210 Unlike single process scenarios, it can quickly become wasteful for every process interacting 211 with the databases to run recovery when it starts up. This is partly because recovery 212 <span class="emphasis"><em>does</em></span> take some amount of time to run, but mostly you want to 213 avoid a situation where your server must 214 reopen all its environment handles just because you fire up a command line database 215 administrative utility that always runs recovery. 216 </p> 217 </li> 218 </ol> 219 </div> 220 <p> 221 DB offers you two methods by which you can manage recovery for multi-process DB applications. 222 Each has different strengths and weaknesses, and they are described in the next sections. 223 </p> 224 <div class="sect3" lang="en" xml:lang="en"> 225 <div class="titlepage"> 226 <div> 227 <div> 228 <h4 class="title"><a id="mp_recover_effects"></a>Effects of Multi-Process Recovery</h4> 229 </div> 230 </div> 231 <div></div> 232 </div> 233 <p> 234 Before continuing, it is worth noting that the following sections describe recovery processes than 235 can result in one process running recovery while other processes are currently actively performing 236 database operations. 237 </p> 238 <p> 239 When this happens, the current database operation will 240 abnormally fail, indicating a DB_RUNRECOVERY condition. 241 This means that your application should immediately abandon any database operations that it may have 242 on-going, discard any environment handles it has opened, and obtain and open new handles. 243 </p> 244 <p> 245 The net effect of this is that any writes performed by unresolved transactions will be lost. 246 For persistent applications (servers, for example), the services it provides will also be 247 unavailable for the amount of time that it takes to complete a recovery and for all participating 248 processes to reopen their environment handles. 249 </p> 250 </div> 251 <div class="sect3" lang="en" xml:lang="en"> 252 <div class="titlepage"> 253 <div> 254 <div> 255 <h4 class="title"><a id="db_register"></a>Process Registration</h4> 256 </div> 257 </div> 258 <div></div> 259 </div> 260 <p> 261 One way to handle multi-process recovery is for every process to "register" its environment. In 262 doing so, the process gains the ability to see if any other applications are using the 263 environment and, if so, whether they have suffered an abnormal termination. If an abnormal 264 termination is detected, the process runs recovery; otherwise, it does not. 265 </p> 266 <p> 267 Note that using process registration also ensures that 268 recovery is serialized across applications. That is, 269 only one process at a time has a chance to run 270 recovery. Generally this means that the first process 271 to start up will run recovery, and all other processes 272 will silently not run recovery because it is not 273 needed. 274 </p> 275 <p> 276 To cause your application to register its environment, you specify 277 <span> 278 the <tt class="literal">DB_REGISTER</tt> flag when you open your environment. 279 Note that you must also specify <tt class="literal">DB_RECOVER</tt> or 280 <tt class="literal">DB_RECOVER_FATAL</tt> for your environment open. 281 </span> 282 283 284 If during the open, DB determines that recovery must be run, this indicates the type of 285 recovery that is run. If you do not specify either type of recovery, then no recovery is run if 286 the registration process identifies a need for it. In this case, the environment open simply 287 fails by 288 <span>returning <tt class="literal">DB_RUNRECOVERY</tt>.</span> 289 290 </p> 291 <p> 292 Be aware that there are some limitations/requirements if you want your various processes to 293 coordinate recovery using this registration process: 294 </p> 295 <div class="orderedlist"> 296 <ol type="1"> 297 <li> 298 <p> 299 There can be only one environment handle per 300 environment per process. In the case of multi-threaded 301 processes, the environment handle must be shared across threads. 302 </p> 303 </li> 304 <li> 305 <p> 306 All processes sharing the environment must use registration. If registration is 307 not uniformly used across all participating processes, then you can see inconsistent results 308 in terms of your application's ability to recognize that recovery must be run. 309 </p> 310 </li> 311 <li> 312 <p> 313 You can not use this mechanism with the <tt class="methodname">failchk()</tt> 314 mechanism 315 described in the next section. 316 </p> 317 </li> 318 </ol> 319 </div> 320 </div> 321 <div class="sect3" lang="en" xml:lang="en"> 322 <div class="titlepage"> 323 <div> 324 <div> 325 <h4 class="title"><a id="failchk"></a>Failure Checking</h4> 326 </div> 327 </div> 328 <div></div> 329 </div> 330 <p> 331 For very large and robust multi-process applications, the most common way to ensure all the 332 processes are working as intended is to make use of a watchdog process. To assist a watchdog 333 process, DB offers a failure checking mechanism. 334 </p> 335 <p> 336 When a thread of control fails with open environment handles, the result is that there may be 337 resources left locked or corrupted. Other threads of control may encountered these unavailable resources 338 quickly or not at all, depending on data access patterns. 339 </p> 340 <p> 341 In any case, the DB failure checking mechanism allows a watchdog to detect if an environment is 342 unusable as a result of a thread of control failure. It should be called periodically 343 (for example, once a minute) from the watchdog process. If the environment is deemed unusable, then 344 the watchdog process is notified that recovery should be run. It is then up to the watchdog to 345 actually run recovery. It is also the watchdog's responsibility to decide what to do about currently 346 running processes before running recovery. The watchdog could, for example, attempt to 347 gracefully shutdown or kill all relevant processes before running recovery. 348 </p> 349 <p> 350 Note that failure checking need not be run from a separate process, although conceptually that is 351 how the mechanism is meant to be used. This same mechanism could be used in a multi-threaded 352 application that wants to have a watchdog thread. 353 </p> 354 <p> 355 To use failure checking you must: 356 </p> 357 <div class="orderedlist"> 358 <ol type="1"> 359 <li> 360 <p> 361 <span> 362 Provide an <tt class="function">is_alive()</tt> call back using the 363 <tt class="methodname">DB_ENV->set_isalive()</tt> 364 365 method. 366 </span> 367 368 DB uses this method to determine whether a specified process and thread 369 is alive when the failure checking is performed. 370 </p> 371 </li> 372 <li> 373 <p> 374 Possibly provide a 375 376 <span> 377 <tt class="literal">thread_id</tt> callback 378 </span> 379 380 381 382 that uniquely identifies a process 383 and thread of control. This 384 <span>callback</span> 385 386 is only necessary if the standard process and thread 387 identification functions for your platform are not sufficient to for use by failure 388 checking. This is rarely necessary and is usually because the thread and/or process ids 389 used by your system cannot fit into an unsigned integer. 390 </p> 391 <p> 392 You provide this callback using the 393 <tt class="methodname">DB_ENV->set_thread_id()</tt> 394 395 method. See the API reference for this method for more information on when setting a thread 396 id callback might be necessary. 397 </p> 398 </li> 399 <li> 400 <p> 401 Call the 402 <tt class="methodname">DB_ENV->failchk()</tt> 403 404 405 method periodically. You can do this either periodically (once per minute, for example), or 406 whenever a thread of control exits for your application. 407 </p> 408 <p> 409 If this method determines that a thread of control exited holding read locks, those locks 410 are automatically released. If the thread of control exited with an unresolved transaction, 411 that transaction is aborted. If any other problems exist beyond these such that the 412 environment must be recovered, the method will 413 <span>return <tt class="literal">DB_RUNRECOVERY</tt>.</span> 414 415 </p> 416 </li> 417 </ol> 418 </div> 419 <p> 420 Note that this mechanism should not be mixed with the process registration method of multi-process 421 recovery described in the previous section. 422 </p> 423 </div> 424 </div> 425 </div> 426 <div class="navfooter"> 427 <hr /> 428 <table width="100%" summary="Navigation footer"> 429 <tr> 430 <td width="40%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td> 431 <td width="20%" align="center"> 432 <a accesskey="u" href="filemanagement.html">Up</a> 433 </td> 434 <td width="40%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td> 435 </tr> 436 <tr> 437 <td width="40%" align="left" valign="top">Recovery Procedures </td> 438 <td width="20%" align="center"> 439 <a accesskey="h" href="index.html">Home</a> 440 </td> 441 <td width="40%" align="right" valign="top"> Using Hot Failovers</td> 442 </tr> 443 </table> 444 </div> 445 </body> 446</html> 447