1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>Designing Your Application for Recovery</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> 9 <link rel="start" href="index.html" title="Getting Started with Berkeley DB Transaction Processing" /> 10 <link rel="up" href="filemanagement.html" title="Chapter 5. Managing DB Files" /> 11 <link rel="prev" href="recovery.html" title="Recovery Procedures" /> 12 <link rel="next" href="hotfailover.html" title="Using Hot Failovers" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">Designing Your Application for Recovery</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td> 22 <th width="60%" align="center">Chapter 5. Managing DB Files</th> 23 <td width="20%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td> 24 </tr> 25 </table> 26 <hr /> 27 </div> 28 <div class="sect1" lang="en" xml:lang="en"> 29 <div class="titlepage"> 30 <div> 31 <div> 32 <h2 class="title" style="clear: both"><a id="architectrecovery"></a>Designing Your Application for Recovery</h2> 33 </div> 34 </div> 35 </div> 36 <div class="toc"> 37 <dl> 38 <dt> 39 <span class="sect2"> 40 <a href="architectrecovery.html#multithreadrecovery">Recovery for Multi-Threaded Applications</a> 41 </span> 42 </dt> 43 <dt> 44 <span class="sect2"> 45 <a href="architectrecovery.html#multiprocessrecovery">Recovery in Multi-Process Applications</a> 46 </span> 47 </dt> 48 </dl> 49 </div> 50 <p> 51 When building your DB application, you should consider how you will run recovery. If you are building a 52 single threaded, single process application, it is fairly simple to run recovery when your application first 53 opens its environment. In this case, you need only decide if you want to run recovery every time you open 54 your application (recommended) or only some of the time, presumably triggered by a start up option 55 controlled by your application's user. 56 </p> 57 <p> 58 However, for multi-threaded and multi-process applications, you need to carefully consider how you will 59 design your application's startup code so as to run recovery only when it makes sense to do so. 60 </p> 61 <div class="sect2" lang="en" xml:lang="en"> 62 <div class="titlepage"> 63 <div> 64 <div> 65 <h3 class="title"><a id="multithreadrecovery"></a>Recovery for Multi-Threaded Applications</h3> 66 </div> 67 </div> 68 </div> 69 <p> 70 If your application uses only one environment handle, then handling recovery for a multi-threaded 71 application is no more difficult than for a single threaded application. You simply open the environment 72 in the application's main thread, and then pass that handle to each of the threads that will be 73 performing DB operations. We illustrate this with our final example in this book (see 74 <a class="xref" href="txnexample_c.html" title="Transaction Example">Transaction Example</a> 75 76 77 78 for more information). 79 </p> 80 <p> 81 Alternatively, you can have each worker thread open its own environment handle. However, in this case, 82 designing for recovery is a bit more complicated. 83 </p> 84 <p> 85 Generally, when a thread performing database operations fails 86 or hangs, it is frequently best to simply 87 restart the application and run recovery upon application 88 startup as normal. However, not all applications can afford 89 to restart because a single thread has misbehaved. 90 </p> 91 <p> 92 If you are attempting to continue operations in the face of a misbehaving thread, 93 then at a minimum recovery must be run if a thread performing database operations fails or hangs. 94 </p> 95 <p> 96 Remember that recovery clears the environment of all 97 outstanding locks, including any that might be outstanding 98 from an aborted thread. If these locks are not cleared, 99 other threads performing database operations can back up 100 behind the locks obtained but never cleared by the failed 101 thread. The result will be an application that hangs 102 indefinitely. 103 </p> 104 <p> 105 To run recovery under these circumstances: 106 </p> 107 <div class="orderedlist"> 108 <ol type="1"> 109 <li> 110 <p> 111 Suspend or shutdown all other threads performing 112 database operations. 113 </p> 114 </li> 115 <li> 116 <p> 117 Discarding any open environment handles. Note that 118 attempting to gracefully close these handles may be 119 asking for trouble; the close can fail if the 120 environment is already in need of recovery. For 121 this reason, it is best and easiest to simply discard the handle. 122 </p> 123 </li> 124 <li> 125 <p> 126 Open new handles, running recovery as you open 127 them. 128 See <a class="xref" href="recovery.html#normalrecovery" title="Normal Recovery">Normal Recovery</a> for more information. 129 </p> 130 </li> 131 <li> 132 <p> 133 Restart all your database threads. 134 </p> 135 </li> 136 </ol> 137 </div> 138 <p> 139 A traditional way to handle this activity is to spawn a watcher thread that is responsible for making 140 sure all is well with your threads, and performing the above actions if not. 141 </p> 142 <p> 143 However, in the case where each worker thread opens and maintains its own environment handle, recovery 144 is complicated for two reasons: 145 </p> 146 <div class="orderedlist"> 147 <ol type="1"> 148 <li> 149 <p> 150 For some applications and workloads, it might be 151 worthwhile to give your database threads the 152 ability to gracefully finalize any on-going 153 transactions. If this is the case, your 154 code must be capable of signaling each thread 155 to halt DB activities and close its 156 environment. If you simply run recovery against the 157 environment, your database threads will 158 detect this and fail in the midst of performing their 159 database operations. 160 </p> 161 </li> 162 <li> 163 <p> 164 Your code must be capable of ensuring only one 165 thread runs recovery before allowing all other 166 threads to open their respective environment 167 handles. Recovery should be single threaded because when 168 recovery is run against an environment, it is 169 deleted and then recreated. This will cause all 170 other processes and threads to "fail" when they 171 attempt operations against the newly recovered 172 environment. If all threads run recovery 173 when they start up, then it is likely that some 174 threads will fail because the environment that they 175 are using has been recovered. This will cause the thread to have to re-execute its own recovery 176 path. At best, this is inefficient and at worst it could cause your application to fall into an 177 endless recovery pattern. 178 </p> 179 </li> 180 </ol> 181 </div> 182 </div> 183 <div class="sect2" lang="en" xml:lang="en"> 184 <div class="titlepage"> 185 <div> 186 <div> 187 <h3 class="title"><a id="multiprocessrecovery"></a>Recovery in Multi-Process Applications</h3> 188 </div> 189 </div> 190 </div> 191 <p> 192 Frequently, DB applications use multiple processes to interact with the databases. For example, you may 193 have a long-running process, such as some kind of server, and then a series of administrative tools that 194 you use to inspect and administer the underlying databases. Or, in some web-based architectures, different 195 services are run as independent processes that are managed by the server. 196 </p> 197 <p> 198 In any case, recovery for a multi-process environment is complicated for two reasons: 199 </p> 200 <div class="orderedlist"> 201 <ol type="1"> 202 <li> 203 <p> 204 In the event that recovery must be run, you might 205 want to notify processes interacting with the environment 206 that recovery is about to occur and give them a 207 chance to gracefully terminate. Whether it is 208 worthwhile for you to do this is entirely dependent 209 upon the nature of your application. Some 210 long-running applications with multiple processes 211 performing meaningful work might want to do this. 212 Other applications with processes performing database 213 operations that are likely to be harmed by error conditions in other 214 processes will likely find it to be not worth the 215 effort. For this latter group, the chances of 216 performing a graceful shutdown may be low anyway. 217 </p> 218 </li> 219 <li> 220 <p> 221 Unlike single process scenarios, it can quickly become wasteful for every process interacting 222 with the databases to run recovery when it starts up. This is partly because recovery 223 <span class="emphasis"><em>does</em></span> take some amount of time to run, but mostly you want to 224 avoid a situation where your server must 225 reopen all its environment handles just because you fire up a command line database 226 administrative utility that always runs recovery. 227 </p> 228 </li> 229 </ol> 230 </div> 231 <p> 232 DB offers you two methods by which you can manage recovery for multi-process DB applications. 233 Each has different strengths and weaknesses, and they are described in the next sections. 234 </p> 235 <div class="sect3" lang="en" xml:lang="en"> 236 <div class="titlepage"> 237 <div> 238 <div> 239 <h4 class="title"><a id="mp_recover_effects"></a>Effects of Multi-Process Recovery</h4> 240 </div> 241 </div> 242 </div> 243 <p> 244 Before continuing, it is worth noting that the following sections describe recovery processes than 245 can result in one process running recovery while other processes are currently actively performing 246 database operations. 247 </p> 248 <p> 249 When this happens, the current database operation will 250 abnormally fail, indicating a DB_RUNRECOVERY condition. 251 This means that your application should immediately abandon any database operations that it may have 252 on-going, discard any environment handles it has opened, and obtain and open new handles. 253 </p> 254 <p> 255 The net effect of this is that any writes performed by unresolved transactions will be lost. 256 For persistent applications (servers, for example), the services it provides will also be 257 unavailable for the amount of time that it takes to complete a recovery and for all participating 258 processes to reopen their environment handles. 259 </p> 260 </div> 261 <div class="sect3" lang="en" xml:lang="en"> 262 <div class="titlepage"> 263 <div> 264 <div> 265 <h4 class="title"><a id="db_register"></a>Process Registration</h4> 266 </div> 267 </div> 268 </div> 269 <p> 270 One way to handle multi-process recovery is for every process to "register" its environment. In 271 doing so, the process gains the ability to see if any other applications are using the 272 environment and, if so, whether they have suffered an abnormal termination. If an abnormal 273 termination is detected, the process runs recovery; otherwise, it does not. 274 </p> 275 <p> 276 Note that using process registration also ensures that 277 recovery is serialized across applications. That is, 278 only one process at a time has a chance to run 279 recovery. Generally this means that the first process 280 to start up will run recovery, and all other processes 281 will silently not run recovery because it is not 282 needed. 283 </p> 284 <p> 285 To cause your application to register its environment, you specify 286 <span> 287 the <code class="literal">DB_REGISTER</code> flag when you open your environment. 288 You may also specify <code class="literal">DB_RECOVER</code>. However, it is an error to specify 289 <code class="literal">DB_RECOVER_FATAL</code> when using 290 the <code class="literal">DB_REGISTER</code> flag. 291 </span> 292 293 294 If during the open, DB determines that recovery must be run, it will automatically run the correct 295 type of recovery for you, so long as you specify normal recovery 296 on your environment open. If you do not specify normal recovery, and you register your environment, 297 then no recovery is run if the registration process identifies a need for it. In this case, 298 the environment open simply fails by 299 <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span> 300 301 </p> 302 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> 303 <h3 class="title">Note</h3> 304 <p> 305 If you do not specify normal recovery when you open your first registered environment 306 in the application, then that application will fail the environment open by 307 <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span> 308 309 This is because the first process to register must create an internal 310 registration file, and recovery is forced when that file is created. To 311 avoid an abnormal termination of the environment open, specify recovery on 312 the environment open for at least the first process starting in your 313 application. 314 </p> 315 </div> 316 <p> 317 In addition, if you specify <code class="literal">DB_ENV_FAILCHK</code> 318 when you register your environment, then a fail check is performed on 319 environment open (fail checks are described in the next section). If, during the 320 fail check process, an abnormal termination is detected for any of the processes 321 involved in the application, DB releases any read locks held by the dead 322 process and performs transaction aborts as necessary. This is done in an attempt 323 to clean up the environment. 324 </p> 325 <p> 326 In this situation, if a general cleanup of the 327 environment is not possible and normal recovery is not specified on environment 328 open, then the open will abort, 329 <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span> 330 331 However, if this situation occurs and recovery was specified, then the appropriate type of recovery 332 (normal or fatal) is run so as to bring the environment back to a healthy state. 333 </p> 334 <p> 335 Be aware that there are some limitations/requirements if you want your various processes to 336 coordinate recovery using registration: 337 </p> 338 <div class="orderedlist"> 339 <ol type="1"> 340 <li> 341 <p> 342 There can be only one environment handle per 343 environment per process. In the case of multi-threaded 344 processes, the environment handle must be shared across threads. 345 </p> 346 </li> 347 <li> 348 <p> 349 All processes sharing the environment must use registration. If registration is 350 not uniformly used across all participating processes, then you can see inconsistent results 351 in terms of your application's ability to recognize that recovery must be run. 352 </p> 353 </li> 354 </ol> 355 </div> 356 </div> 357 <div class="sect3" lang="en" xml:lang="en"> 358 <div class="titlepage"> 359 <div> 360 <div> 361 <h4 class="title"><a id="failchk"></a>Failure Checking</h4> 362 </div> 363 </div> 364 </div> 365 <p> 366 For very large and robust multi-process applications, the most common way to ensure all the 367 processes are working as intended is to make use of a watchdog process. To assist a watchdog 368 process, DB offers a failure checking mechanism. 369 </p> 370 <p> 371 When a thread of control fails with open environment handles, the result is that there may be 372 resources left locked or corrupted. Other threads of control may encountered these unavailable resources 373 quickly or not at all, depending on data access patterns. 374 </p> 375 <p> 376 In any case, the DB failure checking mechanism allows a watchdog to detect if an environment is 377 unusable as a result of a thread of control failure. It should be called periodically 378 (for example, once a minute) from the watchdog process. If the environment is deemed unusable, then 379 the watchdog process is notified that recovery should be run. It is then up to the watchdog to 380 actually run recovery. It is also the watchdog's responsibility to decide what to do about currently 381 running processes before running recovery. The watchdog could, for example, attempt to 382 gracefully shutdown or kill all relevant processes before running recovery. 383 </p> 384 <p> 385 Note that failure checking need not be run from a separate process, although conceptually that is 386 how the mechanism is meant to be used. This same mechanism could be used in a multi-threaded 387 application that wants to have a watchdog thread. 388 </p> 389 <p> 390 To use failure checking you must: 391 </p> 392 <div class="orderedlist"> 393 <ol type="1"> 394 <li> 395 <p> 396 <span> 397 Provide an <code class="function">is_alive()</code> call back using the 398 399 <code class="methodname">Dbenv::set_isalive()</code> 400 method. 401 </span> 402 403 404 DB uses this method to determine whether a specified process and thread 405 is alive when the failure checking is performed. 406 </p> 407 </li> 408 <li> 409 <p> 410 Possibly provide a 411 412 <span> 413 <code class="literal">thread_id</code> callback 414 </span> 415 416 417 418 419 that uniquely identifies a process 420 and thread of control. This 421 <span>callback</span> 422 423 424 425 is only necessary if the standard process and thread 426 identification functions for your platform are not sufficient to for use by failure 427 checking. This is rarely necessary and is usually because the thread and/or process ids 428 used by your system cannot fit into an unsigned integer. 429 </p> 430 <p> 431 You provide this callback using the 432 433 <code class="methodname">DbEnv::set_thread_id()</code> 434 method. See the API reference for this method for more information on when setting a thread 435 id callback might be necessary. 436 </p> 437 </li> 438 <li> 439 <p> 440 Call the 441 442 <code class="methodname">DbEnv::failchk()</code> 443 444 445 446 method periodically. You can do this either periodically (once per minute, for example), or 447 whenever a thread of control exits for your application. 448 </p> 449 <p> 450 If this method determines that a thread of control exited holding read locks, those locks 451 are automatically released. If the thread of control exited with an unresolved transaction, 452 that transaction is aborted. If any other problems exist beyond these such that the 453 environment must be recovered, the method will 454 <span>return <code class="literal">DB_RUNRECOVERY</code>.</span> 455 456 457 </p> 458 </li> 459 </ol> 460 </div> 461 </div> 462 </div> 463 </div> 464 <div class="navfooter"> 465 <hr /> 466 <table width="100%" summary="Navigation footer"> 467 <tr> 468 <td width="40%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td> 469 <td width="20%" align="center"> 470 <a accesskey="u" href="filemanagement.html">Up</a> 471 </td> 472 <td width="40%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td> 473 </tr> 474 <tr> 475 <td width="40%" align="left" valign="top">Recovery Procedures </td> 476 <td width="20%" align="center"> 477 <a accesskey="h" href="index.html">Home</a> 478 </td> 479 <td width="40%" align="right" valign="top"> Using Hot Failovers</td> 480 </tr> 481 </table> 482 </div> 483 </body> 484</html> 485