1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>Architecting Transactional Data Store applications</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> 9 <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> 10 <link rel="up" href="transapp.html" title="Chapter 11. Berkeley DB Transactional Data Store Applications" /> 11 <link rel="prev" href="transapp_fail.html" title="Handling failure in Transactional Data Store applications" /> 12 <link rel="next" href="transapp_env_open.html" title="Opening the environment" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">Architecting Transactional Data Store applications</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td> 22 <th width="60%" align="center">Chapter 11. 23 Berkeley DB Transactional Data Store Applications 24 </th> 25 <td width="20%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td> 26 </tr> 27 </table> 28 <hr /> 29 </div> 30 <div class="sect1" lang="en" xml:lang="en"> 31 <div class="titlepage"> 32 <div> 33 <div> 34 <h2 class="title" style="clear: both"><a id="transapp_app"></a>Architecting Transactional Data Store applications</h2> 35 </div> 36 </div> 37 </div> 38 <p> 39 When building Transactional Data Store applications, the architecture 40 decisions involve application startup (running recovery) and handling 41 system or application failure. For details on performing recovery, see 42 the <a class="xref" href="transapp_recovery.html" title="Recovery procedures">Recovery procedures</a>. 43</p> 44 <p> 45 Recovery in a database environment is a single-threaded procedure, that 46 is, one thread of control or process must complete database environment 47 recovery before any other thread of control or process operates in the 48 Berkeley DB environment. 49</p> 50 <p> 51 Performing recovery first marks any existing database environment as 52 "failed" and then removes it, causing threads of control running in the 53 database environment to fail and return to the application. This 54 feature allows applications to recover environments without concern for 55 threads of control that might still be running in the removed 56 environment. The subsequent re-creation of the database environment is 57 serialized, so multiple threads of control attempting to create a 58 database environment will serialize behind a single creating thread. 59</p> 60 <p> 61 One consideration in removing (as part of recovering) a database 62 environment which may be in use by another thread, is the type of mutex 63 being used by the Berkeley DB library. In the case of database 64 environment failure when using test-and-set mutexes, threads of control 65 waiting on a mutex when the environment is marked "failed" will quickly 66 notice the failure and will return an error from the Berkeley DB API. 67 In the case of environment failure when using blocking mutexes, where 68 the underlying system mutex implementation does not unblock mutex 69 waiters after the thread of control holding the mutex dies, threads 70 waiting on a mutex when an environment is recovered might hang forever. 71 Applications blocked on events (for example, an application blocked on 72 a network socket, or a GUI event) may also fail to notice environment 73 recovery within a reasonable amount of time. Systems with such mutex 74 implementations are rare, but do exist; applications on such systems 75 should use an application architecture where the thread recovering the 76 database environment can explicitly terminate any process using the 77 failed environment, or configure Berkeley DB for test-and-set mutexes, 78 or incorporate some form of long-running timer or watchdog process to 79 wake or kill blocked processes should they block for too long. 80</p> 81 <p> 82 Regardless, it makes little sense for multiple threads of control to 83 simultaneously attempt recovery of a database environment, since the 84 last one to run will remove all database environments created by the 85 threads of control that ran before it. However, for some applications, 86 it may make sense for applications to have a single thread of control 87 that performs recovery and then removes the database environment, after 88 which the application launches a number of processes, any of which will 89 create the database environment and continue forward. 90</p> 91 <p> 92 There are three common ways to architect Berkeley DB Transactional Data 93 Store applications. The one chosen is usually based on whether or not 94 the application is comprised of a single process or group of processes 95 descended from a single process (for example, a server started when the 96 system first boots), or if the application is comprised of unrelated 97 processes (for example, processes started by web connections or users 98 logged into the system). 99</p> 100 <div class="orderedlist"> 101 <ol type="1"> 102 <li> 103 <p> 104 The first way to architect Transactional Data Store 105 applications is as a single process (the process may or may not 106 be multithreaded.) 107 </p> 108 <p> 109 When this process starts, it runs recovery on the database 110 environment and then opens its databases. The application can 111 subsequently create new threads as it chooses. Those threads 112 can either share already open Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and <a href="../api_reference/C/db.html" class="olink">DB</a> 113 handles, or create their own. In this architecture, databases 114 are rarely opened or closed when more than a single thread of 115 control is running; that is, they are opened when only a single 116 thread is running, and closed after all threads but one have 117 exited. The last thread of control to exit closes the 118 databases and the database environment. 119 </p> 120 <p> 121 This architecture is simplest to implement because thread 122 serialization is easy and failure detection does not require 123 monitoring multiple processes. 124 </p> 125 <p> 126 If the application's thread model allows processes to continue 127 after thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method can be used to 128 determine if the database environment is usable after thread 129 failure. If the application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or 130 <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns 131 <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, 132 the application must 133 behave as if there has been a system failure, performing 134 recovery and re-creating the database environment. Once these 135 actions have been taken, other threads of control can continue 136 (as long as all existing Berkeley DB handles are first 137 discarded). 138 </p> 139 </li> 140 <li> 141 <p> 142 The second way to architect Transactional Data Store 143 applications is as a group of related processes (the processes 144 may or may not be multithreaded). 145 </p> 146 <p> 147 This architecture requires the order in which threads of control are 148 created be controlled to serialize database environment recovery. 149 </p> 150 <p> 151 In addition, this architecture requires that threads of control 152 be monitored. If any thread of control exits with open 153 Berkeley DB handles, the application may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> 154 method to detect lost mutexes and locks and determine if the 155 application can continue. If the application does not call 156 <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the database 157 environment can no longer be used, the application must behave 158 as if there has been a system failure, performing recovery and 159 creating a new database environment. Once these actions have 160 been taken, other threads of control can be continued (as long 161 as all existing Berkeley DB handles are first discarded), or 162 163 </p> 164 <p> 165 The easiest way to structure groups of related processes is to 166 first create a single "watcher" process (often a script) that 167 starts when the system first boots, runs recovery on the 168 database environment and then creates the processes or threads 169 that will actually perform work. The initial thread has no 170 further responsibilities other than to wait on the threads of 171 control it has started, to ensure none of them unexpectedly 172 exit. If a thread of control exits, the watcher process 173 optionally calls the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. If the application 174 does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the 175 environment can no longer be used, the watcher kills all of the 176 threads of control using the failed environment, runs recovery, 177 and starts new threads of control to perform work. 178 </p> 179 </li> 180 <li> 181 <p> 182 The third way to architect Transactional Data Store 183 applications is as a group of unrelated processes (the 184 processes may or may not be multithreaded). This is the most 185 difficult architecture to implement because of the level of 186 difficulty in some systems of finding and monitoring unrelated 187 processes. There are several possible techniques to implement 188 this architecture. 189 </p> 190 <p> 191 One solution is to log a thread of control ID when a new 192 Berkeley DB handle is opened. For example, an initial 193 "watcher" process could run recovery on the database 194 environment and then create a sentinel file. Any "worker" 195 process wanting to use the environment would check for the 196 sentinel file. If the sentinel file does not exist, the worker 197 would fail or wait for the sentinel file to be created. Once 198 the sentinel file exists, the worker would register its process 199 ID with the watcher (via shared memory, IPC or some other 200 registry mechanism), and then the worker would open its 201 <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed. When the worker finishes 202 using the environment, it would unregister its process ID with 203 the watcher. The watcher periodically checks to ensure that no 204 worker has failed while using the environment. If a worker 205 fails while using the environment, the watcher removes the 206 sentinel file, kills all of the workers currently using the 207 environment, runs recovery on the environment, and finally 208 creates a new sentinel file. 209 </p> 210 <p> 211 The weakness of this approach is that, on some systems, it is 212 difficult to determine if an unrelated process is still 213 running. For example, POSIX systems generally disallow sending 214 signals to unrelated processes. The trick to monitoring 215 unrelated processes is to find a system resource held by the 216 process that will be modified if the process dies. On POSIX 217 systems, flock- or fcntl-style locking will work, as will 218 LockFile on Windows systems. Other systems may have to use 219 other process-related information such as file reference counts 220 or modification times. In the worst case, threads of control 221 can be required to periodically re-register with the watcher 222 process: if the watcher has not heard from a thread of control 223 in a specified period of time, the watcher will take action, 224 recovering the environment. 225 </p> 226 <p> 227 The Berkeley DB library includes one built-in implementation of this approach, 228 the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> method's <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag: 229 </p> 230 <p> 231 If the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is set, each process opening the 232 database environment first checks to see if recovery needs to 233 be performed. If recovery needs to be performed for any reason 234 (including the initial creation of the database environment), 235 and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is also specified, recovery will be performed 236 and then the open will proceed normally. If recovery needs to 237 be performed and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not specified, 238 <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a> 239 will be returned. If recovery does not need to be performed, 240 <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> will be ignored. 241 </p> 242 <p> 243 Prior to the actual recovery beginning, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_PANIC" class="olink">DB_EVENT_REG_PANIC</a> 244 event is set for the environment. Processes in the application using 245 the <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> method will be notified when they do their next 246 operations in the environment. Processes receiving this event should 247 exit the environment. Also, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_ALIVE" class="olink">DB_EVENT_REG_ALIVE</a> event will be 248 triggered if there are other processes currently attached to the 249 environment. Only the process doing the recovery will receive this 250 event notification. It will receive this notification once for each 251 process still attached to the environment. The parameter of the 252 <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> callback will contain the process identifier of the 253 process still attached. The process doing the recovery can then 254 signal the attached process or perform some other operation prior to 255 recovery (i.e. kill the attached process). 256 </p> 257 <p> 258 The <a href="../api_reference/C/envset_timeout.html" class="olink">DB_ENV->set_timeout()</a> method's <a href="../api_reference/C/envset_timeout.html#set_timeout_DB_SET_REG_TIMEOUT" class="olink">DB_SET_REG_TIMEOUT</a> flag can be set to 259 establish a wait period before starting recovery. This creates a 260 window of time for other processes to receive the DB_EVENT_REG_PANIC 261 event and exit the environment. 262 </p> 263 <p> 264 There are three additional requirements for the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> 265 architecture to work: 266 </p> 267 <div class="itemizedlist"> 268 <ul type="disc"> 269 <li> 270 <p> 271 First, all applications using the database environment must 272 specify the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag when opening the environment. 273 However, there is no additional requirement if the application 274 chooses a single process to recover the environment, as the 275 first process to open the database environment will know to 276 perform recovery. 277 </p> 278 </li> 279 <li> 280 <p> 281 Second, there can only be a single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database 282 environment in each process. As the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> locking is 283 per-process, not per-thread, multiple <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles in a single 284 environment could race with each other, potentially causing 285 data corruption. 286 </p> 287 </li> 288 <li> 289 <p> 290 Third, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> implementation does not explicitly 291 terminate processes using the database environment which is 292 being recovered. Instead, it relies on the processes 293 themselves noticing the database environment has been discarded 294 from underneath them. For this reason, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag 295 should be used with a mutex implementation that does not block 296 in the operating system, as that risks a thread of control 297 blocking forever on a mutex which will never be granted. Using 298 any test-and-set mutex implementation ensures this cannot 299 happen, and for that reason the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is generally 300 used with a test-and-set mutex implementation. 301 </p> 302 </li> 303 </ul> 304 </div> 305 <p> 306 A second solution for groups of unrelated processes is also 307 based on a "watcher process". This solution is intended for 308 systems where it is not practical to monitor the processes 309 sharing a database environment, but it is possible to monitor 310 the environment to detect if a thread of control has failed 311 holding open Berkeley DB handles. This would be done by having 312 a "watcher" process periodically call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. 313 If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the environment can no longer be 314 used, the watcher would then take action, recovering the 315 environment. 316 </p> 317 <p> 318 The weakness of this approach is that all threads of control 319 using the environment must specify an "ID" function and an 320 "is-alive" function using the <a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV->set_thread_id()</a> method. (In 321 other words, the Berkeley DB library must be able to assign a 322 unique ID to each thread of control, and additionally determine 323 if the thread of control is still running. It can be difficult 324 to portably provide that information in applications using a 325 variety of different programming languages and running on a 326 variety of different platforms.) 327 </p> 328 <p> 329 A third solution for groups of unrelated processes is a hybrid of the two 330 above. Along with implementing the built-in sentinel approach with the 331 the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> methods <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag, the <a href="../api_reference/C/envopen.html#envopen_DB_FAILCHK" class="olink">DB_FAILCHK</a> flag can be specified. 332 When using both flags, each process opening the database environment first 333 checks to see if recocvery needs to be performed. If recovery needs to be 334 performed for any reason, it will first determine if a thread of control 335 exited while holding database read locks, and release those. Then it will 336 abort any unresolved transactions. If these steps are successful, the process 337 opening the environment will continue without the need for any 338 additional recocvery. If these steps are unsuccessful, then additional 339 recovery will be performed if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is specified and if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not 340 specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>will be returned. 341 </p> 342 <p> 343 Since this solution is hybrid of the first two, all of the requirements of both 344 of them must be implemented (will need "ID" function, "is-alive" function, 345 single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database, etc.) 346 </p> 347 <p> 348 The described approaches are different, and should not be 349 combined. Applications might use either the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> 350 approach, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> or the hybrid approach, but not together in 351 the same application. For example, a POSIX application written 352 as a library underneath a wide variety of interfaces and 353 differing APIs might choose the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach for a 354 few reasons: first, it does not require making periodic calls 355 to the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method; second, when implementing in a 356 variety of languages, is may be more difficult to specify 357 unique IDs for each thread of control; third, it may be more 358 difficult determine if a thread of control is still running, as 359 any particular thread of control is likely to lack sufficient 360 permissions to signal other processes. Alternatively, an 361 application with a dedicated watcher process, running with 362 appropriate permissions, might choose the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> approach 363 as supporting higher overall throughput and reliability, as 364 that approach allows the application to abort unresolved 365 transactions and continue forward without having to recover the 366 database environment. The hybrid approach is useful in situations 367 where running a dedicated watcher process is not practical but getting the 368 equivalent of <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> on the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> is important. 369 </p> 370 </li> 371 </ol> 372 </div> 373 <p> 374 Obviously, when implementing a process to monitor other threads of 375 control, it is important the watcher process' code be as simple and 376 well-tested as possible, because the application may hang if it fails. 377</p> 378 </div> 379 <div class="navfooter"> 380 <hr /> 381 <table width="100%" summary="Navigation footer"> 382 <tr> 383 <td width="40%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td> 384 <td width="20%" align="center"> 385 <a accesskey="u" href="transapp.html">Up</a> 386 </td> 387 <td width="40%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td> 388 </tr> 389 <tr> 390 <td width="40%" align="left" valign="top">Handling failure in Transactional Data Store applications </td> 391 <td width="20%" align="center"> 392 <a accesskey="h" href="index.html">Home</a> 393 </td> 394 <td width="40%" align="right" valign="top"> Opening the environment</td> 395 </tr> 396 </table> 397 </div> 398 </body> 399</html> 400