1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>Architecting Data Store and Concurrent Data Store applications</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> 9 <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> 10 <link rel="up" href="cam.html" title="Chapter��10.�� Berkeley DB Concurrent Data Store Applications" /> 11 <link rel="prev" href="cam_fail.html" title="Handling failure in Data Store and Concurrent Data Store applications" /> 12 <link rel="next" href="transapp.html" title="Chapter��11.�� Berkeley DB Transactional Data Store Applications" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">Architecting Data Store and Concurrent Data Store applications</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="cam_fail.html">Prev</a>��</td> 22 <th width="60%" align="center">Chapter��10.�� 23 Berkeley DB Concurrent Data Store Applications 24 </th> 25 <td width="20%" align="right">��<a accesskey="n" href="transapp.html">Next</a></td> 26 </tr> 27 </table> 28 <hr /> 29 </div> 30 <div class="sect1" lang="en" xml:lang="en"> 31 <div class="titlepage"> 32 <div> 33 <div> 34 <h2 class="title" style="clear: both"><a id="cam_app"></a>Architecting Data Store and Concurrent Data Store applications</h2> 35 </div> 36 </div> 37 </div> 38 <p>When building Data Store and Concurrent Data Store applications, the 39architecture decisions involve application startup (cleaning up any 40existing databases, the removal of any existing database environment 41and creation of a new environment), and handling system or application 42failure. "Cleaning up" databases involves removal and re-creation 43of the database, restoration from an archival copy and/or verification 44and optional salvage, as described in <a class="xref" href="cam_fail.html" title="Handling failure in Data Store and Concurrent Data Store applications">Handling failure in Data Store and Concurrent Data Store applications</a>.</p> 45 <p>Data Store or Concurrent Data Store applications without database 46environments are single process, by definition. These applications 47should start up, re-create, restore, or verify and optionally salvage 48their databases and run until eventual exit or application or system 49failure. After system or application failure, that process can simply 50repeat this procedure. This document will not discuss the case of these 51applications further.</p> 52 <p>Otherwise, the first question of Data Store and Concurrent Data Store 53architecture is the cleaning up existing databases and the removal of 54existing database environments, and the subsequent creation of a new 55environment. For obvious reasons, the application must serialize the 56re-creation, restoration, or verification and optional salvage of its 57databases. Further, environment removal and creation must be 58single-threaded, that is, one thread of control (where a thread of 59control is either a true thread or a process) must remove and re-create 60the environment before any other thread of control can use the new 61environment. It may simplify matters that Berkeley DB serializes creation of 62the environment, so multiple threads of control attempting to create a 63environment will serialize behind a single creating thread.</p> 64 <p>Removing a database environment will first mark the environment as 65"failed", causing any threads of control still running in the 66environment to fail and return to the application. This feature allows 67applications to remove environments without concern for threads of 68control that might still be running in the removed environment.</p> 69 <p>One consideration in removing a database environment which may be in use 70by another thread, is the type of mutex being used by the Berkeley DB library. 71In the case of database environment failure when using test-and-set 72mutexes, threads of control waiting on a mutex when the environment is 73marked "failed" will quickly notice the failure and will return an error 74from the Berkeley DB API. In the case of environment failure when using 75blocking mutexes, where the underlying system mutex implementation does 76not unblock mutex waiters after the thread of control holding the mutex 77dies, threads waiting on a mutex when an environment is recovered might 78hang forever. Applications blocked on events (for example, an 79application blocked on a network socket or a GUI event) may also fail 80to notice environment recovery within a reasonable amount of time. 81Systems with such mutex implementations are rare, but do exist; 82applications on such systems should use an application architecture 83where the thread recovering the database environment can explicitly 84terminate any process using the failed environment, or configure Berkeley DB 85for test-and-set mutexes, or incorporate some form of long-running timer 86or watchdog process to wake or kill blocked processes should they block 87for too long.</p> 88 <p>Regardless, it makes little sense for multiple threads of control to 89simultaneously attempt to remove and re-create a environment, since the 90last one to run will remove all environments created by the threads of 91control that ran before it. However, for some few applications, it may 92make sense for applications to have a single thread of control that 93checks the existing databases and removes the environment, after which 94the application launches a number of processes, any of which are able 95to create the environment.</p> 96 <p>With respect to cleaning up existing databases, the database environment 97must be removed before the databases are cleaned up. Removing the 98environment causes any Berkeley DB library calls made by threads of control 99running in the failed environment to return failure to the application. 100Removing the database environment first ensures the threads of control 101in the old environment do not race with the threads of control cleaning 102up the databases, possibly overwriting them after the cleanup has 103finished. Where the application architecture and system permit, many 104applications kill all threads of control running in the failed database 105environment before removing the failed database environment, on general 106principles as well as to minimize overall system resource usage. It 107does not matter if the new environment is created before or after the 108databases are cleaned up.</p> 109 <p>After having dealt with database and database environment recovery after 110failure, the next issue to manage is application failure. As described 111in <a class="xref" href="cam_fail.html" title="Handling failure in Data Store and Concurrent Data Store applications">Handling failure in Data Store and Concurrent Data Store applications</a>, when a thread of control in a Data Store or 112Concurrent Data Store application fails, it may exit holding data 113structure mutexes or logical database locks. These mutexes and locks 114must be released to avoid the remaining threads of control hanging 115behind the failed thread of control's mutexes or locks.</p> 116 <p>There are three common ways to architect Berkeley DB Data Store and Concurrent 117Data Store applications. The one chosen is usually based on whether or 118not the application is comprised of a single process or group of 119processes descended from a single process (for example, a server started 120when the system first boots), or if the application is comprised of 121unrelated processes (for example, processes started by web connections 122or users logging into the system).</p> 123 <div class="orderedlist"> 124 <ol type="1"> 125 <li>The first way to architect Data Store and Concurrent Data Store 126applications is as a single process (the process may or may not be 127multithreaded.) 128<p>When this process starts, it removes any existing database environment 129and creates a new environment. It then cleans up the databases and 130opens those databases in the environment. The application can 131subsequently create new threads of control as it chooses. Those threads 132of control can either share already open Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and 133<a href="../api_reference/C/db.html" class="olink">DB</a> handles, or create their own. In this architecture, 134databases are rarely opened or closed when more than a single thread of 135control is running; that is, they are opened when only a single thread 136is running, and closed after all threads but one have exited. The last 137thread of control to exit closes the databases and the database 138environment.</p><p>This architecture is simplest to implement because thread serialization 139is easy and failure detection does not require monitoring multiple 140processes.</p><p>If the application's thread model allows the process to continue after 141thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method can be used to determine if 142the database environment is usable after the failure. If the 143application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or 144<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application 145must behave as if there has been a system failure, removing the 146environment and creating a new environment, and cleaning up any 147databases it wants to continue to use. Once these actions have been 148taken, other threads of control can continue (as long as all existing 149Berkeley DB handles are first discarded), or restarted.</p></li> 150 <li>The second way to architect Data Store and Concurrent Data Store 151applications is as a group of related processes (the processes may or 152may not be multithreaded). 153<p>This architecture requires the order in which threads of control are 154created be controlled to serialize database environment removal and 155creation, and database cleanup.</p><p>In addition, this architecture requires that threads of control be 156monitored. If any thread of control exits with open Berkeley DB handles, the 157application may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method to determine if the 158database environment is usable after the exit. If the application does 159not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns 160<a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application must behave as if there has been 161a system failure, removing the environment and creating a new 162environment, and cleaning up any databases it wants to continue to use. 163Once these actions have been taken, other threads of control can 164continue (as long as all existing Berkeley DB handles are first discarded), 165or restarted.</p><p>The easiest way to structure groups of related processes is to first 166create a single "watcher" process (often a script) that starts when the 167system first boots, removes and creates the database environment, cleans 168up the databases and then creates the processes or threads that will 169actually perform work. The initial thread has no further 170responsibilities other than to wait on the threads of control it has 171started, to ensure none of them unexpectedly exit. If a thread of 172control exits, the watcher process optionally calls the 173<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. If the application does not call 174<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns 175<a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the environment can no longer be used, the 176watcher kills all of the threads of control using the failed 177environment, cleans up, and starts new threads of control to perform 178work.</p></li> 179 <li>The third way to architect Data Store and Concurrent Data Store 180applications is as a group of unrelated processes (the processes may or 181may not be multithreaded). This is the most difficult architecture to 182implement because of the level of difficulty in some systems of finding 183and monitoring unrelated processes. 184<p>One solution is to log a thread of control ID when a new Berkeley DB handle 185is opened. For example, an initial "watcher" process could open/create 186the database environment, clean up the databases and then create a 187sentinel file. Any "worker" process wanting to use the environment 188would check for the sentinel file. If the sentinel file does not exist, 189the worker would fail or wait for the sentinel file to be created. Once 190the sentinel file exists, the worker would register its process ID with 191the watcher (via shared memory, IPC or some other registry mechanism), 192and then the worker would open its <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed. 193When the worker finishes using the environment, it would unregister its 194process ID with the watcher. The watcher periodically checks to ensure 195that no worker has failed while using the environment. If a worker 196fails while using the environment, the watcher removes the sentinel 197file, kills all of the workers currently using the environment, cleans 198up the environment and databases, and finally creates a new sentinel 199file.</p><p>The weakness of this approach is that, on some systems, it is difficult 200to determine if an unrelated process is still running. For example, 201POSIX systems generally disallow sending signals to unrelated processes. 202The trick to monitoring unrelated processes is to find a system resource 203held by the process that will be modified if the process dies. On POSIX 204systems, flock- or fcntl-style locking will work, as will LockFile on 205Windows systems. Other systems may have to use other process-related 206information such as file reference counts or modification times. In the 207worst case, threads of control can be required to periodically 208re-register with the watcher process: if the watcher has not heard from 209a thread of control in a specified period of time, the watcher will take 210action, cleaning up the environment.</p><p>If it is not practical to monitor the processes sharing a database 211environment, it may be possible to monitor the environment to detect if 212a thread of control has failed holding open Berkeley DB handles. This would 213be done by having a "watcher" process periodically call the 214<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns 215<a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the watcher would then take action, cleaning up 216the environment.</p><p>The weakness of this approach is that all threads of control using the 217environment must specify an "ID" function and an "is-alive" function 218using the <a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV->set_thread_id()</a> method. (In other words, the Berkeley DB 219library must be able to assign a unique ID to each thread of control, 220and additionally determine if the thread of control is still running. 221It can be difficult to portably provide that information in applications 222using a variety of different programming languages and running on a 223variety of different platforms.)</p></li> 224 </ol> 225 </div> 226 <p>Obviously, when implementing a process to monitor other threads of 227control, it is important the watcher process' code be as simple and 228well-tested as possible, because the application may hang if it fails.</p> 229 </div> 230 <div class="navfooter"> 231 <hr /> 232 <table width="100%" summary="Navigation footer"> 233 <tr> 234 <td width="40%" align="left"><a accesskey="p" href="cam_fail.html">Prev</a>��</td> 235 <td width="20%" align="center"> 236 <a accesskey="u" href="cam.html">Up</a> 237 </td> 238 <td width="40%" align="right">��<a accesskey="n" href="transapp.html">Next</a></td> 239 </tr> 240 <tr> 241 <td width="40%" align="left" valign="top">Handling failure in Data Store and Concurrent Data Store applications��</td> 242 <td width="20%" align="center"> 243 <a accesskey="h" href="index.html">Home</a> 244 </td> 245 <td width="40%" align="right" valign="top">��Chapter��11.�� 246 Berkeley DB Transactional Data Store Applications 247 </td> 248 </tr> 249 </table> 250 </div> 251 </body> 252</html> 253