1<!--$Id: app.so,v 10.9 2005/12/01 03:18:51 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: Architecting Data Store and Concurrent Data Store applications</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<table width="100%"><tr valign=top> 12<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Berkeley DB Concurrent Data Store Applications</dl></b></td> 13<td align=right><a href="../cam/fail.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/intro.html"><img src="../../images/next.gif" alt="Next"></a> 14</td></tr></table> 15<p align=center><b>Architecting Data Store and Concurrent Data Store applications</b></p> 16<p>When building Data Store and Concurrent Data Store applications, the 17architecture decisions involve application startup (cleaning up any 18existing databases, the removal of any existing database environment 19and creation of a new environment), and handling system or application 20failure. "Cleaning up" databases involves removal and re-creation 21of the database, restoration from an archival copy and/or verification 22and optional salvage, as described in <a href="fail.html">Handling failure 23in Data Store and Concurrent Data Store applications</a>.</p> 24<p>Data Store or Concurrent Data Store applications without database 25environments are single process, by definition. These applications 26should start up, re-create, restore, or verify and optionally salvage 27their databases and run until eventual exit or application or system 28failure. After system or application failure, that process can simply 29repeat this procedure. This document will not discuss the case of these 30applications further.</p> 31<p>Otherwise, the first question of Data Store and Concurrent Data Store 32architecture is the cleaning up existing databases and the removal of 33existing database environments, and the subsequent creation of a new 34environment. For obvious reasons, the application must serialize the 35re-creation, restoration, or verification and optional salvage of its 36databases. Further, environment removal and creation must be 37single-threaded, that is, one thread of control (where a thread of 38control is either a true thread or a process) must remove and re-create 39the environment before any other thread of control can use the new 40environment. It may simplify matters that Berkeley DB serializes creation of 41the environment, so multiple threads of control attempting to create a 42environment will serialize behind a single creating thread.</p> 43<p>Removing a database environment will first mark the environment as 44"failed", causing any threads of control still running in the 45environment to fail and return to the application. This feature allows 46applications to remove environments without concern for threads of 47control that might still be running in the removed environment.</p> 48<p>One consideration in removing a database environment which may be in use 49by another thread, is the type of mutex being used by the Berkeley DB library. 50In the case of database environment failure when using test-and-set 51mutexes, threads of control waiting on a mutex when the environment is 52marked "failed" will quickly notice the failure and will return an error 53from the Berkeley DB API. In the case of environment failure when using 54blocking mutexes, where the underlying system mutex implementation does 55not unblock mutex waiters after the thread of control holding the mutex 56dies, threads waiting on a mutex when an environment is recovered might 57hang forever. Applications blocked on events (for example, an 58application blocked on a network socket or a GUI event) may also fail 59to notice environment recovery within a reasonable amount of time. 60Systems with such mutex implementations are rare, but do exist; 61applications on such systems should use an application architecture 62where the thread recovering the database environment can explicitly 63terminate any process using the failed environment, or configure Berkeley DB 64for test-and-set mutexes, or incorporate some form of long-running timer 65or watchdog process to wake or kill blocked processes should they block 66for too long.</p> 67<p>Regardless, it makes little sense for multiple threads of control to 68simultaneously attempt to remove and re-create a environment, since the 69last one to run will remove all environments created by the threads of 70control that ran before it. However, for some few applications, it may 71make sense for applications to have a single thread of control that 72checks the existing databases and removes the environment, after which 73the application launches a number of processes, any of which are able 74to create the environment.</p> 75<p>With respect to cleaning up existing databases, the database environment 76must be removed before the databases are cleaned up. Removing the 77environment causes any Berkeley DB library calls made by threads of control 78running in the failed environment to return failure to the application. 79Removing the database environment first ensures the threads of control 80in the old environment do not race with the threads of control cleaning 81up the databases, possibly overwriting them after the cleanup has 82finished. Where the application architecture and system permit, many 83applications kill all threads of control running in the failed database 84environment before removing the failed database environment, on general 85principles as well as to minimize overall system resource usage. It 86does not matter if the new environment is created before or after the 87databases are cleaned up.</p> 88<p>After having dealt with database and database environment recovery after 89failure, the next issue to manage is application failure. As described 90in <a href="fail.html">Handling failure in Data Store and Concurrent Data 91Store applications</a>, when a thread of control in a Data Store or 92Concurrent Data Store application fails, it may exit holding data 93structure mutexes or logical database locks. These mutexes and locks 94must be released to avoid the remaining threads of control hanging 95behind the failed thread of control's mutexes or locks.</p> 96<p>There are three common ways to architect Berkeley DB Data Store and Concurrent 97Data Store applications. The one chosen is usually based on whether or 98not the application is comprised of a single process or group of 99processes descended from a single process (for example, a server started 100when the system first boots), or if the application is comprised of 101unrelated processes (for example, processes started by web connections 102or users logging into the system).</p> 103<ol> 104<p><li>The first way to architect Data Store and Concurrent Data Store 105applications is as a single process (the process may or may not be 106multithreaded.) 107<p>When this process starts, it removes any existing database environment 108and creates a new environment. It then cleans up the databases and 109opens those databases in the environment. The application can 110subsequently create new threads of control as it chooses. Those threads 111of control can either share already open Berkeley DB <a href="../../api_c/env_class.html">DB_ENV</a> and 112<a href="../../api_c/db_class.html">DB</a> handles, or create their own. In this architecture, 113databases are rarely opened or closed when more than a single thread of 114control is running; that is, they are opened when only a single thread 115is running, and closed after all threads but one have exited. The last 116thread of control to exit closes the databases and the database 117environment.</p> 118<p>This architecture is simplest to implement because thread serialization 119is easy and failure detection does not require monitoring multiple 120processes.</p> 121<p>If the application's thread model allows the process to continue after 122thread failure, the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method can be used to determine if 123the database environment is usable after the failure. If the 124application does not call <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a>, or 125<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns <a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application 126must behave as if there has been a system failure, removing the 127environment and creating a new environment, and cleaning up any 128databases it wants to continue to use. Once these actions have been 129taken, other threads of control can continue (as long as all existing 130Berkeley DB handles are first discarded), or restarted.</p> 131<p><li>The second way to architect Data Store and Concurrent Data Store 132applications is as a group of related processes (the processes may or 133may not be multithreaded). 134<p>This architecture requires the order in which threads of control are 135created be controlled to serialize database environment removal and 136creation, and database cleanup.</p> 137<p>In addition, this architecture requires that threads of control be 138monitored. If any thread of control exits with open Berkeley DB handles, the 139application may call the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method to determine if the 140database environment is usable after the exit. If the application does 141not call <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a>, or <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns 142<a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application must behave as if there has been 143a system failure, removing the environment and creating a new 144environment, and cleaning up any databases it wants to continue to use. 145Once these actions have been taken, other threads of control can 146continue (as long as all existing Berkeley DB handles are first discarded), 147or restarted.</p> 148<p>The easiest way to structure groups of related processes is to first 149create a single "watcher" process (often a script) that starts when the 150system first boots, removes and creates the database environment, cleans 151up the databases and then creates the processes or threads that will 152actually perform work. The initial thread has no further 153responsibilities other than to wait on the threads of control it has 154started, to ensure none of them unexpectedly exit. If a thread of 155control exits, the watcher process optionally calls the 156<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method. If the application does not call 157<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> or if <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns 158<a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the environment can no longer be used, the 159watcher kills all of the threads of control using the failed 160environment, cleans up, and starts new threads of control to perform 161work.</p> 162<p><li>The third way to architect Data Store and Concurrent Data Store 163applications is as a group of unrelated processes (the processes may or 164may not be multithreaded). This is the most difficult architecture to 165implement because of the level of difficulty in some systems of finding 166and monitoring unrelated processes. 167<p>One solution is to log a thread of control ID when a new Berkeley DB handle 168is opened. For example, an initial "watcher" process could open/create 169the database environment, clean up the databases and then create a 170sentinel file. Any "worker" process wanting to use the environment 171would check for the sentinel file. If the sentinel file does not exist, 172the worker would fail or wait for the sentinel file to be created. Once 173the sentinel file exists, the worker would register its process ID with 174the watcher (via shared memory, IPC or some other registry mechanism), 175and then the worker would open its <a href="../../api_c/env_class.html">DB_ENV</a> handles and proceed. 176When the worker finishes using the environment, it would unregister its 177process ID with the watcher. The watcher periodically checks to ensure 178that no worker has failed while using the environment. If a worker 179fails while using the environment, the watcher removes the sentinel 180file, kills all of the workers currently using the environment, cleans 181up the environment and databases, and finally creates a new sentinel 182file.</p> 183<p>The weakness of this approach is that, on some systems, it is difficult 184to determine if an unrelated process is still running. For example, 185POSIX systems generally disallow sending signals to unrelated processes. 186The trick to monitoring unrelated processes is to find a system resource 187held by the process that will be modified if the process dies. On POSIX 188systems, flock- or fcntl-style locking will work, as will LockFile on 189Windows systems. Other systems may have to use other process-related 190information such as file reference counts or modification times. In the 191worst case, threads of control can be required to periodically 192re-register with the watcher process: if the watcher has not heard from 193a thread of control in a specified period of time, the watcher will take 194action, cleaning up the environment.</p> 195<p>If it is not practical to monitor the processes sharing a database 196environment, it may be possible to monitor the environment to detect if 197a thread of control has failed holding open Berkeley DB handles. This would 198be done by having a "watcher" process periodically call the 199<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method. If <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns 200<a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the watcher would then take action, cleaning up 201the environment.</p> 202<p>The weakness of this approach is that all threads of control using the 203environment must specify an "ID" function and an "is-alive" function 204using the <a href="../../api_c/env_set_thread_id.html">DB_ENV->set_thread_id</a> method. (In other words, the Berkeley DB 205library must be able to assign a unique ID to each thread of control, 206and additionally determine if the thread of control is still running. 207It can be difficult to portably provide that information in applications 208using a variety of different programming languages and running on a 209variety of different platforms.)</p> </ol> 210<p>Obviously, when implementing a process to monitor other threads of 211control, it is important the watcher process' code be as simple and 212well-tested as possible, because the application may hang if it fails.</p> 213<table width="100%"><tr><td><br></td><td align=right><a href="../cam/fail.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/intro.html"><img src="../../images/next.gif" alt="Next"></a> 214</td></tr></table> 215<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 216</body> 217</html> 218