1<!--$Id: app.so,v 10.29 2007/12/07 21:09:25 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: Architecting Transactional Data Store applications</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<table width="100%"><tr valign=top> 12<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Berkeley DB Transactional Data Store Applications</dl></b></td> 13<td align=right><a href="../transapp/fail.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/env_open.html"><img src="../../images/next.gif" alt="Next"></a> 14</td></tr></table> 15<p align=center><b>Architecting Transactional Data Store applications</b></p> 16<p>When building Transactional Data Store applications, the architecture 17decisions involve application startup (running recovery) and handling 18system or application failure. For details on performing recovery, 19see the <a href="recovery.html">Recovery procedures</a>.</p> 20<p>Recovery in a database environment is a single-threaded procedure, that 21is, one thread of control or process must complete database environment 22recovery before any other thread of control or process operates in the 23Berkeley DB environment. It may simplify matters that Berkeley DB serializes 24recovery and creation of a new database environment.</p> 25<p>Performing recovery first marks any existing database environment as 26"failed" and then removes it, causing threads of control running in the 27database environment to fail and return to the application. This 28feature allows applications to recover environments without concern for 29threads of control that might still be running in the removed 30environment. The subsequent re-creation of the database environment is 31serialized, so multiple threads of control attempting to create a 32database environment will serialize behind a single creating thread.</p> 33<p>One consideration in removing (as part of recovering) a database 34environment which may be in use by another thread, is the type of mutex 35being used by the Berkeley DB library. In the case of database environment 36failure when using test-and-set mutexes, threads of control waiting on 37a mutex when the environment is marked "failed" will quickly notice the 38failure and will return an error from the Berkeley DB API. In the case of 39environment failure when using blocking mutexes, where the underlying 40system mutex implementation does not unblock mutex waiters after the 41thread of control holding the mutex dies, threads waiting on a mutex 42when an environment is recovered might hang forever. Applications 43blocked on events (for example, an application blocked on a network 44socket, or a GUI event) may also fail to notice environment recovery 45within a reasonable amount of time. Systems with such mutex 46implementations are rare, but do exist; applications on such systems 47should use an application architecture where the thread recovering the 48database environment can explicitly terminate any process using the 49failed environment, or configure Berkeley DB for test-and-set mutexes, or 50incorporate some form of long-running timer or watchdog process to wake 51or kill blocked processes should they block for too long.</p> 52<p>Regardless, it makes little sense for multiple threads of control to 53simultaneously attempt recovery of a database environment, since the 54last one to run will remove all database environments created by the 55threads of control that ran before it. However, for some applications, 56it may make sense for applications to have a single thread of control 57that performs recovery and then removes the database environment, after 58which the application launches a number of processes, any of which will 59create the database environment and continue forward.</p> 60<p>There are three common ways to architect Berkeley DB Transactional Data Store 61applications. The one chosen is usually based on whether or not the 62application is comprised of a single process or group of processes 63descended from a single process (for example, a server started when the 64system first boots), or if the application is comprised of unrelated 65processes (for example, processes started by web connections or users 66logged into the system).</p> 67<ol> 68<p><li>The first way to architect Transactional Data Store applications is as 69a single process (the process may or may not be multithreaded.) 70<p>When this process starts, it runs recovery on the database environment 71and then opens its databases. The application can subsequently create 72new threads of control as it chooses. Those threads of control can 73either share already open Berkeley DB <a href="../../api_c/env_class.html">DB_ENV</a> and <a href="../../api_c/db_class.html">DB</a> handles, 74or create their own. In this architecture, databases are rarely opened 75or closed when more than a single thread of control is running; that is, 76they are opened when only a single thread is running, and closed after 77all threads but one have exited. The last thread of control to exit 78closes the databases and the database environment.</p> 79<p>This architecture is simplest to implement because thread serialization 80is easy and failure detection does not require monitoring multiple 81processes.</p> 82<p>If the application's thread model allows processes to continue after 83thread failure, the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method can be used to determine if 84the database environment is usable after thread failure. If the 85application does not call <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a>, or 86<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns <a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application 87must behave as if there has been a system failure, performing recovery 88and re-creating the database environment. Once these actions have been 89taken, other threads of control can continue (as long as all existing 90Berkeley DB handles are first discarded), or restarted.</p> 91<p><li>The second way to architect Transactional Data Store applications is as 92a group of related processes (the processes may or may not be 93multithreaded). 94<p>This architecture requires the order in which threads of control are 95created be controlled to serialize database environment recovery.</p> 96<p>In addition, this architecture requires that threads of control be 97monitored. If any thread of control exits with open Berkeley DB handles, the 98application may call the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method to detect lost mutexes 99and locks and determine if the application can continue. If the 100application does not call <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a>, or 101<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns that the database environment can no 102longer be used, the application must behave as if there has been a 103system failure, performing recovery and creating a new database 104environment. Once these actions have been taken, other threads of 105control can be continued (as long as all existing Berkeley DB handles are 106first discarded), or restarted.</p> 107<p>The easiest way to structure groups of related processes is to first 108create a single "watcher" process (often a script) that starts when the 109system first boots, runs recovery on the database environment and then 110creates the processes or threads that will actually perform work. The 111initial thread has no further responsibilities other than to wait on the 112threads of control it has started, to ensure none of them unexpectedly 113exit. If a thread of control exits, the watcher process optionally 114calls the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method. If the application does not call 115<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> or if <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns that the 116environment can no longer be used, the watcher kills all of the threads 117of control using the failed environment, runs recovery, and starts new 118threads of control to perform work.</p> 119<p><li>The third way to architect Transactional Data Store applications is as 120a group of unrelated processes (the processes may or may not be 121multithreaded). This is the most difficult architecture to implement 122because of the level of difficulty in some systems of finding and 123monitoring unrelated processes. 124<p>One solution is to log a thread of control ID when a new Berkeley DB handle 125is opened. For example, an initial "watcher" process could run recovery 126on the database environment and then create a sentinel file. Any 127"worker" process wanting to use the environment would check for the 128sentinel file. If the sentinel file does not exist, the worker would 129fail or wait for the sentinel file to be created. Once the sentinel 130file exists, the worker would register its process ID with the watcher 131(via shared memory, IPC or some other registry mechanism), and then the 132worker would open its <a href="../../api_c/env_class.html">DB_ENV</a> handles and proceed. When the 133worker finishes using the environment, it would unregister its process 134ID with the watcher. The watcher periodically checks to ensure that no 135worker has failed while using the environment. If a worker fails while 136using the environment, the watcher removes the sentinel file, kills all 137of the workers currently using the environment, runs recovery on the 138environment, and finally creates a new sentinel file.</p> 139<p>The weakness of this approach is that, on some systems, it is difficult 140to determine if an unrelated process is still running. For example, 141POSIX systems generally disallow sending signals to unrelated processes. 142The trick to monitoring unrelated processes is to find a system resource 143held by the process that will be modified if the process dies. On POSIX 144systems, flock- or fcntl-style locking will work, as will LockFile on 145Windows systems. Other systems may have to use other process-related 146information such as file reference counts or modification times. In the 147worst case, threads of control can be required to periodically 148re-register with the watcher process: if the watcher has not heard from 149a thread of control in a specified period of time, the watcher will take 150action, recovering the environment.</p> 151<p>The Berkeley DB library includes one built-in implementation of this approach, 152the <a href="../../api_c/env_open.html">DB_ENV->open</a> method's <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> flag:</p> 153<p>If the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> flag is set, each process opening the 154database environment first checks to see if recovery needs to be 155performed. If recovery needs to be performed for any reason (including 156the initial creation of the database environment), and 157<a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> is also specified, recovery will be performed and 158then the open will proceed normally. If recovery needs to be performed 159and <a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> is not specified, <a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a> will be 160returned. If recovery does not need to be performed, <a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> 161will be ignored.</p> 162<p>There are three additional requirements for the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> 163architecture to work:</p> 164<p><ul type=disc> 165<li>First, all applications using the database environment must specify the 166<a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> flag when opening the environment. However, there 167is no additional requirement the application choose a single process to 168recover the environment, as the first process to open the database 169environment will know to perform recovery. 170<li>Second, there can only be a single <a href="../../api_c/env_class.html">DB_ENV</a> handle per database 171environment in each process. As the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> locking is 172per-process, not per-thread, multiple <a href="../../api_c/env_class.html">DB_ENV</a> handles in a single 173environment could race with each other, potentially causing data 174corruption. 175<li>Third, the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> implementation does not explicitly 176terminate processes using the database environment which is being 177recovered. Instead, it relies on the processes themselves noticing the 178database environment has been discarded from underneath them. For this 179reason, the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> flag should be used with a mutex 180implementation that does not block in the operating system, as that 181risks a thread of control blocking forever on a mutex which will never 182be granted. Using any test-and-set mutex implementation ensures this 183cannot happen, and for that reason the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> flag is 184generally used with a test-and-set mutex implementation. 185</ul> 186<p>A second solution for groups of unrelated processes is also based on a 187"watcher process". This solution is intended for systems where it is 188not practical to monitor the processes sharing a database environment, 189but it is possible to monitor the environment to detect if a thread of 190control has failed holding open Berkeley DB handles. This would be done by 191having a "watcher" process periodically call the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method. 192If <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> returns that the environment can no longer be 193used, the watcher would then take action, recovering the environment.</p> 194<p>The weakness of this approach is that all threads of control using the 195environment must specify an "ID" function and an "is-alive" function 196using the <a href="../../api_c/env_set_thread_id.html">DB_ENV->set_thread_id</a> method. (In other words, the Berkeley DB 197library must be able to assign a unique ID to each thread of control, 198and additionally determine if the thread of control is still running. 199It can be difficult to portably provide that information in applications 200using a variety of different programming languages and running on a 201variety of different platforms.)</p> 202<p>The two described approaches are different, and should not be combined. 203Applications might use either the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> approach or the 204<a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> approach, but not both together in the same 205application. For example, a POSIX application written as a library 206underneath a wide variety of interfaces and differing APIs might choose 207the <a href="../../api_c/env_open.html#DB_REGISTER">DB_REGISTER</a> approach for a few reasons: first, it does not 208require making periodic calls to the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> method; second, 209when implementing in a variety of languages, is may be more difficult 210to specify unique IDs for each thread of control; third, it may be more 211difficult determine if a thread of control is still running, as any 212particular thread of control is likely to lack sufficient permissions 213to signal other processes. Alternatively, an application with a 214dedicated watcher process, running with appropriate permissions, might 215choose the <a href="../../api_c/env_failchk.html">DB_ENV->failchk</a> approach as supporting higher overall 216throughput and reliability, as that approach allows the application to 217abort unresolved transactions and continue forward without having to 218recover the database environment.</p> 219</ol> 220<p>Obviously, when implementing a process to monitor other threads of 221control, it is important the watcher process' code be as simple and 222well-tested as possible, because the application may hang if it fails.</p> 223<table width="100%"><tr><td><br></td><td align=right><a href="../transapp/fail.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/env_open.html"><img src="../../images/next.gif" alt="Next"></a> 224</td></tr></table> 225<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 226</body> 227</html> 228