1<!--$Id: app.so,v 10.9 2005/12/01 03:18:51 bostic Exp $-->
2<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
3<!--See the file LICENSE for redistribution information.-->
4<html>
5<head>
6<title>Berkeley DB Reference Guide: Architecting Data Store and Concurrent Data Store applications</title>
7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
9</head>
10<body bgcolor=white>
11<table width="100%"><tr valign=top>
12<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Berkeley DB Concurrent Data Store Applications</dl></b></td>
13<td align=right><a href="../cam/fail.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/intro.html"><img src="../../images/next.gif" alt="Next"></a>
14</td></tr></table>
15<p align=center><b>Architecting Data Store and Concurrent Data Store applications</b></p>
16<p>When building Data Store and Concurrent Data Store applications, the
17architecture decisions involve application startup (cleaning up any
18existing databases, the removal of any existing database environment
19and creation of a new environment), and handling system or application
20failure.  "Cleaning up" databases involves removal and re-creation
21of the database, restoration from an archival copy and/or verification
22and optional salvage, as described in <a href="fail.html">Handling failure
23in Data Store and Concurrent Data Store applications</a>.</p>
24<p>Data Store or Concurrent Data Store applications without database
25environments are single process, by definition.  These applications
26should start up, re-create, restore, or verify and optionally salvage
27their databases and run until eventual exit or application or system
28failure.  After system or application failure, that process can simply
29repeat this procedure.  This document will not discuss the case of these
30applications further.</p>
31<p>Otherwise, the first question of Data Store and Concurrent Data Store
32architecture is the cleaning up existing databases and the removal of
33existing database environments, and the subsequent creation of a new
34environment.  For obvious reasons, the application must serialize the
35re-creation, restoration, or verification and optional salvage of its
36databases.  Further, environment removal and creation must be
37single-threaded, that is, one thread of control (where a thread of
38control is either a true thread or a process) must remove and re-create
39the environment before any other thread of control can use the new
40environment.  It may simplify matters that Berkeley DB serializes creation of
41the environment, so multiple threads of control attempting to create a
42environment will serialize behind a single creating thread.</p>
43<p>Removing a database environment will first mark the environment as
44"failed", causing any threads of control still running in the
45environment to fail and return to the application.  This feature allows
46applications to remove environments without concern for threads of
47control that might still be running in the removed environment.</p>
48<p>One consideration in removing a database environment which may be in use
49by another thread, is the type of mutex being used by the Berkeley DB library.
50In the case of database environment failure when using test-and-set
51mutexes, threads of control waiting on a mutex when the environment is
52marked "failed" will quickly notice the failure and will return an error
53from the Berkeley DB API.  In the case of environment failure when using
54blocking mutexes, where the underlying system mutex implementation does
55not unblock mutex waiters after the thread of control holding the mutex
56dies, threads waiting on a mutex when an environment is recovered might
57hang forever.  Applications blocked on events (for example, an
58application blocked on a network socket or a GUI event) may also fail
59to notice environment recovery within a reasonable amount of time.
60Systems with such mutex implementations are rare, but do exist;
61applications on such systems should use an application architecture
62where the thread recovering the database environment can explicitly
63terminate any process using the failed environment, or configure Berkeley DB
64for test-and-set mutexes, or incorporate some form of long-running timer
65or watchdog process to wake or kill blocked processes should they block
66for too long.</p>
67<p>Regardless, it makes little sense for multiple threads of control to
68simultaneously attempt to remove and re-create a environment, since the
69last one to run will remove all environments created by the threads of
70control that ran before it.  However, for some few applications, it may
71make sense for applications to have a single thread of control that
72checks the existing databases and removes the environment, after which
73the application launches a number of processes, any of which are able
74to create the environment.</p>
75<p>With respect to cleaning up existing databases, the database environment
76must be removed before the databases are cleaned up.  Removing the
77environment causes any Berkeley DB library calls made by threads of control
78running in the failed environment to return failure to the application.
79Removing the database environment first ensures the threads of control
80in the old environment do not race with the threads of control cleaning
81up the databases, possibly overwriting them after the cleanup has
82finished.  Where the application architecture and system permit, many
83applications kill all threads of control running in the failed database
84environment before removing the failed database environment, on general
85principles as well as to minimize overall system resource usage.  It
86does not matter if the new environment is created before or after the
87databases are cleaned up.</p>
88<p>After having dealt with database and database environment recovery after
89failure, the next issue to manage is application failure.  As described
90in <a href="fail.html">Handling failure in Data Store and Concurrent Data
91Store applications</a>, when a thread of control in a Data Store or
92Concurrent Data Store application fails, it may exit holding data
93structure mutexes or logical database locks.  These mutexes and locks
94must be released to avoid the remaining threads of control hanging
95behind the failed thread of control's mutexes or locks.</p>
96<p>There are three common ways to architect Berkeley DB Data Store and Concurrent
97Data Store applications.  The one chosen is usually based on whether or
98not the application is comprised of a single process or group of
99processes descended from a single process (for example, a server started
100when the system first boots), or if the application is comprised of
101unrelated processes (for example, processes started by web connections
102or users logging into the system).</p>
103<ol>
104<p><li>The first way to architect Data Store and Concurrent Data Store
105applications is as a single process (the process may or may not be
106multithreaded.)
107<p>When this process starts, it removes any existing database environment
108and creates a new environment.  It then cleans up the databases and
109opens those databases in the environment.  The application can
110subsequently create new threads of control as it chooses.  Those threads
111of control can either share already open Berkeley DB <a href="../../api_c/env_class.html">DB_ENV</a> and
112<a href="../../api_c/db_class.html">DB</a> handles, or create their own.  In this architecture,
113databases are rarely opened or closed when more than a single thread of
114control is running; that is, they are opened when only a single thread
115is running, and closed after all threads but one have exited.  The last
116thread of control to exit closes the databases and the database
117environment.</p>
118<p>This architecture is simplest to implement because thread serialization
119is easy and failure detection does not require monitoring multiple
120processes.</p>
121<p>If the application's thread model allows the process to continue after
122thread failure, the <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> method can be used to determine if
123the database environment is usable after the failure.  If the
124application does not call <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a>, or
125<a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> returns <a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application
126must behave as if there has been a system failure, removing the
127environment and creating a new environment, and cleaning up any
128databases it wants to continue to use.  Once these actions have been
129taken, other threads of control can continue (as long as all existing
130Berkeley DB handles are first discarded), or restarted.</p>
131<p><li>The second way to architect Data Store and Concurrent Data Store
132applications is as a group of related processes (the processes may or
133may not be multithreaded).
134<p>This architecture requires the order in which threads of control are
135created be controlled to serialize database environment removal and
136creation, and database cleanup.</p>
137<p>In addition, this architecture requires that threads of control be
138monitored.  If any thread of control exits with open Berkeley DB handles, the
139application may call the <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> method to determine if the
140database environment is usable after the exit.  If the application does
141not call <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a>, or <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> returns
142<a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the application must behave as if there has been
143a system failure, removing the environment and creating a new
144environment, and cleaning up any databases it wants to continue to use.
145Once these actions have been taken, other threads of control can
146continue (as long as all existing Berkeley DB handles are first discarded),
147or restarted.</p>
148<p>The easiest way to structure groups of related processes is to first
149create a single "watcher" process (often a script) that starts when the
150system first boots, removes and creates the database environment, cleans
151up the databases and then creates the processes or threads that will
152actually perform work.  The initial thread has no further
153responsibilities other than to wait on the threads of control it has
154started, to ensure none of them unexpectedly exit.  If a thread of
155control exits, the watcher process optionally calls the
156<a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> method.  If the application does not call
157<a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> or if <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> returns
158<a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the environment can no longer be used, the
159watcher kills all of the threads of control using the failed
160environment, cleans up, and starts new threads of control to perform
161work.</p>
162<p><li>The third way to architect Data Store and Concurrent Data Store
163applications is as a group of unrelated processes (the processes may or
164may not be multithreaded).  This is the most difficult architecture to
165implement because of the level of difficulty in some systems of finding
166and monitoring unrelated processes.
167<p>One solution is to log a thread of control ID when a new Berkeley DB handle
168is opened.  For example, an initial "watcher" process could open/create
169the database environment, clean up the databases and then create a
170sentinel file.  Any "worker" process wanting to use the environment
171would check for the sentinel file.  If the sentinel file does not exist,
172the worker would fail or wait for the sentinel file to be created.  Once
173the sentinel file exists, the worker would register its process ID with
174the watcher (via shared memory, IPC or some other registry mechanism),
175and then the worker would open its <a href="../../api_c/env_class.html">DB_ENV</a> handles and proceed.
176When the worker finishes using the environment, it would unregister its
177process ID with the watcher.  The watcher periodically checks to ensure
178that no worker has failed while using the environment.  If a worker
179fails while using the environment, the watcher removes the sentinel
180file, kills all of the workers currently using the environment, cleans
181up the environment and databases, and finally creates a new sentinel
182file.</p>
183<p>The weakness of this approach is that, on some systems, it is difficult
184to determine if an unrelated process is still running.  For example,
185POSIX systems generally disallow sending signals to unrelated processes.
186The trick to monitoring unrelated processes is to find a system resource
187held by the process that will be modified if the process dies.  On POSIX
188systems, flock- or fcntl-style locking will work, as will LockFile on
189Windows systems.  Other systems may have to use other process-related
190information such as file reference counts or modification times.  In the
191worst case, threads of control can be required to periodically
192re-register with the watcher process: if the watcher has not heard from
193a thread of control in a specified period of time, the watcher will take
194action, cleaning up the environment.</p>
195<p>If it is not practical to monitor the processes sharing a database
196environment, it may be possible to monitor the environment to detect if
197a thread of control has failed holding open Berkeley DB handles.  This would
198be done by having a "watcher" process periodically call the
199<a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> method.  If <a href="../../api_c/env_failchk.html">DB_ENV-&gt;failchk</a> returns
200<a href="../../ref/program/errorret.html#DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the watcher would then take action, cleaning up
201the environment.</p>
202<p>The weakness of this approach is that all threads of control using the
203environment must specify an "ID" function and an "is-alive" function
204using the <a href="../../api_c/env_set_thread_id.html">DB_ENV-&gt;set_thread_id</a> method.  (In other words, the Berkeley DB
205library must be able to assign a unique ID to each thread of control,
206and additionally determine if the thread of control is still running.
207It can be difficult to portably provide that information in applications
208using a variety of different programming languages and running on a
209variety of different platforms.)</p> </ol>
210<p>Obviously, when implementing a process to monitor other threads of
211control, it is important the watcher process' code be as simple and
212well-tested as possible, because the application may hang if it fails.</p>
213<table width="100%"><tr><td><br></td><td align=right><a href="../cam/fail.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/intro.html"><img src="../../images/next.gif" alt="Next"></a>
214</td></tr></table>
215<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
216</body>
217</html>
218