• Home
  • History
  • Annotate
  • Line#
  • Navigate
  • Raw
  • Download
  • only in /asuswrt-rt-n18u-9.0.0.4.380.2695/release/src/router/db-4.8.30/docs/programmer_reference/
1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>Architecting Transactional Data Store applications</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
9    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
10    <link rel="up" href="transapp.html" title="Chapter 11.  Berkeley DB Transactional Data Store Applications" />
11    <link rel="prev" href="transapp_fail.html" title="Handling failure in Transactional Data Store applications" />
12    <link rel="next" href="transapp_env_open.html" title="Opening the environment" />
13  </head>
14  <body>
15    <div class="navheader">
16      <table width="100%" summary="Navigation header">
17        <tr>
18          <th colspan="3" align="center">Architecting Transactional Data Store applications</th>
19        </tr>
20        <tr>
21          <td width="20%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td>
22          <th width="60%" align="center">Chapter 11. 
23		Berkeley DB Transactional Data Store Applications
24        </th>
25          <td width="20%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td>
26        </tr>
27      </table>
28      <hr />
29    </div>
30    <div class="sect1" lang="en" xml:lang="en">
31      <div class="titlepage">
32        <div>
33          <div>
34            <h2 class="title" style="clear: both"><a id="transapp_app"></a>Architecting Transactional Data Store applications</h2>
35          </div>
36        </div>
37      </div>
38      <p>
39    When building Transactional Data Store applications, the architecture
40    decisions involve application startup (running recovery) and handling
41    system or application failure.  For details on performing recovery, see
42    the <a class="xref" href="transapp_recovery.html" title="Recovery procedures">Recovery procedures</a>.
43</p>
44      <p>
45    Recovery in a database environment is a single-threaded procedure, that
46    is, one thread of control or process must complete database environment
47    recovery before any other thread of control or process operates in the
48    Berkeley DB environment.
49</p>
50      <p>
51    Performing recovery first marks any existing database environment as
52    "failed" and then removes it, causing threads of control running in the
53    database environment to fail and return to the application.  This
54    feature allows applications to recover environments without concern for
55    threads of control that might still be running in the removed
56    environment.  The subsequent re-creation of the database environment is
57    serialized, so multiple threads of control attempting to create a
58    database environment will serialize behind a single creating thread.
59</p>
60      <p>
61    One consideration in removing (as part of recovering) a database
62    environment which may be in use by another thread, is the type of mutex
63    being used by the Berkeley DB library.  In the case of database
64    environment failure when using test-and-set mutexes, threads of control
65    waiting on a mutex when the environment is marked "failed" will quickly
66    notice the failure and will return an error from the Berkeley DB API.
67    In the case of environment failure when using blocking mutexes, where
68    the underlying system mutex implementation does not unblock mutex
69    waiters after the thread of control holding the mutex dies, threads
70    waiting on a mutex when an environment is recovered might hang forever.
71    Applications blocked on events (for example, an application blocked on
72    a network socket, or a GUI event) may also fail to notice environment
73    recovery within a reasonable amount of time.  Systems with such mutex
74    implementations are rare, but do exist; applications on such systems
75    should use an application architecture where the thread recovering the
76    database environment can explicitly terminate any process using the
77    failed environment, or configure Berkeley DB for test-and-set mutexes,
78    or incorporate some form of long-running timer or watchdog process to
79    wake or kill blocked processes should they block for too long.
80</p>
81      <p>
82    Regardless, it makes little sense for multiple threads of control to
83    simultaneously attempt recovery of a database environment, since the
84    last one to run will remove all database environments created by the
85    threads of control that ran before it.  However, for some applications,
86    it may make sense for applications to have a single thread of control
87    that performs recovery and then removes the database environment, after
88    which the application launches a number of processes, any of which will
89    create the database environment and continue forward.
90</p>
91      <p>
92    There are three common ways to architect Berkeley DB Transactional Data
93    Store applications.  The one chosen is usually based on whether or not
94    the application is comprised of a single process or group of processes
95    descended from a single process (for example, a server started when the
96    system first boots), or if the application is comprised of unrelated
97    processes (for example, processes started by web connections or users
98    logged into the system).
99</p>
100      <div class="orderedlist">
101        <ol type="1">
102          <li>
103            <p>
104            The first way to architect Transactional Data Store
105            applications is as a single process (the process may or may not
106            be multithreaded.)
107        </p>
108            <p>
109            When this process starts, it runs recovery on the database
110            environment and then opens its databases.  The application can
111            subsequently create new threads as it chooses.  Those threads
112            can either share already open Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and <a href="../api_reference/C/db.html" class="olink">DB</a>
113            handles, or create their own.  In this architecture, databases
114            are rarely opened or closed when more than a single thread of
115            control is running; that is, they are opened when only a single
116            thread is running, and closed after all threads but one have
117            exited.  The last thread of control to exit closes the
118            databases and the database environment.
119        </p>
120            <p>
121            This architecture is simplest to implement because thread
122            serialization is easy and failure detection does not require
123            monitoring multiple processes.
124        </p>
125            <p>
126            If the application's thread model allows processes to continue
127            after thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> method can be used to
128            determine if the database environment is usable after thread
129            failure.  If the application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>, or
130            <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns 
131            <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>,
132            the application must
133            behave as if there has been a system failure, performing
134            recovery and re-creating the database environment.  Once these
135            actions have been taken, other threads of control can continue
136            (as long as all existing Berkeley DB handles are first
137            discarded).
138        </p>
139          </li>
140          <li>
141            <p>
142            The second way to architect Transactional Data Store
143            applications is as a group of related processes (the processes
144            may or may not be multithreaded).
145        </p>
146            <p>
147            This architecture requires the order in which threads of control are
148            created be controlled to serialize database environment recovery.
149        </p>
150            <p>
151            In addition, this architecture requires that threads of control
152            be monitored.  If any thread of control exits with open
153            Berkeley DB handles, the application may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>
154            method to detect lost mutexes and locks and determine if the
155            application can continue.  If the application does not call
156            <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>, or <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns that the database
157            environment can no longer be used, the application must behave
158            as if there has been a system failure, performing recovery and
159            creating a new database environment.  Once these actions have
160            been taken, other threads of control can be continued (as long
161            as all existing Berkeley DB handles are first discarded), or
162
163        </p>
164            <p>
165            The easiest way to structure groups of related processes is to
166            first create a single "watcher" process (often a script) that
167            starts when the system first boots, runs recovery on the
168            database environment and then creates the processes or threads
169            that will actually perform work.  The initial thread has no
170            further responsibilities other than to wait on the threads of
171            control it has started, to ensure none of them unexpectedly
172            exit.  If a thread of control exits, the watcher process
173            optionally calls the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> method.  If the application
174            does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns that the
175            environment can no longer be used, the watcher kills all of the
176            threads of control using the failed environment, runs recovery,
177            and starts new threads of control to perform work.
178        </p>
179          </li>
180          <li>
181            <p>
182            The third way to architect Transactional Data Store
183            applications is as a group of unrelated processes (the
184            processes may or may not be multithreaded).   This is the most
185            difficult architecture to implement because of the level of
186            difficulty in some systems of finding and monitoring unrelated
187            processes.  There are several possible techniques to implement
188            this architecture.
189        </p>
190            <p>
191            One solution is to log a thread of control ID when a new
192            Berkeley DB handle is opened.  For example, an initial
193            "watcher" process could run recovery on the database
194            environment and then create a sentinel file.  Any "worker"
195            process wanting to use the environment would check for the
196            sentinel file.  If the sentinel file does not exist, the worker
197            would fail or wait for the sentinel file to be created.  Once
198            the sentinel file exists, the worker would register its process
199            ID with the watcher (via shared memory, IPC or some other
200            registry mechanism), and then the worker would open its
201            <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed.  When the worker finishes
202            using the environment, it would unregister its process ID with
203            the watcher.  The watcher periodically checks to ensure that no
204            worker has failed while using the environment.  If a worker
205            fails while using the environment, the watcher removes the
206            sentinel file, kills all of the workers currently using the
207            environment, runs recovery on the environment, and finally
208            creates a new sentinel file.
209        </p>
210            <p>
211            The weakness of this approach is that, on some systems, it is
212            difficult to determine if an unrelated process is still
213            running.  For example, POSIX systems generally disallow sending
214            signals to unrelated processes.  The trick to monitoring
215            unrelated processes is to find a system resource held by the
216            process that will be modified if the process dies.  On POSIX
217            systems, flock- or fcntl-style locking will work, as will
218            LockFile on Windows systems.  Other systems may have to use
219            other process-related information such as file reference counts
220            or modification times.  In the worst case, threads of control
221            can be required to periodically re-register with the watcher
222            process: if the watcher has not heard from a thread of control
223            in a specified period of time, the watcher will take action,
224            recovering the environment.
225        </p>
226            <p>
227            The Berkeley DB library includes one built-in implementation of this approach,
228            the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV-&gt;open()</a> method's <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag:
229        </p>
230            <p>
231            If the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is set, each process opening the
232            database environment first checks to see if recovery needs to
233            be performed.  If recovery needs to be performed for any reason
234            (including the initial creation of the database environment),
235            and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is also specified, recovery will be performed
236            and then the open will proceed normally.  If recovery needs to
237            be performed and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not specified, 
238            <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>
239            will be returned.  If recovery does not need to be performed,
240            <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> will be ignored.
241        </p>
242            <p>
243      Prior to the actual recovery beginning, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_PANIC" class="olink">DB_EVENT_REG_PANIC</a>
244      event is set for the environment.  Processes in the application using
245      the <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV-&gt;set_event_notify()</a> method will be notified when they do their next
246      operations in the environment.  Processes receiving this event should
247      exit the environment.   Also, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_ALIVE" class="olink">DB_EVENT_REG_ALIVE</a> event will be
248      triggered if there are other processes currently attached to the
249      environment.   Only the process doing the recovery will receive this
250      event notification. It will receive this notification once for each
251      process still attached to the environment. The parameter of the
252      <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV-&gt;set_event_notify()</a> callback will contain the process identifier of the
253      process still attached.  The process doing the recovery can then
254      signal the attached process or perform some other operation prior to
255      recovery (i.e. kill the attached process). 
256	</p>
257            <p>
258      The <a href="../api_reference/C/envset_timeout.html" class="olink">DB_ENV-&gt;set_timeout()</a> method's <a href="../api_reference/C/envset_timeout.html#set_timeout_DB_SET_REG_TIMEOUT" class="olink">DB_SET_REG_TIMEOUT</a> flag can be set to
259      establish a wait period before starting recovery.  This creates a
260      window of time for other processes to receive the DB_EVENT_REG_PANIC
261      event and exit the environment.
262	</p>
263            <p>
264            There are three additional requirements for the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a>
265            architecture to work:
266        </p>
267            <div class="itemizedlist">
268              <ul type="disc">
269                <li>
270                  <p>
271            First, all applications using the database environment must
272            specify the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag when opening the environment.
273            However, there is no additional requirement if the application
274            chooses a single process to recover the environment, as the
275            first process to open the database environment will know to
276            perform recovery.
277        </p>
278                </li>
279                <li>
280                  <p>
281            Second, there can only be a single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database
282            environment in each process.  As the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> locking is
283            per-process, not per-thread, multiple <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles in a single
284            environment could race with each other, potentially causing
285            data corruption.
286        </p>
287                </li>
288                <li>
289                  <p>
290            Third, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> implementation does not explicitly
291            terminate processes using the database environment which is
292            being recovered.  Instead, it relies on the processes
293            themselves noticing the database environment has been discarded
294            from underneath them.  For this reason, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag
295            should be used with a mutex implementation that does not block
296            in the operating system, as that risks a thread of control
297            blocking forever on a mutex which will never be granted.  Using
298            any test-and-set mutex implementation ensures this cannot
299            happen, and for that reason the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is generally
300            used with a test-and-set mutex implementation.
301        </p>
302                </li>
303              </ul>
304            </div>
305            <p>
306            A second solution for groups of unrelated processes is also
307            based on a "watcher process".  This solution is intended for
308            systems where it is not practical to monitor the processes
309            sharing a database environment, but it is possible to monitor
310            the environment to detect if a thread of control has failed
311            holding open Berkeley DB handles.  This would be done by having
312            a "watcher" process periodically call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> method.
313            If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns that the environment can no longer be
314            used, the watcher would then take action, recovering the
315            environment.
316        </p>
317            <p>
318            The weakness of this approach is that all threads of control
319            using the environment must specify an "ID" function and an
320            "is-alive" function using the <a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV-&gt;set_thread_id()</a> method.  (In
321            other words, the Berkeley DB library must be able to assign a
322            unique ID to each thread of control, and additionally determine
323            if the thread of control is still running.  It can be difficult
324            to portably provide that information in applications using a
325            variety of different programming languages and running on a
326            variety of different platforms.)
327        </p>
328            <p>
329	  A third solution for groups of unrelated processes is a hybrid of the two
330	  above.  Along with implementing the built-in sentinel approach with the
331	  the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV-&gt;open()</a> methods <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag, the <a href="../api_reference/C/envopen.html#envopen_DB_FAILCHK" class="olink">DB_FAILCHK</a> flag can be specified.
332	  When using both flags, each process opening the database environment first
333	  checks to see if recocvery needs to be performed.  If recovery needs to be
334	  performed for any reason, it will first determine if a thread of control
335	  exited while holding database read locks, and release those.  Then it will
336	  abort any unresolved transactions.   If these steps are successful, the process
337	  opening the environment will continue without the need for any
338	  additional recocvery.   If these steps are unsuccessful, then additional
339	  recovery will be performed if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is specified and if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not
340	  specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>will be returned.
341	</p>
342            <p>
343	  Since this solution is hybrid of the first two, all of the requirements of both
344	  of them must be implemented (will need "ID" function, "is-alive" function,
345	  single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database, etc.)
346	</p>
347            <p>
348            The described approaches are different, and should not be
349            combined.  Applications might use either the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a>
350            approach, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> or the hybrid approach, but not together in
351            the same application.  For example, a POSIX application written
352            as a library underneath a wide variety of interfaces and
353            differing APIs might choose the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach for a
354            few reasons: first, it does not require making periodic calls
355            to the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> method; second, when implementing in a
356            variety of languages, is may be more difficult to specify
357            unique IDs for each thread of control;  third, it may be more
358            difficult determine if a thread of control is still running, as
359            any particular thread of control is likely to lack sufficient
360            permissions to signal other processes.  Alternatively, an
361            application with a dedicated watcher process, running with
362            appropriate permissions, might choose the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> approach
363            as supporting higher overall throughput and reliability, as
364            that approach allows the application to abort unresolved
365            transactions and continue forward without having to recover the
366            database environment.   The hybrid approach is useful in situations
367	    where running a dedicated watcher process is not practical but getting the
368	    equivalent of  <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> on the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV-&gt;open()</a> is important.
369        </p>
370          </li>
371        </ol>
372      </div>
373      <p>
374    Obviously, when implementing a process to monitor other threads of
375    control, it is important the watcher process' code be as simple and
376    well-tested as possible, because the application may hang if it fails.
377</p>
378    </div>
379    <div class="navfooter">
380      <hr />
381      <table width="100%" summary="Navigation footer">
382        <tr>
383          <td width="40%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td>
384          <td width="20%" align="center">
385            <a accesskey="u" href="transapp.html">Up</a>
386          </td>
387          <td width="40%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td>
388        </tr>
389        <tr>
390          <td width="40%" align="left" valign="top">Handling failure in Transactional Data Store applications </td>
391          <td width="20%" align="center">
392            <a accesskey="h" href="index.html">Home</a>
393          </td>
394          <td width="40%" align="right" valign="top"> Opening the environment</td>
395        </tr>
396      </table>
397    </div>
398  </body>
399</html>
400