1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>Designing Your Application for Recovery</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
9    <link rel="start" href="index.html" title="Getting Started with Berkeley DB Transaction Processing" />
10    <link rel="up" href="filemanagement.html" title="Chapter��5.��Managing DB Files" />
11    <link rel="prev" href="recovery.html" title="Recovery Procedures" />
12    <link rel="next" href="hotfailover.html" title="Using Hot Failovers" />
13  </head>
14  <body>
15    <div class="navheader">
16      <table width="100%" summary="Navigation header">
17        <tr>
18          <th colspan="3" align="center">Designing Your Application for Recovery</th>
19        </tr>
20        <tr>
21          <td width="20%" align="left"><a accesskey="p" href="recovery.html">Prev</a>��</td>
22          <th width="60%" align="center">Chapter��5.��Managing DB Files</th>
23          <td width="20%" align="right">��<a accesskey="n" href="hotfailover.html">Next</a></td>
24        </tr>
25      </table>
26      <hr />
27    </div>
28    <div class="sect1" lang="en" xml:lang="en">
29      <div class="titlepage">
30        <div>
31          <div>
32            <h2 class="title" style="clear: both"><a id="architectrecovery"></a>Designing Your Application for Recovery</h2>
33          </div>
34        </div>
35      </div>
36      <div class="toc">
37        <dl>
38          <dt>
39            <span class="sect2">
40              <a href="architectrecovery.html#multithreadrecovery">Recovery for Multi-Threaded Applications</a>
41            </span>
42          </dt>
43          <dt>
44            <span class="sect2">
45              <a href="architectrecovery.html#multiprocessrecovery">Recovery in Multi-Process Applications</a>
46            </span>
47          </dt>
48        </dl>
49      </div>
50      <p>
51            When building your DB application, you should consider how you will run recovery. If you are building a
52            single threaded, single process application, it is fairly simple to run recovery when your application first
53            opens its environment. In this case, you need only decide if you want to run recovery every time you open
54            your application (recommended) or only some of the time, presumably triggered by a start up option
55            controlled by your application's user.
56        </p>
57      <p>
58            However, for multi-threaded and multi-process applications, you need to carefully consider how you will
59            design your application's startup code so as to run recovery only when it makes sense to do so.
60        </p>
61      <div class="sect2" lang="en" xml:lang="en">
62        <div class="titlepage">
63          <div>
64            <div>
65              <h3 class="title"><a id="multithreadrecovery"></a>Recovery for Multi-Threaded Applications</h3>
66            </div>
67          </div>
68        </div>
69        <p>
70                If your application uses only one environment handle, then handling recovery for a multi-threaded
71                application is no more difficult than for a single threaded application. You simply open the environment
72                in the application's main thread, and then pass that handle to each of the threads that will be
73                performing DB operations. We illustrate this with our final example in this book (see 
74                <a class="xref" href="txnexample_c.html" title="Transaction Example">Transaction Example</a>
75                
76                
77                
78                for more information).
79            </p>
80        <p>
81                Alternatively, you can have each worker thread open its own environment handle. However, in this case,
82                designing for recovery is a bit more complicated. 
83            </p>
84        <p>
85                Generally, when a thread performing database operations fails
86                or hangs, it is frequently best to simply
87                restart the application and run recovery upon application
88                startup as normal. However, not all applications can afford
89                to restart because a single thread has misbehaved. 
90             </p>
91        <p>
92                If you are attempting to continue operations in the face of a misbehaving thread,
93                then at a minimum recovery must be run if a thread performing database operations fails or hangs.
94            </p>
95        <p>
96                Remember that recovery clears the environment of all
97                outstanding locks, including any that might be outstanding
98                from an aborted thread. If these locks are not cleared,
99                other threads performing database operations can back up
100                behind the locks obtained but never cleared by the failed
101                thread. The result will be an application that hangs
102                indefinitely.
103            </p>
104        <p>
105                To run recovery under these circumstances:
106            </p>
107        <div class="orderedlist">
108          <ol type="1">
109            <li>
110              <p>
111                        Suspend or shutdown all other threads performing
112                        database operations.
113                    </p>
114            </li>
115            <li>
116              <p>
117                        Discarding any open environment handles. Note that
118                        attempting to gracefully close these handles may be
119                        asking for trouble; the close can fail if the
120                        environment is already in need of recovery. For
121                        this reason, it is best and easiest to simply discard the handle.
122                    </p>
123            </li>
124            <li>
125              <p>
126                        Open new handles, running recovery as you open
127                        them.
128                        See <a class="xref" href="recovery.html#normalrecovery" title="Normal Recovery">Normal Recovery</a> for more information.
129                    </p>
130            </li>
131            <li>
132              <p>
133                        Restart all your database threads.
134                    </p>
135            </li>
136          </ol>
137        </div>
138        <p>
139                A traditional way to handle this activity is to spawn a watcher thread that is responsible for making
140                sure all is well with your threads, and performing the above actions if not.
141            </p>
142        <p>
143                However, in the case where each worker thread opens and maintains its own environment handle, recovery
144                is complicated for two reasons:
145            </p>
146        <div class="orderedlist">
147          <ol type="1">
148            <li>
149              <p>
150                        For some applications and workloads, it might be
151                        worthwhile to give your database threads the
152                        ability to gracefully finalize any on-going
153                        transactions. If this is the case, your
154                        code must be capable of signaling each thread 
155                        to halt DB activities and close its
156                        environment. If you simply run recovery against the
157                        environment, your database threads will
158                        detect this and fail in the midst of performing their
159                        database operations.
160                    </p>
161            </li>
162            <li>
163              <p>
164                        Your code must be capable of ensuring only one
165                        thread runs recovery before allowing all other
166                        threads to open their respective environment
167                        handles. Recovery should be single threaded because when
168                        recovery is run against an environment, it is
169                        deleted and then recreated. This will cause all
170                        other processes and threads to "fail" when they
171                        attempt operations against the newly recovered
172                        environment. If all threads run recovery
173                        when they start up, then it is likely that some
174                        threads will fail because the environment that they
175                        are using has been recovered. This will cause the thread to have to re-execute its own recovery
176                        path. At best, this is inefficient and at worst it could cause your application to fall into an
177                        endless recovery pattern.
178                    </p>
179            </li>
180          </ol>
181        </div>
182      </div>
183      <div class="sect2" lang="en" xml:lang="en">
184        <div class="titlepage">
185          <div>
186            <div>
187              <h3 class="title"><a id="multiprocessrecovery"></a>Recovery in Multi-Process Applications</h3>
188            </div>
189          </div>
190        </div>
191        <p>
192                Frequently, DB applications use multiple processes to interact with the databases. For example, you may
193                have a long-running process, such as some kind of server, and then a series of administrative tools that
194                you use to inspect and administer the underlying databases. Or, in some web-based architectures, different
195                services are run as independent processes that are managed by the server.
196            </p>
197        <p>
198                In any case, recovery for a multi-process environment is complicated for two reasons:
199            </p>
200        <div class="orderedlist">
201          <ol type="1">
202            <li>
203              <p>
204                        In the event that recovery must be run, you might
205                        want to notify processes interacting with the environment 
206                        that recovery is about to occur and give them a
207                        chance to gracefully terminate. Whether it is
208                        worthwhile for you to do this is entirely dependent
209                        upon the nature of your application. Some
210                        long-running applications with multiple processes
211                        performing meaningful work might want to do this.
212                        Other applications with processes performing database 
213                        operations that are likely to be harmed by error conditions in other
214                        processes will likely find it to be not worth the
215                        effort. For this latter group, the chances of
216                        performing a graceful shutdown may be low anyway.
217                    </p>
218            </li>
219            <li>
220              <p>
221                        Unlike single process scenarios, it can quickly become wasteful for every process interacting
222                        with the databases to run recovery when it starts up. This is partly because recovery
223                        <span class="emphasis"><em>does</em></span> take some amount of time to run, but mostly you want to 
224                        avoid a situation where your server must
225                        reopen all its environment handles just because you fire up a command line database
226                        administrative utility that always runs recovery.
227                    </p>
228            </li>
229          </ol>
230        </div>
231        <p>
232                DB offers you two methods by which you can manage recovery for multi-process DB applications.
233                Each has different strengths and weaknesses, and they are described in the next sections.
234            </p>
235        <div class="sect3" lang="en" xml:lang="en">
236          <div class="titlepage">
237            <div>
238              <div>
239                <h4 class="title"><a id="mp_recover_effects"></a>Effects of Multi-Process Recovery</h4>
240              </div>
241            </div>
242          </div>
243          <p>
244                    Before continuing, it is worth noting that the following sections describe recovery processes than
245                    can result in one process running recovery while other processes are currently actively performing 
246                    database operations. 
247                </p>
248          <p>
249                    When this happens, the current database operation will
250                    abnormally fail, indicating a DB_RUNRECOVERY condition.
251                    This means that your application should immediately abandon any database operations that it may have
252                    on-going, discard any environment handles it has opened, and obtain and open new handles. 
253                </p>
254          <p>
255                    The net effect of this is that any writes performed by unresolved transactions will be lost.
256                    For persistent applications (servers, for example), the services it provides will also be
257                    unavailable for the amount of time that it takes to complete a recovery and for all participating
258                    processes to reopen their environment handles.
259                </p>
260        </div>
261        <div class="sect3" lang="en" xml:lang="en">
262          <div class="titlepage">
263            <div>
264              <div>
265                <h4 class="title"><a id="db_register"></a>Process Registration</h4>
266              </div>
267            </div>
268          </div>
269          <p>
270                    One way to handle multi-process recovery is for every process to "register" its environment. In
271                    doing so, the process gains the ability to see if any other applications are using the
272                    environment and, if so, whether they have suffered an abnormal termination. If an abnormal
273                    termination is detected, the process runs recovery; otherwise, it does not.
274                </p>
275          <p>
276                    Note that using process registration also ensures that
277                    recovery is serialized across applications. That is,
278                    only one process at a time has a chance to run
279                    recovery. Generally this means that the first process
280                    to start up will run recovery, and all other processes
281                    will silently not run recovery because it is not
282                    needed.
283                </p>
284          <p>
285                    To cause your application to register its environment, you specify
286                        <span>
287                            the <code class="literal">DB_REGISTER</code> flag when you open your environment.
288                            You may also specify <code class="literal">DB_RECOVER</code>. However, it is an error to specify
289                            <code class="literal">DB_RECOVER_FATAL</code> when using
290                            the <code class="literal">DB_REGISTER</code> flag.
291                        </span>
292
293                        
294                        If during the open, DB determines that recovery must be run, it will automatically run the correct 
295                        type of recovery for you, so long as you specify normal recovery
296                        on your environment open. If you do not specify normal recovery, and you register your environment, 
297                        then no recovery is run if the registration process identifies a need for it. In this case, 
298                        the environment open simply fails by 
299                            <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>
300                            
301                </p>
302          <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
303            <h3 class="title">Note</h3>
304            <p>
305                        If you do not specify normal recovery when you open your first registered environment
306                        in the application, then that application will fail the environment open by
307                            <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>
308                            
309                        This is because the first process to register must create an internal
310                        registration file, and recovery is forced when that file is created. To
311                        avoid an abnormal termination of the environment open, specify recovery on
312                        the environment open for at least the first process starting in your
313                        application.
314                    </p>
315          </div>
316          <p>
317                    In addition, if you specify <code class="literal">DB_ENV_FAILCHK</code>
318                    when you register your environment, then a fail check is performed on
319                    environment open (fail checks are described in the next section). If, during the
320                    fail check process, an abnormal termination is detected for any of the processes
321                    involved in the application, DB releases any read locks held by the dead
322                    process and performs transaction aborts as necessary. This is done in an attempt
323                    to clean up the environment.
324                </p>
325          <p>
326                    In this situation, if a general cleanup of the
327                    environment is not possible and normal recovery is not specified on environment
328                    open, then the open will abort, 
329                    <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>
330                    
331                    However, if this situation occurs and recovery was specified, then the appropriate type of recovery 
332                    (normal or fatal) is run so as to bring the environment back to a healthy state.
333                </p>
334          <p>
335                    Be aware that there are some limitations/requirements if you want your various processes to
336                    coordinate recovery using registration:
337                </p>
338          <div class="orderedlist">
339            <ol type="1">
340              <li>
341                <p>
342                            There can be only one environment handle per
343                            environment per process. In the case of multi-threaded
344                            processes, the environment handle must be shared across threads.
345                        </p>
346              </li>
347              <li>
348                <p>
349                            All processes sharing the environment must use registration. If registration is
350                            not uniformly used across all participating processes, then you can see inconsistent results 
351                            in terms of your application's ability to recognize that recovery must be run.
352                        </p>
353              </li>
354            </ol>
355          </div>
356        </div>
357        <div class="sect3" lang="en" xml:lang="en">
358          <div class="titlepage">
359            <div>
360              <div>
361                <h4 class="title"><a id="failchk"></a>Failure Checking</h4>
362              </div>
363            </div>
364          </div>
365          <p>
366                    For very large and robust multi-process applications, the most common way to ensure all the
367                    processes are working as intended is to make use of a watchdog process. To assist a watchdog
368                    process, DB offers a failure checking mechanism.
369                </p>
370          <p>
371                    When a thread of control fails with open environment handles, the result is that there may be
372                    resources left locked or corrupted. Other threads of control may encountered these unavailable resources 
373                    quickly or not at all, depending on data access patterns.
374                </p>
375          <p>
376                    In any case, the DB failure checking mechanism allows a watchdog to detect if an environment is 
377                    unusable as a result of a thread of control failure. It should be called periodically 
378                    (for example, once a minute) from the watchdog process. If the environment is deemed unusable, then
379                    the watchdog process is notified that recovery should be run. It is then up to the watchdog to
380                    actually run recovery. It is also the watchdog's responsibility to decide what to do about currently
381                    running processes before running recovery. The watchdog could, for example, attempt to
382                    gracefully shutdown or kill all relevant processes before running recovery.
383                </p>
384          <p>
385                    Note that failure checking need not be run from a separate process, although conceptually that is
386                    how the mechanism is meant to be used. This same mechanism could be used in a multi-threaded
387                    application that wants to have a watchdog thread.
388                </p>
389          <p>
390                    To use failure checking you must:
391                </p>
392          <div class="orderedlist">
393            <ol type="1">
394              <li>
395                <p> 
396                            <span>
397                                Provide an <code class="function">is_alive()</code> call back using the 
398                                <code class="methodname">DB_ENV-&gt;set_isalive()</code> 
399                                 
400                                method. 
401                            </span>
402                            
403                            
404                                DB uses this method to determine whether a specified process and thread
405                                is alive when the failure checking is performed.
406                        </p>
407              </li>
408              <li>
409                <p>
410                            Possibly provide a 
411                            
412                                <span>
413                                <code class="literal">thread_id</code> callback 
414                                </span>
415                                
416                            
417                                
418                                
419                            that uniquely identifies a process
420                            and thread of control. This 
421                                <span>callback</span>
422                            
423                                
424
425                            is only necessary if the standard process and thread
426                            identification functions for your platform are not sufficient to for use by failure
427                            checking. This is rarely necessary and is usually because the thread and/or process ids 
428                            used by your system cannot fit into an unsigned integer. 
429                        </p>
430                <p>
431                            You provide this callback using the
432                                <code class="methodname">DB_ENV-&gt;set_thread_id()</code>
433                                
434                            method. See the API reference for this method for more information on when setting a thread
435                            id callback might be necessary.
436                        </p>
437              </li>
438              <li>
439                <p>
440                            Call the 
441                                <code class="methodname">DB_ENV-&gt;failchk()</code>
442                                
443                            
444                                
445
446                            method periodically. You can do this either periodically (once per minute, for example), or
447                            whenever a thread of control exits for your application.
448                        </p>
449                <p>
450                            If this method determines that a thread of control exited holding read locks, those locks
451                            are automatically released. If the thread of control exited with an unresolved transaction,
452                            that transaction is aborted. If any other problems exist beyond these such that the
453                            environment must be recovered, the method will
454                                <span>return <code class="literal">DB_RUNRECOVERY</code>.</span>
455                            
456                                
457                        </p>
458              </li>
459            </ol>
460          </div>
461        </div>
462      </div>
463    </div>
464    <div class="navfooter">
465      <hr />
466      <table width="100%" summary="Navigation footer">
467        <tr>
468          <td width="40%" align="left"><a accesskey="p" href="recovery.html">Prev</a>��</td>
469          <td width="20%" align="center">
470            <a accesskey="u" href="filemanagement.html">Up</a>
471          </td>
472          <td width="40%" align="right">��<a accesskey="n" href="hotfailover.html">Next</a></td>
473        </tr>
474        <tr>
475          <td width="40%" align="left" valign="top">Recovery Procedures��</td>
476          <td width="20%" align="center">
477            <a accesskey="h" href="index.html">Home</a>
478          </td>
479          <td width="40%" align="right" valign="top">��Using Hot Failovers</td>
480        </tr>
481      </table>
482    </div>
483  </body>
484</html>
485