1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>Designing Your Application for Recovery</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
9    <link rel="home" href="index.html" title="Getting Started with Berkeley DB Transaction Processing" />
10    <link rel="up" href="filemanagement.html" title="Chapter 5. Managing DB Files" />
11    <link rel="previous" href="recovery.html" title="Recovery Procedures" />
12    <link rel="next" href="hotfailover.html" title="Using Hot Failovers" />
13  </head>
14  <body>
15    <div class="navheader">
16      <table width="100%" summary="Navigation header">
17        <tr>
18          <th colspan="3" align="center">Designing Your Application for Recovery</th>
19        </tr>
20        <tr>
21          <td width="20%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td>
22          <th width="60%" align="center">Chapter 5. Managing DB Files</th>
23          <td width="20%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td>
24        </tr>
25      </table>
26      <hr />
27    </div>
28    <div class="sect1" lang="en" xml:lang="en">
29      <div class="titlepage">
30        <div>
31          <div>
32            <h2 class="title" style="clear: both"><a id="architectrecovery"></a>Designing Your Application for Recovery</h2>
33          </div>
34        </div>
35        <div></div>
36      </div>
37      <p>
38            When building your DB application, you should consider how you will run recovery. If you are building a
39            single threaded, single process application, it is fairly simple to run recovery when your application first
40            opens its environment. In this case, you need only decide if you want to run recovery every time you open
41            your application (recommended) or only some of the time, presumably triggered by a start up option
42            controlled by your application's user.
43        </p>
44      <p>
45            However, for multi-threaded and multi-process applications, you need to carefully consider how you will
46            design your application's startup code so as to run recovery only when it makes sense to do so.
47        </p>
48      <div class="sect2" lang="en" xml:lang="en">
49        <div class="titlepage">
50          <div>
51            <div>
52              <h3 class="title"><a id="multithreadrecovery"></a>Recovery for Multi-Threaded Applications</h3>
53            </div>
54          </div>
55          <div></div>
56        </div>
57        <p>
58                If your application uses only one environment handle, then handling recovery for a multi-threaded
59                application is no more difficult than for a single threaded application. You simply open the environment
60                in the application's main thread, and then pass that handle to each of the threads that will be
61                performing DB operations. We illustrate this with our final example in this book (see 
62                <a href="txnexample_c.html">Transaction Example</a>
63                
64                
65                
66                for more information).
67            </p>
68        <p>
69                Alternatively, you can have each worker thread open its own environment handle. However, in this case,
70                designing for recovery is a bit more complicated. 
71            </p>
72        <p>
73                Generally, when a thread performing database operations fails
74                or hangs, it is frequently best to simply
75                restart the application and run recovery upon application
76                startup as normal. However, not all applications can afford
77                to restart because a single thread has misbehaved. 
78             </p>
79        <p>
80                If you are attempting to continue operations in the face of a misbehaving thread,
81                then at a minimum recovery must be run if a thread performing database operations fails or hangs.
82            </p>
83        <p>
84                Remember that recovery clears the environment of all
85                outstanding locks, including any that might be outstanding
86                from an aborted thread. If these locks are not cleared,
87                other threads performing database operations can back up
88                behind the locks obtained but never cleared by the failed
89                thread. The result will be an application that hangs
90                indefinitely.
91            </p>
92        <p>
93                To run recovery under these circumstances:
94            </p>
95        <div class="orderedlist">
96          <ol type="1">
97            <li>
98              <p>
99                        Suspend or shutdown all other threads performing
100                        database operations.
101                    </p>
102            </li>
103            <li>
104              <p>
105                        Discarding any open environment handles. Note that
106                        attempting to gracefully close these handles may be
107                        asking for trouble; the close can fail if the
108                        environment is already in need of recovery. For
109                        this reason, it is best and easiest to simply discard the handle.
110                    </p>
111            </li>
112            <li>
113              <p>
114                        Open new handles, running recovery as you open
115                        them.
116                        See <a href="recovery.html#normalrecovery">Normal Recovery</a> for more information.
117                    </p>
118            </li>
119            <li>
120              <p>
121                        Restart all your database threads.
122                    </p>
123            </li>
124          </ol>
125        </div>
126        <p>
127                A traditional way to handle this activity is to spawn a watcher thread that is responsible for making
128                sure all is well with your threads, and performing the above actions if not.
129            </p>
130        <p>
131                However, in the case where each worker thread opens and maintains its own environment handle, recovery
132                is complicated for two reasons:
133            </p>
134        <div class="orderedlist">
135          <ol type="1">
136            <li>
137              <p>
138                        For some applications and workloads, it might be
139                        worthwhile to give your database threads the
140                        ability to gracefully finalize any on-going
141                        transactions. If this is the case, your
142                        code must be capable of signaling each thread 
143                        to halt DB activities and close its
144                        environment. If you simply run recovery against the
145                        environment, your database threads will
146                        detect this and fail in the midst of performing their
147                        database operations.
148                    </p>
149            </li>
150            <li>
151              <p>
152                        Your code must be capable of ensuring only one
153                        thread runs recovery before allowing all other
154                        threads to open their respective environment
155                        handles. Recovery should be single threaded because when
156                        recovery is run against an environment, it is
157                        deleted and then recreated. This will cause all
158                        other processes and threads to "fail" when they
159                        attempt operations against the newly recovered
160                        environment. If all threads run recovery
161                        when they start up, then it is likely that some
162                        threads will fail because the environment that they
163                        are using has been recovered. This will cause the thread to have to re-execute its own recovery
164                        path. At best, this is inefficient and at worst it could cause your application to fall into an
165                        endless recovery pattern.
166                    </p>
167            </li>
168          </ol>
169        </div>
170      </div>
171      <div class="sect2" lang="en" xml:lang="en">
172        <div class="titlepage">
173          <div>
174            <div>
175              <h3 class="title"><a id="multiprocessrecovery"></a>Recovery in Multi-Process Applications</h3>
176            </div>
177          </div>
178          <div></div>
179        </div>
180        <p>
181                Frequently, DB applications use multiple processes to interact with the databases. For example, you may
182                have a long-running process, such as some kind of server, and then a series of administrative tools that
183                you use to inspect and administer the underlying databases. Or, in some web-based architectures, different
184                services are run as independent processes that are managed by the server.
185            </p>
186        <p>
187                In any case, recovery for a multi-process environment is complicated for two reasons:
188            </p>
189        <div class="orderedlist">
190          <ol type="1">
191            <li>
192              <p>
193                        In the event that recovery must be run, you might
194                        want to notify processes interacting with the environment 
195                        that recovery is about to occur and give them a
196                        chance to gracefully terminate. Whether it is
197                        worthwhile for you to do this is entirely dependent
198                        upon the nature of your application. Some
199                        long-running applications with multiple processes
200                        performing meaningful work might want to do this.
201                        Other applications with processes performing database 
202                        operations that are likely to be harmed by error conditions in other
203                        processes will likely find it to be not worth the
204                        effort. For this latter group, the chances of
205                        performing a graceful shutdown may be low anyway.
206                    </p>
207            </li>
208            <li>
209              <p>
210                        Unlike single process scenarios, it can quickly become wasteful for every process interacting
211                        with the databases to run recovery when it starts up. This is partly because recovery
212                        <span class="emphasis"><em>does</em></span> take some amount of time to run, but mostly you want to 
213                        avoid a situation where your server must
214                        reopen all its environment handles just because you fire up a command line database
215                        administrative utility that always runs recovery.
216                    </p>
217            </li>
218          </ol>
219        </div>
220        <p>
221                DB offers you two methods by which you can manage recovery for multi-process DB applications.
222                Each has different strengths and weaknesses, and they are described in the next sections.
223            </p>
224        <div class="sect3" lang="en" xml:lang="en">
225          <div class="titlepage">
226            <div>
227              <div>
228                <h4 class="title"><a id="mp_recover_effects"></a>Effects of Multi-Process Recovery</h4>
229              </div>
230            </div>
231            <div></div>
232          </div>
233          <p>
234                    Before continuing, it is worth noting that the following sections describe recovery processes than
235                    can result in one process running recovery while other processes are currently actively performing 
236                    database operations. 
237                </p>
238          <p>
239                    When this happens, the current database operation will
240                    abnormally fail, indicating a DB_RUNRECOVERY condition.
241                    This means that your application should immediately abandon any database operations that it may have
242                    on-going, discard any environment handles it has opened, and obtain and open new handles. 
243                </p>
244          <p>
245                    The net effect of this is that any writes performed by unresolved transactions will be lost.
246                    For persistent applications (servers, for example), the services it provides will also be
247                    unavailable for the amount of time that it takes to complete a recovery and for all participating
248                    processes to reopen their environment handles.
249                </p>
250        </div>
251        <div class="sect3" lang="en" xml:lang="en">
252          <div class="titlepage">
253            <div>
254              <div>
255                <h4 class="title"><a id="db_register"></a>Process Registration</h4>
256              </div>
257            </div>
258            <div></div>
259          </div>
260          <p>
261                    One way to handle multi-process recovery is for every process to "register" its environment. In
262                    doing so, the process gains the ability to see if any other applications are using the
263                    environment and, if so, whether they have suffered an abnormal termination. If an abnormal
264                    termination is detected, the process runs recovery; otherwise, it does not.
265                </p>
266          <p>
267                    Note that using process registration also ensures that
268                    recovery is serialized across applications. That is,
269                    only one process at a time has a chance to run
270                    recovery. Generally this means that the first process
271                    to start up will run recovery, and all other processes
272                    will silently not run recovery because it is not
273                    needed.
274                </p>
275          <p>
276                    To cause your application to register its environment, you specify
277                        <span>
278                            the <tt class="literal">DB_REGISTER</tt> flag when you open your environment.
279                            Note that you must also specify <tt class="literal">DB_RECOVER</tt> or 
280                            <tt class="literal">DB_RECOVER_FATAL</tt> for your environment open.
281                        </span>
282
283                        
284                        If during the open, DB determines that recovery must be run, this indicates the type of
285                        recovery that is run. If you do not specify either type of recovery, then no recovery is run if
286                        the registration process identifies a need for it. In this case, the environment open simply
287                        fails by 
288                            <span>returning <tt class="literal">DB_RUNRECOVERY</tt>.</span>
289                            
290                </p>
291          <p>
292                    Be aware that there are some limitations/requirements if you want your various processes to
293                    coordinate recovery using this registration process:
294                </p>
295          <div class="orderedlist">
296            <ol type="1">
297              <li>
298                <p>
299                            There can be only one environment handle per
300                            environment per process. In the case of multi-threaded
301                            processes, the environment handle must be shared across threads.
302                        </p>
303              </li>
304              <li>
305                <p>
306                            All processes sharing the environment must use registration. If registration is
307                            not uniformly used across all participating processes, then you can see inconsistent results 
308                            in terms of your application's ability to recognize that recovery must be run.
309                        </p>
310              </li>
311              <li>
312                <p>
313                            You can not use this mechanism with the <tt class="methodname">failchk()</tt>
314                             mechanism
315                            described in the next section.
316                        </p>
317              </li>
318            </ol>
319          </div>
320        </div>
321        <div class="sect3" lang="en" xml:lang="en">
322          <div class="titlepage">
323            <div>
324              <div>
325                <h4 class="title"><a id="failchk"></a>Failure Checking</h4>
326              </div>
327            </div>
328            <div></div>
329          </div>
330          <p>
331                    For very large and robust multi-process applications, the most common way to ensure all the
332                    processes are working as intended is to make use of a watchdog process. To assist a watchdog
333                    process, DB offers a failure checking mechanism.
334                </p>
335          <p>
336                    When a thread of control fails with open environment handles, the result is that there may be
337                    resources left locked or corrupted. Other threads of control may encountered these unavailable resources 
338                    quickly or not at all, depending on data access patterns.
339                </p>
340          <p>
341                    In any case, the DB failure checking mechanism allows a watchdog to detect if an environment is 
342                    unusable as a result of a thread of control failure. It should be called periodically 
343                    (for example, once a minute) from the watchdog process. If the environment is deemed unusable, then
344                    the watchdog process is notified that recovery should be run. It is then up to the watchdog to
345                    actually run recovery. It is also the watchdog's responsibility to decide what to do about currently
346                    running processes before running recovery. The watchdog could, for example, attempt to
347                    gracefully shutdown or kill all relevant processes before running recovery.
348                </p>
349          <p>
350                    Note that failure checking need not be run from a separate process, although conceptually that is
351                    how the mechanism is meant to be used. This same mechanism could be used in a multi-threaded
352                    application that wants to have a watchdog thread.
353                </p>
354          <p>
355                    To use failure checking you must:
356                </p>
357          <div class="orderedlist">
358            <ol type="1">
359              <li>
360                <p> 
361                            <span>
362                                Provide an <tt class="function">is_alive()</tt> call back using the 
363                                <tt class="methodname">DB_ENV-&gt;set_isalive()</tt> 
364                                 
365                                method. 
366                            </span>
367                            
368                                DB uses this method to determine whether a specified process and thread
369                                is alive when the failure checking is performed.
370                        </p>
371              </li>
372              <li>
373                <p>
374                            Possibly provide a 
375                            
376                                <span>
377                                <tt class="literal">thread_id</tt> callback 
378                                </span>
379                                
380                                
381                                
382                            that uniquely identifies a process
383                            and thread of control. This 
384                                <span>callback</span>
385                                
386                            is only necessary if the standard process and thread
387                            identification functions for your platform are not sufficient to for use by failure
388                            checking. This is rarely necessary and is usually because the thread and/or process ids 
389                            used by your system cannot fit into an unsigned integer. 
390                        </p>
391                <p>
392                            You provide this callback using the
393                                <tt class="methodname">DB_ENV-&gt;set_thread_id()</tt>
394                                
395                            method. See the API reference for this method for more information on when setting a thread
396                            id callback might be necessary.
397                        </p>
398              </li>
399              <li>
400                <p>
401                            Call the 
402                                <tt class="methodname">DB_ENV-&gt;failchk()</tt>
403                                
404                                
405                            method periodically. You can do this either periodically (once per minute, for example), or
406                            whenever a thread of control exits for your application.
407                        </p>
408                <p>
409                            If this method determines that a thread of control exited holding read locks, those locks
410                            are automatically released. If the thread of control exited with an unresolved transaction,
411                            that transaction is aborted. If any other problems exist beyond these such that the
412                            environment must be recovered, the method will
413                                <span>return <tt class="literal">DB_RUNRECOVERY</tt>.</span>
414                                
415                        </p>
416              </li>
417            </ol>
418          </div>
419          <p>
420                    Note that this mechanism should not be mixed with the process registration method of multi-process
421                    recovery described in the previous section.
422                </p>
423        </div>
424      </div>
425    </div>
426    <div class="navfooter">
427      <hr />
428      <table width="100%" summary="Navigation footer">
429        <tr>
430          <td width="40%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td>
431          <td width="20%" align="center">
432            <a accesskey="u" href="filemanagement.html">Up</a>
433          </td>
434          <td width="40%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td>
435        </tr>
436        <tr>
437          <td width="40%" align="left" valign="top">Recovery Procedures </td>
438          <td width="20%" align="center">
439            <a accesskey="h" href="index.html">Home</a>
440          </td>
441          <td width="40%" align="right" valign="top"> Using Hot Failovers</td>
442        </tr>
443      </table>
444    </div>
445  </body>
446</html>
447