• Home
  • History
  • Annotate
  • Line#
  • Navigate
  • Raw
  • Download
  • only in /asuswrt-rt-n18u-9.0.0.4.380.2695/release/src-rt/router/db-4.8.30/docs/programmer_reference/
1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>Elections</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
9    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
10    <link rel="up" href="rep.html" title="Chapter��12.�� Berkeley DB Replication" />
11    <link rel="prev" href="rep_mgrmulti.html" title="Running Replication Manager in multiple processes" />
12    <link rel="next" href="rep_mastersync.html" title="Synchronizing with a master" />
13  </head>
14  <body>
15    <div class="navheader">
16      <table width="100%" summary="Navigation header">
17        <tr>
18          <th colspan="3" align="center">Elections</th>
19        </tr>
20        <tr>
21          <td width="20%" align="left"><a accesskey="p" href="rep_mgrmulti.html">Prev</a>��</td>
22          <th width="60%" align="center">Chapter��12.��
23		Berkeley DB Replication
24        </th>
25          <td width="20%" align="right">��<a accesskey="n" href="rep_mastersync.html">Next</a></td>
26        </tr>
27      </table>
28      <hr />
29    </div>
30    <div class="sect1" lang="en" xml:lang="en">
31      <div class="titlepage">
32        <div>
33          <div>
34            <h2 class="title" style="clear: both"><a id="rep_elect"></a>Elections</h2>
35          </div>
36        </div>
37      </div>
38      <p>Replication Manager automatically conducts elections when necessary,
39based on configuration information supplied to the
40<a href="../api_reference/C/reppriority.html" class="olink">DB_ENV-&gt;rep_set_priority()</a> method.</p>
41      <p>It is the responsibility of a Base API application
42to initiate elections if desired.  It is never dangerous
43to hold an election, as the Berkeley DB election process ensures there is
44never more than a single master database environment.  Clients should
45initiate an election whenever they lose contact with the master
46environment, whenever they see a return of <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_HOLDELECTION" class="olink">DB_REP_HOLDELECTION</a>
47from the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method, or when, for whatever reason, they do
48not know who the master is.  It is not necessary for applications to
49immediately hold elections when they start, as any existing master
50will be discovered after calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a>.  If no master has
51been found after a short wait period, then the application should call
52for an election.</p>
53      <p>For a client to win an election, the replication group must currently
54have no master, and the client must have the most recent log records.
55In the case of clients having equivalent log records, the priority of
56the database environments participating in the election will determine
57the winner.  The application specifies the minimum number of replication
58group members that must participate in an election for a winner to be
59declared.  We recommend at least ((N/2) + 1) members.  If fewer than the
60simple majority are specified, a warning will be given.</p>
61      <p>If an application's policy for what site should win an election can be
62parameterized in terms of the database environment's information (that
63is, the number of sites, available log records and a relative priority
64are all that matter), then Berkeley DB can handle all elections transparently.
65However, there are cases where the application has more complete
66knowledge and needs to affect the outcome of elections.  For example,
67applications may choose to handle master selection, explicitly
68designating master and client sites.  Applications in these cases may
69never need to call for an election.  Alternatively, applications may
70choose to use <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a>'s arguments to force the correct outcome
71to an election.  That is, if an application has three sites, A, B, and
72C, and after a failure of C determines that A must become the winner,
73the application can guarantee an election's outcome by specifying
74priorities appropriately after an election:</p>
75      <pre class="programlisting">on A: priority 100, nsites 2
76on B: priority 0, nsites 2</pre>
77      <p>It is dangerous to configure more than one master environment using the
78<a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a> method, and applications should be careful not to do so.
79Applications should only configure themselves as the master environment
80if they are the only possible master, or if they have won an election.
81An application knows it has won an election when it receives the
82<a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REP_ELECTED" class="olink">DB_EVENT_REP_ELECTED</a> event.</p>
83      <p>Normally, when a master failure is detected it is desired that an
84election finish quickly so the application can continue to service
85updates.  Also, participating sites are already up and can participate.
86However, in the case of restarting a whole group after an administrative
87shutdown, it is possible that a slower booting site had later logs than
88any other site.  To cover that case, an application would like to give
89the election more time to ensure all sites have a chance to participate.
90Since it is intractable for a starting site to determine which case
91the whole group is in, the use of a long timeout gives all sites a
92reasonable chance to participate.  If an application wanting full
93participation sets the <span class="bold"><strong>nvotes</strong></span> arg to the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method to
94the number of sites in the group and one site does not reboot, a master
95can never be elected without manual intervention.</p>
96      <p>
97In those cases, the desired action at a group level is to hold
98a full election if all sites crashed and a majority election if
99a subset of sites crashed or rebooted.  Since an individual site cannot know
100which number of votes to require, a mechanism is available to
101accomplish this using timeouts.  By setting a long timeout (perhaps
102on the order of minutes) using the <span class="bold"><strong>DB_REP_FULL_ELECTION_TIMEOUT</strong></span>
103flag to the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV-&gt;rep_set_timeout()</a> method, an application can
104allow Berkeley DB to elect a master even without full participation.
105Sites may also want to set a normal election timeout for majority
106based elections using the <span class="bold"><strong>DB_REP_ELECTION_TIMEOUT</strong></span> flag
107to the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV-&gt;rep_set_timeout()</a> method.</p>
108      <p>
109Consider 3 sites, A, B, and C where A is the master.  In the
110case where all three sites crash and all reboot, all sites
111will set a timeout for a full election, say 10 minutes, but only
112require a majority for <span class="bold"><strong>nvotes</strong></span> to the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method.
113Once all three sites are booted the election will complete
114immediately if they reboot within 10 minutes of each other.  Consider
115if all three sites crash and only two reboot.  The two sites will
116enter the election, but after the 10 minute timeout they will
117elect with the majority of two sites.  Using the full election
118timeout sets a threshold for allowing a site to reboot and rejoin
119the group.</p>
120      <p>To add a database environment to the replication group with the intent
121of it becoming the master, first add it as a client.  Since it may be
122out-of-date with respect to the current master, allow it to update
123itself from the current master.  Then, shut the current master down.
124Presumably, the added client will win the subsequent election.  If the
125client does not win the election, it is likely that it was not given
126sufficient time to update itself with respect to the current master.</p>
127      <p>If a client is unable to find a master or win an election, it means that
128the network has been partitioned and there are not enough environments
129participating in the election for one of the participants to win.
130In this case, the application should repeatedly call <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a>
131and <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a>, alternating between attempting to discover an
132existing master, and holding an election to declare a new one.  In
133desperate circumstances, an application could simply declare itself the
134master by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a>, or by reducing the number of
135participants required to win an election until the election is won.
136Neither of these solutions is recommended: in the case of a network
137partition, either of these choices can result in there being two masters
138in one replication group, and the databases in the environment might
139irretrievably diverge as they are modified in different ways by the
140masters.</p>
141      <p>Note that this presents a special problem for a replication group
142consisting of only two environments.  If a master site fails, the
143remaining client can never comprise a majority of sites in the group.
144If the client application can reach a remote network site, or some other
145external tie-breaker, it may be able to determine whether it is safe
146to declare itself master.  Otherwise it must choose between providing
147availability of a writable master (at the risk of duplicate masters),
148or strict protection against duplicate masters (but no master when a
149failure occurs).   Replication Manager offers this choice via the
150<a href="../api_reference/C/repconfig.html" class="olink">DB_ENV-&gt;rep_set_config()</a> method.  Base API applications can accomplish
151this by judicious setting of the nvotes and nsites parameters to the
152<a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method. </p>
153      <p>It is possible for a less-preferred database environment to win an
154election if a number of systems crash at the same time.  Because an
155election winner is declared as soon as enough environments participate
156in the election, the environment on a slow booting but well-connected
157machine might lose to an environment on a badly connected but faster
158booting machine.  In the case of a number of environments crashing at
159the same time (for example, a set of replicated servers in a single
160machine room), applications should bring the database environments on
161line as clients initially (which will allow them to process read queries
162immediately), and then hold an election after sufficient time has passed
163for the slower booting machines to catch up.</p>
164      <p>If, for any reason, a less-preferred database environment becomes the
165master, it is possible to switch masters in a replicated environment.
166For example, the preferred master crashes, and one of the replication
167group clients becomes the group master.  In order to restore the
168preferred master to master status, take the following steps:</p>
169      <div class="orderedlist">
170        <ol type="1">
171          <li>The preferred master should reboot and re-join the replication group
172as a client.</li>
173          <li>Once the preferred master has caught up with the replication group, the
174application on the current master should complete all active transactions
175and reconfigure itself as a client using the <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a> method.</li>
176          <li>Then, the current or preferred master should call for an election using
177the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method.</li>
178        </ol>
179      </div>
180    </div>
181    <div class="navfooter">
182      <hr />
183      <table width="100%" summary="Navigation footer">
184        <tr>
185          <td width="40%" align="left"><a accesskey="p" href="rep_mgrmulti.html">Prev</a>��</td>
186          <td width="20%" align="center">
187            <a accesskey="u" href="rep.html">Up</a>
188          </td>
189          <td width="40%" align="right">��<a accesskey="n" href="rep_mastersync.html">Next</a></td>
190        </tr>
191        <tr>
192          <td width="40%" align="left" valign="top">Running Replication Manager in multiple processes��</td>
193          <td width="20%" align="center">
194            <a accesskey="h" href="index.html">Home</a>
195          </td>
196          <td width="40%" align="right" valign="top">��Synchronizing with a master</td>
197        </tr>
198      </table>
199    </div>
200  </body>
201</html>
202