1<!--$Id: lease.so,v 1.6 2007/11/27 19:36:55 sue Exp $-->
2<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
3<!--See the file LICENSE for redistribution information.-->
4<html>
5<head>
6<title>Berkeley DB Reference Guide: Master Leases</title>
7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
9</head>
10<body bgcolor=white>
11<table width="100%"><tr valign=top>
12<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Berkeley DB Replication</dl></b></td>
13<td align=right><a href="/rep/trans.html"><img src="/images/prev.gif" alt="Prev"></a><a href="/toc.html"><img src="/images/ref.gif" alt="Ref"></a><a href="/rep/clock_skew.html"><img src="/images/next.gif" alt="Next"></a>
14</td></tr></table>
15<p align=center><b>Master Leases</b></p>
16<p>Some applications have strict requirements about the consistency
17of data read on a master site.  Berkeley DB provides a mechanism
18called master leases to provide such consistency.
19Without master leases, it is sometimes possible for
20Berkeley DB to return old data to an application when newer data is
21available due to unfortunate scheduling as illustrated below:</p>
22<ol>
23<p><li><b>Application on master site</b>:   Read data item
24<i>foo</i> via Berkeley DB <a href="/api_c/db_get.html">DB-&gt;get</a> or <a href="/api_c/dbc_get.html">DBcursor-&gt;get</a> call.
25<p><li><b>Application on master site</b>: sleep, get descheduled, etc.
26<p><li><b>System</b>: Master changes role, becomes a client.
27<p><li><b>System</b>: New site is elected master.
28<p><li><b>System</b>: New master modifies data item <i>foo</i>.
29<p><li><b>Application</b>: Berkeley DB returns old data for <i>foo</i>
30to application.
31</ol>
32<p>By using master leases, Berkeley DB can provide guarantees about the
33consistency of data read on a master site.  The master site
34can be considered a recognized authority for the data and
35consequently can provide authoritative reads.  Clients grant master
36leases to a master site.  By doing so, clients acknowledge
37the right of that site to retain the role of master
38for a period of time.
39During that period of time, clients cannot elect a new
40master, become master, nor grant their lease to another site.</p>
41<p>By holding a collection of granted leases, a master site can
42guarantee to the application that the data returned is the
43current, authoritative value.  As a master performs operations,
44it continually requests updated grants from the clients.
45When a read operation is required, the master guarantees
46that it holds a valid collection of lease grants from clients
47before returning data to the application.  By holding leases,
48Berkeley DB provides several guarantees to the application:</p>
49<ol>
50<p><li>Authoritative reads: A guarantee that the data being read by the
51application is the current value.
52<p><li>Durability from rollback: A guarantee that the data being written or read by the
53application is permanent across a majority of client sites and will
54never be rolled back.
55<p>The rollback guarantee also depends on the <a href="/api_c/env_set_flags.html#DB_TXN_NOSYNC">DB_TXN_NOSYNC</a> flag.
56The guarantee is effective as long as there isn't total
57replication group failure while clients have granted leases
58but are holding the updates in their cache.
59The application must weigh the performance impact of synchronous
60transactions against the risk of total replication group failure.
61If clients grant a lease while holding updated data in cache,
62and total failure occurs, then the data is no longer present
63on the clients and rollback can occur if the master also crashes.</p>
64<p>The guarantee that data will not be rolled back applies only
65to data successfully committed on a master.
66Data read on a client, or read while ignoring leases
67can be rolled back.</p>
68<p><li>Freshness: A guarantee that the data being read by the application
69on the <i>master</i> is up-to-date and has not been
70modified or removed during the read.
71<p>The read authority is only on the master.  Read operations on a client
72always ignore leases and consequently, that operation can return stale data.</p>
73<p><li>Master viability: A guarantee that a current master with valid
74leases cannot encounter a duplicate master situation.
75<p>Leases remove the possibility of a duplicate master situation that
76forces the current master to downgrade to a client.  However, it is
77still possible that old masters with expired leases can discover a later
78master and return <a href="/api_c/rep_message.html#DB_REP_DUPMASTER">DB_REP_DUPMASTER</a> to the application.</p>
79</ol>
80<p>There are several requirements of the application using leases:</p>
81<ol>
82<p><li>Replication Manager applications must configure a majority (or larger)
83acknowledgement policy via the <a href="/api_c/repmgr_ack_policy.html">DB_ENV-&gt;repmgr_set_ack_policy</a> method.  Base API
84users must implement and enforce such a policy on their own.
85<p><li>Base API users must return an error from the send callback function when
86the majority acknowledgement policy is not met for permanent records
87marked with <a href="/api_c/rep_transport.html#DB_REP_PERMANENT">DB_REP_PERMANENT</a>.  Note that the Replication Manager
88automatically fulfills this requirement.
89<p><li>Applications must set the number of sites in the group using the
90<a href="/api_c/rep_nsites.html">DB_ENV-&gt;rep_set_nsites</a> method before starting replication and cannot
91change it during operation.
92<p><li>Using leases in a replication group is all or none.  Behavior is
93undefined when some sites configure leases and others do not.
94Use the <a href="/api_c/rep_config.html">DB_ENV-&gt;rep_set_config</a> method to turn on leases.
95<p><li>The configured lease timeout value must be the same on all sites
96in a replication group, set via the <a href="/api_c/rep_timeout.html">DB_ENV-&gt;rep_set_timeout</a> method.
97<p><li>The configured clock_scale_factor value must be the same on all sites
98in a replication group.  This value defaults to no skew, but can
99be set via the <a href="/api_c/rep_clockskew.html">DB_ENV-&gt;rep_set_clockskew</a> method.
100<p><li>Applications that care about read guarantees must perform all read
101operations on the master.  Reading on a client does not guarantee
102freshness.
103<p><li>The application must use elections to choose a master site.  It must
104never simply declare a master without having won an election (as is
105allowed without Master Leases).
106</ol>
107<p>Master leases are based on timeouts.  Berkeley DB assumes that time
108always runs forward.  Users who change the system clock on
109either client or master sites when leases are in use void all
110guarantees and can get undefined behavior.  See the
111<a href="/api_c/rep_timeout.html">DB_ENV-&gt;rep_set_timeout</a> method for more information.</p>
112<p>Read operations on a master that should not be subject to
113leases can use the <a href="/api_c/db_get.html#DB_IGNORE_LEASE">DB_IGNORE_LEASE</a> flag to the
114<a href="/api_c/db_get.html">DB-&gt;get</a> method or the <a href="/api_c/dbc_get.html">DBcursor-&gt;get</a> method.  Read
115operations on a client always imply leases are ignored.</p>
116<p>Clients are forbidden from participating in elections while
117they have an outstanding lease granted to a master.
118Therefore, if the <a href="/api_c/rep_elect.html">DB_ENV-&gt;rep_elect</a> method is called, then Berkeley DB will
119block, waiting until its lease grant expires before participating in
120any election.  While it waits, the client attempts to
121contact the current master.  If the client finds a current
122master, then it returns from the <a href="/api_c/rep_elect.html">DB_ENV-&gt;rep_elect</a> method.
123When leases are configured and the
124lease has never yet been granted (on start-up), clients
125must wait a full lease timeout before participating in
126an election.</p>
127<table width="100%"><tr><td><br></td><td align=right><a href="/rep/trans.html"><img src="/images/prev.gif" alt="Prev"></a><a href="/toc.html"><img src="/images/ref.gif" alt="Ref"></a><a href="/rep/clock_skew.html"><img src="/images/next.gif" alt="Next"></a>
128</td></tr></table>
129<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
130</body>
131</html>
132