1<!--$Id: build.so,v 10.12 2007/09/26 15:11:32 bostic Exp $-->
2<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
3<!--See the file LICENSE for redistribution information.-->
4<html>
5<head>
6<title>Berkeley DB Reference Guide: Building a Global Transaction Manager</title>
7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
9</head>
10<body bgcolor=white>
11<a name="2"><!--meow--></a>
12<table width="100%"><tr valign=top>
13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Distributed Transactions</dl></b></td>
14<td align=right><a href="../xa/intro.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../xa/xa_intro.html"><img src="../../images/next.gif" alt="Next"></a>
15</td></tr></table>
16<p align=center><b>Building a Global Transaction Manager</b></p>
17<p>Managing distributed transactions and using the two-phase commit
18protocol of Berkeley DB from an application requires the application provide
19the functionality of a global transaction manager (GTM).  The GTM is
20responsible for the following:</p>
21<p><ul type=disc>
22<li>Communicating with the multiple environments (potentially on separate
23systems).
24<li>Managing the global transaction ID name space.
25<li>Maintaining state information about each distributed transaction.
26<li>Recovering from failures of individual environments.
27<li>Recovering the global transaction state after failure of the global
28transaction manager.
29</ul>
30<b>Communicating with multiple Berkeley DB environments</b>
31<p>Two-phase commit is required if an application wants to transaction
32protect Berkeley DB calls across multiple environments.  If the environments
33reside on the same machine, the application can communicate with each
34environment through its own address space with no additional complexity.
35If the environments reside on separate machines, the application can
36either use the Berkeley DB RPC server to manage the remote environments or it
37may use its own messaging capability, translating messages on the remote
38machine into calls into the Berkeley DB library (including the recovery
39calls).  For some applications, it might be sufficient to use Tcl's
40remote invocation to remote copies of the tclsh utility into which the
41Berkeley DB library has been dynamically loaded.</p>
42<b>Managing the Global Transaction ID (GID) name space</b>
43<p>A global transaction is a transaction that spans multiple environments.
44Each global transaction must have a unique transaction ID.  This unique
45ID is the global transaction ID (GID).  In Berkeley DB, global transaction
46IDs must be represented with the confines of a <a href="../../api_c/txn_prepare.html#DB_XIDDATASIZE">DB_XIDDATASIZE</a>
47size (currently 128 bytes) array.  It is the responsibility of the
48global transaction manager to assign GIDs, guarantee their uniqueness,
49and manage the mapping of local transactions to GID.  That is, for each
50GID, the GTM should know which local transactions managers participated.
51The Berkeley DB logging system or a Berkeley DB table could be used to record this
52information.</p>
53<b>Maintaining state for each distributed transaction.</b>
54<p>In addition to knowing which local environments participate in each
55global transaction, the GTM must also know the state of each active
56global transaction.  As soon as a transaction becomes distributed (that
57is, a second environment participates), the GTM must record the
58existence of the global transaction and all participants (whether this
59must reside on stable storage or not depends on the exact configuration
60of the system).  As new environments participate, the GTM must keep this
61information up to date.</p>
62<p>When the GTM is ready to begin commit processing, it should issue
63<a href="../../api_c/txn_prepare.html">DB_TXN-&gt;prepare</a> calls to each participating environment, indicating
64the GID of the global transaction.  Once all the participants have
65successfully prepared, then the GTM must record that the global
66transaction will be committed.   This record should go to stable
67storage.  Once written to stable storage, the GTM can send
68<a href="../../api_c/txn_commit.html">DB_TXN-&gt;commit</a> requests to each participating environment.  Once
69all environments have successfully completed the commit, the GTM can
70either record the successful commit or can somehow "forget" the global
71transaction.</p>
72<p>If nested transactions are used (that is, the <b>parent</b> parameter
73is specified to <a href="../../api_c/txn_begin.html">DB_ENV-&gt;txn_begin</a>), no <a href="../../api_c/txn_prepare.html">DB_TXN-&gt;prepare</a> call should
74be made on behalf of any child transaction.  Only the ultimate parent
75should even issue a <a href="../../api_c/txn_prepare.html">DB_TXN-&gt;prepare</a>.
76</p>
77<p>Should any participant fail to prepare, then the GTM must abort the
78global transaction.  The fact that the transaction is going to be
79aborted should be written to stable storage.  Once written, the GTM can
80then issue <a href="../../api_c/txn_abort.html">DB_TXN-&gt;abort</a> requests to each environment.  When all
81aborts have returned successfully, the GTM can either record the
82successful abort or "forget" the global transaction.</p>
83<p>In summary, for each transaction, the GTM must maintain the following:</p>
84<p><ul type=disc>
85<li>A list of participating environments
86<li>The current state of each transaction (pre-prepare, preparing,
87committing, aborting, done)
88</ul>
89<b>Recovering from the failure of a single environment</b>
90<p>If a single environment fails, there is no need to bring down or recover
91other environments (the only exception to this is if all environments
92are managed in the same application address space and there is a risk
93the failure of the environment corrupted other environments).  Instead,
94once the failing environment comes back up, it should be recovered (that
95is, conventional recovery, via <a href="../../utility/db_recover.html">db_recover</a> or by specifying the
96<a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> flag to <a href="../../api_c/env_open.html">DB_ENV-&gt;open</a> should be run).  If the
97<a href="../../utility/db_recover.html">db_recover</a> utility is used, then the -e option must be
98specified.  In this case, the application will almost certainly want to
99specify environmental parameters via a <a href="../../ref/env/db_config.html#DB_CONFIG">DB_CONFIG</a> file in the
100environment's home directory, so that <a href="../../utility/db_recover.html">db_recover</a> can create an
101appropriately configured environment.  If the <a href="../../utility/db_recover.html">db_recover</a> utility
102is not used, then <a href="../../api_c/env_open.html#DB_PRIVATE">DB_PRIVATE</a> should not be specified, unless all
103processing including recovery, calls to <a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a>, and calls
104to finish prepared, but not yet complete transactions take place using
105the same database environment handle.  The GTM should then issue a
106<a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> call to the environment.  This call will return a
107list of prepared, but not yet committed or aborted transactions.  For
108each transaction, the GTM should look up the GID in its local store to
109determine if the transaction should commit or abort.</p>
110<p>If the GTM is running in a system with multiple GTMs, it is possible
111that some of the transactions returned via <a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> do not
112belong to the current environment.  The GTM should detect this and call
113<a href="../../api_c/txn_discard.html">DB_TXN-&gt;discard</a> on each such transaction handle.  Furthermore, it
114is important to note the environment does not retain information about
115which GTM has issued <a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> operations.  Therefore, each
116GTM should issue all its <a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> calls, before another GTM
117issues its calls.  If the calls are interleaved, each GTM may not get
118a complete and consistent set of transactions.  The simplest way to
119enforce this is for each GTM to make sure it can receive all its
120outstanding transactions in a single <a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> call.  The
121maximum number of possible outstanding transactions is roughly the
122maximum number of active transactions in the environment (which value
123can be obtained using the <a href="../../api_c/txn_stat.html">DB_ENV-&gt;txn_stat</a> method or the <a href="../../utility/db_stat.html">db_stat</a>
124utility).  To simplify this procedure, the caller should allocate an
125array large enough to be certain to hold the list of transactions (for
126example, allocate an array able to hold three times the maximum number
127of transactions).  If that's not possible, callers should check that the
128array was not completely filled in when <a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> returns.
129If the array was completely filled in, each transaction should be
130explicitly discarded, and the call repeated with a larger array.</p>
131<p>The newly recovered environment will forbid any new transactions from
132being started until the prepared but not yet committed/aborted
133transactions have been resolved.  In the multiple GTM case, this means
134that all GTMs must recover before any GTM can begin issuing new transactions.</p>
135<p>Because Berkeley DB flushed both commit and abort records to disk for
136two-phase transaction, once the global transaction has either committed
137or aborted, no action will be necessary in any environment.  If local
138environments are running with the <a href="../../api_c/env_set_flags.html#DB_TXN_WRITE_NOSYNC">DB_TXN_WRITE_NOSYNC</a> or
139<a href="../../api_c/env_set_flags.html#DB_TXN_NOSYNC">DB_TXN_NOSYNC</a> options (that is, is not writing and/or flushing
140the log synchronously at commit time), then it is possible that a commit
141or abort operation may not have been written in the environment.  In
142this case, the GTM must always have a record of completed transactions
143to determine if prepared transactions should be committed or aborted.</p>
144<b>Recovering from GTM failure</b>
145<p>If the GTM fails, it must first recover its local state.  Assuming the
146GTM uses Berkeley DB tables to maintain state, it should run
147<a href="../../utility/db_recover.html">db_recover</a> (or the <a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> option to
148<a href="../../api_c/env_open.html">DB_ENV-&gt;open</a>) upon startup.  Once the GTM is back up and running,
149it needs to review all its outstanding global transactions, that is all
150transaction which are recorded, but not yet committed or aborted.</p>
151<p>Any global transactions which have not yet reached the prepare phase
152should be aborted.  If these transactions were on remote systems, the
153remote systems should eventually time them out and abort them.  If these
154transactions are on the local system, we assume they crashed and were
155aborted as part of GTM startup.</p>
156<p>The GTM must then identify all environments which need to have their
157<a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> methods called.  This includes all environments that
158participated in any transaction that is in the preparing, aborting, or
159committing state.  For each environment, the GTM should issue a
160<a href="../../api_c/txn_recover.html">DB_ENV-&gt;txn_recover</a> call.  Once each environment has responded, the GTM
161can determine the fate of each transaction.  The correct behavior is
162defined depending on the state of the global transaction according to
163the table below.</p>
164<br>
165<b>preparing</b><ul compact><li>if all participating environments return the transaction in the prepared
166but not yet committed/aborted state, then the GTM should commit the
167transaction.  If any participating environment fails to return it, then
168the GTM should issue an abort to all environments that did return it.</ul>
169<b>committing</b><ul compact><li>the GTM should send a commit to any environment that returned this
170transaction in its list of prepared but not yet committed/aborted
171transactions.</ul>
172<b>aborting</b><ul compact><li>the GTM should send an abort to any environment that returned this
173transaction in its list of prepared but not yet committed/aborted
174transactions.</ul>
175<br>
176<table width="100%"><tr><td><br></td><td align=right><a href="../xa/intro.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../xa/xa_intro.html"><img src="../../images/next.gif" alt="Next"></a>
177</td></tr></table>
178<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
179</body>
180</html>
181