1<!--$Id: build.so,v 10.12 2007/09/26 15:11:32 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: Building a Global Transaction Manager</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<a name="2"><!--meow--></a> 12<table width="100%"><tr valign=top> 13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Distributed Transactions</dl></b></td> 14<td align=right><a href="../xa/intro.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../xa/xa_intro.html"><img src="../../images/next.gif" alt="Next"></a> 15</td></tr></table> 16<p align=center><b>Building a Global Transaction Manager</b></p> 17<p>Managing distributed transactions and using the two-phase commit 18protocol of Berkeley DB from an application requires the application provide 19the functionality of a global transaction manager (GTM). The GTM is 20responsible for the following:</p> 21<p><ul type=disc> 22<li>Communicating with the multiple environments (potentially on separate 23systems). 24<li>Managing the global transaction ID name space. 25<li>Maintaining state information about each distributed transaction. 26<li>Recovering from failures of individual environments. 27<li>Recovering the global transaction state after failure of the global 28transaction manager. 29</ul> 30<b>Communicating with multiple Berkeley DB environments</b> 31<p>Two-phase commit is required if an application wants to transaction 32protect Berkeley DB calls across multiple environments. If the environments 33reside on the same machine, the application can communicate with each 34environment through its own address space with no additional complexity. 35If the environments reside on separate machines, the application can 36either use the Berkeley DB RPC server to manage the remote environments or it 37may use its own messaging capability, translating messages on the remote 38machine into calls into the Berkeley DB library (including the recovery 39calls). For some applications, it might be sufficient to use Tcl's 40remote invocation to remote copies of the tclsh utility into which the 41Berkeley DB library has been dynamically loaded.</p> 42<b>Managing the Global Transaction ID (GID) name space</b> 43<p>A global transaction is a transaction that spans multiple environments. 44Each global transaction must have a unique transaction ID. This unique 45ID is the global transaction ID (GID). In Berkeley DB, global transaction 46IDs must be represented with the confines of a <a href="../../api_c/txn_prepare.html#DB_XIDDATASIZE">DB_XIDDATASIZE</a> 47size (currently 128 bytes) array. It is the responsibility of the 48global transaction manager to assign GIDs, guarantee their uniqueness, 49and manage the mapping of local transactions to GID. That is, for each 50GID, the GTM should know which local transactions managers participated. 51The Berkeley DB logging system or a Berkeley DB table could be used to record this 52information.</p> 53<b>Maintaining state for each distributed transaction.</b> 54<p>In addition to knowing which local environments participate in each 55global transaction, the GTM must also know the state of each active 56global transaction. As soon as a transaction becomes distributed (that 57is, a second environment participates), the GTM must record the 58existence of the global transaction and all participants (whether this 59must reside on stable storage or not depends on the exact configuration 60of the system). As new environments participate, the GTM must keep this 61information up to date.</p> 62<p>When the GTM is ready to begin commit processing, it should issue 63<a href="../../api_c/txn_prepare.html">DB_TXN->prepare</a> calls to each participating environment, indicating 64the GID of the global transaction. Once all the participants have 65successfully prepared, then the GTM must record that the global 66transaction will be committed. This record should go to stable 67storage. Once written to stable storage, the GTM can send 68<a href="../../api_c/txn_commit.html">DB_TXN->commit</a> requests to each participating environment. Once 69all environments have successfully completed the commit, the GTM can 70either record the successful commit or can somehow "forget" the global 71transaction.</p> 72<p>If nested transactions are used (that is, the <b>parent</b> parameter 73is specified to <a href="../../api_c/txn_begin.html">DB_ENV->txn_begin</a>), no <a href="../../api_c/txn_prepare.html">DB_TXN->prepare</a> call should 74be made on behalf of any child transaction. Only the ultimate parent 75should even issue a <a href="../../api_c/txn_prepare.html">DB_TXN->prepare</a>. 76</p> 77<p>Should any participant fail to prepare, then the GTM must abort the 78global transaction. The fact that the transaction is going to be 79aborted should be written to stable storage. Once written, the GTM can 80then issue <a href="../../api_c/txn_abort.html">DB_TXN->abort</a> requests to each environment. When all 81aborts have returned successfully, the GTM can either record the 82successful abort or "forget" the global transaction.</p> 83<p>In summary, for each transaction, the GTM must maintain the following:</p> 84<p><ul type=disc> 85<li>A list of participating environments 86<li>The current state of each transaction (pre-prepare, preparing, 87committing, aborting, done) 88</ul> 89<b>Recovering from the failure of a single environment</b> 90<p>If a single environment fails, there is no need to bring down or recover 91other environments (the only exception to this is if all environments 92are managed in the same application address space and there is a risk 93the failure of the environment corrupted other environments). Instead, 94once the failing environment comes back up, it should be recovered (that 95is, conventional recovery, via <a href="../../utility/db_recover.html">db_recover</a> or by specifying the 96<a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> flag to <a href="../../api_c/env_open.html">DB_ENV->open</a> should be run). If the 97<a href="../../utility/db_recover.html">db_recover</a> utility is used, then the -e option must be 98specified. In this case, the application will almost certainly want to 99specify environmental parameters via a <a href="../../ref/env/db_config.html#DB_CONFIG">DB_CONFIG</a> file in the 100environment's home directory, so that <a href="../../utility/db_recover.html">db_recover</a> can create an 101appropriately configured environment. If the <a href="../../utility/db_recover.html">db_recover</a> utility 102is not used, then <a href="../../api_c/env_open.html#DB_PRIVATE">DB_PRIVATE</a> should not be specified, unless all 103processing including recovery, calls to <a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a>, and calls 104to finish prepared, but not yet complete transactions take place using 105the same database environment handle. The GTM should then issue a 106<a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> call to the environment. This call will return a 107list of prepared, but not yet committed or aborted transactions. For 108each transaction, the GTM should look up the GID in its local store to 109determine if the transaction should commit or abort.</p> 110<p>If the GTM is running in a system with multiple GTMs, it is possible 111that some of the transactions returned via <a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> do not 112belong to the current environment. The GTM should detect this and call 113<a href="../../api_c/txn_discard.html">DB_TXN->discard</a> on each such transaction handle. Furthermore, it 114is important to note the environment does not retain information about 115which GTM has issued <a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> operations. Therefore, each 116GTM should issue all its <a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> calls, before another GTM 117issues its calls. If the calls are interleaved, each GTM may not get 118a complete and consistent set of transactions. The simplest way to 119enforce this is for each GTM to make sure it can receive all its 120outstanding transactions in a single <a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> call. The 121maximum number of possible outstanding transactions is roughly the 122maximum number of active transactions in the environment (which value 123can be obtained using the <a href="../../api_c/txn_stat.html">DB_ENV->txn_stat</a> method or the <a href="../../utility/db_stat.html">db_stat</a> 124utility). To simplify this procedure, the caller should allocate an 125array large enough to be certain to hold the list of transactions (for 126example, allocate an array able to hold three times the maximum number 127of transactions). If that's not possible, callers should check that the 128array was not completely filled in when <a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> returns. 129If the array was completely filled in, each transaction should be 130explicitly discarded, and the call repeated with a larger array.</p> 131<p>The newly recovered environment will forbid any new transactions from 132being started until the prepared but not yet committed/aborted 133transactions have been resolved. In the multiple GTM case, this means 134that all GTMs must recover before any GTM can begin issuing new transactions.</p> 135<p>Because Berkeley DB flushed both commit and abort records to disk for 136two-phase transaction, once the global transaction has either committed 137or aborted, no action will be necessary in any environment. If local 138environments are running with the <a href="../../api_c/env_set_flags.html#DB_TXN_WRITE_NOSYNC">DB_TXN_WRITE_NOSYNC</a> or 139<a href="../../api_c/env_set_flags.html#DB_TXN_NOSYNC">DB_TXN_NOSYNC</a> options (that is, is not writing and/or flushing 140the log synchronously at commit time), then it is possible that a commit 141or abort operation may not have been written in the environment. In 142this case, the GTM must always have a record of completed transactions 143to determine if prepared transactions should be committed or aborted.</p> 144<b>Recovering from GTM failure</b> 145<p>If the GTM fails, it must first recover its local state. Assuming the 146GTM uses Berkeley DB tables to maintain state, it should run 147<a href="../../utility/db_recover.html">db_recover</a> (or the <a href="../../api_c/env_open.html#DB_RECOVER">DB_RECOVER</a> option to 148<a href="../../api_c/env_open.html">DB_ENV->open</a>) upon startup. Once the GTM is back up and running, 149it needs to review all its outstanding global transactions, that is all 150transaction which are recorded, but not yet committed or aborted.</p> 151<p>Any global transactions which have not yet reached the prepare phase 152should be aborted. If these transactions were on remote systems, the 153remote systems should eventually time them out and abort them. If these 154transactions are on the local system, we assume they crashed and were 155aborted as part of GTM startup.</p> 156<p>The GTM must then identify all environments which need to have their 157<a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> methods called. This includes all environments that 158participated in any transaction that is in the preparing, aborting, or 159committing state. For each environment, the GTM should issue a 160<a href="../../api_c/txn_recover.html">DB_ENV->txn_recover</a> call. Once each environment has responded, the GTM 161can determine the fate of each transaction. The correct behavior is 162defined depending on the state of the global transaction according to 163the table below.</p> 164<br> 165<b>preparing</b><ul compact><li>if all participating environments return the transaction in the prepared 166but not yet committed/aborted state, then the GTM should commit the 167transaction. If any participating environment fails to return it, then 168the GTM should issue an abort to all environments that did return it.</ul> 169<b>committing</b><ul compact><li>the GTM should send a commit to any environment that returned this 170transaction in its list of prepared but not yet committed/aborted 171transactions.</ul> 172<b>aborting</b><ul compact><li>the GTM should send an abort to any environment that returned this 173transaction in its list of prepared but not yet committed/aborted 174transactions.</ul> 175<br> 176<table width="100%"><tr><td><br></td><td align=right><a href="../xa/intro.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../xa/xa_intro.html"><img src="../../images/next.gif" alt="Next"></a> 177</td></tr></table> 178<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 179</body> 180</html> 181