1<!--$Id: archival.so,v 10.56 2005/02/10 20:02:41 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: Database and log file archival</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<a name="2"><!--meow--></a><a name="3"><!--meow--></a> 12<table width="100%"><tr valign=top> 13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Berkeley DB Transactional Data Store Applications</dl></b></td> 14<td align=right><a href="../transapp/checkpoint.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/logfile.html"><img src="../../images/next.gif" alt="Next"></a> 15</td></tr></table> 16<p align=center><b>Database and log file archival</b></p> 17<p>The third component of the administrative infrastructure, archival for 18catastrophic recovery, concerns the recoverability of the database in 19the face of catastrophic failure. Recovery after catastrophic failure 20is intended to minimize data loss when physical hardware has been 21destroyed -- for example, loss of a disk that contains databases or log 22files. Although the application may still experience data loss in this 23case, it is possible to minimize it.</p> 24<p>First, you may want to periodically create snapshots (that is, backups) 25of your databases to make it possible to recover from catastrophic 26failure. These snapshots are either a standard backup, which creates a 27consistent picture of the databases as of a single instant in time; or 28an on-line backup (also known as a <i>hot</i> backup), which creates 29a consistent picture of the databases as of an unspecified instant 30during the period of time when the snapshot was made. The advantage of 31a hot backup is that applications may continue to read and write the 32databases while the snapshot is being taken. The disadvantage of a hot 33backup is that more information must be archived, and recovery based on 34a hot backup is to an unspecified time between the start of the backup 35and when the backup is completed.</p> 36<p>Second, after taking a snapshot, you should periodically archive the 37log files being created in the environment. It is often helpful to 38think of database archival in terms of full and incremental filesystem 39backups. A snapshot is a full backup, whereas the periodic archival of 40the current log files is an incremental backup. For example, it might 41be reasonable to take a full snapshot of a database environment weekly 42or monthly, and archive additional log files daily. Using both the 43snapshot and the log files, a catastrophic crash at any time can be 44recovered to the time of the most recent log archival; a time long after 45the original snapshot.</p> 46<p>To create a standard backup of your database that can be used to recover 47from catastrophic failure, take the following steps:</p> 48<ol> 49<p><li>Commit or abort all ongoing transactions. 50<p><li>Stop writing your databases until the backup has completed. Read-only 51operations are permitted, but no write operations and no filesystem 52operations may be performed (for example, the <a href="../../api_c/env_remove.html">DB_ENV->remove</a> and 53<a href="../../api_c/db_open.html">DB->open</a> methods may not be called). 54<p><li>Force an environment checkpoint (see <a href="../../utility/db_checkpoint.html">db_checkpoint</a> for more 55information). 56<p><li>Run <a href="../../utility/db_archive.html">db_archive</a> <b>-s</b> to identify all the database data 57files, and copy them to a backup device such as CD-ROM, alternate disk, 58or tape. 59<p>If the database files are stored in a separate directory from the other 60Berkeley DB files, it may be simpler to archive the directory itself instead 61of the individual files (see <a href="../../api_c/env_set_data_dir.html">DB_ENV->set_data_dir</a> for additional 62information). <b>Note: if any of the database files did not have 63an open <a href="../../api_c/db_class.html">DB</a> handle during the lifetime of the current log files, 64<a href="../../utility/db_archive.html">db_archive</a> will not list them in its output!</b> This is another 65reason it may be simpler to use a separate database file directory and 66archive the entire directory instead of archiving only the files listed 67by <a href="../../utility/db_archive.html">db_archive</a>.</p> 68<p><li>Run <a href="../../utility/db_archive.html">db_archive</a> <b>-l</b> to identify all the log files, 69and copy the last one (that is, the one with the highest number) to a 70backup device such as CD-ROM, alternate disk, or tape. 71</ol> 72<a name="4"><!--meow--></a> 73<p>To create a <i>hot</i> backup of your database that can be used to 74recover from catastrophic failure, take the following steps:</p> 75<ol> 76<p><li>Archive your databases, as described in the previous step #4. 77You do not have to halt ongoing transactions or force a checkpoint. As 78this is a hot backup, and the databases may be modified during the copy, 79the utility you use to copy the databases must read database pages 80atomically (as described by <a href="../../ref/transapp/reclimit.html">Berkeley DB recoverability</a>). 81<p><li>Archive <b>all</b> of the log files. The order of these two operations 82is required, and the database files must be archived <b>before</b> the 83log files. This means that if the database files and log files are in 84the same directory, you cannot simply archive the directory; you must 85make sure that the correct order of archival is maintained. 86<p>To archive your log files, run the <a href="../../utility/db_archive.html">db_archive</a> utility using 87the <b>-l</b> option to identify all the database log files, and 88copy them to your backup media. If the database log files are stored 89in a separate directory from the other database files, it may be simpler 90to archive the directory itself instead of the individual files (see 91the <a href="../../api_c/env_set_lg_dir.html">DB_ENV->set_lg_dir</a> method for more information).</p> 92</ol> 93<p>To minimize the archival space needed for log files when doing a hot 94backup, run db_archive to identify those log files which are not in use. 95Log files which are not in use do not need to be included when creating 96a hot backup, and you can discard them or move them aside for use with 97previous backups (whichever is appropriate), before beginning the hot 98backup.</p> 99<p>After completing one of these two sets of steps, the database 100environment can be recovered from catastrophic failure (see 101<a href="recovery.html">Recovery procedures</a> for more information).</p> 102<p>For an example of a hot backup implementation in the Berkeley DB distribution, 103see the source code for the <a href="../../utility/db_hotbackup.html">db_hotbackup</a> utility.</p> 104<p>To update either a hot or cold backup so that recovery from catastrophic 105failure is possible to a new point in time, repeat step #2 under the 106hot backup instructions and archive <b>all</b> of the log files in the 107database environment. Each time both the database and log files are 108copied to backup media, you may discard all previous database snapshots 109and saved log files. Archiving additional log files does not allow you 110to discard either previous database snapshots or log files. Generally, 111updating a backup must be integrated with the application's log file 112removal procedures.</p> 113<p>The time to restore from catastrophic failure is a function of the 114number of log records that have been written since the snapshot was 115originally created. Perhaps more importantly, the more separate pieces 116of backup media you use, the more likely it is that you will have a 117problem reading from one of them. For these reasons, it is often best 118to make snapshots on a regular basis.</p> 119<p><b>Obviously, the reliability of your archive media will affect the safety 120of your data. For archival safety, ensure that you have multiple copies 121of your database backups, verify that your archival media is error-free 122and readable, and that copies of your backups are stored offsite!</b></p> 123<p>The functionality provided by the <a href="../../utility/db_archive.html">db_archive</a> utility is also 124available directly from the Berkeley DB library. The following code fragment 125prints out a list of log and database files that need to be archived:</p> 126<blockquote><pre>void 127log_archlist(DB_ENV *dbenv) 128{ 129 int ret; 130 char **begin, **list; 131<p> 132 /* Get the list of database files. */ 133 if ((ret = dbenv->log_archive(dbenv, 134 &list, DB_ARCH_ABS | DB_ARCH_DATA)) != 0) { 135 dbenv->err(dbenv, ret, "DB_ENV->log_archive: DB_ARCH_DATA"); 136 exit (1); 137 } 138 if (list != NULL) { 139 for (begin = list; *list != NULL; ++list) 140 printf("database file: %s\n", *list); 141 free (begin); 142 } 143<p> 144 /* Get the list of log files. */ 145 if ((ret = dbenv->log_archive(dbenv, 146 &list, DB_ARCH_ABS | DB_ARCH_LOG)) != 0) { 147 dbenv->err(dbenv, ret, "DB_ENV->log_archive: DB_ARCH_LOG"); 148 exit (1); 149 } 150 if (list != NULL) { 151 for (begin = list; *list != NULL; ++list) 152 printf("log file: %s\n", *list); 153 free (begin); 154 } 155}</pre></blockquote> 156<table width="100%"><tr><td><br></td><td align=right><a href="../transapp/checkpoint.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../transapp/logfile.html"><img src="../../images/next.gif" alt="Next"></a> 157</td></tr></table> 158<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 159</body> 160</html> 161