1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>General access method configuration</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> 9 <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> 10 <link rel="up" href="am_conf.html" title="Chapter 2. Access Method Configuration" /> 11 <link rel="prev" href="am_conf_logrec.html" title="Logical record numbers" /> 12 <link rel="next" href="bt_conf.html" title="Btree access method specific configuration" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">General access method configuration</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td> 22 <th width="60%" align="center">Chapter 2. 23 Access Method Configuration 24 </th> 25 <td width="20%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td> 26 </tr> 27 </table> 28 <hr /> 29 </div> 30 <div class="sect1" lang="en" xml:lang="en"> 31 <div class="titlepage"> 32 <div> 33 <div> 34 <h2 class="title" style="clear: both"><a id="general_am_conf"></a>General access method configuration</h2> 35 </div> 36 </div> 37 </div> 38 <div class="toc"> 39 <dl> 40 <dt> 41 <span class="sect2"> 42 <a href="general_am_conf.html#am_conf_pagesize">Selecting a page size</a> 43 </span> 44 </dt> 45 <dt> 46 <span class="sect2"> 47 <a href="general_am_conf.html#am_conf_cachesize">Selecting a cache size</a> 48 </span> 49 </dt> 50 <dt> 51 <span class="sect2"> 52 <a href="general_am_conf.html#am_conf_byteorder">Selecting a byte order</a> 53 </span> 54 </dt> 55 <dt> 56 <span class="sect2"> 57 <a href="general_am_conf.html#am_conf_dup">Duplicate data items</a> 58 </span> 59 </dt> 60 <dt> 61 <span class="sect2"> 62 <a href="general_am_conf.html#am_conf_malloc">Non-local memory allocation</a> 63 </span> 64 </dt> 65 </dl> 66 </div> 67 <p> 68 There are a series of configuration tasks which are common to all 69 access methods. They are described in the following sections. 70</p> 71 <div class="sect2" lang="en" xml:lang="en"> 72 <div class="titlepage"> 73 <div> 74 <div> 75 <h3 class="title"><a id="am_conf_pagesize"></a>Selecting a page size</h3> 76 </div> 77 </div> 78 </div> 79 <p>The size of the pages used in the underlying database can be specified by 80calling the <a href="../api_reference/C/dbset_pagesize.html" class="olink">DB->set_pagesize()</a> method. The minimum page size is 512 bytes 81and the maximum page size is 64K bytes, and must be a power of two. If 82no page size is specified by the application, a page size is selected 83based on the underlying filesystem I/O block size. (A page size selected 84in this way has a lower limit of 512 bytes and an upper limit of 16K 85bytes.)</p> 86 <p>There are several issues to consider when selecting a pagesize: overflow 87record sizes, locking, I/O efficiency, and recoverability.</p> 88 <p>First, the page size implicitly sets the size of an overflow record. 89Overflow records are key or data items that are too large to fit on a 90normal database page because of their size, and are therefore stored in 91overflow pages. Overflow pages are pages that exist outside of the normal 92database structure. For this reason, there is often a significant 93performance penalty associated with retrieving or modifying overflow 94records. Selecting a page size that is too small, and which forces the 95creation of large numbers of overflow pages, can seriously impact the 96performance of an application.</p> 97 <p>Second, in the Btree, Hash and Recno access methods, the finest-grained 98lock that Berkeley DB acquires is for a page. (The Queue access method 99generally acquires record-level locks rather than page-level locks.) 100Selecting a page size that is too large, and which causes threads or 101processes to wait because other threads of control are accessing or 102modifying records on the same page, can impact the performance of your 103application.</p> 104 <p>Third, the page size specifies the granularity of I/O from the database 105to the operating system. Berkeley DB will give a page-sized unit of bytes to 106the operating system to be scheduled for reading/writing from/to the 107disk. For many operating systems, there is an internal <span class="bold"><strong>block 108size</strong></span> which is used as the granularity of I/O from the operating system 109to the disk. Generally, it will be more efficient for Berkeley DB to write 110filesystem-sized blocks to the operating system and for the operating 111system to write those same blocks to the disk.</p> 112 <p>Selecting a database page size smaller than the filesystem block size 113may cause the operating system to coalesce or otherwise manipulate Berkeley DB 114pages and can impact the performance of your application. When the page 115size is smaller than the filesystem block size and a page written by 116Berkeley DB is not found in the operating system's cache, the operating system 117may be forced to read a block from the disk, copy the page into the 118block it read, and then write out the block to disk, rather than simply 119writing the page to disk. Additionally, as the operating system is 120reading more data into its buffer cache than is strictly necessary to 121satisfy each Berkeley DB request for a page, the operating system buffer cache 122may be wasting memory.</p> 123 <p>Alternatively, selecting a page size larger than the filesystem block 124size may cause the operating system to read more data than necessary. 125On some systems, reading filesystem blocks sequentially may cause the 126operating system to begin performing read-ahead. If requesting a single 127database page implies reading enough filesystem blocks to satisfy the 128operating system's criteria for read-ahead, the operating system may do 129more I/O than is required.</p> 130 <p>Fourth, when using the Berkeley DB Transactional Data Store product, the page size may affect the errors 131from which your database can recover See 132<a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a> for more 133information.</p> 134 </div> 135 <div class="sect2" lang="en" xml:lang="en"> 136 <div class="titlepage"> 137 <div> 138 <div> 139 <h3 class="title"><a id="am_conf_cachesize"></a>Selecting a cache size</h3> 140 </div> 141 </div> 142 </div> 143 <p>The size of the cache used for the underlying database can be specified 144by calling the <a href="../api_reference/C/dbset_cachesize.html" class="olink">DB->set_cachesize()</a> method. 145Choosing a cache size is, unfortunately, an art. Your cache must be at 146least large enough for your working set plus some overlap for unexpected 147situations.</p> 148 <p>When using the Btree access method, you must have a cache big enough for 149the minimum working set for a single access. This will include a root 150page, one or more internal pages (depending on the depth of your tree), 151and a leaf page. If your cache is any smaller than that, each new page 152will force out the least-recently-used page, and Berkeley DB will re-read the 153root page of the tree anew on each database request.</p> 154 <p>If your keys are of moderate size (a few tens of bytes) and your pages 155are on the order of 4KB to 8KB, most Btree applications will be only 156three levels. For example, using 20 byte keys with 20 bytes of data 157associated with each key, a 8KB page can hold roughly 400 keys (or 200 158key/data pairs), so a fully populated three-level Btree will hold 32 159million key/data pairs, and a tree with only a 50% page-fill factor will 160still hold 16 million key/data pairs. We rarely expect trees to exceed 161five levels, although Berkeley DB will support trees up to 255 levels.</p> 162 <p>The rule-of-thumb is that cache is good, and more cache is better. 163Generally, applications benefit from increasing the cache size up to a 164point, at which the performance will stop improving as the cache size 165increases. When this point is reached, one of two things have happened: 166either the cache is large enough that the application is almost never 167having to retrieve information from disk, or, your application is doing 168truly random accesses, and therefore increasing size of the cache doesn't 169significantly increase the odds of finding the next requested information 170in the cache. The latter is fairly rare -- almost all applications show 171some form of locality of reference.</p> 172 <p>That said, it is important not to increase your cache size beyond the 173capabilities of your system, as that will result in reduced performance. 174Under many operating systems, tying down enough virtual memory will cause 175your memory and potentially your program to be swapped. This is 176especially likely on systems without unified OS buffer caches and virtual 177memory spaces, as the buffer cache was allocated at boot time and so 178cannot be adjusted based on application requests for large amounts of 179virtual memory.</p> 180 <p>For example, even if accesses are truly random within a Btree, your 181access pattern will favor internal pages to leaf pages, so your cache 182should be large enough to hold all internal pages. In the steady state, 183this requires at most one I/O per operation to retrieve the appropriate 184leaf page.</p> 185 <p>You can use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat utility</a> to monitor the effectiveness of 186your cache. The following output is excerpted from the output of that 187utility's <span class="bold"><strong>-m</strong></span> option:</p> 188 <pre class="programlisting">prompt: db_stat -m 189131072 Cache size (128K). 1904273 Requested pages found in the cache (97%). 191134 Requested pages not found in the cache. 19218 Pages created in the cache. 193116 Pages read into the cache. 19493 Pages written from the cache to the backing file. 1955 Clean pages forced from the cache. 19613 Dirty pages forced from the cache. 1970 Dirty buffers written by trickle-sync thread. 198130 Current clean buffer count. 1994 Current dirty buffer count. 200</pre> 201 <p>The statistics for this cache say that there have been 4,273 requests of 202the cache, and only 116 of those requests required an I/O from disk. This 203means that the cache is working well, yielding a 97% cache hit rate. The 204<a href="../api_reference/C/db_stat.html" class="olink">db_stat utility</a> will present these statistics both for the cache 205as a whole and for each file within the cache separately.</p> 206 </div> 207 <div class="sect2" lang="en" xml:lang="en"> 208 <div class="titlepage"> 209 <div> 210 <div> 211 <h3 class="title"><a id="am_conf_byteorder"></a>Selecting a byte order</h3> 212 </div> 213 </div> 214 </div> 215 <p>Database files created by Berkeley DB can be created in either little- or 216big-endian formats. The byte order used for the underlying database 217is specified by calling the <a href="../api_reference/C/dbset_lorder.html" class="olink">DB->set_lorder()</a> method. If no order 218is selected, the native format of the machine on which the database is 219created will be used.</p> 220 <p>Berkeley DB databases are architecture independent, and any format database can 221be used on a machine with a different native format. In this case, as 222each page that is read into or written from the cache must be converted 223to or from the host format, and databases with non-native formats will 224incur a performance penalty for the run-time conversion.</p> 225 <p> 226 <span class="bold"> 227 <strong>It is important to note that the Berkeley DB access methods do no data 228conversion for application specified data. Key/data pairs written on a 229little-endian format architecture will be returned to the application 230exactly as they were written when retrieved on a big-endian format 231architecture.</strong> 232 </span> 233 </p> 234 </div> 235 <div class="sect2" lang="en" xml:lang="en"> 236 <div class="titlepage"> 237 <div> 238 <div> 239 <h3 class="title"><a id="am_conf_dup"></a>Duplicate data items</h3> 240 </div> 241 </div> 242 </div> 243 <p>The Btree and Hash access methods support the creation of multiple data 244items for a single key item. By default, multiple data items are not 245permitted, and each database store operation will overwrite any previous 246data item for that key. To configure Berkeley DB for duplicate data items, 247call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag. Only one 248copy of the key will be stored for each set of duplicate data items. 249If the Btree access method comparison routine returns that two keys 250compare equally, it is undefined which of the two keys will be stored 251and returned from future database operations.</p> 252 <p>By default, Berkeley DB stores duplicates in the order in which they were added, 253that is, each new duplicate data item will be stored after any already 254existing data items. This default behavior can be overridden by using 255the <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> method and one of the <a href="../api_reference/C/dbcput.html#put_DB_AFTER" class="olink">DB_AFTER</a>, <a href="../api_reference/C/dbcput.html#put_DB_BEFORE" class="olink">DB_BEFORE</a>, <a href="../api_reference/C/dbcput.html#put_DB_KEYFIRST" class="olink">DB_KEYFIRST</a> or <a href="../api_reference/C/dbcput.html#put_DB_KEYLAST" class="olink">DB_KEYLAST</a> flags. 256Alternatively, Berkeley DB may be configured to sort duplicate data items.</p> 257 <p>When stepping through the database sequentially, duplicate data items will 258be returned individually, as a key/data pair, where the key item only 259changes after the last duplicate data item has been returned. For this 260reason, duplicate data items cannot be accessed using the 261<a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> method, as it always returns the first of the duplicate data 262items. Duplicate data items should be retrieved using a Berkeley DB cursor 263interface such as the <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> method.</p> 264 <p>There is a flag that permits applications to request the following data 265item only if it <span class="bold"><strong>is</strong></span> a duplicate data item of the current entry, 266see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_DUP" class="olink">DB_NEXT_DUP</a> for more information. There is a flag that 267permits applications to request the following data item only if it 268<span class="bold"><strong>is not</strong></span> a duplicate data item of the current entry, see 269<a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_NODUP" class="olink">DB_NEXT_NODUP</a> and <a href="../api_reference/C/dbcget.html#dbcget_DB_PREV_NODUP" class="olink">DB_PREV_NODUP</a> for more information.</p> 270 <p>It is also possible to maintain duplicate records in sorted order. Sorting 271duplicates will significantly increase performance when searching them 272and performing equality joins, common operations when using secondary 273indices. To configure Berkeley DB to sort duplicate data items, the application 274must call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag (in 275addition to the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag). In addition, a custom comparison 276function may be specified using the <a href="../api_reference/C/dbset_dup_compare.html" class="olink">DB->set_dup_compare()</a> method. If the 277<a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag is given, but no comparison routine is specified, 278then Berkeley DB defaults to the same lexicographical sorting used for Btree 279keys, with shorter items collating before longer items.</p> 280 <p>If the duplicate data items are unsorted, applications may store identical 281duplicate data items, or, for those that just like the way it sounds, 282<span class="emphasis"><em>duplicate duplicates</em></span>.</p> 283 <p><span class="bold"><strong>In this release it is an error to attempt to store identical 284duplicate data items when duplicates are being stored in a sorted order.</strong></span> 285This restriction is expected to be lifted in a future release. There 286is a flag that permits applications to disallow storing duplicate data 287items when the database has been configured for sorted duplicates, see 288<a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> for more information. Applications not wanting to 289permit duplicate duplicates in databases configured for sorted 290duplicates should begin using the <a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> flag immediately.</p> 291 <p>For further information on how searching and insertion behaves in the 292presence of duplicates (sorted or not), see the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> <a href="../api_reference/C/dbput.html" class="olink">DB->put()</a>, <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> and 293<a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> documentation.</p> 294 </div> 295 <div class="sect2" lang="en" xml:lang="en"> 296 <div class="titlepage"> 297 <div> 298 <div> 299 <h3 class="title"><a id="am_conf_malloc"></a>Non-local memory allocation</h3> 300 </div> 301 </div> 302 </div> 303 <p>Berkeley DB allocates memory for returning key/data pairs and statistical 304information which becomes the responsibility of the application. 305There are also interfaces where an application will allocate memory 306which becomes the responsibility of Berkeley DB.</p> 307 <p>On systems in which there may be multiple library versions of the 308standard allocation routines (notably Windows NT), transferring memory 309between the library and the application will fail because the Berkeley DB 310library allocates memory from a different heap than the application 311uses to free it, or vice versa. To avoid this problem, the 312<a href="../api_reference/C/envset_alloc.html" class="olink">DB_ENV->set_alloc()</a> and <a href="../api_reference/C/dbset_alloc.html" class="olink">DB->set_alloc()</a> methods can be used to 313give Berkeley DB references to the application's allocation routines.</p> 314 </div> 315 </div> 316 <div class="navfooter"> 317 <hr /> 318 <table width="100%" summary="Navigation footer"> 319 <tr> 320 <td width="40%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td> 321 <td width="20%" align="center"> 322 <a accesskey="u" href="am_conf.html">Up</a> 323 </td> 324 <td width="40%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td> 325 </tr> 326 <tr> 327 <td width="40%" align="left" valign="top">Logical record numbers </td> 328 <td width="20%" align="center"> 329 <a accesskey="h" href="index.html">Home</a> 330 </td> 331 <td width="40%" align="right" valign="top"> Btree access method specific configuration</td> 332 </tr> 333 </table> 334 </div> 335 </body> 336</html> 337