1<!--$Id: tune.so,v 10.10 2006/08/25 23:25:17 bostic Exp $-->
2<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
3<!--See the file LICENSE for redistribution information.-->
4<html>
5<head>
6<title>Berkeley DB Reference Guide: Access method tuning</title>
7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
9</head>
10<body bgcolor=white>
11<a name="2"><!--meow--></a><a name="3"><!--meow--></a>
12<table width="100%"><tr valign=top>
13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Access Methods</dl></b></td>
14<td align=right><a href="/am_misc/diskspace.html"><img src="/images/prev.gif" alt="Prev"></a><a href="/toc.html"><img src="/images/ref.gif" alt="Ref"></a><a href="/am_misc/faq.html"><img src="/images/next.gif" alt="Next"></a>
15</td></tr></table>
16<p align=center><b>Access method tuning</b></p>
17<p>There are a few different issues to consider when tuning the performance
18of Berkeley DB access method applications.</p>
19<br>
20<b>access method</b><ul compact><li>An application's choice of a database access method can significantly
21affect performance.  Applications using fixed-length records and integer
22keys are likely to get better performance from the Queue access method.
23Applications using variable-length records are likely to get better
24performance from the Btree access method, as it tends to be faster for
25most applications than either the Hash or Recno access methods.  Because
26the access method APIs are largely identical between the Berkeley DB access
27methods, it is easy for applications to benchmark the different access
28methods against each other.  See <a href="/ref/am_conf/select.html">Selecting an access method</a> for more information.</ul>
29<b>cache size</b><ul compact><li>The Berkeley DB database cache defaults to a fairly small size, and most
30applications concerned with performance will want to set it explicitly.
31Using a too-small cache will result in horrible performance.  The first
32step in tuning the cache size is to use the db_stat utility (or the
33statistics returned by the <a href="/api_c/db_stat.html">DB-&gt;stat</a> function) to measure the
34effectiveness of the cache.  The goal is to maximize the cache's hit
35rate.  Typically, increasing the size of the cache until the hit rate
36reaches 100% or levels off will yield the best performance.  However,
37if your working set is sufficiently large, you will be limited by the
38system's available physical memory.  Depending on the virtual memory
39and file system buffering policies of your system, and the requirements
40of other applications, the maximum cache size will be some amount
41smaller than the size of physical memory.  If you find that
42<a href="/utility/db_stat.html">db_stat</a> shows that increasing the cache size improves your hit
43rate, but performance is not improving (or is getting worse), then it's
44likely you've hit other system limitations.  At this point, you should
45review the system's swapping/paging activity and limit the size of the
46cache to the maximum size possible without triggering paging activity.
47Finally, always remember to make your measurements under conditions as
48close as possible to the conditions your deployed application will run
49under, and to test your final choices under worst-case conditions.</ul>
50<b>shared memory</b><ul compact><li>By default, Berkeley DB creates its database environment shared regions in
51filesystem backed memory.  Some systems do not distinguish between
52regular filesystem pages and memory-mapped pages backed by the
53filesystem, when selecting dirty pages to be flushed back to disk.  For
54this reason, dirtying pages in the Berkeley DB cache may cause intense
55filesystem activity, typically when the filesystem sync thread or
56process is run.  In some cases, this can dramatically affect application
57throughput.  The workaround to this problem is to create the shared
58regions in system shared memory (<a href="/api_c/env_open.html#DB_SYSTEM_MEM">DB_SYSTEM_MEM</a>) or application
59private memory (<a href="/api_c/env_open.html#DB_PRIVATE">DB_PRIVATE</a>), or, in cases where this behavior
60is configurable, to turn off the operating system's flushing of
61memory-mapped pages.</ul>
62<b>large key/data items</b><ul compact><li>Storing large key/data items in a database can alter the performance
63characteristics of Btree, Hash and Recno databases.  The first parameter
64to consider is the database page size.  When a key/data item is too
65large to be placed on a database page, it is stored on "overflow" pages
66that are maintained outside of the normal database structure (typically,
67items that are larger than one-quarter of the page size are deemed to
68be too large).  Accessing these overflow pages requires at least one
69additional page reference over a normal access, so it is usually better
70to increase the page size than to create a database with a large number
71of overflow pages.  Use the <a href="/utility/db_stat.html">db_stat</a> utility (or the statistics
72returned by the <a href="/api_c/db_stat.html">DB-&gt;stat</a> method) to review the number of overflow
73pages in the database.
74<p>The second issue is using large key/data items instead of duplicate data
75items.  While this can offer performance gains to some applications
76(because it is possible to retrieve several data items in a single get
77call), once the key/data items are large enough to be pushed off-page,
78they will slow the application down.  Using duplicate data items is
79usually the better choice in the long run.</p></ul>
80<br>
81<p>A common question when tuning Berkeley DB applications is scalability.  For
82example, people will ask why, when adding additional threads or
83processes to an application, the overall database throughput decreases,
84even when all of the operations are read-only queries.</p>
85<p>First, while read-only operations are logically concurrent, they still
86have to acquire mutexes on internal Berkeley DB data structures.  For example,
87when searching a linked list and looking for a database page, the linked
88list has to be locked against other threads of control attempting to add
89or remove pages from the linked list.  The more threads of control you
90add, the more contention there will be for those shared data structure
91resources.</p>
92<p>Second, once contention starts happening, applications will also start
93to see threads of control convoy behind locks (especially on
94architectures supporting only test-and-set spin mutexes, rather than
95blocking mutexes).  On test-and-set architectures, threads of control
96waiting for locks must attempt to acquire the mutex, sleep, check the
97mutex again, and so on.  Each failed check of the mutex and subsequent
98sleep wastes CPU and decreases the overall throughput of the system.</p>
99<p>Third, every time a thread acquires a shared mutex, it has to shoot down
100other references to that memory in every other CPU on the system.  Many
101modern snoopy cache architectures have slow shoot down characteristics.</p>
102<p>Fourth, schedulers don't care what application-specific mutexes a thread
103of control might hold when de-scheduling a thread.  If a thread of
104control is descheduled while holding a shared data structure mutex,
105other threads of control will be blocked until the scheduler decides to
106run the blocking thread of control again.  The more threads of control
107that are running, the smaller their quanta of CPU time, and the more
108likely they will be descheduled while holding a Berkeley DB mutex.</p>
109<p>The results of adding new threads of control to an application, on the
110application's throughput, is application and hardware specific and
111almost entirely dependent on the application's data access pattern and
112hardware.  In general, using operating systems that support blocking
113mutexes will often make a tremendous difference, and limiting threads
114of control to to some small multiple of the number of CPUs is usually
115the right choice to make.</p>
116<table width="100%"><tr><td><br></td><td align=right><a href="/am_misc/diskspace.html"><img src="/images/prev.gif" alt="Prev"></a><a href="/toc.html"><img src="/images/ref.gif" alt="Ref"></a><a href="/am_misc/faq.html"><img src="/images/next.gif" alt="Next"></a>
117</td></tr></table>
118<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
119</body>
120</html>
121