• Home
  • History
  • Annotate
  • Line#
  • Navigate
  • Raw
  • Download
  • only in /netgear-WNDR4500v2-V1.0.0.60_1.0.38/ap/gpl/timemachine/db-4.7.25.NC/docs/ref/am_misc/
1<!--$Id: diskspace.so,v 10.17 2002/08/09 13:43:47 bostic Exp $-->
2<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
3<!--See the file LICENSE for redistribution information.-->
4<html>
5<head>
6<title>Berkeley DB Reference Guide: Disk space requirements</title>
7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
9</head>
10<body bgcolor=white>
11<a name="2"><!--meow--></a>
12<table width="100%"><tr valign=top>
13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Access Methods</dl></b></td>
14<td align=right><a href="../am_misc/dbsizes.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../am_misc/tune.html"><img src="../../images/next.gif" alt="Next"></a>
15</td></tr></table>
16<p align=center><b>Disk space requirements</b></p>
17<p>It is possible to estimate the total database size based on the size of
18the data.  The following calculations are an estimate of how many bytes
19you will need to hold a set of data and then how many pages it will take
20to actually store it on disk.</p>
21<p>Space freed by deleting key/data pairs from a Btree or Hash database is
22never returned to the filesystem, although it is reused where possible.
23This means that the Btree and Hash databases are grow-only.  If enough
24keys are deleted from a database that shrinking the underlying file is
25desirable, you should create a new database and copy the records from
26the old one into it.</p>
27<p>These are rough estimates at best. For example, they do not take into
28account overflow records, filesystem metadata information, large sets
29of duplicate data items (where the key is only stored once), or
30real-life situations where the sizes of key and data items are wildly
31variable, and the page-fill factor changes over time.</p>
32<b>Btree</b>
33<p>The formulas for the Btree access method are as follows:</p>
34<blockquote><pre>useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor
35<p>
36bytes-of-data = n-records *
37    (bytes-per-entry + page-overhead-for-two-entries)
38<p>
39n-pages-of-data = bytes-of-data / useful-bytes-per-page
40<p>
41total-bytes-on-disk = n-pages-of-data * page-size
42</pre></blockquote>
43<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
44that will actually hold the application data.  It is computed as the total
45number of bytes on the page that are available to hold application data,
46corrected by the percentage of the page that is likely to contain data.
47The reason for this correction is that the percentage of a page that
48contains application data can vary from close to 50% after a page split
49to almost 100% if the entries in the database were inserted in sorted
50order.  Obviously, the <b>page-fill-factor</b> can drastically alter
51the amount of disk space required to hold any particular data set.  The
52page-fill factor of any existing database can be displayed using the
53<a href="../../utility/db_stat.html">db_stat</a> utility.</p>
54<p>The page-overhead for Btree databases is 26 bytes.  As an example, using
55an 8K page size, with an 85% page-fill factor, there are 6941 bytes of
56useful space on each page:</p>
57<blockquote><pre>6941 = (8192 - 26) * .85</pre></blockquote>
58<p>The total <b>bytes-of-data</b> is an easy calculation: It is the
59number of key or data items plus the overhead required to store each
60item on a page.  The overhead to store a key or data item on a Btree
61page is 5 bytes.  So, it would take 1560000000 bytes, or roughly 1.34GB
62of total data to store 60,000,000 key/data pairs, assuming each key or
63data item was 8 bytes long:</p>
64<blockquote><pre>1560000000 = 60000000 * ((8 + 5) * 2)</pre></blockquote>
65<p>The total pages of data, <b>n-pages-of-data</b>, is the
66<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>.  In
67the example, there are 224751 pages of data.</p>
68<blockquote><pre>224751 = 1560000000 / 6941</pre></blockquote>
69<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
70multiplied by the <b>page-size</b>.  In the example, the result is
711841160192 bytes, or roughly 1.71GB.</p>
72<blockquote><pre>1841160192 = 224751 * 8192</pre></blockquote>
73<b>Hash</b>
74<p>The formulas for the Hash access method are as follows:</p>
75<blockquote><pre>useful-bytes-per-page = (page-size - page-overhead)
76<p>
77bytes-of-data = n-records *
78    (bytes-per-entry + page-overhead-for-two-entries)
79<p>
80n-pages-of-data = bytes-of-data / useful-bytes-per-page
81<p>
82total-bytes-on-disk = n-pages-of-data * page-size
83</pre></blockquote>
84<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
85that will actually hold the application data.  It is computed as the total
86number of bytes on the page that are available to hold application data.
87If the application has explicitly set a page-fill factor, pages will
88not necessarily be kept full.  For databases with a preset fill factor,
89see the calculation below.  The page-overhead for Hash databases is 26
90bytes and the page-overhead-for-two-entries is 6 bytes.</p>
91<p>As an example, using an 8K page size, there are 8166 bytes of useful space
92on each page:</p>
93<blockquote><pre>8166 = (8192 - 26)</pre></blockquote>
94<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
95of key/data pairs plus the overhead required to store each pair on a page.
96In this case that's 6 bytes per pair.  So, assuming 60,000,000 key/data
97pairs, each of which is 8 bytes long, there are 1320000000 bytes, or
98roughly 1.23GB of total data:</p>
99<blockquote><pre>1320000000 = 60000000 * (16 + 6)</pre></blockquote>
100<p>The total pages of data, <b>n-pages-of-data</b>, is the
101<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>.  In
102this example, there are 161646 pages of data.</p>
103<blockquote><pre>161646 = 1320000000 / 8166</pre></blockquote>
104<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
105multiplied by the <b>page-size</b>.  In the example, the result is
1061324204032 bytes, or roughly 1.23GB.</p>
107<blockquote><pre>1324204032 = 161646 * 8192</pre></blockquote>
108<p>Now, let's assume that the application specified a fill factor explicitly.
109The fill factor indicates the target number of items to place on a single
110page (a fill factor might reduce the utilization of each page, but it can
111be useful in avoiding splits and preventing buckets from becoming too
112large).  Using our estimates above, each item is 22 bytes (16 + 6), and
113there are 8166 useful bytes on a page (8192 - 26).  That means that, on
114average, you can fit 371 pairs per page.</p>
115<blockquote><pre>371 = 8166 / 22</pre></blockquote>
116<p>However, let's assume that the application designer knows that although
117most items are 8 bytes, they can sometimes be as large as 10, and it's
118very important to avoid overflowing buckets and splitting.  Then, the
119application might specify a fill factor of 314.</p>
120<blockquote><pre>314 = 8166 / 26</pre></blockquote>
121<p>With a fill factor of 314, then the formula for computing database size
122is</p>
123<blockquote><pre>n-pages-of-data = npairs / pairs-per-page</pre></blockquote>
124<p>or 191082.</p>
125<blockquote><pre>191082 = 60000000 / 314</pre></blockquote>
126<p>At 191082 pages, the total database size would be 1565343744, or 1.46GB.</p>
127<blockquote><pre>1565343744 = 191082 * 8192</pre></blockquote>
128<p>There are a few additional caveats with respect to Hash databases.  This
129discussion assumes that the hash function does a good job of evenly
130distributing keys among hash buckets.  If the function does not do this,
131you may find your table growing significantly larger than you expected.
132Secondly, in order to provide support for Hash databases coexisting with
133other databases in a single file, pages within a Hash database are
134allocated in power-of-two chunks.  That means that a Hash database with 65
135buckets will take up as much space as a Hash database with 128 buckets;
136each time the Hash database grows beyond its current power-of-two number
137of buckets, it allocates space for the next power-of-two buckets.  This
138space may be sparsely allocated in the file system, but the files will
139appear to be their full size.  Finally, because of this need for
140contiguous allocation, overflow pages and duplicate pages can be allocated
141only at specific points in the file, and this too can lead to sparse hash
142tables.</p>
143<table width="100%"><tr><td><br></td><td align=right><a href="../am_misc/dbsizes.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../am_misc/tune.html"><img src="../../images/next.gif" alt="Next"></a>
144</td></tr></table>
145<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
146</body>
147</html>
148