1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>BTree Configuration</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
9    <link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
10    <link rel="up" href="dbconfig.html" title="Chapter��6.��Database Configuration" />
11    <link rel="previous" href="cachesize.html" title="Selecting the Cache Size" />
12  </head>
13  <body>
14    <div class="navheader">
15      <table width="100%" summary="Navigation header">
16        <tr>
17          <th colspan="3" align="center">BTree Configuration</th>
18        </tr>
19        <tr>
20          <td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td>
21          <th width="60%" align="center">Chapter��6.��Database Configuration</th>
22          <td width="20%" align="right">��</td>
23        </tr>
24      </table>
25      <hr />
26    </div>
27    <div class="sect1" lang="en" xml:lang="en">
28      <div class="titlepage">
29        <div>
30          <div>
31            <h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2>
32          </div>
33        </div>
34        <div></div>
35      </div>
36      <p>
37        In going through the previous chapters in this book, you may notice that
38        we touch on some topics that are specific to BTree, but we do not cover
39        those topics in any real detail. In this section, we will discuss
40        configuration issues that are unique to BTree.
41    </p>
42      <p>
43        Specifically, in this section we describe:      
44    </p>
45      <div class="itemizedlist">
46        <ul type="disc">
47          <li>
48            <p>
49                Allowing duplicate records.
50            </p>
51          </li>
52          <li>
53            <p>
54                Setting comparator callbacks.
55            </p>
56          </li>
57        </ul>
58      </div>
59      <div class="sect2" lang="en" xml:lang="en">
60        <div class="titlepage">
61          <div>
62            <div>
63              <h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3>
64            </div>
65          </div>
66          <div></div>
67        </div>
68        <p>
69            BTree databases can contain duplicate records. One record is
70            considered to be a duplicate of another when both records use keys
71            that compare as equal to one another.
72        </p>
73        <p>
74            By default, keys are compared using a lexicographical comparison,
75            with shorter keys collating higher than longer keys.
76            You can override this default using the
77                
78                <tt class="methodname">Db::set_bt_compare()</tt>
79                
80            method. See the next section for details.
81        </p>
82        <p>
83            By default, DB databases do not allow duplicate records. As a
84            result, any attempt to write a record that uses a key equal to a
85            previously existing record results in the previously existing record
86            being overwritten by the new record.
87        </p>
88        <p>
89            Allowing duplicate records is useful if you have a database that
90            contains records keyed by a commonly occurring piece of information.
91            It is frequently necessary to allow duplicate records for secondary
92            databases.
93         </p>
94        <p>
95            For example, suppose your primary database contained records related
96            to automobiles. You might in this case want to be able to find all
97            the automobiles in the database that are of a particular color, so
98            you would index on the color of the automobile. However, for any
99            given color there will probably be multiple automobiles. Since the
100            index is the secondary key, this means that multiple secondary
101            database records will share the same key, and so the secondary
102            database must support duplicate records.
103        </p>
104        <div class="sect3" lang="en" xml:lang="en">
105          <div class="titlepage">
106            <div>
107              <div>
108                <h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4>
109              </div>
110            </div>
111            <div></div>
112          </div>
113          <p>
114                Duplicate records can be stored in sorted or unsorted order. 
115                You can cause DB to automatically sort your duplicate
116                records by 
117                <span> 
118                    specifying the <tt class="literal">DB_DUPSORT</tt> flag at
119                    database creation time.
120                </span>
121                
122            </p>
123          <p>
124                If sorted duplicates are supported, then the 
125                <span>
126                    sorting function specified on 
127                    
128                    <tt class="methodname">Db::set_dup_compare()</tt>
129                </span>
130                
131                is used to determine the location of the duplicate record in its
132                duplicate set. If no such function is provided, then the default
133                lexicographical comparison is used.
134            </p>
135        </div>
136        <div class="sect3" lang="en" xml:lang="en">
137          <div class="titlepage">
138            <div>
139              <div>
140                <h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4>
141              </div>
142            </div>
143            <div></div>
144          </div>
145          <p>
146                For performance reasons, BTrees should always contain sorted
147                records. (BTrees containing unsorted entries must potentially 
148                spend a great deal more time locating an entry than does a BTree
149                that contains sorted entries).  That said, DB provides support 
150                for suppressing automatic sorting of duplicate records because it may be that
151                your application is inserting records that are already in a
152                sorted order.
153            </p>
154          <p>
155                That is, if the database is configured to support unsorted
156                duplicates, then the assumption is that your application
157                will manually perform the sorting. In this event,
158                expect to pay a significant performance penalty. Any time you
159                place records into the database in a sort order not know to
160                DB, you will pay a performance penalty
161            </p>
162          <p>
163                That said, this is how DB behaves when inserting records
164                into a database that supports non-sorted duplicates:
165            </p>
166          <div class="itemizedlist">
167            <ul type="disc">
168              <li>
169                <p>
170                    If your application simply adds a duplicate record using 
171                        
172                        <span><tt class="methodname">Db::put()</tt>,</span>
173                        
174                    then the record is inserted at the end of its sorted duplicate set.
175                </p>
176              </li>
177              <li>
178                <p>
179                    If a cursor is used to put the duplicate record to the database,
180                    then the new record is placed in the duplicate set according to the
181                    flags that are provided on the 
182                        
183                        <tt class="methodname">Dbc::put()</tt>
184                    method. The relevant flags are:
185                </p>
186                <div class="itemizedlist">
187                  <ul type="circle">
188                    <li>
189                      <p>
190                            <tt class="literal">DB_AFTER</tt>
191                            
192                        </p>
193                      <p>
194                        The data
195                        <span>
196                            provided on the call to
197                            
198                            <tt class="methodname">Dbc::put()</tt>
199                        </span>
200                        is placed into the database
201                        as a duplicate record. The key used for this operation is
202                        the key used for the record to which the cursor currently
203                        refers. Any key provided on the call 
204                        
205                        <span>
206                        to
207                            
208                            <tt class="methodname">Dbc::put()</tt>
209                        </span>
210
211                        is therefore ignored.
212                        </p>
213                      <p>
214                            The duplicate record is inserted into the database
215                            immediately after the cursor's current position in the
216                            database.
217                        </p>
218                      <p>
219                            This flag is ignored if sorted duplicates are supported for
220                            the database.
221                        </p>
222                    </li>
223                    <li>
224                      <p>
225                            <tt class="literal">DB_BEFORE</tt>
226                            
227                        </p>
228                      <p>
229                            Behaves the same as 
230                                <tt class="literal">DB_AFTER</tt>
231                                
232                            except that the new record is inserted immediately before 
233                            the cursor's current location in the database.
234                        </p>
235                    </li>
236                    <li>
237                      <p>
238                            <tt class="literal">DB_KEYFIRST</tt>
239                            
240                        </p>
241                      <p>
242                            If the key 
243                            <span>
244                                provided on the call to
245                                
246                                <tt class="methodname">Dbc::put()</tt>
247                            </span>
248                            already exists in the
249                            database, and the database is configured to use duplicates
250                            without sorting, then the new record is inserted as the first entry
251                            in the appropriate duplicates list.
252                        </p>
253                    </li>
254                    <li>
255                      <p>
256                            <tt class="literal">DB_KEYLAST</tt>
257                            
258                        </p>
259                      <p>
260                            Behaves identically to
261                                <tt class="literal">DB_KEYFIRST</tt>
262                                
263                            except that the new duplicate record is inserted as the last
264                            record in the duplicates list.
265                        </p>
266                    </li>
267                  </ul>
268                </div>
269              </li>
270            </ul>
271          </div>
272        </div>
273        <div class="sect3" lang="en" xml:lang="en">
274          <div class="titlepage">
275            <div>
276              <div>
277                <h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4>
278              </div>
279            </div>
280            <div></div>
281          </div>
282          <p>
283            Duplicates support can only be configured
284            at database creation time. You do this by specifying the appropriate
285            <span>
286                flags to
287                
288                <tt class="methodname">Db::set_flags()</tt>
289            </span>
290            
291            before the database is opened for the first time.
292        </p>
293          <p>
294            The 
295                <span>flags</span>
296                
297            that you can use are:
298        </p>
299          <div class="itemizedlist">
300            <ul type="disc">
301              <li>
302                <p>
303                    <tt class="literal">DB_DUP</tt>
304                    
305                </p>
306                <p>
307                    The database supports non-sorted duplicate records.
308                </p>
309              </li>
310              <li>
311                <p>
312                    <tt class="literal">DB_DUPSORT</tt>
313                    
314                </p>
315                <p>
316                    The database supports sorted duplicate records.
317                </p>
318              </li>
319            </ul>
320          </div>
321          <p>
322            The following code fragment illustrates how to configure a database
323            to support sorted duplicate records:
324        </p>
325          <a id="cxx_btree_dupsort"></a>
326          <pre class="programlisting">#include &lt;db_cxx.h&gt;
327...
328
329Db db(NULL, 0);
330const char *file_name = "myd.db";
331
332try {
333    // Configure the database for sorted duplicates
334    db.set_flags(DB_DUPSORT);
335
336    // Now open the database
337    db.open(NULL,       // Txn pointer
338            file_name,  // File name
339            NULL,       // Logical db name (unneeded)
340            DB_BTREE,   // Database type (using btree)
341            DB_CREATE,  // Open flags
342            0);         // File mode. Using defaults
343} catch(DbException &amp;e) {
344    db.err(e.get_errno(), "Database '%s' open failed.", file_name);
345} catch(std::exception &amp;e) {
346    db.errx("Error opening database: %s : %s\n", file_name, e.what());
347} 
348
349...
350
351try {
352    db.close(0);
353} catch(DbException &amp;e) {
354    db.err(e.get_errno(), "Database '%s' close failed.", file_name);
355} catch(std::exception &amp;e) {
356    db.errx("Error closing database: %s : %s\n", file_name, e.what());
357} 
358
359</pre>
360        </div>
361      </div>
362      <div class="sect2" lang="en" xml:lang="en">
363        <div class="titlepage">
364          <div>
365            <div>
366              <h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3>
367            </div>
368          </div>
369          <div></div>
370        </div>
371        <p>
372            By default, DB uses a lexicographical comparison function where
373            shorter records collate before longer records. For the majority of
374            cases, this comparison works well and you do not need to manage
375            it in any way. 
376         </p>
377        <p>
378            However, in some situations your application's performance can
379            benefit from setting a custom comparison routine. You can do this
380            either for database keys, or for the data if your
381            database supports sorted duplicate records.
382         </p>
383        <p>
384            Some of the reasons why you may want to provide a custom sorting
385            function are:
386         </p>
387        <div class="itemizedlist">
388          <ul type="disc">
389            <li>
390              <p>
391                    Your database is keyed using strings and you want to provide
392                    some sort of language-sensitive ordering to that data. Doing
393                    so can help increase the locality of reference that allows
394                    your database to perform at its best.
395                </p>
396            </li>
397            <li>
398              <p>
399                    You are using a little-endian system (such as x86) and you
400                    are using integers as your database's keys. Berkeley DB
401                    stores keys as byte strings and little-endian integers
402                    do not sort well when viewed as byte strings. There are
403                    several solutions to this problem, one being to provide a
404                    custom comparison function. See
405                    <a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a> 
406                    for more information.
407                </p>
408            </li>
409            <li>
410              <p>
411                    You you do not want the entire key to participate in the
412                    comparison, for whatever reason.  In 
413                    this case, you may want to provide a custom comparison
414                    function so that only the relevant bytes are examined.
415                </p>
416            </li>
417          </ul>
418        </div>
419        <div class="sect3" lang="en" xml:lang="en">
420          <div class="titlepage">
421            <div>
422              <div>
423                <h4 class="title"><a id="creatingComparisonFunctions"></a>
424                <span>Creating Comparison Functions</span>
425                
426            </h4>
427              </div>
428            </div>
429            <div></div>
430          </div>
431          <p>
432                You set a BTree's key
433                    <span>
434                        comparison function 
435                    </span>
436                    
437                using
438                    
439                    <span><tt class="methodname">Db::set_bt_compare()</tt>.</span>
440                    
441                You can also set a BTree's duplicate data comparison function using
442                    
443                    <span><tt class="methodname">Db::set_dup_compare()</tt>.</span>
444                    
445                
446            </p>
447          <p>
448            <span>
449            You cannot use these methods after the database has been opened.
450            Also, if 
451            </span>
452            
453            the database already exists when it is opened, the
454                    <span>
455                        function 
456                    </span>
457                    
458            provided to these methods must be the same as
459            that historically used to create the database or corruption can
460            occur.
461         </p>
462          <p>
463                The value that you provide to the <tt class="methodname">set_bt_compare()</tt> method 
464                is a pointer to a function that has the following signature:
465            </p>
466          <pre class="programlisting">int (*function)(Db *db, const Dbt *key1, const Dbt *key2)</pre>
467          <p>
468                This function must return an integer value less than, equal to,
469                or greater than 0. If key1 is considered to be greater than
470                key2, then the function must return a value that is greater than
471                0. If the two are equal, then the function must return 0, and if
472                the first key is less than the second then the function must return
473                a negative value.
474            </p>
475          <p>
476                The function that you provide to <tt class="methodname">set_dup_compare()</tt> 
477                works in exactly the same way, except that the 
478                 
479                <tt class="literal">Dbt</tt> 
480                parameters hold record data items instead of keys.
481            </p>
482          <p>
483                For example, an example routine that is used to sort integer
484                keys in the database is:
485                
486            </p>
487          <a id="cxx_btree1"></a>
488          <pre class="programlisting">int
489compare_int(Db *dbp, const Dbt *a, const Dbt *b)
490{
491    int ai, bi;
492
493    // Returns: 
494    // &lt; 0 if a &lt; b 
495    // = 0 if a = b 
496    // &gt; 0 if a &gt; b 
497    memcpy(&amp;ai, a-&gt;get_data(), sizeof(int)); 
498    memcpy(&amp;bi, b-&gt;get_data(), sizeof(int)); 
499    return (ai - bi); 
500} </pre>
501          <p>
502            Note that the data must first be copied into memory that is
503            appropriately aligned, as Berkeley DB does not guarantee any kind of
504            alignment of the underlying data, including for comparison routines.
505            When writing comparison routines, remember that databases created on
506            machines of different architectures may have different integer byte
507            orders, for which your code may need to compensate.
508         </p>
509          <p>
510            To cause DB to use this comparison function:
511         </p>
512          <a id="cxx_btree2"></a>
513          <pre class="programlisting">#include &lt;db_cxx.h&gt;
514#include &lt;string.h&gt;
515
516...
517                                                                                                                                      
518Db db(NULL, 0);
519
520// Set up the btree comparison function for this database
521db.set_bt_compare(compare_int);
522
523// Database open call follows sometime after this.</pre>
524        </div>
525      </div>
526    </div>
527    <div class="navfooter">
528      <hr />
529      <table width="100%" summary="Navigation footer">
530        <tr>
531          <td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td>
532          <td width="20%" align="center">
533            <a accesskey="u" href="dbconfig.html">Up</a>
534          </td>
535          <td width="40%" align="right">��</td>
536        </tr>
537        <tr>
538          <td width="40%" align="left" valign="top">Selecting the Cache Size��</td>
539          <td width="20%" align="center">
540            <a accesskey="h" href="index.html">Home</a>
541          </td>
542          <td width="40%" align="right" valign="top">��</td>
543        </tr>
544      </table>
545    </div>
546  </body>
547</html>
548