1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>BTree Configuration</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
9    <link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
10    <link rel="up" href="dbconfig.html" title="Chapter��6.��Database Configuration" />
11    <link rel="previous" href="cachesize.html" title="Selecting the Cache Size" />
12  </head>
13  <body>
14    <div class="navheader">
15      <table width="100%" summary="Navigation header">
16        <tr>
17          <th colspan="3" align="center">BTree Configuration</th>
18        </tr>
19        <tr>
20          <td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td>
21          <th width="60%" align="center">Chapter��6.��Database Configuration</th>
22          <td width="20%" align="right">��</td>
23        </tr>
24      </table>
25      <hr />
26    </div>
27    <div class="sect1" lang="en" xml:lang="en">
28      <div class="titlepage">
29        <div>
30          <div>
31            <h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2>
32          </div>
33        </div>
34        <div></div>
35      </div>
36      <p>
37        In going through the previous chapters in this book, you may notice that
38        we touch on some topics that are specific to BTree, but we do not cover
39        those topics in any real detail. In this section, we will discuss
40        configuration issues that are unique to BTree.
41    </p>
42      <p>
43        Specifically, in this section we describe:      
44    </p>
45      <div class="itemizedlist">
46        <ul type="disc">
47          <li>
48            <p>
49                Allowing duplicate records.
50            </p>
51          </li>
52          <li>
53            <p>
54                Setting comparator callbacks.
55            </p>
56          </li>
57        </ul>
58      </div>
59      <div class="sect2" lang="en" xml:lang="en">
60        <div class="titlepage">
61          <div>
62            <div>
63              <h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3>
64            </div>
65          </div>
66          <div></div>
67        </div>
68        <p>
69            BTree databases can contain duplicate records. One record is
70            considered to be a duplicate of another when both records use keys
71            that compare as equal to one another.
72        </p>
73        <p>
74            By default, keys are compared using a lexicographical comparison,
75            with shorter keys collating higher than longer keys.
76            You can override this default using the
77                <tt class="methodname">DB-&gt;set_bt_compare()</tt>
78                
79                
80            method. See the next section for details.
81        </p>
82        <p>
83            By default, DB databases do not allow duplicate records. As a
84            result, any attempt to write a record that uses a key equal to a
85            previously existing record results in the previously existing record
86            being overwritten by the new record.
87        </p>
88        <p>
89            Allowing duplicate records is useful if you have a database that
90            contains records keyed by a commonly occurring piece of information.
91            It is frequently necessary to allow duplicate records for secondary
92            databases.
93         </p>
94        <p>
95            For example, suppose your primary database contained records related
96            to automobiles. You might in this case want to be able to find all
97            the automobiles in the database that are of a particular color, so
98            you would index on the color of the automobile. However, for any
99            given color there will probably be multiple automobiles. Since the
100            index is the secondary key, this means that multiple secondary
101            database records will share the same key, and so the secondary
102            database must support duplicate records.
103        </p>
104        <div class="sect3" lang="en" xml:lang="en">
105          <div class="titlepage">
106            <div>
107              <div>
108                <h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4>
109              </div>
110            </div>
111            <div></div>
112          </div>
113          <p>
114                Duplicate records can be stored in sorted or unsorted order. 
115                You can cause DB to automatically sort your duplicate
116                records by 
117                <span> 
118                    specifying the <tt class="literal">DB_DUPSORT</tt> flag at
119                    database creation time.
120                </span>
121                
122            </p>
123          <p>
124                If sorted duplicates are supported, then the 
125                <span>
126                    sorting function specified on 
127                    <tt class="methodname">DB-&gt;set_dup_compare()</tt>
128                    
129                </span>
130                
131                is used to determine the location of the duplicate record in its
132                duplicate set. If no such function is provided, then the default
133                lexicographical comparison is used.
134            </p>
135        </div>
136        <div class="sect3" lang="en" xml:lang="en">
137          <div class="titlepage">
138            <div>
139              <div>
140                <h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4>
141              </div>
142            </div>
143            <div></div>
144          </div>
145          <p>
146                For performance reasons, BTrees should always contain sorted
147                records. (BTrees containing unsorted entries must potentially 
148                spend a great deal more time locating an entry than does a BTree
149                that contains sorted entries).  That said, DB provides support 
150                for suppressing automatic sorting of duplicate records because it may be that
151                your application is inserting records that are already in a
152                sorted order.
153            </p>
154          <p>
155                That is, if the database is configured to support unsorted
156                duplicates, then the assumption is that your application
157                will manually perform the sorting. In this event,
158                expect to pay a significant performance penalty. Any time you
159                place records into the database in a sort order not know to
160                DB, you will pay a performance penalty
161            </p>
162          <p>
163                That said, this is how DB behaves when inserting records
164                into a database that supports non-sorted duplicates:
165            </p>
166          <div class="itemizedlist">
167            <ul type="disc">
168              <li>
169                <p>
170                    If your application simply adds a duplicate record using 
171                        <span><tt class="methodname">DB-&gt;put()</tt>,</span>
172                        
173                        
174                    then the record is inserted at the end of its sorted duplicate set.
175                </p>
176              </li>
177              <li>
178                <p>
179                    If a cursor is used to put the duplicate record to the database,
180                    then the new record is placed in the duplicate set according to the
181                    flags that are provided on the 
182                        <tt class="methodname">DBC-&gt;put()</tt>
183                        
184                    method. The relevant flags are:
185                </p>
186                <div class="itemizedlist">
187                  <ul type="circle">
188                    <li>
189                      <p>
190                            <tt class="literal">DB_AFTER</tt>
191                            
192                        </p>
193                      <p>
194                        The data
195                        <span>
196                            provided on the call to
197                            <tt class="methodname">DBC-&gt;put()</tt>
198                            
199                        </span>
200                        is placed into the database
201                        as a duplicate record. The key used for this operation is
202                        the key used for the record to which the cursor currently
203                        refers. Any key provided on the call 
204                        
205                        <span>
206                        to
207                            <tt class="methodname">DBC-&gt;put()</tt>
208                            
209                        </span>
210
211                        is therefore ignored.
212                        </p>
213                      <p>
214                            The duplicate record is inserted into the database
215                            immediately after the cursor's current position in the
216                            database.
217                        </p>
218                      <p>
219                            This flag is ignored if sorted duplicates are supported for
220                            the database.
221                        </p>
222                    </li>
223                    <li>
224                      <p>
225                            <tt class="literal">DB_BEFORE</tt>
226                            
227                        </p>
228                      <p>
229                            Behaves the same as 
230                                <tt class="literal">DB_AFTER</tt>
231                                
232                            except that the new record is inserted immediately before 
233                            the cursor's current location in the database.
234                        </p>
235                    </li>
236                    <li>
237                      <p>
238                            <tt class="literal">DB_KEYFIRST</tt>
239                            
240                        </p>
241                      <p>
242                            If the key 
243                            <span>
244                                provided on the call to
245                                <tt class="methodname">DBC-&gt;put()</tt>
246                                
247                            </span>
248                            already exists in the
249                            database, and the database is configured to use duplicates
250                            without sorting, then the new record is inserted as the first entry
251                            in the appropriate duplicates list.
252                        </p>
253                    </li>
254                    <li>
255                      <p>
256                            <tt class="literal">DB_KEYLAST</tt>
257                            
258                        </p>
259                      <p>
260                            Behaves identically to
261                                <tt class="literal">DB_KEYFIRST</tt>
262                                
263                            except that the new duplicate record is inserted as the last
264                            record in the duplicates list.
265                        </p>
266                    </li>
267                  </ul>
268                </div>
269              </li>
270            </ul>
271          </div>
272        </div>
273        <div class="sect3" lang="en" xml:lang="en">
274          <div class="titlepage">
275            <div>
276              <div>
277                <h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4>
278              </div>
279            </div>
280            <div></div>
281          </div>
282          <p>
283            Duplicates support can only be configured
284            at database creation time. You do this by specifying the appropriate
285            <span>
286                flags to
287                <tt class="methodname">DB-&gt;set_flags()</tt>
288                
289            </span>
290            
291            before the database is opened for the first time.
292        </p>
293          <p>
294            The 
295                <span>flags</span>
296                
297            that you can use are:
298        </p>
299          <div class="itemizedlist">
300            <ul type="disc">
301              <li>
302                <p>
303                    <tt class="literal">DB_DUP</tt>
304                    
305                </p>
306                <p>
307                    The database supports non-sorted duplicate records.
308                </p>
309              </li>
310              <li>
311                <p>
312                    <tt class="literal">DB_DUPSORT</tt>
313                    
314                </p>
315                <p>
316                    The database supports sorted duplicate records.
317                </p>
318              </li>
319            </ul>
320          </div>
321          <p>
322            The following code fragment illustrates how to configure a database
323            to support sorted duplicate records:
324        </p>
325          <a id="c_btree_dupsort"></a>
326          <pre class="programlisting">#include &lt;db.h&gt;
327...
328
329DB *dbp;
330FILE *error_file_pointer;
331int ret;
332char *program_name = "my_prog";
333char *file_name = "mydb.db";
334
335/* Variable assignments omitted for brevity */
336
337/* Initialize the DB handle */
338ret = db_create(&amp;dbp, NULL, 0);
339if (ret != 0) {
340    fprintf(error_file_pointer, "%s: %s\n", program_name,
341        db_strerror(ret));
342    return(ret);
343}
344
345/* Set up error handling for this database */
346dbp-&gt;set_errfile(dbp, error_file_pointer);
347dbp-&gt;set_errpfx(dbp, program_name);
348                                                                                                                                  
349/*
350 * Configure the database for sorted duplicates
351 */
352ret = dbp-&gt;set_flags(dbp, DB_DUPSORT);
353if (ret != 0) {
354    dbp-&gt;err(dbp, ret, "Attempt to set DUPSORT flag failed.");
355    dbp-&gt;close(dbp, 0);
356    return(ret);
357}
358                                                                                                                                  
359/* Now open the database */
360ret = dbp-&gt;open(dbp,        /* Pointer to the database */
361                NULL,       /* Txn pointer */
362                file_name,  /* File name */
363                NULL,       /* Logical db name (unneeded) */
364                DB_BTREE,   /* Database type (using btree) */
365                DB_CREATE,  /* Open flags */
366                0);         /* File mode. Using defaults */
367if (ret != 0) {
368    dbp-&gt;err(dbp, ret, "Database '%s' open failed.", file_name);
369    dbp-&gt;close(dbp, 0);
370    return(ret);
371} </pre>
372        </div>
373      </div>
374      <div class="sect2" lang="en" xml:lang="en">
375        <div class="titlepage">
376          <div>
377            <div>
378              <h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3>
379            </div>
380          </div>
381          <div></div>
382        </div>
383        <p>
384            By default, DB uses a lexicographical comparison function where
385            shorter records collate before longer records. For the majority of
386            cases, this comparison works well and you do not need to manage
387            it in any way. 
388         </p>
389        <p>
390            However, in some situations your application's performance can
391            benefit from setting a custom comparison routine. You can do this
392            either for database keys, or for the data if your
393            database supports sorted duplicate records.
394         </p>
395        <p>
396            Some of the reasons why you may want to provide a custom sorting
397            function are:
398         </p>
399        <div class="itemizedlist">
400          <ul type="disc">
401            <li>
402              <p>
403                    Your database is keyed using strings and you want to provide
404                    some sort of language-sensitive ordering to that data. Doing
405                    so can help increase the locality of reference that allows
406                    your database to perform at its best.
407                </p>
408            </li>
409            <li>
410              <p>
411                    You are using a little-endian system (such as x86) and you
412                    are using integers as your database's keys. Berkeley DB
413                    stores keys as byte strings and little-endian integers
414                    do not sort well when viewed as byte strings. There are
415                    several solutions to this problem, one being to provide a
416                    custom comparison function. See
417                    <a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a> 
418                    for more information.
419                </p>
420            </li>
421            <li>
422              <p>
423                    You you do not want the entire key to participate in the
424                    comparison, for whatever reason.  In 
425                    this case, you may want to provide a custom comparison
426                    function so that only the relevant bytes are examined.
427                </p>
428            </li>
429          </ul>
430        </div>
431        <div class="sect3" lang="en" xml:lang="en">
432          <div class="titlepage">
433            <div>
434              <div>
435                <h4 class="title"><a id="creatingComparisonFunctions"></a>
436                <span>Creating Comparison Functions</span>
437                
438            </h4>
439              </div>
440            </div>
441            <div></div>
442          </div>
443          <p>
444                You set a BTree's key
445                    <span>
446                        comparison function 
447                    </span>
448                    
449                using
450                    <span><tt class="methodname">DB-&gt;set_bt_compare()</tt>.</span>
451                    
452                    
453                You can also set a BTree's duplicate data comparison function using
454                    <span><tt class="methodname">DB-&gt;set_dup_compare()</tt>.</span>
455                    
456                    
457                
458            </p>
459          <p>
460            <span>
461            You cannot use these methods after the database has been opened.
462            Also, if 
463            </span>
464            
465            the database already exists when it is opened, the
466                    <span>
467                        function 
468                    </span>
469                    
470            provided to these methods must be the same as
471            that historically used to create the database or corruption can
472            occur.
473         </p>
474          <p>
475                The value that you provide to the <tt class="methodname">set_bt_compare()</tt> method 
476                is a pointer to a function that has the following signature:
477            </p>
478          <pre class="programlisting">int (*function)(DB *db, const DBT *key1, const DBT *key2)</pre>
479          <p>
480                This function must return an integer value less than, equal to,
481                or greater than 0. If key1 is considered to be greater than
482                key2, then the function must return a value that is greater than
483                0. If the two are equal, then the function must return 0, and if
484                the first key is less than the second then the function must return
485                a negative value.
486            </p>
487          <p>
488                The function that you provide to <tt class="methodname">set_dup_compare()</tt> 
489                works in exactly the same way, except that the 
490                <tt class="literal">DBT</tt> 
491                 
492                parameters hold record data items instead of keys.
493            </p>
494          <p>
495                For example, an example routine that is used to sort integer
496                keys in the database is:
497                <span>
498                    
499                </span>
500            </p>
501          <a id="c_btree1"></a>
502          <pre class="programlisting">int
503compare_int(DB *dbp, const DBT *a, const DBT *b)
504{
505    int ai, bi;
506
507    /* 
508     * Returns: 
509     * &lt; 0 if a &lt; b 
510     * = 0 if a = b 
511     * &gt; 0 if a &gt; b 
512     */ 
513    memcpy(&amp;ai, a-&gt;data, sizeof(int)); 
514    memcpy(&amp;bi, b-&gt;data, sizeof(int)); 
515    return (ai - bi); 
516} </pre>
517          <p>
518            Note that the data must first be copied into memory that is
519            appropriately aligned, as Berkeley DB does not guarantee any kind of
520            alignment of the underlying data, including for comparison routines.
521            When writing comparison routines, remember that databases created on
522            machines of different architectures may have different integer byte
523            orders, for which your code may need to compensate.
524         </p>
525          <p>
526            To cause DB to use this comparison function:
527         </p>
528          <a id="c_btree2"></a>
529          <pre class="programlisting">#include &lt;db.h&gt;
530#include &lt;string.h&gt;
531
532...
533                                                                                                                                      
534DB *dbp;
535int ret;
536                                                                                                                                      
537/* Create a database */
538ret = db_create(&amp;dbp, NULL, 0);
539if (ret != 0) {
540        fprintf(stderr, "%s: %s\n", "my_program",
541          db_strerror(ret));
542        return(-1);
543}
544                                                                                                                                      
545/* Set up the btree comparison function for this database */
546dbp-&gt;set_bt_compare(dbp, compare_int);
547
548/* Database open call follows sometime after this. */ </pre>
549        </div>
550      </div>
551    </div>
552    <div class="navfooter">
553      <hr />
554      <table width="100%" summary="Navigation footer">
555        <tr>
556          <td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td>
557          <td width="20%" align="center">
558            <a accesskey="u" href="dbconfig.html">Up</a>
559          </td>
560          <td width="40%" align="right">��</td>
561        </tr>
562        <tr>
563          <td width="40%" align="left" valign="top">Selecting the Cache Size��</td>
564          <td width="20%" align="center">
565            <a accesskey="h" href="index.html">Home</a>
566          </td>
567          <td width="40%" align="right" valign="top">��</td>
568        </tr>
569      </table>
570    </div>
571  </body>
572</html>
573