1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4  <head>
5    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6    <title>BTree Configuration</title>
7    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
8    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
9    <link rel="start" href="index.html" title="Getting Started with Berkeley DB" />
10    <link rel="up" href="dbconfig.html" title="Chapter��11.��Database Configuration" />
11    <link rel="prev" href="cachesize.html" title="Selecting the Cache Size" />
12  </head>
13  <body>
14    <div class="navheader">
15      <table width="100%" summary="Navigation header">
16        <tr>
17          <th colspan="3" align="center">BTree Configuration</th>
18        </tr>
19        <tr>
20          <td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td>
21          <th width="60%" align="center">Chapter��11.��Database Configuration</th>
22          <td width="20%" align="right">��</td>
23        </tr>
24      </table>
25      <hr />
26    </div>
27    <div class="sect1" lang="en" xml:lang="en">
28      <div class="titlepage">
29        <div>
30          <div>
31            <h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2>
32          </div>
33        </div>
34      </div>
35      <div class="toc">
36        <dl>
37          <dt>
38            <span class="sect2">
39              <a href="btree.html#duplicateRecords">Allowing Duplicate Records</a>
40            </span>
41          </dt>
42          <dt>
43            <span class="sect2">
44              <a href="btree.html#comparators">Setting Comparison Functions</a>
45            </span>
46          </dt>
47        </dl>
48      </div>
49      <p>
50        In going through the previous chapters in this book, you may notice that
51        we touch on some topics that are specific to BTree, but we do not cover
52        those topics in any real detail. In this section, we will discuss
53        configuration issues that are unique to BTree.
54    </p>
55      <p>
56        Specifically, in this section we describe:      
57    </p>
58      <div class="itemizedlist">
59        <ul type="disc">
60          <li>
61            <p>
62                Allowing duplicate records.
63            </p>
64          </li>
65          <li>
66            <p>
67                Setting comparator callbacks.
68            </p>
69          </li>
70        </ul>
71      </div>
72      <div class="sect2" lang="en" xml:lang="en">
73        <div class="titlepage">
74          <div>
75            <div>
76              <h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3>
77            </div>
78          </div>
79        </div>
80        <p>
81            BTree databases can contain duplicate records. One record is
82            considered to be a duplicate of another when both records use keys
83            that compare as equal to one another.
84        </p>
85        <p>
86            By default, keys are compared using a lexicographical comparison,
87            with shorter keys collating higher than longer keys.
88            You can override this default using the
89                
90                
91                <code class="methodname">DatabaseConfig.setBtreeComparator()</code>
92            method. See the next section for details.
93        </p>
94        <p>
95            By default, DB databases do not allow duplicate records. As a
96            result, any attempt to write a record that uses a key equal to a
97            previously existing record results in the previously existing record
98            being overwritten by the new record.
99        </p>
100        <p>
101            Allowing duplicate records is useful if you have a database that
102            contains records keyed by a commonly occurring piece of information.
103            It is frequently necessary to allow duplicate records for secondary
104            databases.
105         </p>
106        <p>
107            For example, suppose your primary database contained records related
108            to automobiles. You might in this case want to be able to find all
109            the automobiles in the database that are of a particular color, so
110            you would index on the color of the automobile. However, for any
111            given color there will probably be multiple automobiles. Since the
112            index is the secondary key, this means that multiple secondary
113            database records will share the same key, and so the secondary
114            database must support duplicate records.
115        </p>
116        <div class="sect3" lang="en" xml:lang="en">
117          <div class="titlepage">
118            <div>
119              <div>
120                <h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4>
121              </div>
122            </div>
123          </div>
124          <p>
125                Duplicate records can be stored in sorted or unsorted order. 
126                You can cause DB to automatically sort your duplicate
127                records by 
128                
129                <span> 
130                    setting <code class="methodname">DatabaseConfig.setSortedDuplicates()</code>
131                    to <code class="literal">true</code>. Note that this property must be
132                    set prior to database creation time and it cannot be changed
133                    afterwards.
134                </span>
135            </p>
136          <p>
137                If sorted duplicates are supported, then the 
138                
139                <span>
140                    <code class="classname">java.util.Comparator</code> implementation
141                    identified to
142                    <code class="methodname">DatabaseConfig.setDuplicateComparator()</code>
143                </span>
144                is used to determine the location of the duplicate record in its
145                duplicate set. If no such function is provided, then the default
146                lexicographical comparison is used.
147            </p>
148        </div>
149        <div class="sect3" lang="en" xml:lang="en">
150          <div class="titlepage">
151            <div>
152              <div>
153                <h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4>
154              </div>
155            </div>
156          </div>
157          <p>
158                For performance reasons, BTrees should always contain sorted
159                records. (BTrees containing unsorted entries must potentially 
160                spend a great deal more time locating an entry than does a BTree
161                that contains sorted entries).  That said, DB provides support 
162                for suppressing automatic sorting of duplicate records because it may be that
163                your application is inserting records that are already in a
164                sorted order.
165            </p>
166          <p>
167                That is, if the database is configured to support unsorted
168                duplicates, then the assumption is that your application
169                will manually perform the sorting. In this event,
170                expect to pay a significant performance penalty. Any time you
171                place records into the database in a sort order not know to
172                DB, you will pay a performance penalty
173            </p>
174          <p>
175                That said, this is how DB behaves when inserting records
176                into a database that supports non-sorted duplicates:
177            </p>
178          <div class="itemizedlist">
179            <ul type="disc">
180              <li>
181                <p>
182                    If your application simply adds a duplicate record using 
183                        
184                        
185                        <span><code class="methodname">Database.put()</code>,</span>
186                    then the record is inserted at the end of its sorted duplicate set.
187                </p>
188              </li>
189              <li>
190                <p>
191                    If a cursor is used to put the duplicate record to the database,
192                    then the new record is placed in the duplicate set according to the
193                    actual method used to perform the put. The relevant methods
194                    are:
195                </p>
196                <div class="itemizedlist">
197                  <ul type="circle">
198                    <li>
199                      <p>
200                            
201                            <code class="methodname">Cursor.putAfter()</code>
202                        </p>
203                      <p>
204                        The data
205                        
206                        is placed into the database
207                        as a duplicate record. The key used for this operation is
208                        the key used for the record to which the cursor currently
209                        refers. Any key provided on the call 
210                        
211                        
212
213                        is therefore ignored.
214                        </p>
215                      <p>
216                            The duplicate record is inserted into the database
217                            immediately after the cursor's current position in the
218                            database.
219                        </p>
220                    </li>
221                    <li>
222                      <p>
223                            
224                            <code class="methodname">Cursor.putBefore()</code>
225                        </p>
226                      <p>
227                            Behaves the same as 
228                                
229                                <code class="methodname">Cursor.putAfter()</code>
230                            except that the new record is inserted immediately before 
231                            the cursor's current location in the database.
232                        </p>
233                    </li>
234                    <li>
235                      <p>
236                            
237                            <code class="methodname">Cursor.putKeyFirst()</code>
238                        </p>
239                      <p>
240                            If the key 
241                            
242                            already exists in the
243                            database, and the database is configured to use duplicates
244                            without sorting, then the new record is inserted as the first entry
245                            in the appropriate duplicates list.
246                        </p>
247                    </li>
248                    <li>
249                      <p>
250                            
251                            <code class="methodname">Cursor.putKeyLast()</code>
252                        </p>
253                      <p>
254                            Behaves identically to
255                                
256                                <code class="methodname">Cursor.putKeyFirst()</code>
257                            except that the new duplicate record is inserted as the last
258                            record in the duplicates list.
259                        </p>
260                    </li>
261                  </ul>
262                </div>
263              </li>
264            </ul>
265          </div>
266        </div>
267        <div class="sect3" lang="en" xml:lang="en">
268          <div class="titlepage">
269            <div>
270              <div>
271                <h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4>
272              </div>
273            </div>
274          </div>
275          <p>
276            Duplicates support can only be configured
277            at database creation time. You do this by specifying the appropriate
278            
279            <span>
280                <code class="classname">DatabaseConfig</code> method
281            </span>
282            before the database is opened for the first time.
283        </p>
284          <p>
285            The 
286                
287                <span>methods</span>
288            that you can use are:
289        </p>
290          <div class="itemizedlist">
291            <ul type="disc">
292              <li>
293                <p>
294                    
295                    <code class="methodname">DatabaseConfig.setUnsortedDuplicates()</code>
296                </p>
297                <p>
298                    The database supports non-sorted duplicate records.
299                </p>
300              </li>
301              <li>
302                <p>
303                    
304                    <code class="methodname">DatabaseConfig.setSortedDuplicates()</code>
305                </p>
306                <p>
307                    The database supports sorted duplicate records.
308                </p>
309              </li>
310            </ul>
311          </div>
312          <p>
313            The following code fragment illustrates how to configure a database
314            to support sorted duplicate records:
315        </p>
316          <a id="java_btree_dupsort"></a>
317          <pre class="programlisting">package db.GettingStarted;
318
319import java.io.FileNotFoundException;
320
321import com.sleepycat.db.Database;
322import com.sleepycat.db.DatabaseConfig;
323import com.sleepycat.db.DatabaseException;
324import com.sleepycat.db.DatabaseType;
325
326...
327
328Database myDb = null;
329
330try {
331    // Typical configuration settings
332    DatabaseConfig myDbConfig = new DatabaseConfig();
333    myDbConfig.setType(DatabaseType.BTREE);
334    myDbConfig.setAllowCreate(true);
335
336    // Configure for sorted duplicates
337    myDbConfig.setSortedDuplicates(true);
338
339   // Open the database
340   myDb = new Database("mydb.db", null, myDbConfig);
341} catch(DatabaseException dbe) {
342    System.err.println("MyDbs: " + dbe.toString());
343    System.exit(-1);
344} catch(FileNotFoundException fnfe) {
345    System.err.println("MyDbs: " + fnfe.toString());
346    System.exit(-1);
347} </pre>
348        </div>
349      </div>
350      <div class="sect2" lang="en" xml:lang="en">
351        <div class="titlepage">
352          <div>
353            <div>
354              <h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3>
355            </div>
356          </div>
357        </div>
358        <p>
359            By default, DB uses a lexicographical comparison function where
360            shorter records collate before longer records. For the majority of
361            cases, this comparison works well and you do not need to manage
362            it in any way. 
363         </p>
364        <p>
365            However, in some situations your application's performance can
366            benefit from setting a custom comparison routine. You can do this
367            either for database keys, or for the data if your
368            database supports sorted duplicate records.
369         </p>
370        <p>
371            Some of the reasons why you may want to provide a custom sorting
372            function are:
373         </p>
374        <div class="itemizedlist">
375          <ul type="disc">
376            <li>
377              <p>
378                    Your database is keyed using strings and you want to provide
379                    some sort of language-sensitive ordering to that data. Doing
380                    so can help increase the locality of reference that allows
381                    your database to perform at its best.
382                </p>
383            </li>
384            <li>
385              <p>
386                    You are using a little-endian system (such as x86) and you
387                    are using integers as your database's keys. Berkeley DB
388                    stores keys as byte strings and little-endian integers
389                    do not sort well when viewed as byte strings. There are
390                    several solutions to this problem, one being to provide a
391                    custom comparison function. See
392                    <a class="ulink" href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a> 
393                    for more information.
394                </p>
395            </li>
396            <li>
397              <p>
398                    You you do not want the entire key to participate in the
399                    comparison, for whatever reason.  In 
400                    this case, you may want to provide a custom comparison
401                    function so that only the relevant bytes are examined.
402                </p>
403            </li>
404          </ul>
405        </div>
406        <div class="sect3" lang="en" xml:lang="en">
407          <div class="titlepage">
408            <div>
409              <div>
410                <h4 class="title"><a id="creatingComparisonFunctions"></a>
411                
412                <span>Creating Java Comparators</span>
413            </h4>
414              </div>
415            </div>
416          </div>
417          <p>
418                You set a BTree's key
419                    
420                    <span>
421                        comparator
422                    </span>
423                using
424                    
425                    
426                    <span><code class="methodname">DatabaseConfig.setBtreeComparator()</code>.</span>
427                You can also set a BTree's duplicate data comparison function using
428                    
429                    
430                    <span><code class="methodname">DatabaseConfig.setDuplicateComparator()</code>.</span>
431                
432            </p>
433          <p>
434            
435            <span>
436                If
437            </span>
438            the database already exists when it is opened, the
439                    
440                    <span>
441                        comparator
442                    </span>
443            provided to these methods must be the same as
444            that historically used to create the database or corruption can
445            occur.
446         </p>
447          <p>
448      You override the default comparison function by providing a Java
449      <code class="classname">Comparator</code> class to the database.
450      The Java <code class="classname">Comparator</code> interface requires you to implement the
451      <code class="methodname">Comparator.compare()</code> method 
452      (see <a class="ulink" href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Comparator.html" target="_top">http://java.sun.com/j2se/1.4.2/docs/api/java/util/Comparator.html</a> for details). 
453      </p>
454          <p>
455        DB hands your <code class="methodname">Comparator.compare()</code> method
456        the <code class="literal">byte</code> arrays that you stored in the database. If
457        you know how your data is organized in the <code class="literal">byte</code>
458        array, then you can write a comparison routine that directly examines
459        the contents of the arrays.  Otherwise, you have to reconstruct your
460        original objects, and then perform the comparison.
461      </p>
462          <p>
463            For example, suppose you want to perform unicode lexical comparisons
464            instead of UTF-8 byte-by-byte comparisons. Then you could provide a
465            comparator that uses <code class="methodname">String.compareTo()</code>,
466            which performs a Unicode comparison of two strings (note that for
467            single-byte roman characters, Unicode comparison and UTF-8
468            byte-by-byte comparisons are identical ��� this is something you
469            would only want to do if you were using multibyte unicode characters
470            with DB). In this case, your comparator would look like the
471            following:
472      </p>
473          <a id="java_btree1"></a>
474          <pre class="programlisting">package db.GettingStarted;
475
476import java.util.Comparator;
477
478public class MyDataComparator implements Comparator {
479
480    public MyDataComparator() {}
481
482    public int compare(Object d1, Object d2) {
483
484        byte[] b1 = (byte[])d1;
485        byte[] b2 = (byte[])d2;
486
487        String s1 = new String(b1);
488        String s2 = new String(b2);
489        return s1.compareTo(s2);
490    }
491} </pre>
492          <p>
493        To use this comparator:
494    </p>
495          <a id="java_btree2"></a>
496          <pre class="programlisting">package db.GettingStarted;
497
498import java.io.FileNotFoundException;
499import java.util.Comparator;
500import com.sleepycat.db.Database;
501import com.sleepycat.db.DatabaseConfig;
502import com.sleepycat.db.DatabaseException;
503
504...
505
506Database myDatabase = null;
507try {
508    // Get the database configuration object
509    DatabaseConfig myDbConfig = new DatabaseConfig();
510    myDbConfig.setAllowCreate(true);
511
512    // Set the duplicate comparator class
513    MyDataComparator mdc = new MyDataComparator();
514    myDbConfig.setDuplicateComparator(mdc);
515
516    // Open the database that you will use to store your data
517    myDbConfig.setSortedDuplicates(true);
518    myDatabase = new Database("myDb", null, myDbConfig);
519} catch (DatabaseException dbe) {
520    // Exception handling goes here
521} catch (FileNotFoundException fnfe) {
522    // Exception handling goes here
523}</pre>
524        </div>
525      </div>
526    </div>
527    <div class="navfooter">
528      <hr />
529      <table width="100%" summary="Navigation footer">
530        <tr>
531          <td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td>
532          <td width="20%" align="center">
533            <a accesskey="u" href="dbconfig.html">Up</a>
534          </td>
535          <td width="40%" align="right">��</td>
536        </tr>
537        <tr>
538          <td width="40%" align="left" valign="top">Selecting the Cache Size��</td>
539          <td width="20%" align="center">
540            <a accesskey="h" href="index.html">Home</a>
541          </td>
542          <td width="40%" align="right" valign="top">��</td>
543        </tr>
544      </table>
545    </div>
546  </body>
547</html>
548