1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>BTree Configuration</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" /> 9 <link rel="home" href="index.html" title="Getting Started with Berkeley DB" /> 10 <link rel="up" href="dbconfig.html" title="Chapter��6.��Database Configuration" /> 11 <link rel="previous" href="cachesize.html" title="Selecting the Cache Size" /> 12 </head> 13 <body> 14 <div class="navheader"> 15 <table width="100%" summary="Navigation header"> 16 <tr> 17 <th colspan="3" align="center">BTree Configuration</th> 18 </tr> 19 <tr> 20 <td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td> 21 <th width="60%" align="center">Chapter��6.��Database Configuration</th> 22 <td width="20%" align="right">��</td> 23 </tr> 24 </table> 25 <hr /> 26 </div> 27 <div class="sect1" lang="en" xml:lang="en"> 28 <div class="titlepage"> 29 <div> 30 <div> 31 <h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2> 32 </div> 33 </div> 34 <div></div> 35 </div> 36 <p> 37 In going through the previous chapters in this book, you may notice that 38 we touch on some topics that are specific to BTree, but we do not cover 39 those topics in any real detail. In this section, we will discuss 40 configuration issues that are unique to BTree. 41 </p> 42 <p> 43 Specifically, in this section we describe: 44 </p> 45 <div class="itemizedlist"> 46 <ul type="disc"> 47 <li> 48 <p> 49 Allowing duplicate records. 50 </p> 51 </li> 52 <li> 53 <p> 54 Setting comparator callbacks. 55 </p> 56 </li> 57 </ul> 58 </div> 59 <div class="sect2" lang="en" xml:lang="en"> 60 <div class="titlepage"> 61 <div> 62 <div> 63 <h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3> 64 </div> 65 </div> 66 <div></div> 67 </div> 68 <p> 69 BTree databases can contain duplicate records. One record is 70 considered to be a duplicate of another when both records use keys 71 that compare as equal to one another. 72 </p> 73 <p> 74 By default, keys are compared using a lexicographical comparison, 75 with shorter keys collating higher than longer keys. 76 You can override this default using the 77 <tt class="methodname">DB->set_bt_compare()</tt> 78 79 80 method. See the next section for details. 81 </p> 82 <p> 83 By default, DB databases do not allow duplicate records. As a 84 result, any attempt to write a record that uses a key equal to a 85 previously existing record results in the previously existing record 86 being overwritten by the new record. 87 </p> 88 <p> 89 Allowing duplicate records is useful if you have a database that 90 contains records keyed by a commonly occurring piece of information. 91 It is frequently necessary to allow duplicate records for secondary 92 databases. 93 </p> 94 <p> 95 For example, suppose your primary database contained records related 96 to automobiles. You might in this case want to be able to find all 97 the automobiles in the database that are of a particular color, so 98 you would index on the color of the automobile. However, for any 99 given color there will probably be multiple automobiles. Since the 100 index is the secondary key, this means that multiple secondary 101 database records will share the same key, and so the secondary 102 database must support duplicate records. 103 </p> 104 <div class="sect3" lang="en" xml:lang="en"> 105 <div class="titlepage"> 106 <div> 107 <div> 108 <h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4> 109 </div> 110 </div> 111 <div></div> 112 </div> 113 <p> 114 Duplicate records can be stored in sorted or unsorted order. 115 You can cause DB to automatically sort your duplicate 116 records by 117 <span> 118 specifying the <tt class="literal">DB_DUPSORT</tt> flag at 119 database creation time. 120 </span> 121 122 </p> 123 <p> 124 If sorted duplicates are supported, then the 125 <span> 126 sorting function specified on 127 <tt class="methodname">DB->set_dup_compare()</tt> 128 129 </span> 130 131 is used to determine the location of the duplicate record in its 132 duplicate set. If no such function is provided, then the default 133 lexicographical comparison is used. 134 </p> 135 </div> 136 <div class="sect3" lang="en" xml:lang="en"> 137 <div class="titlepage"> 138 <div> 139 <div> 140 <h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4> 141 </div> 142 </div> 143 <div></div> 144 </div> 145 <p> 146 For performance reasons, BTrees should always contain sorted 147 records. (BTrees containing unsorted entries must potentially 148 spend a great deal more time locating an entry than does a BTree 149 that contains sorted entries). That said, DB provides support 150 for suppressing automatic sorting of duplicate records because it may be that 151 your application is inserting records that are already in a 152 sorted order. 153 </p> 154 <p> 155 That is, if the database is configured to support unsorted 156 duplicates, then the assumption is that your application 157 will manually perform the sorting. In this event, 158 expect to pay a significant performance penalty. Any time you 159 place records into the database in a sort order not know to 160 DB, you will pay a performance penalty 161 </p> 162 <p> 163 That said, this is how DB behaves when inserting records 164 into a database that supports non-sorted duplicates: 165 </p> 166 <div class="itemizedlist"> 167 <ul type="disc"> 168 <li> 169 <p> 170 If your application simply adds a duplicate record using 171 <span><tt class="methodname">DB->put()</tt>,</span> 172 173 174 then the record is inserted at the end of its sorted duplicate set. 175 </p> 176 </li> 177 <li> 178 <p> 179 If a cursor is used to put the duplicate record to the database, 180 then the new record is placed in the duplicate set according to the 181 flags that are provided on the 182 <tt class="methodname">DBC->put()</tt> 183 184 method. The relevant flags are: 185 </p> 186 <div class="itemizedlist"> 187 <ul type="circle"> 188 <li> 189 <p> 190 <tt class="literal">DB_AFTER</tt> 191 192 </p> 193 <p> 194 The data 195 <span> 196 provided on the call to 197 <tt class="methodname">DBC->put()</tt> 198 199 </span> 200 is placed into the database 201 as a duplicate record. The key used for this operation is 202 the key used for the record to which the cursor currently 203 refers. Any key provided on the call 204 205 <span> 206 to 207 <tt class="methodname">DBC->put()</tt> 208 209 </span> 210 211 is therefore ignored. 212 </p> 213 <p> 214 The duplicate record is inserted into the database 215 immediately after the cursor's current position in the 216 database. 217 </p> 218 <p> 219 This flag is ignored if sorted duplicates are supported for 220 the database. 221 </p> 222 </li> 223 <li> 224 <p> 225 <tt class="literal">DB_BEFORE</tt> 226 227 </p> 228 <p> 229 Behaves the same as 230 <tt class="literal">DB_AFTER</tt> 231 232 except that the new record is inserted immediately before 233 the cursor's current location in the database. 234 </p> 235 </li> 236 <li> 237 <p> 238 <tt class="literal">DB_KEYFIRST</tt> 239 240 </p> 241 <p> 242 If the key 243 <span> 244 provided on the call to 245 <tt class="methodname">DBC->put()</tt> 246 247 </span> 248 already exists in the 249 database, and the database is configured to use duplicates 250 without sorting, then the new record is inserted as the first entry 251 in the appropriate duplicates list. 252 </p> 253 </li> 254 <li> 255 <p> 256 <tt class="literal">DB_KEYLAST</tt> 257 258 </p> 259 <p> 260 Behaves identically to 261 <tt class="literal">DB_KEYFIRST</tt> 262 263 except that the new duplicate record is inserted as the last 264 record in the duplicates list. 265 </p> 266 </li> 267 </ul> 268 </div> 269 </li> 270 </ul> 271 </div> 272 </div> 273 <div class="sect3" lang="en" xml:lang="en"> 274 <div class="titlepage"> 275 <div> 276 <div> 277 <h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4> 278 </div> 279 </div> 280 <div></div> 281 </div> 282 <p> 283 Duplicates support can only be configured 284 at database creation time. You do this by specifying the appropriate 285 <span> 286 flags to 287 <tt class="methodname">DB->set_flags()</tt> 288 289 </span> 290 291 before the database is opened for the first time. 292 </p> 293 <p> 294 The 295 <span>flags</span> 296 297 that you can use are: 298 </p> 299 <div class="itemizedlist"> 300 <ul type="disc"> 301 <li> 302 <p> 303 <tt class="literal">DB_DUP</tt> 304 305 </p> 306 <p> 307 The database supports non-sorted duplicate records. 308 </p> 309 </li> 310 <li> 311 <p> 312 <tt class="literal">DB_DUPSORT</tt> 313 314 </p> 315 <p> 316 The database supports sorted duplicate records. 317 </p> 318 </li> 319 </ul> 320 </div> 321 <p> 322 The following code fragment illustrates how to configure a database 323 to support sorted duplicate records: 324 </p> 325 <a id="c_btree_dupsort"></a> 326 <pre class="programlisting">#include <db.h> 327... 328 329DB *dbp; 330FILE *error_file_pointer; 331int ret; 332char *program_name = "my_prog"; 333char *file_name = "mydb.db"; 334 335/* Variable assignments omitted for brevity */ 336 337/* Initialize the DB handle */ 338ret = db_create(&dbp, NULL, 0); 339if (ret != 0) { 340 fprintf(error_file_pointer, "%s: %s\n", program_name, 341 db_strerror(ret)); 342 return(ret); 343} 344 345/* Set up error handling for this database */ 346dbp->set_errfile(dbp, error_file_pointer); 347dbp->set_errpfx(dbp, program_name); 348 349/* 350 * Configure the database for sorted duplicates 351 */ 352ret = dbp->set_flags(dbp, DB_DUPSORT); 353if (ret != 0) { 354 dbp->err(dbp, ret, "Attempt to set DUPSORT flag failed."); 355 dbp->close(dbp, 0); 356 return(ret); 357} 358 359/* Now open the database */ 360ret = dbp->open(dbp, /* Pointer to the database */ 361 NULL, /* Txn pointer */ 362 file_name, /* File name */ 363 NULL, /* Logical db name (unneeded) */ 364 DB_BTREE, /* Database type (using btree) */ 365 DB_CREATE, /* Open flags */ 366 0); /* File mode. Using defaults */ 367if (ret != 0) { 368 dbp->err(dbp, ret, "Database '%s' open failed.", file_name); 369 dbp->close(dbp, 0); 370 return(ret); 371} </pre> 372 </div> 373 </div> 374 <div class="sect2" lang="en" xml:lang="en"> 375 <div class="titlepage"> 376 <div> 377 <div> 378 <h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3> 379 </div> 380 </div> 381 <div></div> 382 </div> 383 <p> 384 By default, DB uses a lexicographical comparison function where 385 shorter records collate before longer records. For the majority of 386 cases, this comparison works well and you do not need to manage 387 it in any way. 388 </p> 389 <p> 390 However, in some situations your application's performance can 391 benefit from setting a custom comparison routine. You can do this 392 either for database keys, or for the data if your 393 database supports sorted duplicate records. 394 </p> 395 <p> 396 Some of the reasons why you may want to provide a custom sorting 397 function are: 398 </p> 399 <div class="itemizedlist"> 400 <ul type="disc"> 401 <li> 402 <p> 403 Your database is keyed using strings and you want to provide 404 some sort of language-sensitive ordering to that data. Doing 405 so can help increase the locality of reference that allows 406 your database to perform at its best. 407 </p> 408 </li> 409 <li> 410 <p> 411 You are using a little-endian system (such as x86) and you 412 are using integers as your database's keys. Berkeley DB 413 stores keys as byte strings and little-endian integers 414 do not sort well when viewed as byte strings. There are 415 several solutions to this problem, one being to provide a 416 custom comparison function. See 417 <a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a> 418 for more information. 419 </p> 420 </li> 421 <li> 422 <p> 423 You you do not want the entire key to participate in the 424 comparison, for whatever reason. In 425 this case, you may want to provide a custom comparison 426 function so that only the relevant bytes are examined. 427 </p> 428 </li> 429 </ul> 430 </div> 431 <div class="sect3" lang="en" xml:lang="en"> 432 <div class="titlepage"> 433 <div> 434 <div> 435 <h4 class="title"><a id="creatingComparisonFunctions"></a> 436 <span>Creating Comparison Functions</span> 437 438 </h4> 439 </div> 440 </div> 441 <div></div> 442 </div> 443 <p> 444 You set a BTree's key 445 <span> 446 comparison function 447 </span> 448 449 using 450 <span><tt class="methodname">DB->set_bt_compare()</tt>.</span> 451 452 453 You can also set a BTree's duplicate data comparison function using 454 <span><tt class="methodname">DB->set_dup_compare()</tt>.</span> 455 456 457 458 </p> 459 <p> 460 <span> 461 You cannot use these methods after the database has been opened. 462 Also, if 463 </span> 464 465 the database already exists when it is opened, the 466 <span> 467 function 468 </span> 469 470 provided to these methods must be the same as 471 that historically used to create the database or corruption can 472 occur. 473 </p> 474 <p> 475 The value that you provide to the <tt class="methodname">set_bt_compare()</tt> method 476 is a pointer to a function that has the following signature: 477 </p> 478 <pre class="programlisting">int (*function)(DB *db, const DBT *key1, const DBT *key2)</pre> 479 <p> 480 This function must return an integer value less than, equal to, 481 or greater than 0. If key1 is considered to be greater than 482 key2, then the function must return a value that is greater than 483 0. If the two are equal, then the function must return 0, and if 484 the first key is less than the second then the function must return 485 a negative value. 486 </p> 487 <p> 488 The function that you provide to <tt class="methodname">set_dup_compare()</tt> 489 works in exactly the same way, except that the 490 <tt class="literal">DBT</tt> 491 492 parameters hold record data items instead of keys. 493 </p> 494 <p> 495 For example, an example routine that is used to sort integer 496 keys in the database is: 497 <span> 498 499 </span> 500 </p> 501 <a id="c_btree1"></a> 502 <pre class="programlisting">int 503compare_int(DB *dbp, const DBT *a, const DBT *b) 504{ 505 int ai, bi; 506 507 /* 508 * Returns: 509 * < 0 if a < b 510 * = 0 if a = b 511 * > 0 if a > b 512 */ 513 memcpy(&ai, a->data, sizeof(int)); 514 memcpy(&bi, b->data, sizeof(int)); 515 return (ai - bi); 516} </pre> 517 <p> 518 Note that the data must first be copied into memory that is 519 appropriately aligned, as Berkeley DB does not guarantee any kind of 520 alignment of the underlying data, including for comparison routines. 521 When writing comparison routines, remember that databases created on 522 machines of different architectures may have different integer byte 523 orders, for which your code may need to compensate. 524 </p> 525 <p> 526 To cause DB to use this comparison function: 527 </p> 528 <a id="c_btree2"></a> 529 <pre class="programlisting">#include <db.h> 530#include <string.h> 531 532... 533 534DB *dbp; 535int ret; 536 537/* Create a database */ 538ret = db_create(&dbp, NULL, 0); 539if (ret != 0) { 540 fprintf(stderr, "%s: %s\n", "my_program", 541 db_strerror(ret)); 542 return(-1); 543} 544 545/* Set up the btree comparison function for this database */ 546dbp->set_bt_compare(dbp, compare_int); 547 548/* Database open call follows sometime after this. */ </pre> 549 </div> 550 </div> 551 </div> 552 <div class="navfooter"> 553 <hr /> 554 <table width="100%" summary="Navigation footer"> 555 <tr> 556 <td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td> 557 <td width="20%" align="center"> 558 <a accesskey="u" href="dbconfig.html">Up</a> 559 </td> 560 <td width="40%" align="right">��</td> 561 </tr> 562 <tr> 563 <td width="40%" align="left" valign="top">Selecting the Cache Size��</td> 564 <td width="20%" align="center"> 565 <a accesskey="h" href="index.html">Home</a> 566 </td> 567 <td width="40%" align="right" valign="top">��</td> 568 </tr> 569 </table> 570 </div> 571 </body> 572</html> 573