1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>BTree Configuration</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" /> 9 <link rel="home" href="index.html" title="Getting Started with Berkeley DB" /> 10 <link rel="up" href="dbconfig.html" title="Chapter��6.��Database Configuration" /> 11 <link rel="previous" href="cachesize.html" title="Selecting the Cache Size" /> 12 </head> 13 <body> 14 <div class="navheader"> 15 <table width="100%" summary="Navigation header"> 16 <tr> 17 <th colspan="3" align="center">BTree Configuration</th> 18 </tr> 19 <tr> 20 <td width="20%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td> 21 <th width="60%" align="center">Chapter��6.��Database Configuration</th> 22 <td width="20%" align="right">��</td> 23 </tr> 24 </table> 25 <hr /> 26 </div> 27 <div class="sect1" lang="en" xml:lang="en"> 28 <div class="titlepage"> 29 <div> 30 <div> 31 <h2 class="title" style="clear: both"><a id="btree"></a>BTree Configuration</h2> 32 </div> 33 </div> 34 <div></div> 35 </div> 36 <p> 37 In going through the previous chapters in this book, you may notice that 38 we touch on some topics that are specific to BTree, but we do not cover 39 those topics in any real detail. In this section, we will discuss 40 configuration issues that are unique to BTree. 41 </p> 42 <p> 43 Specifically, in this section we describe: 44 </p> 45 <div class="itemizedlist"> 46 <ul type="disc"> 47 <li> 48 <p> 49 Allowing duplicate records. 50 </p> 51 </li> 52 <li> 53 <p> 54 Setting comparator callbacks. 55 </p> 56 </li> 57 </ul> 58 </div> 59 <div class="sect2" lang="en" xml:lang="en"> 60 <div class="titlepage"> 61 <div> 62 <div> 63 <h3 class="title"><a id="duplicateRecords"></a>Allowing Duplicate Records</h3> 64 </div> 65 </div> 66 <div></div> 67 </div> 68 <p> 69 BTree databases can contain duplicate records. One record is 70 considered to be a duplicate of another when both records use keys 71 that compare as equal to one another. 72 </p> 73 <p> 74 By default, keys are compared using a lexicographical comparison, 75 with shorter keys collating higher than longer keys. 76 You can override this default using the 77 78 <tt class="methodname">Db::set_bt_compare()</tt> 79 80 method. See the next section for details. 81 </p> 82 <p> 83 By default, DB databases do not allow duplicate records. As a 84 result, any attempt to write a record that uses a key equal to a 85 previously existing record results in the previously existing record 86 being overwritten by the new record. 87 </p> 88 <p> 89 Allowing duplicate records is useful if you have a database that 90 contains records keyed by a commonly occurring piece of information. 91 It is frequently necessary to allow duplicate records for secondary 92 databases. 93 </p> 94 <p> 95 For example, suppose your primary database contained records related 96 to automobiles. You might in this case want to be able to find all 97 the automobiles in the database that are of a particular color, so 98 you would index on the color of the automobile. However, for any 99 given color there will probably be multiple automobiles. Since the 100 index is the secondary key, this means that multiple secondary 101 database records will share the same key, and so the secondary 102 database must support duplicate records. 103 </p> 104 <div class="sect3" lang="en" xml:lang="en"> 105 <div class="titlepage"> 106 <div> 107 <div> 108 <h4 class="title"><a id="sorteddups"></a>Sorted Duplicates</h4> 109 </div> 110 </div> 111 <div></div> 112 </div> 113 <p> 114 Duplicate records can be stored in sorted or unsorted order. 115 You can cause DB to automatically sort your duplicate 116 records by 117 <span> 118 specifying the <tt class="literal">DB_DUPSORT</tt> flag at 119 database creation time. 120 </span> 121 122 </p> 123 <p> 124 If sorted duplicates are supported, then the 125 <span> 126 sorting function specified on 127 128 <tt class="methodname">Db::set_dup_compare()</tt> 129 </span> 130 131 is used to determine the location of the duplicate record in its 132 duplicate set. If no such function is provided, then the default 133 lexicographical comparison is used. 134 </p> 135 </div> 136 <div class="sect3" lang="en" xml:lang="en"> 137 <div class="titlepage"> 138 <div> 139 <div> 140 <h4 class="title"><a id="nosorteddups"></a>Unsorted Duplicates</h4> 141 </div> 142 </div> 143 <div></div> 144 </div> 145 <p> 146 For performance reasons, BTrees should always contain sorted 147 records. (BTrees containing unsorted entries must potentially 148 spend a great deal more time locating an entry than does a BTree 149 that contains sorted entries). That said, DB provides support 150 for suppressing automatic sorting of duplicate records because it may be that 151 your application is inserting records that are already in a 152 sorted order. 153 </p> 154 <p> 155 That is, if the database is configured to support unsorted 156 duplicates, then the assumption is that your application 157 will manually perform the sorting. In this event, 158 expect to pay a significant performance penalty. Any time you 159 place records into the database in a sort order not know to 160 DB, you will pay a performance penalty 161 </p> 162 <p> 163 That said, this is how DB behaves when inserting records 164 into a database that supports non-sorted duplicates: 165 </p> 166 <div class="itemizedlist"> 167 <ul type="disc"> 168 <li> 169 <p> 170 If your application simply adds a duplicate record using 171 172 <span><tt class="methodname">Db::put()</tt>,</span> 173 174 then the record is inserted at the end of its sorted duplicate set. 175 </p> 176 </li> 177 <li> 178 <p> 179 If a cursor is used to put the duplicate record to the database, 180 then the new record is placed in the duplicate set according to the 181 flags that are provided on the 182 183 <tt class="methodname">Dbc::put()</tt> 184 method. The relevant flags are: 185 </p> 186 <div class="itemizedlist"> 187 <ul type="circle"> 188 <li> 189 <p> 190 <tt class="literal">DB_AFTER</tt> 191 192 </p> 193 <p> 194 The data 195 <span> 196 provided on the call to 197 198 <tt class="methodname">Dbc::put()</tt> 199 </span> 200 is placed into the database 201 as a duplicate record. The key used for this operation is 202 the key used for the record to which the cursor currently 203 refers. Any key provided on the call 204 205 <span> 206 to 207 208 <tt class="methodname">Dbc::put()</tt> 209 </span> 210 211 is therefore ignored. 212 </p> 213 <p> 214 The duplicate record is inserted into the database 215 immediately after the cursor's current position in the 216 database. 217 </p> 218 <p> 219 This flag is ignored if sorted duplicates are supported for 220 the database. 221 </p> 222 </li> 223 <li> 224 <p> 225 <tt class="literal">DB_BEFORE</tt> 226 227 </p> 228 <p> 229 Behaves the same as 230 <tt class="literal">DB_AFTER</tt> 231 232 except that the new record is inserted immediately before 233 the cursor's current location in the database. 234 </p> 235 </li> 236 <li> 237 <p> 238 <tt class="literal">DB_KEYFIRST</tt> 239 240 </p> 241 <p> 242 If the key 243 <span> 244 provided on the call to 245 246 <tt class="methodname">Dbc::put()</tt> 247 </span> 248 already exists in the 249 database, and the database is configured to use duplicates 250 without sorting, then the new record is inserted as the first entry 251 in the appropriate duplicates list. 252 </p> 253 </li> 254 <li> 255 <p> 256 <tt class="literal">DB_KEYLAST</tt> 257 258 </p> 259 <p> 260 Behaves identically to 261 <tt class="literal">DB_KEYFIRST</tt> 262 263 except that the new duplicate record is inserted as the last 264 record in the duplicates list. 265 </p> 266 </li> 267 </ul> 268 </div> 269 </li> 270 </ul> 271 </div> 272 </div> 273 <div class="sect3" lang="en" xml:lang="en"> 274 <div class="titlepage"> 275 <div> 276 <div> 277 <h4 class="title"><a id="specifyingDups"></a>Configuring a Database to Support Duplicates</h4> 278 </div> 279 </div> 280 <div></div> 281 </div> 282 <p> 283 Duplicates support can only be configured 284 at database creation time. You do this by specifying the appropriate 285 <span> 286 flags to 287 288 <tt class="methodname">Db::set_flags()</tt> 289 </span> 290 291 before the database is opened for the first time. 292 </p> 293 <p> 294 The 295 <span>flags</span> 296 297 that you can use are: 298 </p> 299 <div class="itemizedlist"> 300 <ul type="disc"> 301 <li> 302 <p> 303 <tt class="literal">DB_DUP</tt> 304 305 </p> 306 <p> 307 The database supports non-sorted duplicate records. 308 </p> 309 </li> 310 <li> 311 <p> 312 <tt class="literal">DB_DUPSORT</tt> 313 314 </p> 315 <p> 316 The database supports sorted duplicate records. 317 </p> 318 </li> 319 </ul> 320 </div> 321 <p> 322 The following code fragment illustrates how to configure a database 323 to support sorted duplicate records: 324 </p> 325 <a id="cxx_btree_dupsort"></a> 326 <pre class="programlisting">#include <db_cxx.h> 327... 328 329Db db(NULL, 0); 330const char *file_name = "myd.db"; 331 332try { 333 // Configure the database for sorted duplicates 334 db.set_flags(DB_DUPSORT); 335 336 // Now open the database 337 db.open(NULL, // Txn pointer 338 file_name, // File name 339 NULL, // Logical db name (unneeded) 340 DB_BTREE, // Database type (using btree) 341 DB_CREATE, // Open flags 342 0); // File mode. Using defaults 343} catch(DbException &e) { 344 db.err(e.get_errno(), "Database '%s' open failed.", file_name); 345} catch(std::exception &e) { 346 db.errx("Error opening database: %s : %s\n", file_name, e.what()); 347} 348 349... 350 351try { 352 db.close(0); 353} catch(DbException &e) { 354 db.err(e.get_errno(), "Database '%s' close failed.", file_name); 355} catch(std::exception &e) { 356 db.errx("Error closing database: %s : %s\n", file_name, e.what()); 357} 358 359</pre> 360 </div> 361 </div> 362 <div class="sect2" lang="en" xml:lang="en"> 363 <div class="titlepage"> 364 <div> 365 <div> 366 <h3 class="title"><a id="comparators"></a>Setting Comparison Functions</h3> 367 </div> 368 </div> 369 <div></div> 370 </div> 371 <p> 372 By default, DB uses a lexicographical comparison function where 373 shorter records collate before longer records. For the majority of 374 cases, this comparison works well and you do not need to manage 375 it in any way. 376 </p> 377 <p> 378 However, in some situations your application's performance can 379 benefit from setting a custom comparison routine. You can do this 380 either for database keys, or for the data if your 381 database supports sorted duplicate records. 382 </p> 383 <p> 384 Some of the reasons why you may want to provide a custom sorting 385 function are: 386 </p> 387 <div class="itemizedlist"> 388 <ul type="disc"> 389 <li> 390 <p> 391 Your database is keyed using strings and you want to provide 392 some sort of language-sensitive ordering to that data. Doing 393 so can help increase the locality of reference that allows 394 your database to perform at its best. 395 </p> 396 </li> 397 <li> 398 <p> 399 You are using a little-endian system (such as x86) and you 400 are using integers as your database's keys. Berkeley DB 401 stores keys as byte strings and little-endian integers 402 do not sort well when viewed as byte strings. There are 403 several solutions to this problem, one being to provide a 404 custom comparison function. See 405 <a href="http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html" target="_top">http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_misc/faq.html</a> 406 for more information. 407 </p> 408 </li> 409 <li> 410 <p> 411 You you do not want the entire key to participate in the 412 comparison, for whatever reason. In 413 this case, you may want to provide a custom comparison 414 function so that only the relevant bytes are examined. 415 </p> 416 </li> 417 </ul> 418 </div> 419 <div class="sect3" lang="en" xml:lang="en"> 420 <div class="titlepage"> 421 <div> 422 <div> 423 <h4 class="title"><a id="creatingComparisonFunctions"></a> 424 <span>Creating Comparison Functions</span> 425 426 </h4> 427 </div> 428 </div> 429 <div></div> 430 </div> 431 <p> 432 You set a BTree's key 433 <span> 434 comparison function 435 </span> 436 437 using 438 439 <span><tt class="methodname">Db::set_bt_compare()</tt>.</span> 440 441 You can also set a BTree's duplicate data comparison function using 442 443 <span><tt class="methodname">Db::set_dup_compare()</tt>.</span> 444 445 446 </p> 447 <p> 448 <span> 449 You cannot use these methods after the database has been opened. 450 Also, if 451 </span> 452 453 the database already exists when it is opened, the 454 <span> 455 function 456 </span> 457 458 provided to these methods must be the same as 459 that historically used to create the database or corruption can 460 occur. 461 </p> 462 <p> 463 The value that you provide to the <tt class="methodname">set_bt_compare()</tt> method 464 is a pointer to a function that has the following signature: 465 </p> 466 <pre class="programlisting">int (*function)(Db *db, const Dbt *key1, const Dbt *key2)</pre> 467 <p> 468 This function must return an integer value less than, equal to, 469 or greater than 0. If key1 is considered to be greater than 470 key2, then the function must return a value that is greater than 471 0. If the two are equal, then the function must return 0, and if 472 the first key is less than the second then the function must return 473 a negative value. 474 </p> 475 <p> 476 The function that you provide to <tt class="methodname">set_dup_compare()</tt> 477 works in exactly the same way, except that the 478 479 <tt class="literal">Dbt</tt> 480 parameters hold record data items instead of keys. 481 </p> 482 <p> 483 For example, an example routine that is used to sort integer 484 keys in the database is: 485 486 </p> 487 <a id="cxx_btree1"></a> 488 <pre class="programlisting">int 489compare_int(Db *dbp, const Dbt *a, const Dbt *b) 490{ 491 int ai, bi; 492 493 // Returns: 494 // < 0 if a < b 495 // = 0 if a = b 496 // > 0 if a > b 497 memcpy(&ai, a->get_data(), sizeof(int)); 498 memcpy(&bi, b->get_data(), sizeof(int)); 499 return (ai - bi); 500} </pre> 501 <p> 502 Note that the data must first be copied into memory that is 503 appropriately aligned, as Berkeley DB does not guarantee any kind of 504 alignment of the underlying data, including for comparison routines. 505 When writing comparison routines, remember that databases created on 506 machines of different architectures may have different integer byte 507 orders, for which your code may need to compensate. 508 </p> 509 <p> 510 To cause DB to use this comparison function: 511 </p> 512 <a id="cxx_btree2"></a> 513 <pre class="programlisting">#include <db_cxx.h> 514#include <string.h> 515 516... 517 518Db db(NULL, 0); 519 520// Set up the btree comparison function for this database 521db.set_bt_compare(compare_int); 522 523// Database open call follows sometime after this.</pre> 524 </div> 525 </div> 526 </div> 527 <div class="navfooter"> 528 <hr /> 529 <table width="100%" summary="Navigation footer"> 530 <tr> 531 <td width="40%" align="left"><a accesskey="p" href="cachesize.html">Prev</a>��</td> 532 <td width="20%" align="center"> 533 <a accesskey="u" href="dbconfig.html">Up</a> 534 </td> 535 <td width="40%" align="right">��</td> 536 </tr> 537 <tr> 538 <td width="40%" align="left" valign="top">Selecting the Cache Size��</td> 539 <td width="20%" align="center"> 540 <a accesskey="h" href="index.html">Home</a> 541 </td> 542 <td width="40%" align="right" valign="top">��</td> 543 </tr> 544 </table> 545 </div> 546 </body> 547</html> 548