1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <title>Partitioning databases</title> 7 <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> 8 <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> 9 <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> 10 <link rel="up" href="am.html" title="Chapter 3. Access Method Operations" /> 11 <link rel="prev" href="am_opensub.html" title="Opening multiple databases in a single file" /> 12 <link rel="next" href="am_get.html" title="Retrieving records" /> 13 </head> 14 <body> 15 <div class="navheader"> 16 <table width="100%" summary="Navigation header"> 17 <tr> 18 <th colspan="3" align="center">Partitioning databases</th> 19 </tr> 20 <tr> 21 <td width="20%" align="left"><a accesskey="p" href="am_opensub.html">Prev</a> </td> 22 <th width="60%" align="center">Chapter 3. 23 Access Method Operations 24 </th> 25 <td width="20%" align="right"> <a accesskey="n" href="am_get.html">Next</a></td> 26 </tr> 27 </table> 28 <hr /> 29 </div> 30 <div class="sect1" lang="en" xml:lang="en"> 31 <div class="titlepage"> 32 <div> 33 <div> 34 <h2 class="title" style="clear: both"><a id="am_partition"></a>Partitioning databases</h2> 35 </div> 36 </div> 37 </div> 38 <div class="toc"> 39 <dl> 40 <dt> 41 <span class="sect2"> 42 <a href="am_partition.html#am_partition_keys">Specifying partition keys</a> 43 </span> 44 </dt> 45 <dt> 46 <span class="sect2"> 47 <a href="am_partition.html#am_partition_function">Partitioning callback</a> 48 </span> 49 </dt> 50 <dt> 51 <span class="sect2"> 52 <a href="am_partition.html#partition_file_placement">Placing partition files</a> 53 </span> 54 </dt> 55 </dl> 56 </div> 57 <p> 58 You can improve concurrency on your database reads and writes by 59 splitting access to a single database into multiple databases. This 60 helps to avoid contention for internal database pages, as well as 61 allowing you to spread your databases across multiple disks, 62 which can help to improve disk I/O. 63</p> 64 <p> 65 While you can manually do this by creating and using more than one 66 database for your data, DB is capable of partitioning your 67 database for you. When you use DB's built-in database partitioning 68 feature, your access to your data is performed in exactly the same way 69 as if you were only using one database; all the work of knowing which 70 database to use to access a particular record is handled for you under 71 the hood. 72</p> 73 <p> 74 Only the BTree and Hash access methods are supported for partitioned 75 databases. 76</p> 77 <p> 78 You indicate that you want your database to be partitioned by calling 79 <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> before opening your database the first time. You can 80 indicate the directory in which each partition is contained using the 81 <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> method. 82</p> 83 <p> 84 Once you have partitioned a database, you cannot change your 85 partitioning scheme. 86</p> 87 <p> 88 There are two ways to indicate what key/data pairs should go on which 89 partition. The first is by specifying an array of <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s that indicate 90 the minimum key value for a given partition. The second is by providing 91 a callback that returns the number of the partition on which a specified 92 key is placed. 93</p> 94 <div class="sect2" lang="en" xml:lang="en"> 95 <div class="titlepage"> 96 <div> 97 <div> 98 <h3 class="title"><a id="am_partition_keys"></a>Specifying partition keys</h3> 99 </div> 100 </div> 101 </div> 102 <p> 103 For simple cases, you can partition your database by providing 104 an array of <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s, each element of which provides the minimum 105 key value to be placed on a partition. There must be one fewer 106 elements in this array than you have partitions. The first 107 element of the array indicates the minimum key value for the 108 second partition in your database. Key values that are less 109 than the first key value provided in this array are placed on 110 the first partition (partition 0). 111 </p> 112 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> 113 <h3 class="title">Note</h3> 114 <p> 115 You can use partition keys only if you are using the Btree 116 access method. 117 </p> 118 </div> 119 <p> 120 For example, suppose you had a database of fruit, and you want 121 three partitions for your database. Then you need a <a href="../api_reference/C/dbt.html" class="olink">DBT</a> array 122 of size two. The first element in this array indicates the 123 minimum keys that should be placed on partition 1. The second 124 element in this array indicates the minimum key value placed on 125 partition 2. Keys that compare less than the first <a href="../api_reference/C/dbt.html" class="olink">DBT</a> in the 126 array are placed on partition 0. 127 </p> 128 <p> 129 All comparisons are performed according to the lexicographic 130 comparison used by your platform. 131 </p> 132 <p> 133 For example, suppose you want all fruits whose names begin 134 with: 135 </p> 136 <div class="itemizedlist"> 137 <ul type="disc"> 138 <li> 139 <p> 140 'a' - 'f' to go on partition 0 141 </p> 142 </li> 143 <li> 144 <p> 145 'g' - 'p' to go on partition 1 146 </p> 147 </li> 148 <li> 149 <p> 150 'q' - 'z' to go on partition 2. 151 </p> 152 </li> 153 </ul> 154 </div> 155 <p> 156 Then you would accomplish this with the following code 157 fragment: 158 </p> 159 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> 160 <h3 class="title">Note</h3> 161 <p> 162 The <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> partition callback parameter must 163 be <code class="literal">NULL</code> if you are using an array of 164 <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s to partition your database. 165 </p> 166 </div> 167 <pre class="programlisting"> DB *dbp = NULL; 168 DB_ENV *envp = NULL; 169 DBT partKeys[2]; 170 u_int32_t db_flags; 171 const char *file_name = "mydb.db"; 172 int ret; 173 174... 175 176 /* Skipping environment open to shorten this example */ 177... 178 179 /* Initialize the DB handle */ 180 ret = db_create(&dbp, envp, 0); 181 if (ret != 0) { 182 fprintf(stderr, "%s\n", db_strerror(ret)); 183 return (EXIT_FAILURE); 184 } 185 186 /* Setup the partition keys */ 187 memset(&partKeys[0], 0, sizeof(DBT)); 188 partKeys[0].data = "g"; 189 partKeys[0].size = sizeof("g") - 1; 190 191 memset(&partKeys[1], 0, sizeof(DBT)); 192 partKeys[1].data = "q"; 193 partKeys[1].size = sizeof("q") - 1; 194 195 dbp->set_partition(dbp, 3, partKeys, NULL); 196 197 /* Now open the database */ 198 db_flags = DB_CREATE; /* Allow database creation */ 199 200 ret = dbp->open(dbp, /* Pointer to the database */ 201 NULL, /* Txn pointer */ 202 file_name, /* File name */ 203 NULL, /* Logical db name */ 204 DB_BTREE, /* Database type (using btree) */ 205 db_flags, /* Open flags */ 206 0); /* File mode. Using defaults */ 207 if (ret != 0) { 208 dbp->err(dbp, ret, "Database '%s' open failed", 209 file_name); 210 return (EXIT_FAILURE); 211 } </pre> 212 </div> 213 <div class="sect2" lang="en" xml:lang="en"> 214 <div class="titlepage"> 215 <div> 216 <div> 217 <h3 class="title"><a id="am_partition_function"></a>Partitioning callback</h3> 218 </div> 219 </div> 220 </div> 221 <p> 222 In some cases, a simple lexicographical comparison of key data 223 will not sufficiently support a partitioning scheme. For 224 those situations, you should write a partitioning function. 225 This function accepts a pointer to the <a href="../api_reference/C/db.html" class="olink">DB</a> and the <a href="../api_reference/C/dbt.html" class="olink">DBT</a>, and 226 it returns the number of the partition on which the key 227 belongs. 228 </p> 229 <p> 230 Note that <a href="../api_reference/C/db.html" class="olink">DB</a> actually places the key on the partition 231 calculated by: 232 </p> 233 <pre class="programlisting">returned_partition modulo number_of_partitions</pre> 234 <p> 235 Also, remember that if you use a partitioning function when you 236 create your database, then you must use the same partitioning 237 function every time you open that database in the future. 238 </p> 239 <p> 240 The following code fragment illustrates a partition callback: 241 </p> 242 <pre class="programlisting">u_int32_t db_partition_fn(DB *db, DBT *key) { 243 char *key_data; 244 u_int32_t ret_number; 245 /* Obtain your key data, unpacking it as necessary 246 * Here, we do the very simple thing just for illustrative purposes. 247 */ 248 249 key_data = (char *)key->data; 250 251 /* Here you would perform whatever comparison you require to determine 252 * what partition the key belongs on. If you return either 0 or the 253 * number of partitions in the database, the key is placed in the first 254 * database partition. Else, it is placed on: 255 * 256 * returned_number mod number_of_partitions 257 */ 258 259 ret_number = 0; 260 261 return ret_number; 262} </pre> 263 <p> 264 You then cause your partition callback to be used by providing it 265 to the <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> method, as illustrated by the following 266 code fragment. 267 </p> 268 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> 269 <h3 class="title">Note</h3> 270 <p> 271 The <a href="../api_reference/C/dbset_partition.html" class="olink">DB->set_partition()</a> <a href="../api_reference/C/dbt.html" class="olink">DBT</a> array parameter must 272 be <code class="literal">NULL</code> if you are using a partition 273 call back to partition your database. 274 </p> 275 </div> 276 <pre class="programlisting"> DB *dbp = NULL; 277 DB_ENV *envp = NULL; 278 u_int32_t db_flags; 279 const char *file_name = "mydb.db"; 280 int ret; 281 282... 283 284 /* Skipping environment open to shorten this example */ 285... 286 287 /* Initialize the DB handle */ 288 ret = db_create(&dbp, envp, 0); 289 if (ret != 0) { 290 fprintf(stderr, "%s\n", db_strerror(ret)); 291 return (EXIT_FAILURE); 292 } 293 294 dbp->set_partition(dbp, 3, NULL, db_partition_fn); 295 296 /* Now open the database */ 297 db_flags = DB_CREATE; /* Allow database creation */ 298 299 ret = dbp->open(dbp, /* Pointer to the database */ 300 NULL, /* Txn pointer */ 301 file_name, /* File name */ 302 NULL, /* Logical db name */ 303 DB_BTREE, /* Database type (using btree) */ 304 db_flags, /* Open flags */ 305 0); /* File mode. Using defaults */ 306 if (ret != 0) { 307 dbp->err(dbp, ret, "Database '%s' open failed", 308 file_name); 309 return (EXIT_FAILURE); 310 } </pre> 311 </div> 312 <div class="sect2" lang="en" xml:lang="en"> 313 <div class="titlepage"> 314 <div> 315 <div> 316 <h3 class="title"><a id="partition_file_placement"></a>Placing partition files</h3> 317 </div> 318 </div> 319 </div> 320 <p> 321 When you partition a database, a database file is created on 322 disk in the same way as if you were not partitioning the 323 database. That is, this file uses the name you provide to the 324 <a href="../api_reference/C/dbopen.html" class="olink">DB->open()</a> <code class="literal">file</code> parameter. 325 </p> 326 <p> 327 However, DB then also creates a series of database files on 328 disk, one for each partition that you want to use. These 329 partition files share the same name as the database file name, 330 but are also number sequentially. So if you create a database 331 named <code class="filename">mydb.db</code>, and you create 3 partitions 332 for it, then you will see the following database files on disk: 333 </p> 334 <pre class="programlisting"> mydb.db 335 __dbp.mydb.db.000 336 __dbp.mydb.db.001 337 __dbp.mydb.db.002 </pre> 338 <p> 339 All of the database's contents go into the numbered database 340 files. You can cause these files to be placed in different 341 directories (and, hence, different disk partitions or even 342 disks) by using the <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> method. 343 </p> 344 <p> 345 <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> takes a NULL-terminated array of 346 strings, each one of which should represent an existing 347 filesystem directory. 348 </p> 349 <p> 350 If you are using an environment, the directories specified 351 using <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> must also be included in the 352 environment list specified by <a href="../api_reference/C/envadd_data_dir.html" class="olink">DB_ENV->add_data_dir()</a>. 353 </p> 354 <p> 355 If you are not using an environment, then the the directories 356 specified to <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> can be either complete 357 paths to currently existing directories, or paths relative to 358 the application's current working directory. 359 </p> 360 <p> 361 Ideally, you will provide <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> with an array 362 that is the same size as the number of partitions you are 363 creating for your database. Partition files are then placed 364 according to the order that directories are contained in the 365 array; partition 0 is placed in directory_array[0], partition 1 366 in directory_array[1], and so forth. However, if you provide an 367 array of directories that is smaller than the number of 368 database partitions, then the directories are used on a 369 round-robin fashion. 370 </p> 371 <p> 372 You must call <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> before you create your 373 database, and before you open your database each time 374 thereafter. The array provided to <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB->set_partition_dirs()</a> must not 375 change after the database has been created. 376 </p> 377 </div> 378 </div> 379 <div class="navfooter"> 380 <hr /> 381 <table width="100%" summary="Navigation footer"> 382 <tr> 383 <td width="40%" align="left"><a accesskey="p" href="am_opensub.html">Prev</a> </td> 384 <td width="20%" align="center"> 385 <a accesskey="u" href="am.html">Up</a> 386 </td> 387 <td width="40%" align="right"> <a accesskey="n" href="am_get.html">Next</a></td> 388 </tr> 389 <tr> 390 <td width="40%" align="left" valign="top">Opening multiple databases in a single file </td> 391 <td width="20%" align="center"> 392 <a accesskey="h" href="index.html">Home</a> 393 </td> 394 <td width="40%" align="right" valign="top"> Retrieving records</td> 395 </tr> 396 </table> 397 </div> 398 </body> 399</html> 400