1<!--$Id: join.so,v 10.31 2004/01/21 20:39:56 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: Equality Join</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<a name="2"><!--meow--></a><a name="3"><!--meow--></a><a name="4"><!--meow--></a> 12<table width="100%"><tr valign=top> 13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Access Methods</dl></b></td> 14<td align=right><a href="/am/curdup.html"><img src="/images/prev.gif" alt="Prev"></a><a href="/toc.html"><img src="/images/ref.gif" alt="Ref"></a><a href="/am/count.html"><img src="/images/next.gif" alt="Next"></a> 15</td></tr></table> 16<p align=center><b>Equality Join</b></p> 17<p>Berkeley DB supports "equality" (also known as "natural"), joins on secondary 18indices. An equality join is a method of retrieving data from a primary 19database using criteria stored in a set of secondary indices. It 20requires the data be organized as a primary database which contains the 21primary key and primary data field, and a set of secondary indices. 22Each of the secondary indices is indexed by a different secondary key, 23and, for each key in a secondary index, there is a set of duplicate data 24items that match the primary keys in the primary database.</p> 25<p>For example, let's assume the need for an application that will return 26the names of stores in which one can buy fruit of a given color. We 27would first construct a primary database that lists types of fruit as 28the key item, and the store where you can buy them as the data item:</p> 29<blockquote><table border=1> 30<tr><th>Primary key:</th><th>Primary data:</th></tr> 31<tr> <td align=left>apple</td> <td align=left>Convenience Store</td> </tr> 32<tr> <td align=left>blueberry</td> <td align=left>Farmer's Market</td> </tr> 33<tr> <td align=left>peach</td> <td align=left>Shopway</td> </tr> 34<tr> <td align=left>pear</td> <td align=left>Farmer's Market</td> </tr> 35<tr> <td align=left>raspberry</td> <td align=left>Shopway</td> </tr> 36<tr> <td align=left>strawberry</td> <td align=left>Farmer's Market</td> </tr> 37</table></blockquote> 38<p>We would then create a secondary index with the key <b>color</b>, and, 39as the data items, the names of fruits of different colors.</p> 40<blockquote><table border=1> 41<tr><th>Secondary key:</th><th>Secondary data:</th></tr> 42<tr> <td align=left>blue</td> <td align=left>blueberry</td> </tr> 43<tr> <td align=left>red</td> <td align=left>apple</td> </tr> 44<tr> <td align=left>red</td> <td align=left>raspberry</td> </tr> 45<tr> <td align=left>red</td> <td align=left>strawberry</td> </tr> 46<tr> <td align=left>yellow</td> <td align=left>peach</td> </tr> 47<tr> <td align=left>yellow</td> <td align=left>pear</td> </tr> 48</table></blockquote> 49<p>This secondary index would allow an application to look up a color, and 50then use the data items to look up the stores where the colored fruit 51could be purchased. For example, by first looking up <b>blue</b>, 52the data item <b>blueberry</b> could be used as the lookup key in the 53primary database, returning <b>Farmer's Market</b>.</p> 54<p>Your data must be organized in the following manner in order to use the 55<a href="/api_c/db_join.html">DB->join</a> method:</p> 56<ol> 57<p><li>The actual data should be stored in the database represented by the 58<a href="/api_c/db_class.html">DB</a> object used to invoke this method. Generally, this 59<a href="/api_c/db_class.html">DB</a> object is called the <i>primary</i>. 60<p><li>Secondary indices should be stored in separate databases, whose keys 61are the values of the secondary indices and whose data items are the 62primary keys corresponding to the records having the designated 63secondary key value. It is acceptable (and expected) that there may be 64duplicate entries in the secondary indices. 65<p>These duplicate entries should be sorted for performance reasons, although 66it is not required. For more information see the <a href="/api_c/db_set_flags.html#DB_DUPSORT">DB_DUPSORT</a> flag 67to the <a href="/api_c/db_set_flags.html">DB->set_flags</a> method.</p> 68</ol> 69<p>What the <a href="/api_c/db_join.html">DB->join</a> method does is review a list of secondary keys, and, 70when it finds a data item that appears as a data item for all of the 71secondary keys, it uses that data item as a lookup into the primary 72database, and returns the associated data item.</p> 73<p>If there were another secondary index that had as its key the <b>cost</b> 74of the fruit, a similar lookup could be done on stores where inexpensive 75fruit could be purchased:</p> 76<blockquote><table border=1> 77<tr><th>Secondary key:</th><th>Secondary data:</th></tr> 78<tr> <td align=left>expensive</td> <td align=left>blueberry</td> </tr> 79<tr> <td align=left>expensive</td> <td align=left>peach</td> </tr> 80<tr> <td align=left>expensive</td> <td align=left>pear</td> </tr> 81<tr> <td align=left>expensive</td> <td align=left>strawberry</td> </tr> 82<tr> <td align=left>inexpensive</td> <td align=left>apple</td> </tr> 83<tr> <td align=left>inexpensive</td> <td align=left>pear</td> </tr> 84<tr> <td align=left>inexpensive</td> <td align=left>raspberry</td> </tr> 85</table></blockquote> 86<p>The <a href="/api_c/db_join.html">DB->join</a> method provides equality join functionality. While not 87strictly cursor functionality, in that it is not a method off a cursor 88handle, it is more closely related to the cursor operations than to the 89standard <a href="/api_c/db_class.html">DB</a> operations.</p> 90<p>It is also possible to do lookups based on multiple criteria in a single 91operation. For example, it is possible to look up fruits that are both 92red and expensive in a single operation. If the same fruit appeared as 93a data item in both the color and expense indices, then that fruit name 94would be used as the key for retrieval from the primary index, and would 95then return the store where expensive, red fruit could be purchased.</p> 96<b>Example</b> 97<p>Consider the following three databases:</p> 98<br> 99<b>personnel</b><ul compact><li><p><ul type=disc> 100<li>key = SSN 101<li>data = record containing name, address, phone number, job title 102</ul></ul> 103<b>lastname</b><ul compact><li><p><ul type=disc> 104<li>key = lastname 105<li>data = SSN 106</ul></ul> 107<b>jobs</b><ul compact><li><p><ul type=disc> 108<li>key = job title 109<li>data = SSN 110</ul></ul> 111<br> 112<p>Consider the following query:</p> 113<blockquote><pre>Return the personnel records of all people named smith with the job 114title manager.</pre></blockquote> 115<p>This query finds are all the records in the primary database (personnel) 116for whom the criteria <b>lastname=smith and job title=manager</b> is 117true.</p> 118<p>Assume that all databases have been properly opened and have the 119handles: pers_db, name_db, job_db. We also assume that we have an 120active transaction to which the handle txn refers.</p> 121<blockquote><pre>DBC *name_curs, *job_curs, *join_curs; 122DBC *carray[3]; 123DBT key, data; 124int ret, tret; 125<p> 126name_curs = NULL; 127job_curs = NULL; 128memset(&key, 0, sizeof(key)); 129memset(&data, 0, sizeof(data)); 130<p> 131if ((ret = 132 name_db->cursor(name_db, txn, &name_curs, 0)) != 0) 133 goto err; 134key.data = "smith"; 135key.size = sizeof("smith"); 136if ((ret = 137 name_curs->c_get(name_curs, &key, &data, DB_SET)) != 0) 138 goto err; 139<p> 140if ((ret = job_db->cursor(job_db, txn, &job_curs, 0)) != 0) 141 goto err; 142key.data = "manager"; 143key.size = sizeof("manager"); 144if ((ret = 145 job_curs->c_get(job_curs, &key, &data, DB_SET)) != 0) 146 goto err; 147<p> 148carray[0] = name_curs; 149carray[1] = job_curs; 150carray[2] = NULL; 151<p> 152if ((ret = 153 pers_db->join(pers_db, carray, &join_curs, 0)) != 0) 154 goto err; 155while ((ret = 156 join_curs->c_get(join_curs, &key, &data, 0)) == 0) { 157 /* Process record returned in key/data. */ 158} 159<p> 160/* 161 * If we exited the loop because we ran out of records, 162 * then it has completed successfully. 163 */ 164if (ret == DB_NOTFOUND) 165 ret = 0; 166<p> 167err: 168if (join_curs != NULL && 169 (tret = join_curs->c_close(join_curs)) != 0 && ret == 0) 170 ret = tret; 171if (name_curs != NULL && 172 (tret = name_curs->c_close(name_curs)) != 0 && ret == 0) 173 ret = tret; 174if (job_curs != NULL && 175 (tret = job_curs->c_close(job_curs)) != 0 && ret == 0) 176 ret = tret; 177<p> 178return (ret); 179</pre></blockquote> 180<p>The name cursor is positioned at the beginning of the duplicate list 181for <b>smith</b> and the job cursor is placed at the beginning of 182the duplicate list for <b>manager</b>. The join cursor is returned 183from the join method. This code then loops over the join cursor getting 184the personnel records of each one until there are no more.</p> 185<table width="100%"><tr><td><br></td><td align=right><a href="/am/curdup.html"><img src="/images/prev.gif" alt="Prev"></a><a href="/toc.html"><img src="/images/ref.gif" alt="Ref"></a><a href="/am/count.html"><img src="/images/next.gif" alt="Next"></a> 186</td></tr></table> 187<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 188</body> 189</html> 190