1<!--$Id: second.so,v 10.12 2004/09/15 19:40:07 bostic Exp $-->
2<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
3<!--See the file LICENSE for redistribution information.-->
4<html>
5<head>
6<title>Berkeley DB Reference Guide: Secondary indices</title>
7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
9</head>
10<body bgcolor=white>
11<a name="2"><!--meow--></a><a name="3"><!--meow--></a>
12<table width="100%"><tr valign=top>
13<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Access Methods</dl></b></td>
14<td align=right><a href="../am/close.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../am/cursor.html"><img src="../../images/next.gif" alt="Next"></a>
15</td></tr></table>
16<p align=center><b>Secondary indices</b></p>
17<p>A secondary index, put simply, is a way to efficiently access records
18in a database (the primary) by means of some piece of information other
19than the usual (primary) key.  In Berkeley DB, this index is simply another
20database whose keys are these pieces of information (the secondary
21keys), and whose data are the primary keys.  Secondary indices can be
22created manually by the application; there is no disadvantage, other
23than complexity, to doing so.  However, when the secondary key can be
24mechanically derived from the primary key and datum that it points to,
25as is frequently the case, Berkeley DB can automatically and transparently
26manage secondary indices.</p>
27<p>As an example of how secondary indices might be used, consider a
28database containing a list of students at a college, each of whom has
29a unique student ID number.  A typical database would use the student
30ID number as the key; however, one might also reasonably want to be
31able to look up students by last name.  To do this, one would construct
32a secondary index in which the secondary key was this last name.</p>
33<p>In SQL, this would be done by executing something like the following:</p>
34<blockquote><pre>CREATE TABLE students(student_id CHAR(4) NOT NULL,
35	lastname CHAR(15), firstname CHAR(15), PRIMARY KEY(student_id));
36CREATE INDEX lname ON students(lastname);</pre></blockquote>
37<p>In Berkeley DB, this would work as follows (a
38<a href="second.javas">Java API example is also available</a>):</p>
39<pre><blockquote>struct student_record {
40	char student_id[4];
41	char last_name[15];
42	char first_name[15];
43};
44<p>
45void
46second()
47{
48	DB *dbp, *sdbp;
49	int ret;
50	<p>
51	/* Open/create primary */
52	if ((ret = db_create(&dbp, dbenv, 0)) != 0)
53		handle_error(ret);
54	if ((ret = dbp-&gt;open(dbp, NULL,
55	    "students.db", NULL, DB_BTREE, DB_CREATE, 0600)) != 0)
56		handle_error(ret);
57	<p>
58	/*
59	 * Open/create secondary.  Note that it supports duplicate data
60	 * items, since last names might not be unique.
61	 */
62	if ((ret = db_create(&sdbp, dbenv, 0)) != 0)
63		handle_error(ret);
64	if ((ret = sdbp-&gt;set_flags(sdbp, DB_DUP | DB_DUPSORT)) != 0)
65		handle_error(ret);
66	if ((ret = sdbp-&gt;open(sdbp, NULL,
67	    "lastname.db", NULL, DB_BTREE, DB_CREATE, 0600)) != 0)
68		handle_error(ret);
69	<p>
70	/* Associate the secondary with the primary. */
71	if ((ret = dbp-&gt;associate(dbp, NULL, sdbp, getname, 0)) != 0)
72		handle_error(ret);
73}
74<p>
75/*
76 * getname -- extracts a secondary key (the last name) from a primary
77 * 	key/data pair
78 */
79int
80getname(secondary, pkey, pdata, skey)
81	DB *secondary;
82	const DBT *pkey, *pdata;
83	DBT *skey;
84{
85	/*
86	 * Since the secondary key is a simple structure member of the
87	 * record, we don't have to do anything fancy to return it.  If
88	 * we have composite keys that need to be constructed from the
89	 * record, rather than simply pointing into it, then the user's
90	 * function might need to allocate space and copy data.  In
91	 * this case, the DB_DBT_APPMALLOC flag should be set in the
92	 * secondary key DBT.
93	 */
94	memset(skey, 0, sizeof(DBT));
95	skey-&gt;data = ((struct student_record *)pdata-&gt;data)-&gt;last_name;
96	skey-&gt;size = sizeof((struct student_record *)pdata-&gt;data)-&gt;last_name;
97	return (0);
98}</blockquote></pre>
99<p>From the application's perspective, putting things into the database
100works exactly as it does without a secondary index;  one can simply
101insert records into the primary database.  In SQL one would do the
102following:</p>
103<blockquote><pre>INSERT INTO student
104    VALUES ("WC42", "Churchill      ", "Winston        ");</pre></blockquote>
105<p>and in Berkeley DB, one does:</p>
106<blockquote><pre>struct student_record s;
107DBT data, key;
108<p>
109memset(&key, 0, sizeof(DBT));
110memset(&data, 0, sizeof(DBT));
111memset(&s, 0, sizeof(struct student_record));
112key.data = "WC42";
113key.size = 4;
114memcpy(&s.student_id, "WC42", sizeof(s.student_id));
115memcpy(&s.last_name, "Churchill      ", sizeof(s.last_name));
116memcpy(&s.first_name, "Winston        ", sizeof(s.first_name));
117data.data = &s;
118data.size = sizeof(s);
119if ((ret = dbp-&gt;put(dbp, txn, &key, &data, 0)) != 0)
120	handle_error(ret);</pre></blockquote>
121<p>Internally, a record with secondary key "Churchill" is inserted into
122the secondary database (in addition to the insertion of "WC42" into the
123primary, of course).</p>
124<p>Deletes are similar.  The SQL clause:</p>
125<blockquote><pre>DELETE FROM student WHERE (student_id = "WC42");</pre></blockquote>
126<p>looks like:</p>
127<blockquote><pre>DBT key;
128<p>
129memset(&key, 0, sizeof(DBT));
130key.data = "WC42";
131key.size = 4;
132if ((ret = dbp-&gt;del(dbp, txn, &key, 0)) != 0)
133	handle_error(ret);</pre></blockquote>
134<p>Deletes can also be performed on the secondary index directly; a delete
135done this way will delete the "real" record in the primary as well.  If
136the secondary supports duplicates and there are duplicate occurrences of
137the secondary key, then all records with that secondary key are removed
138from both the secondary index and the primary database. In
139SQL:</p>
140<blockquote><pre>DELETE FROM lname WHERE (lastname = "Churchill      ");</pre></blockquote>
141<p>In Berkeley DB:</p>
142<blockquote><pre>DBT skey;
143<p>
144memset(&skey, 0, sizeof(DBT));
145skey.data = "Churchill      ";
146skey.size = 15;
147if ((ret = sdbp-&gt;del(sdbp, txn, &skey, 0)) != 0)
148	handle_error(ret);</pre></blockquote>
149<p>Gets on a secondary automatically return the primary datum.  If
150<a href="../../api_c/db_get.html">DB-&gt;pget</a> or <a href="../../api_c/dbc_get.html">DBcursor-&gt;pget</a> is used in lieu of <a href="../../api_c/db_get.html">DB-&gt;get</a>
151or <a href="../../api_c/dbc_get.html">DBcursor-&gt;get</a>, the primary key is returned as well.  Thus, the
152equivalent of:</p>
153<blockquote><pre>SELECT * from lname WHERE (lastname = "Churchill      ");</pre></blockquote>
154<p>would be:</p>
155<blockquote><pre>DBT data, pkey, skey;
156<p>
157memset(&skey, 0, sizeof(DBT));
158memset(&pkey, 0, sizeof(DBT));
159memset(&data, 0, sizeof(DBT));
160skey.data = "Churchill      ";
161skey.size = 15;
162if ((ret = sdbp-&gt;pget(sdbp, txn, &skey, &pkey, &data, 0)) != 0)
163	handle_error(ret);
164/*
165 * Now pkey contains "WC42" and data contains Winston's record.
166 */</pre></blockquote>
167<p>To create a secondary index to a Berkeley DB database, open the database that
168is to become a secondary index normally, then pass it as the "secondary"
169argument to the <a href="../../api_c/db_associate.html">DB-&gt;associate</a> method for some primary database.</p>
170<p>After a <a href="../../api_c/db_associate.html">DB-&gt;associate</a> call is made, the secondary indices become
171alternate interfaces to the primary database.  All updates to the
172primary will be automatically reflected in each secondary index that has
173been associated with it.  All get operations using the <a href="../../api_c/db_get.html">DB-&gt;get</a>
174or <a href="../../api_c/dbc_get.html">DBcursor-&gt;get</a> methods on the secondary index return the primary datum
175associated with the specified (or otherwise current, in the case of
176cursor operations) secondary key.  The <a href="../../api_c/db_get.html">DB-&gt;pget</a> and
177<a href="../../api_c/dbc_get.html">DBcursor-&gt;pget</a> methods also become usable; these behave just like
178<a href="../../api_c/db_get.html">DB-&gt;get</a> and <a href="../../api_c/dbc_get.html">DBcursor-&gt;get</a>, but return the primary key in
179addition to the primary datum, for those applications that need it as
180well.</p>
181<p>Cursor get operations on a secondary index perform as expected; although
182the data returned will by default be those of the primary database, a
183position in the secondary index is maintained normally, and records will
184appear in the order determined by the secondary key and the comparison
185function or other structure of the secondary database.</p>
186<p>Delete operations on a secondary index delete the item from the primary
187database and all relevant secondaries, including the current one.</p>
188<p>Put operations of any kind are forbidden on secondary indices, as there
189is no way to specify a primary key for a newly put item.  Instead, the
190application should use the <a href="../../api_c/dbc_put.html">DBcursor-&gt;put</a> or <a href="../../api_c/db_put.html">DB-&gt;put</a> methods
191on the primary database.</p>
192<p>Any number of secondary indices may be associated with a given primary
193database, up to limitations on available memory and the number of open
194file descriptors.</p>
195<p>Note that although Berkeley DB guarantees that updates made using any
196<a href="../../api_c/db_class.html">DB</a> handle with an associated secondary will be reflected in the
197that secondary, associating each primary handle with all the appropriate
198secondaries is the responsibility of the application and is not enforced
199by Berkeley DB.  It is generally unsafe, but not forbidden by Berkeley DB, to modify
200a database that has secondary indices without having those indices open
201and associated.  Similarly, it is generally unsafe, but not forbidden,
202to modify a secondary index directly.  Applications that violate these
203rules face the possibility of outdated or incorrect results if the
204secondary indices are later used.</p>
205<p>If a secondary index becomes outdated for any reason, it should be
206discarded using the <a href="../../api_c/db_remove.html">DB-&gt;remove</a> method and a new one created
207using the <a href="../../api_c/db_associate.html">DB-&gt;associate</a> method.  If a secondary index is no
208longer needed, all of its handles should be closed using the
209<a href="../../api_c/db_close.html">DB-&gt;close</a> method, and then the database should be removed using
210a new database handle and the <a href="../../api_c/db_remove.html">DB-&gt;remove</a> method.</p>
211<p>Closing a primary database handle automatically dis-associates all
212secondary database handles associated with it.</p>
213<table width="100%"><tr><td><br></td><td align=right><a href="../am/close.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../am/cursor.html"><img src="../../images/next.gif" alt="Next"></a>
214</td></tr></table>
215<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
216</body>
217</html>
218