1<!--$Id: dbis.so,v 10.11 2006/09/19 16:21:42 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: What is Berkeley DB?</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<table width="100%"><tr valign=top> 12<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></b></td> 13<td align=right><a href="../intro/terrain.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../intro/dbisnot.html"><img src="../../images/next.gif" alt="Next"></a> 14</td></tr></table> 15<p align=center><b>What is Berkeley DB?</b></p> 16<p>So far, we've discussed database systems in general terms. It's time 17now to consider Berkeley DB in particular and see how it fits into the 18framework we have introduced. The key question is, what kinds of 19applications should use Berkeley DB?</p> 20<p>Berkeley DB is an Open Source embedded database library that provides 21scalable, high-performance, transaction-protected data management 22services to applications. Berkeley DB provides a simple function-call API for 23data access and management.</p> 24<p>By "Open Source," we mean Berkeley DB is distributed under a license that 25conforms to the <a href="http://www.opensource.org/osd.html">Open 26Source Definition</a>. This license guarantees Berkeley DB is freely available 27for use and redistribution in other Open Source applications. Oracle 28Corporation sells commercial licenses allowing the redistribution of 29Berkeley DB in proprietary applications. In all cases the complete source 30code for Berkeley DB is freely available for download and use.</p> 31<p>Berkeley DB is "embedded" because it links directly into the application. It 32runs in the same address space as the application. As a result, no 33inter-process communication, either over the network or between 34processes on the same machine, is required for database operations. 35Berkeley DB provides a simple function-call API for a number of programming 36languages, including C, C++, Java, Perl, Tcl, Python, and PHP. All 37database operations happen inside the library. Multiple processes, or 38multiple threads in a single process, can all use the database at the 39same time as each uses the Berkeley DB library. Low-level services like 40locking, transaction logging, shared buffer management, memory 41management, and so on are all handled transparently by the library.</p> 42<p>The Berkeley DB library is extremely portable. It runs under almost all UNIX 43and Linux variants, Windows, and a number of embedded real-time 44operating systems. It runs on both 32-bit and 64-bit systems. It has 45been deployed on high-end Internet servers, desktop machines, and on 46palmtop computers, set-top boxes, in network switches, and elsewhere. 47Once Berkeley DB is linked into the application, the end user generally does 48not know that there's a database present at all.</p> 49<p>Berkeley DB is scalable in a number of respects. The database library itself 50is quite compact (under 300 kilobytes of text space on common 51architectures), but it can manage databases up to 256 terabytes in size. 52It also supports high concurrency, with thousands of users operating on 53the same database at the same time. Berkeley DB is small enough to run in 54tightly constrained embedded systems, but can take advantage of 55gigabytes of memory and terabytes of disk on high-end server machines.</p> 56<p>Berkeley DB generally outperforms relational and object-oriented database 57systems in embedded applications for a couple of reasons. First, because 58the library runs in the same address space, no inter-process 59communication is required for database operations. The cost of 60communicating between processes on a single machine, or among machines 61on a network, is much higher than the cost of making a function call. 62Second, because Berkeley DB uses a simple function-call interface for all 63operations, there is no query language to parse, and no execution plan 64to produce.</p> 65<b>Data Access Services</b> 66<p>Berkeley DB applications can choose the storage structure that best suits the 67application. Berkeley DB supports hash tables, Btrees, simple 68record-number-based storage, and persistent queues. Programmers can 69create tables using any of these storage structures, and can mix 70operations on different kinds of tables in a single application.</p> 71<p>Hash tables are generally good for very large databases that need 72predictable search and update times for random-access records. Hash 73tables allow users to ask, "Does this key exist?" or to fetch a record 74with a known key. Hash tables do not allow users to ask for records 75with keys that are close to a known key.</p> 76<p>Btrees are better for range-based searches, as when the application 77needs to find all records with keys between some starting and ending 78value. Btrees also do a better job of exploiting <i>locality 79of reference</i>. If the application is likely to touch keys near each 80other at the same time, the Btrees work well. The tree structure keeps 81keys that are close together near one another in storage, so fetching 82nearby values usually doesn't require a disk access.</p> 83<p>Record-number-based storage is natural for applications that need to 84store and fetch records, but that do not have a simple way to generate 85keys of their own. In a record number table, the record number is the 86key for the record. Berkeley DB will generate these record numbers 87automatically.</p> 88<p>Queues are well-suited for applications that create records, and then 89must deal with those records in creation order. A good example is 90on-line purchasing systems. Orders can enter the system at any time, 91but should generally be filled in the order in which they were placed.</p> 92<b>Data management services</b> 93<p>Berkeley DB offers important data management services, including concurrency, 94transactions, and recovery. All of these services work on all of the 95storage structures.</p> 96<p>Many users can work on the same database concurrently. Berkeley DB handles 97locking transparently, ensuring that two users working on the same 98record do not interfere with one another.</p> 99<p>The library provides strict ACID transaction semantics, by default. 100However, applications are allowed to relax the isolation guarantees 101the database system makes.</p> 102<p>Multiple operations can be grouped into a single transaction, and can 103be committed or rolled back atomically. Berkeley DB uses a technique called 104<i>two-phase locking</i> to be sure that concurrent transactions 105are isolated from one another, and a technique called 106<i>write-ahead logging</i> to guarantee that committed changes 107survive application, system, or hardware failures.</p> 108<p>When an application starts up, it can ask Berkeley DB to run recovery. 109Recovery restores the database to a clean state, with all committed 110changes present, even after a crash. The database is guaranteed to be 111consistent and all committed changes are guaranteed to be present when 112recovery completes.</p> 113<p>An application can specify, when it starts up, which data management 114services it will use. Some applications need fast, single-user, 115non-transactional Btree data storage. In that case, the application can 116disable the locking and transaction systems, and will not incur the 117overhead of locking or logging. If an application needs to support 118multiple concurrent users, but doesn't need transactions, it can turn 119on locking without transactions. Applications that need concurrent, 120transaction-protected database access can enable all of the 121subsystems.</p> 122<p>In all these cases, the application uses the same function-call API to 123fetch and update records.</p> 124<b>Design</b> 125<p>Berkeley DB was designed to provide industrial-strength database services to 126application developers, without requiring them to become database 127experts. It is a classic C-library style <i>toolkit</i>, providing 128a broad base of functionality to application writers. Berkeley DB was designed 129by programmers, for programmers: its modular design surfaces simple, 130orthogonal interfaces to core services, and it provides mechanism (for 131example, good thread support) without imposing policy (for example, the 132use of threads is not required). Just as importantly, Berkeley DB allows 133developers to balance performance against the need for crash recovery 134and concurrent use. An application can use the storage structure that 135provides the fastest access to its data and can request only the degree 136of logging and locking that it needs.</p> 137<p>Because of the tool-based approach and separate interfaces for each 138Berkeley DB subsystem, you can support a complete transaction environment for 139other system operations. Berkeley DB even allows you to wrap transactions 140around the standard UNIX file read and write operations! Further, Berkeley DB 141was designed to interact correctly with the native system's toolset, a 142feature no other database package offers. For example, Berkeley DB supports 143hot backups (database backups while the database is in use), using 144standard UNIX system utilities, for example, dump, tar, cpio, pax or 145even cp.</p> 146<p>Finally, because scripting language interfaces are available for Berkeley DB 147(notably Tcl and Perl), application writers can build incredibly powerful 148database engines with little effort. You can build transaction-protected 149database applications using your favorite scripting languages, an 150increasingly important feature in a world using CGI scripts to deliver 151HTML.</p> 152<table width="100%"><tr><td><br></td><td align=right><a href="../intro/terrain.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../intro/dbisnot.html"><img src="../../images/next.gif" alt="Next"></a> 153</td></tr></table> 154<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 155</body> 156</html> 157