1<!--$Id: terrain.so,v 10.4 2001/05/05 01:49:26 bostic Exp $--> 2<!--Copyright (c) 1997,2008 Oracle. All rights reserved.--> 3<!--See the file LICENSE for redistribution information.--> 4<html> 5<head> 6<title>Berkeley DB Reference Guide: Mapping the terrain: theory and practice</title> 7<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> 8<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++"> 9</head> 10<body bgcolor=white> 11<table width="100%"><tr valign=top> 12<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></b></td> 13<td align=right><a href="../intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a> 14</td></tr></table> 15<p align=center><b>Mapping the terrain: theory and practice</b></p> 16<p>The first step in selecting a database system is figuring out what the 17choices are. Decades of research and real-world deployment have produced 18countless systems. We need to organize them somehow to reduce the number 19of options.</p> 20<p>One obvious way to group systems is to use the common labels that 21vendors apply to them. The buzzwords here include "network," 22"relational," "object-oriented," and "embedded," with some 23cross-fertilization like "object-relational" and "embedded network". 24Understanding the buzzwords is important. Each has some grounding in 25theory, but has also evolved into a practical label for categorizing 26systems that work in a certain way.</p> 27<p>All database systems, regardless of the buzzwords that apply to them, 28provide a few common services. All of them store data, for example. 29We'll begin by exploring the common services that all systems provide, 30and then examine the differences among the different kinds of systems.</p> 31<b>Data access and data management</b> 32<p>Fundamentally, database systems provide two services.</p> 33<p>The first service is <i>data access</i>. Data access means adding 34new data to the database (inserting), finding data of interest 35(searching), changing data already stored (updating), and removing data 36from the database (deleting). All databases provide these services. How 37they work varies from category to category, and depends on the record 38structure that the database supports.</p> 39<p>Each record in a database is a collection of values. For example, the 40record for a Web site customer might include a name, email address, 41shipping address, and payment information. Records are usually stored 42in tables. Each table holds records of the same kind. For example, the 43<b>customer</b> table at an e-commerce Web site might store the 44customer records for every person who shopped at the site. Often, 45database records have a different structure from the structures or 46instances supported by the programming language in which an application 47is written. As a result, working with records can mean:</p> 48<p><ul type=disc> 49<li>using database operations like searches and updates on records; and 50<li>converting between programming language structures and database record 51types in the application. 52</ul> 53<p>The second service is <i>data management</i>. Data management is 54more complicated than data access. Providing good data management 55services is the hard part of building a database system. When you 56choose a database system to use in an application you build, making sure 57it supports the data management services you need is critical.</p> 58<p>Data management services include allowing multiple users to work on the 59database simultaneously (concurrency), allowing multiple records to be 60changed instantaneously (transactions), and surviving application and 61system crashes (recovery). Different database systems offer different 62data management services. Data management services are entirely 63independent of the data access services listed above. For example, 64nothing about relational database theory requires that the system 65support transactions, but most commercial relational systems do.</p> 66<p>Concurrency means that multiple users can operate on the database at 67the same time. Support for concurrency ranges from none (single-user 68access only) to complete (many readers and writers working 69simultaneously).</p> 70<p>Transactions permit users to make multiple changes appear at once. For 71example, a transfer of funds between bank accounts needs to be a 72transaction because the balance in one account is reduced and the 73balance in the other increases. If the reduction happened before the 74increase, than a poorly-timed system crash could leave the customer 75poorer; if the bank used the opposite order, then the same system crash 76could make the customer richer. Obviously, both the customer and the 77bank are best served if both operations happen at the same instant.</p> 78<p>Transactions have well-defined properties in database systems. They are 79<i>atomic</i>, so that the changes happen all at once or not at all. 80They are <i>consistent</i>, so that the database is in a legal state 81when the transaction begins and when it ends. They are typically 82<i>isolated</i>, which means that any other users in the database 83cannot interfere with them while they are in progress. And they are 84<i>durable</i>, so that if the system or application crashes after 85a transaction finishes, the changes are not lost. Together, the 86properties of <i>atomicity</i>, <i>consistency</i>, 87<i>isolation</i>, and <i>durability</i> are known as the ACID 88properties.</p> 89<p>As is the case for concurrency, support for transactions varies among 90databases. Some offer atomicity without making guarantees about 91durability. Some ignore isolatability, especially in single-user 92systems; there's no need to isolate other users from the effects of 93changes when there are no other users.</p> 94<p>Another important data management service is recovery. Strictly 95speaking, recovery is a procedure that the system carries out when it 96starts up. The purpose of recovery is to guarantee that the database is 97complete and usable. This is most important after a system or 98application crash, when the database may have been damaged. The recovery 99process guarantees that the internal structure of the database is good. 100Recovery usually means that any completed transactions are checked, and 101any lost changes are reapplied to the database. At the end of the 102recovery process, applications can use the database as if there had been 103no interruption in service.</p> 104<p>Finally, there are a number of data management services that permit 105copying of data. For example, most database systems are able to import 106data from other sources, and to export it for use elsewhere. Also, most 107systems provide some way to back up databases and to restore in the 108event of a system failure that damages the database. Many commercial 109systems allow <i>hot backups</i>, so that users can back up 110databases while they are in use. Many applications must run without 111interruption, and cannot be shut down for backups.</p> 112<p>A particular database system may provide other data management services. 113Some provide browsers that show database structure and contents. Some 114include tools that enforce data integrity rules, such as the rule that 115no employee can have a negative salary. These data management services 116are not common to all systems, however. Concurrency, recovery, and 117transactions are the data management services that most database vendors 118support.</p> 119<p>Deciding what kind of database to use means understanding the data 120access and data management services that your application needs. Berkeley DB 121is an embedded database that supports fairly simple data access with a 122rich set of data management services. To highlight its strengths and 123weaknesses, we can compare it to other database system categories.</p> 124<b>Relational databases</b> 125<p>Relational databases are probably the best-known database variant, 126because of the success of companies like Oracle. Relational databases 127are based on the mathematical field of set theory. The term "relation" 128is really just a synonym for "set" -- a relation is just a set of 129records or, in our terminology, a table. One of the main innovations in 130early relational systems was to insulate the programmer from the 131physical organization of the database. Rather than walking through 132arrays of records or traversing pointers, programmers make statements 133about tables in a high-level language, and the system executes those 134statements.</p> 135<p>Relational databases operate on <i>tuples</i>, or records, composed 136of values of several different data types, including integers, character 137strings, and others. Operations include searching for records whose 138values satisfy some criteria, updating records, and so on.</p> 139<p>Virtually all relational databases use the Structured Query Language, 140or SQL. This language permits people and computer programs to work with 141the database by writing simple statements. The database engine reads 142those statements and determines how to satisfy them on the tables in 143the database.</p> 144<p>SQL is the main practical advantage of relational database systems. 145Rather than writing a computer program to find records of interest, the 146relational system user can just type a query in a simple syntax, and 147let the engine do the work. This gives users enormous flexibility; they 148do not need to decide in advance what kind of searches they want to do, 149and they do not need expensive programmers to find the data they need. 150Learning SQL requires some effort, but it's much simpler than a 151full-blown high-level programming language for most purposes. And there 152are a lot of programmers who have already learned SQL.</p> 153<b>Object-oriented databases</b> 154<p>Object-oriented databases are less common than relational systems, but 155are still fairly widespread. Most object-oriented databases were 156originally conceived as persistent storage systems closely wedded to 157particular high-level programming languages like C++. With the spread 158of Java, most now support more than one programming language, but 159object-oriented database systems fundamentally provide the same class 160and method abstractions as do object-oriented programming languages.</p> 161<p>Many object-oriented systems allow applications to operate on objects 162uniformly, whether they are in memory or on disk. These systems create 163the illusion that all objects are in memory all the time. The advantage 164to object-oriented programmers who simply want object storage and 165retrieval is clear. They need never be aware of whether an object is in 166memory or not. The application simply uses objects, and the database 167system moves them between disk and memory transparently. All of the 168operations on an object, and all its behavior, are determined by the 169programming language.</p> 170<p>Object-oriented databases aren't nearly as widely deployed as relational 171systems. In order to attract developers who understand relational 172systems, many of the object-oriented systems have added support for 173query languages very much like SQL. In practice, though, object-oriented 174databases are mostly used for persistent storage of objects in C++ and 175Java programs.</p> 176<b>Network databases</b> 177<p>The "network model" is a fairly old technique for managing and 178navigating application data. Network databases are designed to make 179pointer traversal very fast. Every record stored in a network database 180is allowed to contain pointers to other records. These pointers are 181generally physical addresses, so fetching the record to which it refers 182just means reading it from disk by its disk address.</p> 183<p>Network database systems generally permit records to contain integers, 184floating point numbers, and character strings, as well as references to 185other records. An application can search for records of interest. After 186retrieving a record, the application can fetch any record to which it 187refers, quickly.</p> 188<p>Pointer traversal is fast because most network systems use physical disk 189addresses as pointers. When the application wants to fetch a record, 190the database system uses the address to fetch exactly the right string 191of bytes from the disk. This requires only a single disk access in all 192cases. Other systems, by contrast, often must do more than one disk read 193to find a particular record.</p> 194<p>The key advantage of the network model is also its main drawback. The 195fact that pointer traversal is so fast means that applications that do 196it will run well. On the other hand, storing pointers all over the 197database makes it very hard to reorganize the database. In effect, once 198you store a pointer to a record, it is difficult to move that record 199elsewhere. Some network databases handle this by leaving forwarding 200pointers behind, but this defeats the speed advantage of doing a single 201disk access in the first place. Other network databases find, and fix, 202all the pointers to a record when it moves, but this makes 203reorganization very expensive. Reorganization is often necessary in 204databases, since adding and deleting records over time will consume 205space that cannot be reclaimed without reorganizing. Without periodic 206reorganization to compact network databases, they can end up with a 207considerable amount of wasted space.</p> 208<b>Clients and servers</b> 209<p>Database vendors have two choices for system architecture. They can 210build a server to which remote clients connect, and do all the database 211management inside the server. Alternatively, they can provide a module 212that links directly into the application, and does all database 213management locally. In either case, the application developer needs 214some way of communicating with the database (generally, an Application 215Programming Interface (API) that does work in the process or that 216communicates with a server to get work done).</p> 217<p>Almost all commercial database products are implemented as servers, and 218applications connect to them as clients. Servers have several features 219that make them attractive.</p> 220<p>First, because all of the data is managed by a separate process, and 221possibly on a separate machine, it's easy to isolate the database server 222from bugs and crashes in the application.</p> 223<p>Second, because some database products (particularly relational engines) 224are quite large, splitting them off as separate server processes keeps 225applications small, which uses less disk space and memory. Relational 226engines include code to parse SQL statements, to analyze them and 227produce plans for execution, to optimize the plans, and to execute 228them.</p> 229<p>Finally, by storing all the data in one place and managing it with a 230single server, it's easier for organizations to back up, protect, and 231set policies on their databases. The enterprise databases for large 232companies often have several full-time administrators caring for them, 233making certain that applications run quickly, granting and denying 234access to users, and making backups.</p> 235<p>However, centralized administration can be a disadvantage in some cases. 236In particular, if a programmer wants to build an application that uses 237a database for storage of important information, then shipping and 238supporting the application is much harder. The end user needs to install 239and administer a separate database server, and the programmer must 240support not just one product, but two. Adding a server process to the 241application creates new opportunity for installation mistakes and 242run-time problems.</p> 243<table width="100%"><tr><td><br></td><td align=right><a href="../intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a> 244</td></tr></table> 245<p><font size=1>Copyright (c) 1996,2008 Oracle. All rights reserved.</font> 246</body> 247</html> 248