1<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> 2<html> 3<head> 4 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 5 <meta name="GENERATOR" content="Mozilla/4.76 [en] (X11; U; FreeBSD 4.3-RELEASE i386) [Netscape]"> 6</head> 7<body> 8 9<center> 10<h1> 11 Client/Server Interface for Berkeley DB</h1></center> 12 13<center><i>Susan LoVerso</i> 14<br><i>Rev 1.3</i> 15<br><i>1999 Nov 29</i></center> 16 17<p>We provide an interface allowing client/server access to Berkeley DB. 18Our goal is to provide a client and server library to allow users to separate 19the functionality of their applications yet still have access to the full 20benefits of Berkeley DB. The goal is to provide a totally seamless 21interface with minimal modification to existing applications as well. 22<p>The client/server interface for Berkeley DB can be broken up into several 23layers. At the lowest level there is the transport mechanism to send 24out the messages over the network. Above that layer is the messaging 25layer to interpret what comes over the wire, and bundle/unbundle message 26contents. The next layer is Berkeley DB itself. 27<p>The transport layer uses ONC RPC (RFC 1831) and XDR (RFC 1832). 28We declare our message types and operations supported by our program and 29the RPC library and utilities pretty much take care of the rest. 30The 31<i>rpcgen</i> program generates all of the low level code needed. 32We need to define both sides of the RPC. 33<br> 34<h2> 35<a NAME="DB Modifications"></a>DB Modifications</h2> 36To achieve the goal of a seamless interface, it is necessary to impose 37a constraint on the application. That constraint is simply that all database 38access must be done through an open environment. I.e. this model 39does not support standalone databases. The reason for this constraint 40is so that we have an environment structure internally to store our connection 41to the server. Imposing this constraint means that we can provide 42the seamless interface just by adding a single environment method: <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>. 43<p>The planned interface for this method is: 44<pre>DBENV->set_rpc_server(dbenv, /* DB_ENV structure */ 45 hostname /* Host of server */ 46 cl_timeout, /* Client timeout (sec) */ 47 srv_timeout,/* Server timeout (sec) */ 48 flags); /* Flags: unused */</pre> 49This new method takes the hostname of the server, establishes our connection 50and an environment on the server. If a server timeout is specified, 51then we send that to the server as well (and the server may or may not 52choose to use that value). This timeout is how long the server will 53allow the environment to remain idle before declaring it dead and releasing 54resources on the server. The pointer to the connection is stored 55on the client in the DBENV structure and is used by all other methods to 56figure out with whom to communicate. If a client timeout is specified, 57it indicates how long the client is willing to wait for a reply from the 58server. If the values are 0, then defaults are used. Flags 59is currently unused, but exists because we always need to have a placeholder 60for flags and it would be used for specifying authentication desired (were 61we to provide an authentication scheme at some point) or other uses not 62thought of yet! 63<p>This client code is part of the monolithic DB library. The user 64accesses the client functions via a new flag to <a href="../docs/api_c/db_env_create.html">db_env_create()</a>. 65That flag is DB_CLIENT. By using this flag the user indicates they 66want to have the client methods rather than the standard methods for the 67environment. Also by issuing this flag, the user needs to connect 68to the server via the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a> 69method. 70<p>We need two new fields in the <i>DB_ENV </i>structure. One is 71the socket descriptor to communicate to the server, the other field is 72the client identifier the server gives to us. The <i>DB, </i>and<i> 73DBC </i>only need one additional field, the client identifier. The 74<i>DB_TXN</i> 75structure does not need modification, we are overloading the <i>txn_id 76</i>field. 77<h2> 78Issues</h2> 79We need to figure out what to do in case of client and server crashes. 80Both the client library and the server program are stateful. They 81both consume local resources during the lifetime of the connection. 82Should one end drop that connection, the other side needs to release those 83resources. 84<p>If the server crashes, then the client will get an error back. 85I have chosen to implement time-outs on the client side, using a default 86or allowing the application to specify one through the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a> 87method. Either the current operation will time-out waiting for the 88reply or the next operation called will time out (or get back some other 89kind of error regarding the server's non-existence). In any case, 90if the client application gets back such an error, it should abort any 91open transactions locally, close any databases, and close its environment. 92It may then decide to retry to connect to the server periodically or whenever 93it comes back. If the last operation a client did was a transaction 94commit that did not return or timed out from the server, the client cannot 95determine if the transaction was committed or not but must release the 96local transaction resources. Once the server is back up, recovery must 97be run on the server. If the transaction commit completed on 98the server before the crash, then the operation is redone, if the transaction 99commit did not get to the server, the pieces of the transaction are undone 100on recover. The client can then re-establish its connection and begin 101again. This is effectively like beginning over. The client 102cannot use ID's from its previous connection to the server. However, 103if recovery is run, then consistency is assured. 104<p>If the client crashes, the server needs to somehow figure this out. 105The server is just sitting there waiting for a request to come in. 106A server must be able to time-out a client. Similar to ftpd, if a 107connection is idle for N seconds, then the server decides the client is 108dead and releases that client's resources, aborting any open transactions, 109closing any open databases and environments. The server timing 110out a client is not a trivial issue however. The generated function 111for the server just calls <i>svc_run()</i>. The server code I write 112contains procedures to do specific things. We do not have access 113to the code calling <i>select()</i>. Timing out the select is not 114good enough even if we could do so. We want to time-out idle environments, 115not simply cause a time-out if the server is idle a while. See the 116discussion of the <a href="#The Server Program">server program</a> for 117a description of how we accomplish this. 118<p>Since rpcgen generates the main() function of the server, I do not yet 119know how we are going to have the server multi-threaded or multi-process 120without changing the generated code. The RPC book indicates that 121the only way to accomplish this is through modifying the generated code 122in the server. <b>For the moment we will ignore this issue while 123we get the core server working, as it is only a performance issue.</b> 124<p>We do not do any security or authentication. Someone could get 125the code and modify it to spoof messages, trick the server, etc. 126RPC has some amount of authentication built into it. I haven't yet 127looked into it much to know if we want to use it or just point a user at 128it. The changes to the client code are fairly minor, the changes 129to our server procs are fairly minor. We would have to add code to 130a <i>sed</i> script or <i>awk</i> script to change the generated server 131code (yet again) in the dispatch routine to perform authentication. 132<p>We will need to get an official program number from Sun. We can 133get this by sending mail to <i>rpc@sun.com</i> and presumably at some point 134they will send us back a program number that we will encode into our XDR 135description file. Until we release this we can use a program number 136in the "user defined" number space. 137<br> 138<h2> 139<a NAME="The Server Program"></a>The Server Program</h2> 140The server is a standalone program that the user builds and runs, probably 141as a daemon like process. This program is linked against the Berkeley 142DB library and the RPC library (which is part of the C library on my FreeBSD 143machine, others may have/need <i>-lrpclib</i>). The server basically 144is a slave to the client process. All messages from the client are 145synchronous and two-way. The server handles messages one at a time, 146and sends a reply back before getting another message. There are 147no asynchronous messages generated by the server to the client. 148<p>We have made a choice to modify the generated code for the server. 149The changes will be minimal, generally calling functions we write, that 150are in other source files. The first change is adding a call to our 151time-out function as described below. The second change is changing 152the name of the generated <i>main()</i> function to <i>__dbsrv_main()</i>, 153and adding our own <i>main()</i> function so that we can parse options, 154and set up other initialization we require. I have a <i>sed</i> script 155that is run from the distribution scripts that massages the generated code 156to make these minor changes. 157<p>Primarily the code needed for the server is the collection of the specified 158RPC functions. Each function receives the structure indicated, and 159our code takes out what it needs and passes the information into DB itself. 160The server needs to maintain a translation table for identifiers that we 161pass back to the client for the environment, transaction and database handles. 162<p>The table that the server maintains, assuming one client per server 163process/thread, should contain the handle to the environment, database 164or transaction, a link to maintain parent/child relationships between transactions, 165or databases and cursors, this handle's identifier, a type so that we can 166error if the client passes us a bad id for this call, and a link to this 167handle's environment entry (for time out/activity purposes). The 168table contains, in entries used by environments, a time-out value and an 169activity time stamp. Its use is described below for timing out idle 170clients. 171<p>Here is how we time out clients in the server. We have to modify 172the generated server code, but only to add one line during the dispatch 173function to run the time-out function. The call is made right before 174the return of the dispatch function, after the reply is sent to the client, 175so that client's aren't kept waiting for server bookkeeping activities. 176This time-out function then runs every time the server processes a request. 177In the time-out function we maintain a time-out hint that is the youngest 178environment to time-out. If the current time is less than the hint 179we know we do not need to run through the list of open handles. If 180the hint is expired, then we go through the list of open environment handles, 181and if they are past their expiration, then we close them and clean up. 182If they are not, we set up the hint for the next time. 183<p>Each entry in the open handle table has a pointer back to its environment's 184entry. Every operation within this environment can then update the 185single environment activity record. Every environment can have a 186different time-out. The <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server 187</a>call 188takes a server time-out value. If this value is 0 then a default 189(currently 5 minutes) is used. This time-out value is only a hint 190to the server. It may choose to disregard this value or set the time-out 191based on its own implementation. 192<p>For completeness, the flaws of this time-out implementation should be 193pointed out. First, it is possible that a client could crash with 194open handles, and no other requests come in to the server. Therefore 195the time-out function never gets run and those resources are not released 196(until a request does come in). Similarly, this time-out is not exact. 197The time-out function uses its hint and if it computes a hint on one run, 198an earlier time-out might be created before that time-out expires. 199This issue simply yields a handle that doesn't get released until that 200original hint expires. To illustrate, consider that at the time that 201the time-out function is run, the youngest time-out is 5 minutes in the 202future. Soon after, a new environment is opened that has a time-out 203of 1 minute. If this environment becomes idle (and other operations 204are going on), the time-out function will not release that environment 205until the original 5 minute hint expires. This is not a problem since 206the resources will eventually be released. 207<p>On a similar note, if a client crashes during an RPC, our reply generates 208a SIGPIPE, and our server crashes unless we catch it. Using <i>signal(SIGPIPE, 209SIG_IGN) </i>we can ignore it, and the server will go on. This is 210a call in our <i>main()</i> function that we write. Eventually 211this client's handles would be timed out as described above. We need 212this only for the unfortunate window of a client crashing during the RPC. 213<p>The options below are primarily for control of the program itself,. 214Details relating to databases and environments should be passed from the 215client to the server, since the server can serve many clients, many environments 216and many databases. Therefore it makes more sense for the client 217to set the cache size of its own environment, rather than setting a default 218cachesize on the server that applies as a blanket to any environment it 219may be called upon to open. Options are: 220<ul> 221<li> 222<b>-t </b> to set the default time-out given to an environment.</li> 223 224<li> 225<b>-T</b> to set the maximum time-out allowed for the server.</li> 226 227<li> 228<b>-L</b> to log the execution of the server process to a specified file.</li> 229 230<li> 231<b>-v</b> to run in verbose mode.</li> 232 233<li> 234<b>-M</b> to specify the maximum number of outstanding child server 235processes/threads we can have at any given time. The default is 10. 236<b>[We 237are not yet doing multiple threads/processes.]</b></li> 238</ul> 239 240<h2> 241The Client Code</h2> 242The client code contains all of the supported functions and methods used 243in this model. There are several methods in the <i>__db_env 244</i>and 245<i>__db</i> 246structures that currently do not apply, such as the callbacks. Those 247fields that are not applicable to the client model point to NULL to notify 248the user of their error. Some method functions remain unchanged, 249as well such as the error calls. 250<p>The client code contains each method function that goes along with the 251<a href="#Remote Procedure Calls">RPC 252calls</a> described elsewhere. The client library also contains its 253own version of <a href="../docs/api_c/env_create.html">db_env_create()</a>, 254which does not result in any messages going over to the server (since we 255do not yet know what server we are talking to). This function sets 256up the pointers to the correct client functions. 257<p>All of the method functions that handle the messaging have a basic flow 258similar to this: 259<ul> 260<li> 261Local arg parsing that may be needed</li> 262 263<li> 264Marshalling the message header and the arguments we need to send to the 265server</li> 266 267<li> 268Sending the message</li> 269 270<li> 271Receiving a reply</li> 272 273<li> 274Unmarshalling the reply</li> 275 276<li> 277Local results processing that may be needed</li> 278</ul> 279 280<h2> 281Generated Code</h2> 282Almost all of the code is generated from a source file describing the interface 283and an <i>awk</i> script. This awk script generates six (6) 284files for us. It also modifies one. The files are: 285<ol> 286<li> 287Client file - The C source file created containing the client code.</li> 288 289<li> 290Client template file - The C template source file created containing interfaces 291for handling client-local issues such as resource allocation, but with 292a consistent interface with the client code generated.</li> 293 294<li> 295Server file - The C source file created containing the server code.</li> 296 297<li> 298Server template file - The C template source file created containing interfaces 299for handling server-local issues such as resource allocation, calling into 300the DB library but with a consistent interface with the server code generated.</li> 301 302<li> 303XDR file - The XDR message description file created.</li> 304 305<li> 306Server sed file - A sed script that contains commands to apply to the server 307procedure file (i.e. the real source file that the server template file 308becomes) so that minor interface changes can be consistently and easily 309applied to the real code.</li> 310 311<li> 312Server procedure file - This is the file that is modified by the sed script 313generated. It originated from the server template file.</li> 314</ol> 315The awk script reads a source file, <i>db_server/rpc.src </i>that describes 316each operation and what sorts of arguments it takes and what it returns 317from the server. The syntax of the source file describes the interface 318to that operation. There are four (4) parts to the syntax: 319<ol> 320<li> 321<b>BEGIN</b> <b><i>function version# codetype</i></b> - begins a new functional 322interface for the given <b><i>function</i></b>. Each function has 323a <b><i>version number</i></b>, currently all of them are at version number 324one (1). The <b><i>code type</i></b> indicates to the awk script 325what kind of code to generate. The choices are:</li> 326 327<ul> 328<li> 329<b>CODE </b>- Generate all code, and return a status value. If specified, 330the client code will simply return the status to the user upon completion 331of the RPC call.</li> 332 333<li> 334<b>RETCODE </b>- Generate all code and call a return function in the client 335template file to deal with client issues or with other returned items. 336If specified, the client code generated will call a function of the form 337<i>__dbcl_<name>_ret() 338</i>where 339<name> is replaced with the function name given here. This function 340is placed in the template file because this indicates that something special 341must occur on return. The arguments to this function are the same 342as those for the client function, with the addition of the reply message 343structure.</li> 344 345<li> 346<b>NOCLNTCODE - </b>Generate XDR and server code, but no corresponding 347client code. (This is used for functions that are not named the same thing 348on both sides. The only use of this at the moment is db_env_create 349and db_create. The environment create call to the server is actually 350called from the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a> 351method. The db_create code exists elsewhere in the library and we 352modify that code for the client call.)</li> 353</ul> 354 355<li> 356<b>ARG <i>RPC-type C-type varname [list-type]</i></b>- each line of this 357describes an argument to the function. The argument is called <b><i>varname</i></b>. 358The <b><i>C-type</i></b> given is what it should look like in the C code 359generated, such as <b>DB *, u_int32_t, const char *</b>. The 360<b><i>RPC-type</i></b> 361is an indication about how the RPC request message should be constructed. 362The RPC-types allowed are described below.</li> 363 364<li> 365<b>RET <i>RPC-type C-type varname [list-type]</i></b>- each line of this 366describes what the server should return from this procedure call (in addition 367to a status, which is always returned and should not be specified). 368The argument is called <b><i>varname</i></b>. The <b><i>C-type</i></b> 369given is what it should look like in the C code generated, such as <b>DB 370*, u_int32_t, const char *</b>. The <b><i>RPC-type</i></b> is an 371indication about how the RPC reply message should be constructed. 372The RPC-types are described below.</li> 373 374<li> 375<b>END </b>- End the description of this function. The result is 376that when the awk script encounters the <b>END</b> tag, it now has all 377the information it needs to construct the generated code for this function.</li> 378</ol> 379The <b><i>RPC-type</i></b> must be one of the following: 380<ul> 381<li> 382<b>IGNORE </b>- This argument is not passed to the server and should be 383ignored when constructing the XDR code. <b>Only allowed for an ARG 384specfication.</b></li> 385 386<li> 387<b>STRING</b> - This argument is a string.</li> 388 389<li> 390<b>INT </b>- This argument is an integer of some sort.</li> 391 392<li> 393<b>DBT </b>- This argument is a DBT, resulting in its decomposition into 394the request message.</li> 395 396<li> 397<b>LIST</b> - This argument is an opaque list passed to the server (NULL-terminated). 398If an argument of this type is given, it must have a <b><i>list-type</i></b> 399specified that is one of:</li> 400 401<ul> 402<li> 403<b>STRING</b></li> 404 405<li> 406<b>INT</b></li> 407 408<li> 409<b>ID</b>.</li> 410</ul> 411 412<li> 413<b>ID</b> - This argument is an identifier.</li> 414</ul> 415So, for example, the source for the DB->join RPC call looks like: 416<pre>BEGIN dbjoin 1 RETCODE 417ARG ID DB * dbp 418ARG LIST DBC ** curs ID 419ARG IGNORE DBC ** dbcpp 420ARG INT u_int32_t flags 421RET ID long dbcid 422END</pre> 423Our first line tells us we are writing the dbjoin function. It requires 424special code on the client so we indicate that with the RETCODE. 425This method takes four arguments. For the RPC request we need the 426database ID from the dbp, we construct a NULL-terminated list of IDs for 427the cursor list, we ignore the argument to return the cursor handle to 428the user, and we pass along the flags. On the return, the reply contains 429a status, by default, and additionally, it contains the ID of the newly 430created cursor. 431<h2> 432Building and Installing</h2> 433I need to verify with Don Anderson, but I believe we should just build 434the server program, just like we do for db_stat, db_checkpoint, etc. 435Basically it can be treated as a utility program from the building and 436installation perspective. 437<p>As mentioned early on, in the section on <a href="#DB Modifications">DB 438Modifications</a>, we have a single library, but allowing the user to access 439the client portion by sending a flag to <a href="../docs/api_c/env_create.html">db_env_create()</a>. 440The Makefile is modified to include the new files. 441<p>Testing is performed in two ways. First I have a new example program, 442that should become part of the example directory. It is basically 443a merging of ex_access.c and ex_env.c. This example is adequate to 444test basic functionality, as it does just does database put/get calls and 445appropriate open and close calls. However, in order to test the full 446set of functions a more generalized scheme is required. For the moment, 447I am going to modify the Tcl interface to accept the server information. 448Nothing else should need to change in Tcl. Then we can either write 449our own test modules or use a subset of the existing ones to test functionality 450on a regular basis. 451</body> 452</html> 453