1Berkeley DB's Java API 2$Id: README,v 12.2 2006/08/24 14:46:10 bostic Exp $ 3 4Berkeley DB's Java API is now generated with SWIG 5(http://www.swig.org). This document describes how SWIG is used - 6what we trust it to do, what things we needed to work around. 7 8 9Overview 10======== 11 12SWIG is a tool that generates wrappers around native (C/C++) APIs for 13various languages (mainly scripting languages) including Java. 14 15By default, SWIG creates an API in the target language that exactly 16replicates the native API (for example, each pointer type in the API 17is wrapped as a distinct type in the language). Although this 18simplifies the wrapper layer (type translation is trivial), it usually 19doesn't result in natural API in the target language. 20 21A further constraint for Berkeley DB's Java API was backwards 22compatibility. The original hand-coded Java API is in widespread use, 23and included many design decisions about how native types should be 24represented in Java. As an example, callback functions are 25represented by Java interfaces that applications using Berkeley DB 26could implement. The SWIG implementation was required to maintain 27backwards compatibility for those applications. 28 29 30Running SWIG 31============ 32 33The simplest use of SWIG is to simply run it with a C include file as 34input. SWIG parses the file and generates wrapper code for the target 35language. For Java, this includes a Java class for each C struct and 36a C source file containing the Java Native Interface (JNI) function 37calls for each native method. 38 39The s_swig shell script in db/dist runs SWIG, and then post-processes 40each Java source file with the sed commands in 41libdb_java/java-post.sed. The Java sources are placed in 42java/src/com/sleepycat/db, and the native wrapper code is in a single 43file in libdb_java/db_java_wrap.c. 44 45The post-processing step modifies code in ways that is difficult with 46SWIG (given my current level of knowledge). This includes changing 47some access modifiers to hide some of the implementation methods, 48selectively adding "throws" clauses to methods, and adding calls to 49"initialize" methods in Db and DbEnv after they are constructed (more 50below on what these aclls do). 51 52In addition to the source code generated by SWIG, some of the Java 53classes are written by hand, and constants and code to fill statistics 54structures are generated by the script dist/s_java. The native 55statistics code is in libdb_java/java_stat_auto.c, and is compiled 56into the db_java_wrap object file with a #include directive. This 57allows most functions in that object to be static, which encourages 58compiler inlining and reduces the number of symbols we export. 59 60 61The Implementation 62================== 63 64For the reasons mentioned above, Berkeley DB requires a more 65sophisticated mapping between the native API and Java, so additional 66SWIG directives are added to the input. In particular: 67 68* The general intention is for db.i to contain the full DB API (just 69 like db.h). As much as possible, this file is kept Java independent 70 so that it can be updated easily when the API changes. SWIG doesn't 71 have any builtin rules for how to handle function pointers in a 72 struct, so each DB method must be added in a SWIG "%extend" block 73 which includes the method signature and a call to the method. 74 75 * SWIG's automatically generated function names happen to collide 76 with Berkeley DB's naming convention. For example, in a SWIG class 77 called __db, a method called "open" would result in a wrapper 78 function called "__db_open", which already exists in DB. This is 79 another reason why making these static functions is important. 80 81* The main Java support starts in db_java.i - this file includes all 82 Java code that is explicitly inserted into the generated classes, 83 and is responsible for defining object lifecycles (handling 84 allocation and cleanup). 85 86 * Methods that need to be wrapped for special handling in Java code 87 are renamed with a trailing zero (e.g., close becomes close0). 88 This is invisible to applications. 89 90 * Most DB classes that are wrapped have method calls that imply the 91 cleanup of any native resources associated with the Java object 92 (for example, Db.close or DbTxn.abort). These methods are wrapped 93 so that if the object is accessed after the native part has been 94 destroyed, an exception is thrown rather than a trap that crashes 95 the JVM. 96 97 * Db and DbEnv initialization is more complex: a global reference is 98 stored in the corresponding struct so that native code can 99 efficiently map back to Java code. In addition, if a Db is 100 created without an environment (i.e., in a private environment), 101 the initialization wraps the internal DbEnv to simplify handling 102 of various Db methods that just call the corresponding DbEnv 103 method (like err, errx, etc.). It is important that the global 104 references are cleaned up before the DB and DB_ENV handles are 105 closed, so the Java objects can be garbage collected. 106 107 * In the case of DbLock and DbLsn, there are no such methods. In 108 these cases, there is a finalize method that does the appropriate 109 cleanup. No other classes have finalize methods (in particular, 110 the Dbt class is now implemented entirely in Java, so no 111 finalization is necessary). 112 113* Overall initialization code, including the System.loadLibrary call, 114 is in java_util.i. This includes looking up all class, field and 115 method handles once so that execution is not slowed down by repeated 116 runtime type queries. 117 118* Exception handling is in java_except.i. The main non-obvious design 119 choice was to create a db_ret_t type for methods that return an 120 error code as an int in the C API, but return void in the Java API 121 (and throw exceptions on error). 122 123 * The only other odd case with exceptions is DbMemoryException - 124 this is thrown as normal when a call returns ENOMEM, but there is 125 special handling for the case where a Dbt with DB_DBT_USERMEM is 126 not big enough to handle a result: in this case, the Dbt handling 127 code calls the method update_dbt on the exception that is about to 128 be thrown to register the failed Dbt in the exception. 129 130* Statistics handling is in java_stat.i - this mainly just hooks into 131 the automatically-generated code in java_stat_auto.c. 132 133* Callbacks: the general approach is that Db and DbEnv maintain 134 references to the objects that handle each callback, and have a 135 helper method for each call. This is primarily to simplify the 136 native code, and performs better than more complex native code. 137 138 * One difference with the new approach is that the implementation is 139 more careful about calling DeleteLocalRef on objects created for 140 callbacks. This is particularly important for callbacks like 141 bt_compare, which may be called repeatedly from native code. 142 Without the DeleteLocalRef calls, the Java objects that are 143 created can not be collected until the original call returns. 144 145* Most of the rest of the code is in java_typemaps.i. A typemap is a 146 rule describing how a native type is mapped onto a Java type for 147 parameters and return values. These handle most of the complexity 148 of creating exactly the Java API we want. 149 150 * One of the main areas of complexity is Dbt handling. The approach 151 taken is to accept whatever data is passed in by the application, 152 pass that to native code, and reflect any changes to the native 153 DBT back into the Java object. In other words, the Dbt typemaps 154 don't replicate DB's rules about whether Dbts will be modified or 155 not - they just pass the data through. 156 157 * As noted above, when a Dbt is "released" (i.e., no longer needed 158 in native code), one of the check is whether a DbMemoryException 159 is pending, and if so, whether this Dbt might be the cause. In 160 that case, the Dbt is added to the exception via the "update_dbt" 161 method. 162 163* Constant handling has been simplified by making DbConstants an 164 interface. This allows the Db class to inherit the constants, and 165 most can be inlined by javac. 166 167 * The danger here is if applications are compiled against one 168 version of db.jar, but run against another. This danger existed 169 previously, but was partly ameliorated by a separation of 170 constants into "case" and "non-case" constants (the non-case 171 constants were arranged so they could not be inlined). The only 172 complete solution to this problem is for applications to check the 173 version returned by DbEnv.get_version* versus the Db.DB_VERSION* 174 constants. 175 176 177Application-visible changes 178=========================== 179 180* The new API is around 5x faster for many operations. 181 182* Some internal methods and constructors that were previously public 183 have been hidden or removed. 184 185* A few methods that were inconsistent have been cleaned up (e.g., 186 Db.close now returns void, was an int but always zero). The 187 synchronized attributed has been toggled on some methods - this is 188 an attempt to prevent multi-threaded applications shooting 189 themselves in the foot by calling close() or similar methods 190 concurrently from multiple threads. 191