Cross Reference: /macosx-10.10.1/BerkeleyDB-21/db/libdb_java/README

Berkeley DB's Java API
$Id: README,v 12.2 2006/08/24 14:46:10 bostic Exp $

Berkeley DB's Java API is now generated with SWIG
(http://www.swig.org).  This document describes how SWIG is used -
what we trust it to do, what things we needed to work around.


Overview
========

SWIG is a tool that generates wrappers around native (C/C++) APIs for
various languages (mainly scripting languages) including Java.

By default, SWIG creates an API in the target language that exactly
replicates the native API (for example, each pointer type in the API
is wrapped as a distinct type in the language).  Although this
simplifies the wrapper layer (type translation is trivial), it usually
doesn't result in natural API in the target language.

A further constraint for Berkeley DB's Java API was backwards
compatibility.  The original hand-coded Java API is in widespread use,
and included many design decisions about how native types should be
represented in Java.  As an example, callback functions are
represented by Java interfaces that applications using Berkeley DB
could implement.  The SWIG implementation was required to maintain
backwards compatibility for those applications.


Running SWIG
============

The simplest use of SWIG is to simply run it with a C include file as
input.  SWIG parses the file and generates wrapper code for the target
language.  For Java, this includes a Java class for each C struct and
a C source file containing the Java Native Interface (JNI) function
calls for each native method.

The s_swig shell script in db/dist runs SWIG, and then post-processes
each Java source file with the sed commands in
libdb_java/java-post.sed.  The Java sources are placed in
java/src/com/sleepycat/db, and the native wrapper code is in a single
file in libdb_java/db_java_wrap.c.

The post-processing step modifies code in ways that is difficult with
SWIG (given my current level of knowledge).  This includes changing
some access modifiers to hide some of the implementation methods,
selectively adding "throws" clauses to methods, and adding calls to
"initialize" methods in Db and DbEnv after they are constructed (more
below on what these aclls do).

In addition to the source code generated by SWIG, some of the Java
classes are written by hand, and constants and code to fill statistics
structures are generated by the script dist/s_java.  The native
statistics code is in libdb_java/java_stat_auto.c, and is compiled
into the db_java_wrap object file with a #include directive.  This
allows most functions in that object to be static, which encourages
compiler inlining and reduces the number of symbols we export.


The Implementation
==================

For the reasons mentioned above, Berkeley DB requires a more
sophisticated mapping between the native API and Java, so additional
SWIG directives are added to the input.  In particular:

* The general intention is for db.i to contain the full DB API (just
  like db.h).  As much as possible, this file is kept Java independent
  so that it can be updated easily when the API changes.  SWIG doesn't
  have any builtin rules for how to handle function pointers in a
  struct, so each DB method must be added in a SWIG "%extend" block
  which includes the method signature and a call to the method.

  * SWIG's automatically generated function names happen to collide
    with Berkeley DB's naming convention.  For example, in a SWIG class
    called __db, a method called "open" would result in a wrapper
    function called "__db_open", which already exists in DB.  This is
    another reason why making these static functions is important.

* The main Java support starts in db_java.i - this file includes all
  Java code that is explicitly inserted into the generated classes,
  and is responsible for defining object lifecycles (handling
  allocation and cleanup).

  * Methods that need to be wrapped for special handling in Java code
    are renamed with a trailing zero (e.g., close becomes close0).
    This is invisible to applications.

  * Most DB classes that are wrapped have method calls that imply the
    cleanup of any native resources associated with the Java object
    (for example, Db.close or DbTxn.abort).  These methods are wrapped
    so that if the object is accessed after the native part has been
    destroyed, an exception is thrown rather than a trap that crashes
    the JVM.

  * Db and DbEnv initialization is more complex: a global reference is
    stored in the corresponding struct so that native code can
    efficiently map back to Java code.  In addition, if a Db is
    created without an environment (i.e., in a private environment),
    the initialization wraps the internal DbEnv to simplify handling
    of various Db methods that just call the corresponding DbEnv
    method (like err, errx, etc.).  It is important that the global
    references are cleaned up before the DB and DB_ENV handles are
    closed, so the Java objects can be garbage collected.

  * In the case of DbLock and DbLsn, there are no such methods.  In
    these cases, there is a finalize method that does the appropriate
    cleanup.  No other classes have finalize methods (in particular,
    the Dbt class is now implemented entirely in Java, so no
    finalization is necessary).

* Overall initialization code, including the System.loadLibrary call,
  is in java_util.i.  This includes looking up all class, field and
  method handles once so that execution is not slowed down by repeated
  runtime type queries.

* Exception handling is in java_except.i.  The main non-obvious design
  choice was to create a db_ret_t type for methods that return an
  error code as an int in the C API, but return void in the Java API
  (and throw exceptions on error).

  * The only other odd case with exceptions is DbMemoryException -
    this is thrown as normal when a call returns ENOMEM, but there is
    special handling for the case where a Dbt with DB_DBT_USERMEM is
    not big enough to handle a result: in this case, the Dbt handling
    code calls the method update_dbt on the exception that is about to
    be thrown to register the failed Dbt in the exception.

* Statistics handling is in java_stat.i - this mainly just hooks into
  the automatically-generated code in java_stat_auto.c.

* Callbacks: the general approach is that Db and DbEnv maintain
  references to the objects that handle each callback, and have a
  helper method for each call.  This is primarily to simplify the
  native code, and performs better than more complex native code.

  * One difference with the new approach is that the implementation is
    more careful about calling DeleteLocalRef on objects created for
    callbacks.  This is particularly important for callbacks like
    bt_compare, which may be called repeatedly from native code.
    Without the DeleteLocalRef calls, the Java objects that are
    created can not be collected until the original call returns.

* Most of the rest of the code is in java_typemaps.i.  A typemap is a
  rule describing how a native type is mapped onto a Java type for
  parameters and return values.  These handle most of the complexity
  of creating exactly the Java API we want.

  * One of the main areas of complexity is Dbt handling.  The approach
    taken is to accept whatever data is passed in by the application,
    pass that to native code, and reflect any changes to the native
    DBT back into the Java object.  In other words, the Dbt typemaps
    don't replicate DB's rules about whether Dbts will be modified or
    not - they just pass the data through.

  * As noted above, when a Dbt is "released" (i.e., no longer needed
    in native code), one of the check is whether a DbMemoryException
    is pending, and if so, whether this Dbt might be the cause.  In
    that case, the Dbt is added to the exception via the "update_dbt"
    method.

* Constant handling has been simplified by making DbConstants an
  interface.  This allows the Db class to inherit the constants, and
  most can be inlined by javac.

  * The danger here is if applications are compiled against one
    version of db.jar, but run against another.  This danger existed
    previously, but was partly ameliorated by a separation of
    constants into "case" and "non-case" constants (the non-case
    constants were arranged so they could not be inlined).  The only
    complete solution to this problem is for applications to check the
    version returned by DbEnv.get_version* versus the Db.DB_VERSION*
    constants.


Application-visible changes
===========================

* The new API is around 5x faster for many operations.

* Some internal methods and constructors that were previously public
  have been hidden or removed.

* A few methods that were inconsistent have been cleaned up (e.g.,
  Db.close now returns void, was an int but always zero).  The
  synchronized attributed has been toggled on some methods - this is
  an attempt to prevent multi-threaded applications shooting
  themselves in the foot by calling close() or similar methods
  concurrently from multiple threads.