1109998SmarkmNotes: 2001-09-24
2109998Smarkm-----------------
3109998Smarkm
4109998SmarkmThis "description" (if one chooses to call it that) needed some major updating
5109998Smarkmso here goes. This update addresses a change being made at the same time to
6109998SmarkmOpenSSL, and it pretty much completely restructures the underlying mechanics of
7109998Smarkmthe "ENGINE" code. So it serves a double purpose of being a "ENGINE internals
8109998Smarkmfor masochists" document *and* a rather extensive commit log message. (I'd get
9109998Smarkmlynched for sticking all this in CHANGES or the commit mails :-).
10109998Smarkm
11109998SmarkmENGINE_TABLE underlies this restructuring, as described in the internal header
12109998Smarkm"eng_int.h", implemented in eng_table.c, and used in each of the "class" files;
13109998Smarkmtb_rsa.c, tb_dsa.c, etc.
14109998Smarkm
15109998SmarkmHowever, "EVP_CIPHER" underlies the motivation and design of ENGINE_TABLE so
16109998SmarkmI'll mention a bit about that first. EVP_CIPHER (and most of this applies
17109998Smarkmequally to EVP_MD for digests) is both a "method" and a algorithm/mode
18109998Smarkmidentifier that, in the current API, "lingers". These cipher description +
19109998Smarkmimplementation structures can be defined or obtained directly by applications,
20109998Smarkmor can be loaded "en masse" into EVP storage so that they can be catalogued and
21109998Smarkmsearched in various ways, ie. two ways of encrypting with the "des_cbc"
22109998Smarkmalgorithm/mode pair are;
23109998Smarkm
24109998Smarkm(i) directly;
25109998Smarkm     const EVP_CIPHER *cipher = EVP_des_cbc();
26109998Smarkm     EVP_EncryptInit(&ctx, cipher, key, iv);
27109998Smarkm     [ ... use EVP_EncryptUpdate() and EVP_EncryptFinal() ...]
28109998Smarkm
29109998Smarkm(ii) indirectly; 
30109998Smarkm     OpenSSL_add_all_ciphers();
31109998Smarkm     cipher = EVP_get_cipherbyname("des_cbc");
32109998Smarkm     EVP_EncryptInit(&ctx, cipher, key, iv);
33109998Smarkm     [ ... etc ... ]
34109998Smarkm
35109998SmarkmThe latter is more generally used because it also allows ciphers/digests to be
36109998Smarkmlooked up based on other identifiers which can be useful for automatic cipher
37109998Smarkmselection, eg. in SSL/TLS, or by user-controllable configuration.
38109998Smarkm
39109998SmarkmThe important point about this is that EVP_CIPHER definitions and structures are
40109998Smarkmpassed around with impunity and there is no safe way, without requiring massive
41109998Smarkmrewrites of many applications, to assume that EVP_CIPHERs can be reference
42109998Smarkmcounted. One an EVP_CIPHER is exposed to the caller, neither it nor anything it
43109998Smarkmcomes from can "safely" be destroyed. Unless of course the way of getting to
44109998Smarkmsuch ciphers is via entirely distinct API calls that didn't exist before.
45109998SmarkmHowever existing API usage cannot be made to understand when an EVP_CIPHER
46109998Smarkmpointer, that has been passed to the caller, is no longer being used.
47109998Smarkm
48109998SmarkmThe other problem with the existing API w.r.t. to hooking EVP_CIPHER support
49109998Smarkminto ENGINE is storage - the OBJ_NAME-based storage used by EVP to register
50109998Smarkmciphers simultaneously registers cipher *types* and cipher *implementations* -
51109998Smarkmthey are effectively the same thing, an "EVP_CIPHER" pointer. The problem with
52109998Smarkmhooking in ENGINEs is that multiple ENGINEs may implement the same ciphers. The
53109998Smarkmsolution is necessarily that ENGINE-provided ciphers simply are not registered,
54109998Smarkmstored, or exposed to the caller in the same manner as existing ciphers. This is
55109998Smarkmespecially necessary considering the fact ENGINE uses reference counts to allow
56109998Smarkmfor cleanup, modularity, and DSO support - yet EVP_CIPHERs, as exposed to
57109998Smarkmcallers in the current API, support no such controls.
58109998Smarkm
59109998SmarkmAnother sticking point for integrating cipher support into ENGINE is linkage.
60109998SmarkmAlready there is a problem with the way ENGINE supports RSA, DSA, etc whereby
61109998Smarkmthey are available *because* they're part of a giant ENGINE called "openssl".
62109998SmarkmIe. all implementations *have* to come from an ENGINE, but we get round that by
63109998Smarkmhaving a giant ENGINE with all the software support encapsulated. This creates
64109998Smarkmlinker hassles if nothing else - linking a 1-line application that calls 2 basic
65109998SmarkmRSA functions (eg. "RSA_free(RSA_new());") will result in large quantities of
66109998SmarkmENGINE code being linked in *and* because of that DSA, DH, and RAND also. If we
67109998Smarkmcontinue with this approach for EVP_CIPHER support (even if it *was* possible)
68109998Smarkmwe would lose our ability to link selectively by selectively loading certain
69109998Smarkmimplementations of certain functionality. Touching any part of any kind of
70109998Smarkmcrypto would result in massive static linkage of everything else. So the
71109998Smarkmsolution is to change the way ENGINE feeds existing "classes", ie. how the
72109998Smarkmhooking to ENGINE works from RSA, DSA, DH, RAND, as well as adding new hooking
73109998Smarkmfor EVP_CIPHER, and EVP_MD.
74109998Smarkm
75109998SmarkmThe way this is now being done is by mostly reverting back to how things used to
76109998Smarkmwork prior to ENGINE :-). Ie. RSA now has a "RSA_METHOD" pointer again - this
77109998Smarkmwas previously replaced by an "ENGINE" pointer and all RSA code that required
78109998Smarkmthe RSA_METHOD would call ENGINE_get_RSA() each time on its ENGINE handle to
79109998Smarkmtemporarily get and use the ENGINE's RSA implementation. Apart from being more
80109998Smarkmefficient, switching back to each RSA having an RSA_METHOD pointer also allows
81109998Smarkmus to conceivably operate with *no* ENGINE. As we'll see, this removes any need
82109998Smarkmfor a fallback ENGINE that encapsulates default implementations - we can simply
83109998Smarkmhave our RSA structure pointing its RSA_METHOD pointer to the software
84109998Smarkmimplementation and have its ENGINE pointer set to NULL.
85109998Smarkm
86109998SmarkmA look at the EVP_CIPHER hooking is most explanatory, the RSA, DSA (etc) cases
87109998Smarkmturn out to be degenerate forms of the same thing. The EVP storage of ciphers,
88109998Smarkmand the existing EVP API functions that return "software" implementations and
89109998Smarkmdescriptions remain untouched. However, the storage takes more meaning in terms
90109998Smarkmof "cipher description" and less meaning in terms of "implementation". When an
91109998SmarkmEVP_CIPHER_CTX is actually initialised with an EVP_CIPHER method and is about to
92109998Smarkmbegin en/decryption, the hooking to ENGINE comes into play. What happens is that
93109998Smarkmcipher-specific ENGINE code is asked for an ENGINE pointer (a functional
94109998Smarkmreference) for any ENGINE that is registered to perform the algo/mode that the
95109998Smarkmprovided EVP_CIPHER structure represents. Under normal circumstances, that
96109998SmarkmENGINE code will return NULL because no ENGINEs will have had any cipher
97109998Smarkmimplementations *registered*. As such, a NULL ENGINE pointer is stored in the
98109998SmarkmEVP_CIPHER_CTX context, and the EVP_CIPHER structure is left hooked into the
99109998Smarkmcontext and so is used as the implementation. Pretty much how things work now
100109998Smarkmexcept we'd have a redundant ENGINE pointer set to NULL and doing nothing.
101109998Smarkm
102109998SmarkmConversely, if an ENGINE *has* been registered to perform the algorithm/mode
103109998Smarkmcombination represented by the provided EVP_CIPHER, then a functional reference
104109998Smarkmto that ENGINE will be returned to the EVP_CIPHER_CTX during initialisation.
105109998SmarkmThat functional reference will be stored in the context (and released on
106109998Smarkmcleanup) - and having that reference provides a *safe* way to use an EVP_CIPHER
107109998Smarkmdefinition that is private to the ENGINE. Ie. the EVP_CIPHER provided by the
108109998Smarkmapplication will actually be replaced by an EVP_CIPHER from the registered
109109998SmarkmENGINE - it will support the same algorithm/mode as the original but will be a
110109998Smarkmcompletely different implementation. Because this EVP_CIPHER isn't stored in the
111109998SmarkmEVP storage, nor is it returned to applications from traditional API functions,
112109998Smarkmthere is no associated problem with it not having reference counts. And of
113109998Smarkmcourse, when one of these "private" cipher implementations is hooked into
114109998SmarkmEVP_CIPHER_CTX, it is done whilst the EVP_CIPHER_CTX holds a functional
115109998Smarkmreference to the ENGINE that owns it, thus the use of the ENGINE's EVP_CIPHER is
116109998Smarkmsafe.
117109998Smarkm
118109998SmarkmThe "cipher-specific ENGINE code" I mentioned is implemented in tb_cipher.c but
119109998Smarkmin essence it is simply an instantiation of "ENGINE_TABLE" code for use by
120109998SmarkmEVP_CIPHER code. tb_digest.c is virtually identical but, of course, it is for
121109998Smarkmuse by EVP_MD code. Ditto for tb_rsa.c, tb_dsa.c, etc. These instantiations of
122109998SmarkmENGINE_TABLE essentially provide linker-separation of the classes so that even
123109998Smarkmif ENGINEs implement *all* possible algorithms, an application using only
124109998SmarkmEVP_CIPHER code will link at most code relating to EVP_CIPHER, tb_cipher.c, core
125109998SmarkmENGINE code that is independant of class, and of course the ENGINE
126109998Smarkmimplementation that the application loaded. It will *not* however link any
127109998Smarkmclass-specific ENGINE code for digests, RSA, etc nor will it bleed over into
128109998Smarkmother APIs, such as the RSA/DSA/etc library code.
129109998Smarkm
130109998SmarkmENGINE_TABLE is a little more complicated than may seem necessary but this is
131109998Smarkmmostly to avoid a lot of "init()"-thrashing on ENGINEs (that may have to load
132109998SmarkmDSOs, and other expensive setup that shouldn't be thrashed unnecessarily) *and*
133109998Smarkmto duplicate "default" behaviour. Basically an ENGINE_TABLE instantiation, for
134109998Smarkmexample tb_cipher.c, implements a hash-table keyed by integer "nid" values.
135109998SmarkmThese nids provide the uniquenness of an algorithm/mode - and each nid will hash
136109998Smarkmto a potentially NULL "ENGINE_PILE". An ENGINE_PILE is essentially a list of
137109998Smarkmpointers to ENGINEs that implement that particular 'nid'. Each "pile" uses some
138109998Smarkmcaching tricks such that requests on that 'nid' will be cached and all future
139109998Smarkmrequests will return immediately (well, at least with minimal operation) unless
140109998Smarkma change is made to the pile, eg. perhaps an ENGINE was unloaded. The reason is
141109998Smarkmthat an application could have support for 10 ENGINEs statically linked
142109998Smarkmin, and the machine in question may not have any of the hardware those 10
143109998SmarkmENGINEs support. If each of those ENGINEs has a "des_cbc" implementation, we
144109998Smarkmwant to avoid every EVP_CIPHER_CTX setup from trying (and failing) to initialise
145109998Smarkmeach of those 10 ENGINEs. Instead, the first such request will try to do that
146109998Smarkmand will either return (and cache) a NULL ENGINE pointer or will return a
147109998Smarkmfunctional reference to the first that successfully initialised. In the latter
148109998Smarkmcase it will also cache an extra functional reference to the ENGINE as a
149109998Smarkm"default" for that 'nid'. The caching is acknowledged by a 'uptodate' variable
150109998Smarkmthat is unset only if un/registration takes place on that pile. Ie. if
151109998Smarkmimplementations of "des_cbc" are added or removed. This behaviour can be
152109998Smarkmtweaked; the ENGINE_TABLE_FLAG_NOINIT value can be passed to
153109998SmarkmENGINE_set_table_flags(), in which case the only ENGINEs that tb_cipher.c will
154109998Smarkmtry to initialise from the "pile" will be those that are already initialised
155109998Smarkm(ie. it's simply an increment of the functional reference count, and no real
156109998Smarkm"initialisation" will take place).
157109998Smarkm
158109998SmarkmRSA, DSA, DH, and RAND all have their own ENGINE_TABLE code as well, and the
159109998Smarkmdifference is that they all use an implicit 'nid' of 1. Whereas EVP_CIPHERs are
160109998Smarkmactually qualitatively different depending on 'nid' (the "des_cbc" EVP_CIPHER is
161109998Smarkmnot an interoperable implementation of "aes_256_cbc"), RSA_METHODs are
162109998Smarkmnecessarily interoperable and don't have different flavours, only different
163109998Smarkmimplementations. In other words, the ENGINE_TABLE for RSA will either be empty,
164109998Smarkmor will have a single ENGING_PILE hashed to by the 'nid' 1 and that pile
165109998Smarkmrepresents ENGINEs that implement the single "type" of RSA there is.
166109998Smarkm
167109998SmarkmCleanup - the registration and unregistration may pose questions about how
168109998Smarkmcleanup works with the ENGINE_PILE doing all this caching nonsense (ie. when the
169109998Smarkmapplication or EVP_CIPHER code releases its last reference to an ENGINE, the
170109998SmarkmENGINE_PILE code may still have references and thus those ENGINEs will stay
171109998Smarkmhooked in forever). The way this is handled is via "unregistration". With these
172109998Smarkmnew ENGINE changes, an abstract ENGINE can be loaded and initialised, but that
173109998Smarkmis an algorithm-agnostic process. Even if initialised, it will not have
174109998Smarkmregistered any of its implementations (to do so would link all class "table"
175109998Smarkmcode despite the fact the application may use only ciphers, for example). This
176109998Smarkmis deliberately a distinct step. Moreover, registration and unregistration has
177109998Smarkmnothing to do with whether an ENGINE is *functional* or not (ie. you can even
178109998Smarkmregister an ENGINE and its implementations without it being operational, you may
179109998Smarkmnot even have the drivers to make it operate). What actually happens with
180109998Smarkmrespect to cleanup is managed inside eng_lib.c with the "engine_cleanup_***"
181109998Smarkmfunctions. These functions are internal-only and each part of ENGINE code that
182109998Smarkmcould require cleanup will, upon performing its first allocation, register a
183109998Smarkmcallback with the "engine_cleanup" code. The other part of this that makes it
184109998Smarkmtick is that the ENGINE_TABLE instantiations (tb_***.c) use NULL as their
185109998Smarkminitialised state. So if RSA code asks for an ENGINE and no ENGINE has
186109998Smarkmregistered an implementation, the code will simply return NULL and the tb_rsa.c
187109998Smarkmstate will be unchanged. Thus, no cleanup is required unless registration takes
188109998Smarkmplace. ENGINE_cleanup() will simply iterate across a list of registered cleanup
189109998Smarkmcallbacks calling each in turn, and will then internally delete its own storage
190109998Smarkm(a STACK). When a cleanup callback is next registered (eg. if the cleanup() is
191109998Smarkmpart of a gracefull restart and the application wants to cleanup all state then
192109998Smarkmstart again), the internal STACK storage will be freshly allocated. This is much
193109998Smarkmthe same as the situation in the ENGINE_TABLE instantiations ... NULL is the
194109998Smarkminitialised state, so only modification operations (not queries) will cause that
195109998Smarkmcode to have to register a cleanup.
196109998Smarkm
197109998SmarkmWhat else? The bignum callbacks and associated ENGINE functions have been
198109998Smarkmremoved for two obvious reasons; (i) there was no way to generalise them to the
199109998Smarkmmechanism now used by RSA/DSA/..., because there's no such thing as a BIGNUM
200109998Smarkmmethod, and (ii) because of (i), there was no meaningful way for library or
201109998Smarkmapplication code to automatically hook and use ENGINE supplied bignum functions
202109998Smarkmanyway. Also, ENGINE_cpy() has been removed (although an internal-only version
203109998Smarkmexists) - the idea of providing an ENGINE_cpy() function probably wasn't a good
204109998Smarkmone and now certainly doesn't make sense in any generalised way. Some of the
205109998SmarkmRSA, DSA, DH, and RAND functions that were fiddled during the original ENGINE
206109998Smarkmchanges have now, as a consequence, been reverted back. This is because the
207109998Smarkmhooking of ENGINE is now automatic (and passive, it can interally use a NULL
208109998SmarkmENGINE pointer to simply ignore ENGINE from then on).
209109998Smarkm
210109998SmarkmHell, that should be enough for now ... comments welcome: geoff@openssl.org
211109998Smarkm
212