1109998Smarkm
2109998SmarkmOpenSSL ASN1 Revision
3109998Smarkm=====================
4109998Smarkm
5109998SmarkmThis document describes some of the issues relating to the new ASN1 code.
6109998Smarkm
7109998SmarkmPrevious OpenSSL ASN1 problems
8109998Smarkm=============================
9109998Smarkm
10109998SmarkmOK why did the OpenSSL ASN1 code need revising in the first place? Well
11109998Smarkmthere are lots of reasons some of which are included below...
12109998Smarkm
13109998Smarkm1. The code is difficult to read and write. For every single ASN1 structure
14109998Smarkm(e.g. SEQUENCE) four functions need to be written for new, free, encode and
15109998Smarkmdecode operations. This is a very painful and error prone operation. Very few
16109998Smarkmpeople have ever written any OpenSSL ASN1 and those that have usually wish
17109998Smarkmthey hadn't.
18109998Smarkm
19109998Smarkm2. Partly because of 1. the code is bloated and takes up a disproportionate
20109998Smarkmamount of space. The SEQUENCE encoder is particularly bad: it essentially
21109998Smarkmcontains two copies of the same operation, one to compute the SEQUENCE length
22109998Smarkmand the other to encode it.
23109998Smarkm
24109998Smarkm3. The code is memory based: that is it expects to be able to read the whole
25109998Smarkmstructure from memory. This is fine for small structures but if you have a
26109998Smarkm(say) 1Gb PKCS#7 signedData structure it isn't such a good idea...
27109998Smarkm
28109998Smarkm4. The code for the ASN1 IMPLICIT tag is evil. It is handled by temporarily
29109998Smarkmchanging the tag to the expected one, attempting to read it, then changing it
30109998Smarkmback again. This means that decode buffers have to be writable even though they
31109998Smarkmare ultimately unchanged. This gets in the way of constification.
32109998Smarkm
33109998Smarkm5. The handling of EXPLICIT isn't much better. It adds a chunk of code into 
34109998Smarkmthe decoder and encoder for every EXPLICIT tag.
35109998Smarkm
36109998Smarkm6. APPLICATION and PRIVATE tags aren't even supported at all.
37109998Smarkm
38109998Smarkm7. Even IMPLICIT isn't complete: there is no support for implicitly tagged
39109998Smarkmtypes that are not OPTIONAL.
40109998Smarkm
41109998Smarkm8. Much of the code assumes that a tag will fit in a single octet. This is
42109998Smarkmonly true if the tag is 30 or less (mercifully tags over 30 are rare).
43109998Smarkm
44109998Smarkm9. The ASN1 CHOICE type has to be largely handled manually, there aren't any
45109998Smarkmmacros that properly support it.
46109998Smarkm
47109998Smarkm10. Encoders have no concept of OPTIONAL and have no error checking. If the
48109998Smarkmpassed structure contains a NULL in a mandatory field it will not be encoded,
49109998Smarkmresulting in an invalid structure.
50109998Smarkm
51109998Smarkm11. It is tricky to add ASN1 encoders and decoders to external applications.
52109998Smarkm
53109998SmarkmTemplate model
54109998Smarkm==============
55109998Smarkm
56109998SmarkmOne of the major problems with revision is the sheer volume of the ASN1 code.
57109998SmarkmAttempts to change (for example) the IMPLICIT behaviour would result in a
58109998Smarkmmodification of *every* single decode function. 
59109998Smarkm
60109998SmarkmI decided to adopt a template based approach. I'm using the term 'template'
61109998Smarkmin a manner similar to SNACC templates: it has nothing to do with C++
62109998Smarkmtemplates.
63109998Smarkm
64109998SmarkmA template is a description of an ASN1 module as several constant C structures.
65109998SmarkmIt describes in a machine readable way exactly how the ASN1 structure should
66109998Smarkmbehave. If this template contains enough detail then it is possible to write
67109998Smarkmversions of new, free, encode, decode (and possibly others operations) that
68109998Smarkmoperate on templates.
69109998Smarkm
70109998SmarkmInstead of having to write code to handle each operation only a single
71109998Smarkmtemplate needs to be written. If new operations are needed (such as a 'print'
72109998Smarkmoperation) only a single new template based function needs to be written 
73109998Smarkmwhich will then automatically handle all existing templates.
74109998Smarkm
75109998SmarkmPlans for revision
76109998Smarkm==================
77109998Smarkm
78109998SmarkmThe revision will consist of the following steps. Other than the first two
79109998Smarkmthese can be handled in any order.
80109998Smarkm 
81109998Smarkmo Design and write template new, free, encode and decode operations, initially
82109998Smarkmmemory based. *DONE*
83109998Smarkm
84109998Smarkmo Convert existing ASN1 code to template form. *IN PROGRESS*
85109998Smarkm
86109998Smarkmo Convert an existing ASN1 compiler (probably SNACC) to output templates
87109998Smarkmin OpenSSL form.
88109998Smarkm
89109998Smarkmo Add support for BIO based ASN1 encoders and decoders to handle large
90109998Smarkmstructures, initially blocking I/O.
91109998Smarkm
92109998Smarkmo Add support for non blocking I/O: this is quite a bit harder than blocking
93109998SmarkmI/O.
94109998Smarkm
95109998Smarkmo Add new ASN1 structures, such as OCSP, CRMF, S/MIME v3 (CMS), attribute
96109998Smarkmcertificates etc etc.
97109998Smarkm
98109998SmarkmDescription of major changes
99109998Smarkm============================
100109998Smarkm
101109998SmarkmThe BOOLEAN type now takes three values. 0xff is TRUE, 0 is FALSE and -1 is
102109998Smarkmabsent. The meaning of absent depends on the context. If for example the
103109998Smarkmboolean type is DEFAULT FALSE (as in the case of the critical flag for
104109998Smarkmcertificate extensions) then -1 is FALSE, if DEFAULT TRUE then -1 is TRUE.
105109998SmarkmUsually the value will only ever be read via an API which will hide this from
106109998Smarkman application.
107109998Smarkm
108109998SmarkmThere is an evil bug in the old ASN1 code that mishandles OPTIONAL with
109109998SmarkmSEQUENCE OF or SET OF. These are both implemented as a STACK structure. The
110109998Smarkmold code would omit the structure if the STACK was NULL (which is fine) or if
111109998Smarkmit had zero elements (which is NOT OK). This causes problems because an empty
112109998SmarkmSEQUENCE OF or SET OF will result in an empty STACK when it is decoded but when
113109998Smarkmit is encoded it will be omitted resulting in different encodings. The new code
114109998Smarkmonly omits the encoding if the STACK is NULL, if it contains zero elements it
115109998Smarkmis encoded and empty. There is an additional problem though: because an empty
116109998SmarkmSTACK was omitted, sometimes the corresponding *_new() function would
117109998Smarkminitialize the STACK to empty so an application could immediately use it, if
118109998Smarkmthis is done with the new code (i.e. a NULL) it wont work. Therefore a new
119109998SmarkmSTACK should be allocated first. One instance of this is the X509_CRL list of
120109998Smarkmrevoked certificates: a helper function X509_CRL_add0_revoked() has been added
121109998Smarkmfor this purpose.
122109998Smarkm
123109998SmarkmThe X509_ATTRIBUTE structure used to have an element called 'set' which took
124109998Smarkmthe value 1 if the attribute value was a SET OF or 0 if it was a single. Due
125109998Smarkmto the behaviour of CHOICE in the new code this has been changed to a field
126109998Smarkmcalled 'single' which is 0 for a SET OF and 1 for single. The old field has
127109998Smarkmbeen deleted to deliberately break source compatibility. Since this structure
128109998Smarkmis normally accessed via higher level functions this shouldn't break too much.
129109998Smarkm
130109998SmarkmThe X509_REQ_INFO certificate request info structure no longer has a field
131109998Smarkmcalled 'req_kludge'. This used to be set to 1 if the attributes field was
132109998Smarkm(incorrectly) omitted. You can check to see if the field is omitted now by
133109998Smarkmchecking if the attributes field is NULL. Similarly if you need to omit
134109998Smarkmthe field then free attributes and set it to NULL.
135109998Smarkm
136109998SmarkmThe top level 'detached' field in the PKCS7 structure is no longer set when
137109998Smarkma PKCS#7 structure is read in. PKCS7_is_detached() should be called instead.
138109998SmarkmThe behaviour of PKCS7_get_detached() is unaffected.
139109998Smarkm
140109998SmarkmThe values of 'type' in the GENERAL_NAME structure have changed. This is
141109998Smarkmbecause the old code use the ASN1 initial octet as the selector. The new
142109998Smarkmcode uses the index in the ASN1_CHOICE template.
143109998Smarkm
144109998SmarkmThe DIST_POINT_NAME structure has changed to be a true CHOICE type.
145109998Smarkm
146109998Smarkmtypedef struct DIST_POINT_NAME_st {
147109998Smarkmint type;
148109998Smarkmunion {
149109998Smarkm	STACK_OF(GENERAL_NAME) *fullname;
150109998Smarkm	STACK_OF(X509_NAME_ENTRY) *relativename;
151109998Smarkm} name;
152109998Smarkm} DIST_POINT_NAME;
153109998Smarkm
154109998SmarkmThis means that name.fullname or name.relativename should be set
155109998Smarkmand type reflects the option. That is if name.fullname is set then
156109998Smarkmtype is 0 and if name.relativename is set type is 1.
157109998Smarkm
158109998SmarkmWith the old code using the i2d functions would typically involve:
159109998Smarkm
160109998Smarkmunsigned char *buf, *p;
161109998Smarkmint len;
162109998Smarkm/* Find length of encoding */
163109998Smarkmlen = i2d_SOMETHING(x, NULL);
164109998Smarkm/* Allocate buffer */
165109998Smarkmbuf = OPENSSL_malloc(len);
166109998Smarkmif(buf == NULL) {
167109998Smarkm	/* Malloc error */
168109998Smarkm}
169109998Smarkm/* Use temp variable because &p gets updated to point to end of
170109998Smarkm * encoding.
171109998Smarkm */
172109998Smarkmp = buf;
173109998Smarkmi2d_SOMETHING(x, &p);
174109998Smarkm
175109998Smarkm
176109998SmarkmUsing the new i2d you can also do:
177109998Smarkm
178109998Smarkmunsigned char *buf = NULL;
179109998Smarkmint len;
180109998Smarkmlen = i2d_SOMETHING(x, &buf);
181109998Smarkmif(len < 0) {
182109998Smarkm	/* Malloc error */
183109998Smarkm}
184109998Smarkm
185109998Smarkmand it will automatically allocate and populate a buffer with the
186109998Smarkmencoding. After this call 'buf' will point to the start of the
187109998Smarkmencoding which is len bytes long.
188