1251875Speter<HTML>
2251875Speter<HEAD><TITLE>APR Canonical Filenames</TITLE></HEAD>
3251875Speter<BODY>
4251875Speter<h1>APR Canonical Filename</h1>
5251875Speter
6251875Speter<h2>Requirements</h2>
7251875Speter
8251875Speter<p>APR porters need to address the underlying discrepancies between
9251875Speterfile systems.  To achieve a reasonable degree of security, the
10251875Speterprogram depending upon APR needs to know that two paths may be
11362181Sdimcompared, and that a mismatch is guaranteed to reflect that the
12251875Spetertwo paths do not return the same resource</p>.
13251875Speter
14362181Sdim<p>The first discrepancy is in volume roots.  Unix and pure derivatives
15251875Speterhave only one root path, "/".  Win32 and OS2 share root paths of
16251875Speterthe form "D:/", D: is the volume designation.  However, this can
17251875Speterbe specified as "//./D:/" as well, indicating D: volume of the 
18251875Speter'this' machine.  Win32 and OS2 also may employ a UNC root path,
19251875Speterof the form "//server/share/" where share is a share-point of the
20251875Speterspecified network server.  Finally, NetWare root paths are of the
21251875Speterform "server/volume:/", or the simpler "volume:/" syntax for 'this'
22251875Spetermachine.  All these non-Unix file systems accept volume:path,
23251875Speterwithout a slash following the colon, as a path relative to the
24362181Sdimcurrent working directory, which APR will treat as ambiguous, that
25251875Speteris, neither an absolute nor a relative path per se.</p>
26251875Speter
27251875Speter<p>The second discrepancy is in the meaning of the 'this' directory.
28251875SpeterIn general, 'this' must be eliminated from the path where it occurs.
29251875SpeterThe syntax "path/./" and "path/" are both aliases to path.  However,
30251875Speterthis isn't file system independent, since the double slash "//" has
31251875Spetera special meaning on OS2 and Win32 at the start of the path name,
32251875Speterand is invalid on those platforms before the "//server/share/" UNC
33251875Speterroot path is completed.  Finally, as noted above, "//./volume/" is
34251875Speterlegal root syntax on WinNT, and perhaps others.</p>
35251875Speter
36251875Speter<p>The third discrepancy is in the context of the 'parent' directory.
37251875SpeterWhen "parent/path/.." occurs, the path must be unwound to "parent".
38251875SpeterIt's also critical to simply truncate leading "/../" paths to "/",
39251875Spetersince the parent of the root is root.  This gets tricky on the
40251875SpeterWin32 and OS2 platforms, since the ".." element is invalid before
41251875Speterthe "//server/share/" is complete, and the "//server/share/../"
42362181Sdimsequence is the complete UNC root "//server/share/".  In relative
43251875Speterpaths, leading ".." elements are significant, until they are merged
44251875Speterwith an absolute path.  The relative form must only retain the ".."
45251875Spetersegments as leading segments, to be resolved once merged to another
46251875Speterrelative or an absolute path.</p>
47251875Speter
48251875Speter<p>The fourth discrepancy occurs with acceptance of alternate character
49362181Sdimcodes for the same element.  Path separators are not retained within
50251875Speterthe APR canonical forms.  The OS filesystem and APR (slashed) forms
51251875Spetercan both be returned as strings, to be used in the proper context.
52251875SpeterUnix, Win32 and Netware all accept slashes and backslashes as the
53362181Sdimsame path separator symbol, although unix strictly accepts slashes.
54251875SpeterWhile the APR form of the name strictly uses slashes, always consider
55251875Speterthat there could be a platform that actually accepts slashes as a
56251875Spetercharacter within a segment name.</p>
57251875Speter
58362181Sdim<p>The fifth and worst discrepancy plagues Win32, OS2, Netware, and some
59251875Speterfilesystems mounted in Unix.  Case insensitivity can permit the same
60251875Speterfile to slip through in both it's proper case and alternate cases.
61251875SpeterSimply changing the case is insufficient for any character set beyond
62362181SdimASCII, since various dialectic forms of characters suffer from one to
63251875Spetermany or many to one translations.  An example would be u-umlaut, which
64251875Spetermight be accepted as a single character u-umlaut, a two character
65251875Spetersequence u and the zero-width umlaut, the upper case form of the same,
66362181Sdimor perhaps even a capital U alone.  This can be handled in different
67251875Speterways depending on the purposes of the APR based program, but the one
68251875Speterrequirement is that the path must be absolute in order to resolve these
69251875Speterambiguities.  Methods employed include comparison of device and inode
70362181Sdimfile uniqifiers, which is a fairly fast operation, or querying the OS
71251875Speterfor the true form of the name, which can be much slower.  Only the
72251875Speteracknowledgement of the file names by the OS can validate the equality
73251875Speterof two different cases of the same filename.</p>
74251875Speter
75251875Speter<p>The sixth discrepancy, illegal or insignificant characters, is especially 
76251875Spetersignificant in non-unix file systems.  Trailing periods are accepted
77251875Speterbut never stored, therefore trailing periods must be ignored for any
78251875Speterform of comparison.  And all OS's have certain expectations of what
79362181Sdimcharacters are illegal (or undesirable due to confusion.)</p>
80251875Speter
81251875Speter<p>A final warning, canonical functions don't transform or resolve case
82251875Speteror character ambiguity issues until they are resolved into an absolute
83251875Speterpath.  The relative canonical path, while useful, while useful for URL
84251875Speteror similar identifiers, cannot be used for testing or comparison of file 
85251875Spetersystem objects.</p>
86251875Speter
87251875Speter<hr>
88251875Speter
89251875Speter<h2>Canonical API</h2>
90251875Speter
91251875SpeterFunctions to manipulate the apr_canon_file_t (an opaque type) include:
92251875Speter
93251875Speter<ul>
94251875Speter<li>Create canon_file_t (from char* path and canon_file_t parent path)
95251875Speter<li>Merged canon_file_t (from path and parent, both canon_file_t)
96251875Speter<li>Get char* path of all or some segments
97251875Speter<li>Get path flags of IsRelative, IsVirtualRoot, and IsAbsolute
98251875Speter<li>Compare two canon_file_t structures for file equality
99251875Speter</ul>
100251875Speter
101251875Speter<p>The path is corrected to the file system case only if is in absolute 
102251875Speterform.  The apr_canon_file_t should be preserved as long as possible and 
103251875Speterused as the parent to create child entries to reduce the number of expensive 
104251875Speterstat and case canonicalization calls to the OS.</p>
105251875Speter
106251875Speter<p>The comparison operation provides that the APR can postpone correction
107266735Speterof case by simply relying upon the device and inode for equivalence.  The
108251875Speterstat implementation provides that two files are the same, while their
109266735Speterstrings are not equivalent, and eliminates the need for the operating
110251875Spetersystem to return the proper form of the name.</p>
111251875Speter
112251875Speter<p>In any case, returning the char* path, with a flag to request the proper
113251875Spetercase, forces the OS calls to resolve the true names of each segment.  Where
114362181Sdimthere is a penalty for this operation and the stat device and inode test
115251875Speteris faster, case correction is postponed until the char* result is requested.
116251875SpeterOn platforms that identify the inode, device, or proper name interchangably
117362181Sdimwith no penalties, this may occur when the name is initially processed.</p>
118251875Speter
119251875Speter<hr>
120251875Speter
121251875Speter<h2>Unix Example</h2>
122251875Speter
123251875Speter<p>First the simplest case:</p>
124251875Speter
125251875Speter<pre>
126251875SpeterParse Canonical Name 
127251875Speteraccepts parent path as canonical_t
128251875Speter        this path as string
129251875Speter
130251875SpeterSplit this path Segments on '/'
131251875Speter
132251875SpeterFor each of this path Segments
133251875Speter  If first Segment
134251875Speter    If this Segment is Empty ([nothing]/)
135251875Speter      Append this Root Segment (don't merge)
136251875Speter      Continue to next Segment
137251875Speter    Else is relative
138251875Speter      Append parent Segments (to merge)
139251875Speter      Continue with this Segment
140251875Speter  If Segment is '.' or empty (2 slashes)
141251875Speter    Discard this Segment
142251875Speter    Continue with next Segment
143251875Speter  If Segment is '..'
144251875Speter    If no previous Segment or previous Segment is '..'
145251875Speter      Append this Segment
146251875Speter      Continue with next Segment
147251875Speter    If previous Segment and previous is not Root Segment
148251875Speter      Discard previous Segment
149251875Speter    Discard this Segment
150251875Speter    Continue with next Segment
151251875Speter  Append this Relative Segment
152251875Speter  Continue with next Segment        
153251875Speter</pre>
154251875Speter
155251875Speter</BODY>
156266735Speter</HTML>
157