1<HTML> 2<HEAD><TITLE>APR Canonical Filenames</TITLE></HEAD> 3<BODY> 4<h1>APR Canonical Filename</h1> 5 6<h2>Requirements</h2> 7 8<p>APR porters need to address the underlying discrepancies between 9file systems. To achieve a reasonable degree of security, the 10program depending upon APR needs to know that two paths may be 11compared, and that a mismatch is guarenteed to reflect that the 12two paths do not return the same resource</p>. 13 14<p>The first discrepancy is in volume roots. Unix and pure deriviates 15have only one root path, "/". Win32 and OS2 share root paths of 16the form "D:/", D: is the volume designation. However, this can 17be specified as "//./D:/" as well, indicating D: volume of the 18'this' machine. Win32 and OS2 also may employ a UNC root path, 19of the form "//server/share/" where share is a share-point of the 20specified network server. Finally, NetWare root paths are of the 21form "server/volume:/", or the simpler "volume:/" syntax for 'this' 22machine. All these non-Unix file systems accept volume:path, 23without a slash following the colon, as a path relative to the 24current working directory, which APR will treat as ambigious, that 25is, neither an absolute nor a relative path per se.</p> 26 27<p>The second discrepancy is in the meaning of the 'this' directory. 28In general, 'this' must be eliminated from the path where it occurs. 29The syntax "path/./" and "path/" are both aliases to path. However, 30this isn't file system independent, since the double slash "//" has 31a special meaning on OS2 and Win32 at the start of the path name, 32and is invalid on those platforms before the "//server/share/" UNC 33root path is completed. Finally, as noted above, "//volume/" is 34legal root syntax on WinNT, and perhaps others.</p> 35 36<p>The third discrepancy is in the context of the 'parent' directory. 37When "parent/path/.." occurs, the path must be unwound to "parent". 38It's also critical to simply truncate leading "/../" paths to "/", 39since the parent of the root is root. This gets tricky on the 40Win32 and OS2 platforms, since the ".." element is invalid before 41the "//server/share/" is complete, and the "//server/share/../" 42seqence is the complete UNC root "//server/share/". In relative 43paths, leading ".." elements are significant, until they are merged 44with an absolute path. The relative form must only retain the ".." 45segments as leading segments, to be resolved once merged to another 46relative or an absolute path.</p> 47 48<p>The fourth discrepancy occurs with acceptance of alternate character 49codes for the same element. Path seperators are not retained within 50the APR canonical forms. The OS filesystem and APR (slashed) forms 51can both be returned as strings, to be used in the proper context. 52Unix, Win32 and Netware all accept slashes and backslashes as the 53same path seperator symbol, although unix strictly accepts slashes. 54While the APR form of the name strictly uses slashes, always consider 55that there could be a platform that actually accepts slashes as a 56character within a segment name.</p> 57 58<p>The fifth and worst discrepancy plauges Win32, OS2, Netware, and some 59filesystems mounted in Unix. Case insensitivity can permit the same 60file to slip through in both it's proper case and alternate cases. 61Simply changing the case is insufficient for any character set beyond 62ASCII, since various dilectic forms of characters suffer from one to 63many or many to one translations. An example would be u-umlaut, which 64might be accepted as a single character u-umlaut, a two character 65sequence u and the zero-width umlaut, the upper case form of the same, 66or perhaps even a captial U alone. This can be handled in different 67ways depending on the purposes of the APR based program, but the one 68requirement is that the path must be absolute in order to resolve these 69ambiguities. Methods employed include comparison of device and inode 70file uniqifiers, which is a fairly fast operation, or quering the OS 71for the true form of the name, which can be much slower. Only the 72acknowledgement of the file names by the OS can validate the equality 73of two different cases of the same filename.</p> 74 75<p>The sixth discrepancy, illegal or insignificant characters, is especially 76significant in non-unix file systems. Trailing periods are accepted 77but never stored, therefore trailing periods must be ignored for any 78form of comparison. And all OS's have certain expectations of what 79characters are illegal (or undesireable due to confusion.)</p> 80 81<p>A final warning, canonical functions don't transform or resolve case 82or character ambiguity issues until they are resolved into an absolute 83path. The relative canonical path, while useful, while useful for URL 84or similar identifiers, cannot be used for testing or comparison of file 85system objects.</p> 86 87<hr> 88 89<h2>Canonical API</h2> 90 91Functions to manipulate the apr_canon_file_t (an opaque type) include: 92 93<ul> 94<li>Create canon_file_t (from char* path and canon_file_t parent path) 95<li>Merged canon_file_t (from path and parent, both canon_file_t) 96<li>Get char* path of all or some segments 97<li>Get path flags of IsRelative, IsVirtualRoot, and IsAbsolute 98<li>Compare two canon_file_t structures for file equality 99</ul> 100 101<p>The path is corrected to the file system case only if is in absolute 102form. The apr_canon_file_t should be preserved as long as possible and 103used as the parent to create child entries to reduce the number of expensive 104stat and case canonicalization calls to the OS.</p> 105 106<p>The comparison operation provides that the APR can postpone correction 107of case by simply relying upon the device and inode for equivalence. The 108stat implementation provides that two files are the same, while their 109strings are not equivalent, and eliminates the need for the operating 110system to return the proper form of the name.</p> 111 112<p>In any case, returning the char* path, with a flag to request the proper 113case, forces the OS calls to resolve the true names of each segment. Where 114there is a penality for this operation and the stat device and inode test 115is faster, case correction is postponed until the char* result is requested. 116On platforms that identify the inode, device, or proper name interchangably 117with no penalities, this may occur when the name is initially processed.</p> 118 119<hr> 120 121<h2>Unix Example</h2> 122 123<p>First the simplest case:</p> 124 125<pre> 126Parse Canonical Name 127accepts parent path as canonical_t 128 this path as string 129 130Split this path Segments on '/' 131 132For each of this path Segments 133 If first Segment 134 If this Segment is Empty ([nothing]/) 135 Append this Root Segment (don't merge) 136 Continue to next Segment 137 Else is relative 138 Append parent Segments (to merge) 139 Continue with this Segment 140 If Segment is '.' or empty (2 slashes) 141 Discard this Segment 142 Continue with next Segment 143 If Segment is '..' 144 If no previous Segment or previous Segment is '..' 145 Append this Segment 146 Continue with next Segment 147 If previous Segment and previous is not Root Segment 148 Discard previous Segment 149 Discard this Segment 150 Continue with next Segment 151 Append this Relative Segment 152 Continue with next Segment 153</pre> 154 155</BODY> 156</HTML> 157