1251881Speter<HTML> 2251881Speter<HEAD><TITLE>APR Canonical Filenames</TITLE></HEAD> 3251881Speter<BODY> 4251881Speter<h1>APR Canonical Filename</h1> 5251881Speter 6251881Speter<h2>Requirements</h2> 7251881Speter 8251881Speter<p>APR porters need to address the underlying discrepancies between 9251881Speterfile systems. To achieve a reasonable degree of security, the 10251881Speterprogram depending upon APR needs to know that two paths may be 11251881Spetercompared, and that a mismatch is guarenteed to reflect that the 12251881Spetertwo paths do not return the same resource</p>. 13251881Speter 14251881Speter<p>The first discrepancy is in volume roots. Unix and pure deriviates 15251881Speterhave only one root path, "/". Win32 and OS2 share root paths of 16251881Speterthe form "D:/", D: is the volume designation. However, this can 17251881Speterbe specified as "//./D:/" as well, indicating D: volume of the 18251881Speter'this' machine. Win32 and OS2 also may employ a UNC root path, 19251881Speterof the form "//server/share/" where share is a share-point of the 20251881Speterspecified network server. Finally, NetWare root paths are of the 21251881Speterform "server/volume:/", or the simpler "volume:/" syntax for 'this' 22251881Spetermachine. All these non-Unix file systems accept volume:path, 23251881Speterwithout a slash following the colon, as a path relative to the 24251881Spetercurrent working directory, which APR will treat as ambigious, that 25251881Speteris, neither an absolute nor a relative path per se.</p> 26251881Speter 27251881Speter<p>The second discrepancy is in the meaning of the 'this' directory. 28251881SpeterIn general, 'this' must be eliminated from the path where it occurs. 29251881SpeterThe syntax "path/./" and "path/" are both aliases to path. However, 30251881Speterthis isn't file system independent, since the double slash "//" has 31251881Spetera special meaning on OS2 and Win32 at the start of the path name, 32251881Speterand is invalid on those platforms before the "//server/share/" UNC 33251881Speterroot path is completed. Finally, as noted above, "//./volume/" is 34251881Speterlegal root syntax on WinNT, and perhaps others.</p> 35251881Speter 36251881Speter<p>The third discrepancy is in the context of the 'parent' directory. 37251881SpeterWhen "parent/path/.." occurs, the path must be unwound to "parent". 38251881SpeterIt's also critical to simply truncate leading "/../" paths to "/", 39251881Spetersince the parent of the root is root. This gets tricky on the 40251881SpeterWin32 and OS2 platforms, since the ".." element is invalid before 41251881Speterthe "//server/share/" is complete, and the "//server/share/../" 42251881Speterseqence is the complete UNC root "//server/share/". In relative 43251881Speterpaths, leading ".." elements are significant, until they are merged 44251881Speterwith an absolute path. The relative form must only retain the ".." 45251881Spetersegments as leading segments, to be resolved once merged to another 46251881Speterrelative or an absolute path.</p> 47251881Speter 48251881Speter<p>The fourth discrepancy occurs with acceptance of alternate character 49251881Spetercodes for the same element. Path seperators are not retained within 50251881Speterthe APR canonical forms. The OS filesystem and APR (slashed) forms 51251881Spetercan both be returned as strings, to be used in the proper context. 52251881SpeterUnix, Win32 and Netware all accept slashes and backslashes as the 53251881Spetersame path seperator symbol, although unix strictly accepts slashes. 54251881SpeterWhile the APR form of the name strictly uses slashes, always consider 55251881Speterthat there could be a platform that actually accepts slashes as a 56251881Spetercharacter within a segment name.</p> 57251881Speter 58251881Speter<p>The fifth and worst discrepancy plauges Win32, OS2, Netware, and some 59251881Speterfilesystems mounted in Unix. Case insensitivity can permit the same 60251881Speterfile to slip through in both it's proper case and alternate cases. 61251881SpeterSimply changing the case is insufficient for any character set beyond 62251881SpeterASCII, since various dilectic forms of characters suffer from one to 63251881Spetermany or many to one translations. An example would be u-umlaut, which 64might be accepted as a single character u-umlaut, a two character 65sequence u and the zero-width umlaut, the upper case form of the same, 66or perhaps even a captial U alone. This can be handled in different 67ways depending on the purposes of the APR based program, but the one 68requirement is that the path must be absolute in order to resolve these 69ambiguities. Methods employed include comparison of device and inode 70file uniqifiers, which is a fairly fast operation, or quering the OS 71for the true form of the name, which can be much slower. Only the 72acknowledgement of the file names by the OS can validate the equality 73of two different cases of the same filename.</p> 74 75<p>The sixth discrepancy, illegal or insignificant characters, is especially 76significant in non-unix file systems. Trailing periods are accepted 77but never stored, therefore trailing periods must be ignored for any 78form of comparison. And all OS's have certain expectations of what 79characters are illegal (or undesireable due to confusion.)</p> 80 81<p>A final warning, canonical functions don't transform or resolve case 82or character ambiguity issues until they are resolved into an absolute 83path. The relative canonical path, while useful, while useful for URL 84or similar identifiers, cannot be used for testing or comparison of file 85system objects.</p> 86 87<hr> 88 89<h2>Canonical API</h2> 90 91Functions to manipulate the apr_canon_file_t (an opaque type) include: 92 93<ul> 94<li>Create canon_file_t (from char* path and canon_file_t parent path) 95<li>Merged canon_file_t (from path and parent, both canon_file_t) 96<li>Get char* path of all or some segments 97<li>Get path flags of IsRelative, IsVirtualRoot, and IsAbsolute 98<li>Compare two canon_file_t structures for file equality 99</ul> 100 101<p>The path is corrected to the file system case only if is in absolute 102form. The apr_canon_file_t should be preserved as long as possible and 103used as the parent to create child entries to reduce the number of expensive 104stat and case canonicalization calls to the OS.</p> 105 106<p>The comparison operation provides that the APR can postpone correction 107of case by simply relying upon the device and inode for equivalence. The 108stat implementation provides that two files are the same, while their 109strings are not equivalent, and eliminates the need for the operating 110system to return the proper form of the name.</p> 111 112<p>In any case, returning the char* path, with a flag to request the proper 113case, forces the OS calls to resolve the true names of each segment. Where 114there is a penality for this operation and the stat device and inode test 115is faster, case correction is postponed until the char* result is requested. 116On platforms that identify the inode, device, or proper name interchangably 117with no penalities, this may occur when the name is initially processed.</p> 118 119<hr> 120 121<h2>Unix Example</h2> 122 123<p>First the simplest case:</p> 124 125<pre> 126Parse Canonical Name 127accepts parent path as canonical_t 128 this path as string 129 130Split this path Segments on '/' 131 132For each of this path Segments 133 If first Segment 134 If this Segment is Empty ([nothing]/) 135 Append this Root Segment (don't merge) 136 Continue to next Segment 137 Else is relative 138 Append parent Segments (to merge) 139 Continue with this Segment 140 If Segment is '.' or empty (2 slashes) 141 Discard this Segment 142 Continue with next Segment 143 If Segment is '..' 144 If no previous Segment or previous Segment is '..' 145 Append this Segment 146 Continue with next Segment 147 If previous Segment and previous is not Root Segment 148 Discard previous Segment 149 Discard this Segment 150 Continue with next Segment 151 Append this Relative Segment 152 Continue with next Segment 153</pre> 154 155</BODY> 156</HTML> 157