1251875Speter<HTML> 2251875Speter<HEAD><TITLE>APR Canonical Filenames</TITLE></HEAD> 3251875Speter<BODY> 4251875Speter<h1>APR Canonical Filename</h1> 5251875Speter 6251875Speter<h2>Requirements</h2> 7251875Speter 8251875Speter<p>APR porters need to address the underlying discrepancies between 9251875Speterfile systems. To achieve a reasonable degree of security, the 10251875Speterprogram depending upon APR needs to know that two paths may be 11362181Sdimcompared, and that a mismatch is guaranteed to reflect that the 12251875Spetertwo paths do not return the same resource</p>. 13251875Speter 14362181Sdim<p>The first discrepancy is in volume roots. Unix and pure derivatives 15251875Speterhave only one root path, "/". Win32 and OS2 share root paths of 16251875Speterthe form "D:/", D: is the volume designation. However, this can 17251875Speterbe specified as "//./D:/" as well, indicating D: volume of the 18251875Speter'this' machine. Win32 and OS2 also may employ a UNC root path, 19251875Speterof the form "//server/share/" where share is a share-point of the 20251875Speterspecified network server. Finally, NetWare root paths are of the 21251875Speterform "server/volume:/", or the simpler "volume:/" syntax for 'this' 22251875Spetermachine. All these non-Unix file systems accept volume:path, 23251875Speterwithout a slash following the colon, as a path relative to the 24362181Sdimcurrent working directory, which APR will treat as ambiguous, that 25251875Speteris, neither an absolute nor a relative path per se.</p> 26251875Speter 27251875Speter<p>The second discrepancy is in the meaning of the 'this' directory. 28251875SpeterIn general, 'this' must be eliminated from the path where it occurs. 29251875SpeterThe syntax "path/./" and "path/" are both aliases to path. However, 30251875Speterthis isn't file system independent, since the double slash "//" has 31251875Spetera special meaning on OS2 and Win32 at the start of the path name, 32251875Speterand is invalid on those platforms before the "//server/share/" UNC 33251875Speterroot path is completed. Finally, as noted above, "//./volume/" is 34251875Speterlegal root syntax on WinNT, and perhaps others.</p> 35251875Speter 36251875Speter<p>The third discrepancy is in the context of the 'parent' directory. 37251875SpeterWhen "parent/path/.." occurs, the path must be unwound to "parent". 38251875SpeterIt's also critical to simply truncate leading "/../" paths to "/", 39251875Spetersince the parent of the root is root. This gets tricky on the 40251875SpeterWin32 and OS2 platforms, since the ".." element is invalid before 41251875Speterthe "//server/share/" is complete, and the "//server/share/../" 42362181Sdimsequence is the complete UNC root "//server/share/". In relative 43251875Speterpaths, leading ".." elements are significant, until they are merged 44251875Speterwith an absolute path. The relative form must only retain the ".." 45251875Spetersegments as leading segments, to be resolved once merged to another 46251875Speterrelative or an absolute path.</p> 47251875Speter 48251875Speter<p>The fourth discrepancy occurs with acceptance of alternate character 49362181Sdimcodes for the same element. Path separators are not retained within 50251875Speterthe APR canonical forms. The OS filesystem and APR (slashed) forms 51251875Spetercan both be returned as strings, to be used in the proper context. 52251875SpeterUnix, Win32 and Netware all accept slashes and backslashes as the 53362181Sdimsame path separator symbol, although unix strictly accepts slashes. 54251875SpeterWhile the APR form of the name strictly uses slashes, always consider 55251875Speterthat there could be a platform that actually accepts slashes as a 56251875Spetercharacter within a segment name.</p> 57251875Speter 58362181Sdim<p>The fifth and worst discrepancy plagues Win32, OS2, Netware, and some 59251875Speterfilesystems mounted in Unix. Case insensitivity can permit the same 60251875Speterfile to slip through in both it's proper case and alternate cases. 61251875SpeterSimply changing the case is insufficient for any character set beyond 62362181SdimASCII, since various dialectic forms of characters suffer from one to 63251875Spetermany or many to one translations. An example would be u-umlaut, which 64251875Spetermight be accepted as a single character u-umlaut, a two character 65251875Spetersequence u and the zero-width umlaut, the upper case form of the same, 66362181Sdimor perhaps even a capital U alone. This can be handled in different 67251875Speterways depending on the purposes of the APR based program, but the one 68251875Speterrequirement is that the path must be absolute in order to resolve these 69251875Speterambiguities. Methods employed include comparison of device and inode 70362181Sdimfile uniqifiers, which is a fairly fast operation, or querying the OS 71251875Speterfor the true form of the name, which can be much slower. Only the 72251875Speteracknowledgement of the file names by the OS can validate the equality 73251875Speterof two different cases of the same filename.</p> 74251875Speter 75251875Speter<p>The sixth discrepancy, illegal or insignificant characters, is especially 76251875Spetersignificant in non-unix file systems. Trailing periods are accepted 77251875Speterbut never stored, therefore trailing periods must be ignored for any 78251875Speterform of comparison. And all OS's have certain expectations of what 79362181Sdimcharacters are illegal (or undesirable due to confusion.)</p> 80251875Speter 81251875Speter<p>A final warning, canonical functions don't transform or resolve case 82251875Speteror character ambiguity issues until they are resolved into an absolute 83251875Speterpath. The relative canonical path, while useful, while useful for URL 84251875Speteror similar identifiers, cannot be used for testing or comparison of file 85251875Spetersystem objects.</p> 86251875Speter 87251875Speter<hr> 88251875Speter 89251875Speter<h2>Canonical API</h2> 90251875Speter 91251875SpeterFunctions to manipulate the apr_canon_file_t (an opaque type) include: 92251875Speter 93251875Speter<ul> 94251875Speter<li>Create canon_file_t (from char* path and canon_file_t parent path) 95251875Speter<li>Merged canon_file_t (from path and parent, both canon_file_t) 96251875Speter<li>Get char* path of all or some segments 97251875Speter<li>Get path flags of IsRelative, IsVirtualRoot, and IsAbsolute 98251875Speter<li>Compare two canon_file_t structures for file equality 99251875Speter</ul> 100251875Speter 101251875Speter<p>The path is corrected to the file system case only if is in absolute 102251875Speterform. The apr_canon_file_t should be preserved as long as possible and 103251875Speterused as the parent to create child entries to reduce the number of expensive 104251875Speterstat and case canonicalization calls to the OS.</p> 105251875Speter 106251875Speter<p>The comparison operation provides that the APR can postpone correction 107266735Speterof case by simply relying upon the device and inode for equivalence. The 108251875Speterstat implementation provides that two files are the same, while their 109266735Speterstrings are not equivalent, and eliminates the need for the operating 110251875Spetersystem to return the proper form of the name.</p> 111251875Speter 112251875Speter<p>In any case, returning the char* path, with a flag to request the proper 113251875Spetercase, forces the OS calls to resolve the true names of each segment. Where 114362181Sdimthere is a penalty for this operation and the stat device and inode test 115251875Speteris faster, case correction is postponed until the char* result is requested. 116251875SpeterOn platforms that identify the inode, device, or proper name interchangably 117362181Sdimwith no penalties, this may occur when the name is initially processed.</p> 118251875Speter 119251875Speter<hr> 120251875Speter 121251875Speter<h2>Unix Example</h2> 122251875Speter 123251875Speter<p>First the simplest case:</p> 124251875Speter 125251875Speter<pre> 126251875SpeterParse Canonical Name 127251875Speteraccepts parent path as canonical_t 128251875Speter this path as string 129251875Speter 130251875SpeterSplit this path Segments on '/' 131251875Speter 132251875SpeterFor each of this path Segments 133251875Speter If first Segment 134251875Speter If this Segment is Empty ([nothing]/) 135251875Speter Append this Root Segment (don't merge) 136251875Speter Continue to next Segment 137251875Speter Else is relative 138251875Speter Append parent Segments (to merge) 139251875Speter Continue with this Segment 140251875Speter If Segment is '.' or empty (2 slashes) 141251875Speter Discard this Segment 142251875Speter Continue with next Segment 143251875Speter If Segment is '..' 144251875Speter If no previous Segment or previous Segment is '..' 145251875Speter Append this Segment 146251875Speter Continue with next Segment 147251875Speter If previous Segment and previous is not Root Segment 148251875Speter Discard previous Segment 149251875Speter Discard this Segment 150251875Speter Continue with next Segment 151251875Speter Append this Relative Segment 152251875Speter Continue with next Segment 153251875Speter</pre> 154251875Speter 155251875Speter</BODY> 156266735Speter</HTML> 157