structure revision 256281
1This file describes the design, layouts, and file formats of a
2libsvn_fs_fs repository.
3
4Design
5------
6
7In FSFS, each committed revision is represented as an immutable file
8containing the new node-revisions, contents, and changed-path
9information for the revision, plus a second, changeable file
10containing the revision properties.
11
12In contrast to the BDB back end, the contents of recent revision of
13files are stored as deltas against earlier revisions, instead of the
14other way around.  This is less efficient for common-case checkouts,
15but brings greater simplicity and robustness, as well as the
16flexibility to make commits work without write access to existing
17revisions.  Skip-deltas and delta combination mitigate the checkout
18cost.
19
20In-progress transactions are represented with a prototype rev file
21containing only the new text representations of files (appended to as
22changed file contents come in), along with a separate file for each
23node-revision, directory representation, or property representation
24which has been changed or added in the transaction.  During the final
25stage of the commit, these separate files are marshalled onto the end
26of the prototype rev file to form the immutable revision file.
27
28Layout of the FS directory
29--------------------------
30
31The layout of the FS directory (the "db" subdirectory of the
32repository) is:
33
34  revs/               Subdirectory containing revs
35    <shard>/          Shard directory, if sharding is in use (see below)
36      <revnum>        File containing rev <revnum>
37    <shard>.pack/     Pack directory, if the repo has been packed (see below)
38      pack            Pack file, if the repository has been packed (see below)
39      manifest        Pack manifest file, if a pack file exists (see below)
40  revprops/           Subdirectory containing rev-props
41    <shard>/          Shard directory, if sharding is in use (see below)
42      <revnum>        File containing rev-props for <revnum>
43    <shard>.pack/     Pack directory, if the repo has been packed (see below)
44      <rev>.<count>   Pack file, if the repository has been packed (see below)
45      manifest        Pack manifest file, if a pack file exists (see below)
46    revprops.db       SQLite database of the packed revision properties
47  transactions/       Subdirectory containing transactions
48    <txnid>.txn/      Directory containing transaction <txnid>
49  txn-protorevs/      Subdirectory containing transaction proto-revision files
50    <txnid>.rev       Proto-revision file for transaction <txnid>
51    <txnid>.rev-lock  Write lock for proto-rev file
52  txn-current         File containing the next transaction key
53  locks/              Subdirectory containing locks
54    <partial-digest>/ Subdirectory named for first 3 letters of an MD5 digest
55      <digest>        File containing locks/children for path with <digest>
56  node-origins/       Lazy cache of origin noderevs for nodes
57    <partial-nodeid>  File containing noderev ID of origins of nodes
58  current             File specifying current revision and next node/copy id
59  fs-type             File identifying this filesystem as an FSFS filesystem
60  write-lock          Empty file, locked to serialise writers
61  txn-current-lock    Empty file, locked to serialise 'txn-current'
62  uuid                File containing the UUID of the repository
63  format              File containing the format number of this filesystem
64  fsfs.conf           Configuration file
65  min-unpacked-rev    File containing the oldest revision not in a pack file
66  min-unpacked-revprop File containing the oldest revision of unpacked revprop
67  rep-cache.db        SQLite database mapping rep checksums to locations
68
69Files in the revprops directory are in the hash dump format used by
70svn_hash_write.
71
72The format of the "current" file is:
73
74 * Format 3 and above: a single line of the form
75   "<youngest-revision>\n" giving the youngest revision for the
76   repository.
77
78 * Format 2 and below: a single line of the form "<youngest-revision>
79   <next-node-id> <next-copy-id>\n" giving the youngest revision, the
80   next unique node-ID, and the next unique copy-ID for the
81   repository.
82
83The "write-lock" file is an empty file which is locked before the
84final stage of a commit and unlocked after the new "current" file has
85been moved into place to indicate that a new revision is present.  It
86is also locked during a revprop propchange while the revprop file is
87read in, mutated, and written out again.  Note that readers are never
88blocked by any operation - writers must ensure that the filesystem is
89always in a consistent state.
90
91The "txn-current" file is a file with a single line of text that
92contains only a base-36 number.  The current value will be used in the
93next transaction name, along with the revision number the transaction
94is based on.  This sequence number ensures that transaction names are
95not reused, even if the transaction is aborted and a new transaction
96based on the same revision is begun.  The only operation that FSFS
97performs on this file is "get and increment"; the "txn-current-lock"
98file is locked during this operation.
99
100"fsfs.conf" is a configuration file in the standard Subversion/Python
101config format.  It is automatically generated when you create a new
102repository; read the generated file for details on what it controls.
103
104When representation sharing is enabled, the filesystem tracks
105representation checksum and location mappings using a SQLite database in
106"rep-cache.db".  The database has a single table, which stores the sha1
107hash text as the primary key, mapped to the representation revision, offset,
108size and expanded size.  This file is only consulted during writes and never
109during reads.  Consequently, it is not required, and may be removed at an
110abritrary time, with the subsequent loss of rep-sharing capabilities for
111revisions written thereafter.
112
113Filesystem formats
114------------------
115
116The "format" file defines what features are permitted within the
117filesystem, and indicates changes that are not backward-compatible.
118It serves the same purpose as the repository file of the same name.
119
120The filesystem format file was introduced in Subversion 1.2, and so
121will not be present if the repository was created with an older
122version of Subversion.  An absent format file should be interpreted as
123indicating a format 1 filesystem.
124
125The format file is a single line of the form "<format number>\n",
126followed by any number of lines specifying 'format options' -
127additional information about the filesystem's format.  Each format
128option line is of the form "<option>\n" or "<option> <parameters>\n".
129
130Clients should raise an error if they encounter an option not
131permitted by the format number in use.
132
133The formats are:
134
135  Format 1, understood by Subversion 1.1+
136  Format 2, understood by Subversion 1.4+
137  Format 3, understood by Subversion 1.5+
138  Format 4, understood by Subversion 1.6+
139  Format 5, understood by Subversion 1.7-dev, never released
140  Format 6, understood by Subversion 1.8
141
142The differences between the formats are:
143
144Delta representation in revision files
145  Format 1: svndiff0 only
146  Formats 2+: svndiff0 or svndiff1
147
148Format options
149  Formats 1-2: none permitted
150  Format 3+:   "layout" option
151
152Transaction name reuse
153  Formats 1-2: transaction names may be reused
154  Format 3+:   transaction names generated using txn-current file
155
156Location of proto-rev file and its lock
157  Formats 1-2: transactions/<txnid>/rev and
158    transactions/<txnid>/rev-lock.
159  Format 3+:   txn-protorevs/<txnid>.rev and
160    txn-protorevs/<txnid>.rev-lock.
161
162Node-ID and copy-ID generation
163  Formats 1-2: Node-IDs and copy-IDs are guaranteed to form a
164    monotonically increasing base36 sequence using the "current"
165    file.
166  Format 3+:   Node-IDs and copy-IDs use the new revision number to
167    ensure uniqueness and the "current" file just contains the
168    youngest revision.
169
170Mergeinfo metadata:
171  Format 1-2: minfo-here and minfo-count node-revision fields are not
172    stored.  svn_fs_get_mergeinfo returns an error.
173  Format 3+:  minfo-here and minfo-count node-revision fields are
174    maintained.  svn_fs_get_mergeinfo works.
175
176Revision changed paths list:
177  Format 1-3: Does not contain the node's kind.
178  Format 4+:  Contains the node's kind.
179
180Shard packing:
181  Format 4:   Applied to revision data only.
182  Format 5:   Revprops would be packed independently of revision data.
183  Format 6+:  Applied equally to revision data and revprop data
184    (i.e. same min packed revision)
185
186# Incomplete list.  See SVN_FS_FS__MIN_*_FORMAT
187
188
189Filesystem format options
190-------------------------
191
192Currently, the only recognised format option is "layout", which
193specifies the paths that will be used to store the revision files and
194revision property files.
195
196The "layout" option is followed by the name of the filesystem layout
197and any required parameters.  The default layout, if no "layout"
198keyword is specified, is the 'linear' layout.
199
200The known layouts, and the parameters they require, are as follows:
201
202"linear"
203  Revision files and rev-prop files are named after the revision they
204  represent, and are placed directly in the revs/ and revprops/
205  directories.  r1234 will be represented by the revision file
206  revs/1234 and the rev-prop file revprops/1234.
207
208"sharded <max-files-per-directory>"
209  Revision files and rev-prop files are named after the revision they
210  represent, and are placed in a subdirectory of the revs/ and
211  revprops/ directories named according to the 'shard' they belong to.
212
213  Shards are numbered from zero and contain between one and the
214  maximum number of files per directory specified in the layout's
215  parameters.
216
217  For the "sharded 1000" layout, r1234 will be represented by the
218  revision file revs/1/1234 and rev-prop file revprops/1/1234.  The
219  revs/0/ directory will contain revisions 0-999, revs/1/ will contain
220  1000-1999, and so on.
221
222Packing revisions
223-----------------
224
225A filesystem can optionally be "packed" to conserve space on disk.  The
226packing process concatenates all the revision files in each full shard to
227create pack files.  A manifest file is also created for each shard which
228records the indexes of the corresponding revision files in the pack file.
229In addition, the original shard is removed, and reads are redirected to the
230pack file.
231
232The manifest file consists of a list of offsets, one for each revision in the
233pack file.  The offsets are stored as ASCII decimal, and separated by a newline
234character.
235
236Packing revision properties (format 5: SQLite)
237---------------------------
238
239This was supported by 1.7-dev builds but never included in a blessed release.
240
241See r1143829 of this file:
242http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/structure?view=markup&pathrev=1143829
243
244
245Packing revision properties (format 6+)
246---------------------------
247
248Similarly to the revision data, packing will concatenate multiple
249revprops into a single file.  Since they are mutable data, we put an
250upper limit to the size of these files:  We will concatenate the data
251up to the limit and then use a new file for the following revisions.
252
253The limit can be set and changed at will in the configuration file. 
254It is 64kB by default.  Because a pack file must contain at least one
255complete property list, files containing just one revision may exceed
256that limit.
257
258Furthermore, pack files can be compressed which saves about 75% of
259disk space.  A configuration file flag enables the compression; it is
260off by default and may be switched on and off at will.  The pack size
261limit is always applied to the uncompressed data.  For this reason,
262the default is 256kB while compression has been enabled.
263
264Files are named after their start revision as "<rev>.<counter>" where
265counter will be increased whenever we rewrite a pack file due to a
266revprop change.  The manifest file contains the list of pack file
267names, one line for each revision.
268
269Many tools track repository global data in revision properties at 
270revision 0.  To minimize I/O overhead for those applications,  we
271will never pack that revision, i.e. its data is always being kept
272in revprops/0/0.
273
274Pack file format
275
276  Top level: <packed container>
277
278  We always apply data compression to the pack file - using the
279  SVN_DELTA_COMPRESSION_LEVEL_NONE level if compression is disabled.
280  (Note that compression at SVN_DELTA_COMPRESSION_LEVEL_NONE is not
281  a no-op stream transformation although most of the data will remain
282  human readable.)
283
284  container := header '\n' (revprops)+
285  header    := start_rev '\n' rev_count '\n' (size '\n')+
286
287  All numbers in the header are given as ASCII decimals.  rev_count
288  is the number of revisions packed into this container.  There must
289  be exactly as many "size" and serialized "revprops".  The "size"
290  values in the list are the length in bytes of the serialized
291  revprops of the respective revision.
292
293Writing to packed revprops
294
295  The old pack file is being read and the new revprops serialized.
296  If they fit into the same pack file, a temp file with the new
297  content gets written and moved into place just like an non-packed
298  revprop file would. No name change or manifest update required.
299
300  If they don't fit into the same pack file,  i.e. exceed the pack
301  size limit,  the pack will be split into 2 or 3 new packs just
302  before and / or after the modified revision.
303
304  In the current implementation, they will never be merged again.
305  To minimize fragmentation, the initial packing process will only
306  use about 90% of the limit, i.e. leave some room for growth.
307
308  When a pack file gets split, its counter is being increased
309  creating a new file and leaving the old content in place and
310  available for concurrent readers.  Only after the new manifest
311  file got moved into place, will the old pack files be deleted. 
312
313  Write access to revprops is being serialized by the global
314  filesystem write lock.  We only need to build a few retries into
315  the reader code to gracefully handle manifest changes and pack
316  file deletions.
317
318
319Node-revision IDs
320-----------------
321
322A node-rev ID consists of the following three fields:
323
324    node_revision_id ::= node_id '.' copy_id '.' txn_id
325
326At this level, the form of the ID is the same as for BDB - see the
327section called "ID's" in <../libsvn_fs_base/notes/structure>.
328
329In order to support efficient lookup of node-revisions by their IDs
330and to simplify the allocation of fresh node-IDs during a transaction,
331we treat the fields of a node-rev ID in new and interesting ways.
332
333Within a new transaction:
334
335  New node-revision IDs assigned within a transaction have a txn-id
336  field of the form "t<txnid>".
337
338  When a new node-id or copy-id is assigned in a transaction, the ID
339  used is a "_" followed by a base36 number unique to the transaction.
340
341Within a revision:
342
343  Within a revision file, node-revs have a txn-id field of the form
344  "r<rev>/<offset>", to support easy lookup. The <offset> is the (ASCII
345  decimal) number of bytes from the start of the revision file to the
346  start of the node-rev.
347
348  During the final phase of a commit, node-revision IDs are rewritten
349  to have repository-wide unique node-ID and copy-ID fields, and to have
350  "r<rev>/<offset>" txn-id fields.
351
352  In Format 3 and above, this uniqueness is done by changing a temporary
353  id of "_<base36>" to "<base36>-<rev>".  Note that this means that the
354  originating revision of a line of history or a copy can be determined
355  by looking at the node ID.
356
357  In Format 2 and below, the "current" file contains global base36
358  node-ID and copy-ID counters; during the commit, the counter value is
359  added to the transaction-specific base36 ID, and the value in
360  "current" is adjusted.
361
362  (It is legal for Format 3 repositories to contain Format 2-style IDs;
363  this just prevents I/O-less node-origin-rev lookup for those nodes.)
364
365The temporary assignment of node-ID and copy-ID fields has
366implications for svn_fs_compare_ids and svn_fs_check_related.  The ID
367_1.0.t1 is not related to the ID _1.0.t2 even though they have the
368same node-ID, because temporary node-IDs are restricted in scope to
369the transactions they belong to.
370
371There is a lazily created cache mapping from node-IDs to the full
372node-revision ID where they are created.  This is in the node-origins
373directory; the file name is the node-ID without its last character (or
374"0" for single-character node IDs) and the contents is a serialized
375hash mapping from node-ID to node-revision ID.  This cache is only
376used for node-IDs of the pre-Format 3 style.
377
378Copy-IDs and copy roots
379-----------------------
380
381Copy-IDs are assigned in the same manner as they are in the BDB
382implementation:
383
384  * A node-rev resulting from a creation operation (with no copy
385    history) receives the copy-ID of its parent directory.
386
387  * A node-rev resulting from a copy operation receives a fresh
388    copy-ID, as one would expect.
389
390  * A node-rev resulting from a modification operation receives a
391    copy-ID depending on whether its predecessor derives from a
392    copy operation or whether it derives from a creation operation
393    with no intervening copies:
394
395      - If the predecessor does not derive from a copy, the new
396        node-rev receives the copy-ID of its parent directory.  If the
397        node-rev is being modified through its created-path, this will
398        be the same copy-ID as the predecessor node-rev has; however,
399        if the node-rev is being modified through a copied ancestor
400        directory (i.e. we are performing a "lazy copy"), this will be
401        a different copy-ID.
402
403      - If the predecessor derives from a copy and the node-rev is
404        being modified through its created-path, the new node-rev
405        receives the copy-ID of the predecessor.
406
407      - If the predecessor derives from a copy and the node-rev is not
408        being modified through its created path, the new node-rev
409        receives a fresh copy-ID.  This is called a "soft copy"
410        operation, as distinct from a "true copy" operation which was
411        actually requested through the svn_fs interface.  Soft copies
412        exist to ensure that the same <node-ID,copy-ID> pair is not
413        used twice within a transaction.
414
415Unlike the BDB implementation, we do not have a "copies" table.
416Instead, each node-revision record contains a "copyroot" field
417identifying the node-rev resulting from the true copy operation most
418proximal to the node-rev.  If the node-rev does not itself derive from
419a copy operation, then the copyroot field identifies the copy of an
420ancestor directory; if no ancestor directories derive from a copy
421operation, then the copyroot field identifies the root directory of
422rev 0.
423
424Revision file format
425--------------------
426
427A revision file contains a concatenation of various kinds of data:
428
429  * Text and property representations
430  * Node-revisions
431  * The changed-path data
432  * Two offsets at the very end
433
434A representation begins with a line containing either "PLAIN\n" or
435"DELTA\n" or "DELTA <rev> <offset> <length>\n", where <rev>, <offset>,
436and <length> give the location of the delta base of the representation
437and the amount of data it contains (not counting the header or
438trailer).  If no base location is given for a delta, the base is the
439empty stream.  After the initial line comes raw svndiff data, followed
440by a cosmetic trailer "ENDREP\n".
441
442If the representation is for the text contents of a directory node,
443the expanded contents are in hash dump format mapping entry names to
444"<type> <id>" pairs, where <type> is "file" or "dir" and <id> gives
445the ID of the child node-rev.
446
447If a representation is for a property list, the expanded contents are
448in the form of a dumped hash map mapping property names to property
449values.
450
451The marshalling syntax for node-revs is a series of fields terminated
452by a blank line.  Fields have the syntax "<name>: <value>\n", where
453<name> is a symbolic field name (each symbolic name is used only once
454in a given node-rev) and <value> is the value data.  Unrecognized
455fields are ignored, for extensibility.  The following fields are
456defined:
457
458  id        The ID of the node-rev
459  type      "file" or "dir"
460  pred      The ID of the predecessor node-rev
461  count     Count of node-revs since the base of the node
462  text      "<rev> <offset> <length> <size> <digest>" for text rep
463  props     "<rev> <offset> <length> <size> <digest>" for props rep
464            <rev> and <offset> give location of rep
465            <length> gives length of rep, sans header and trailer
466            <size> gives size of expanded rep; may be 0 if equal
467             to the length
468            <digest> gives hex MD5 digest of expanded rep
469            ### in formats >=4, also present:
470            <sha1-digest> gives hex SHA1 digest of expanded rep
471            <uniquifier> see representation_t->uniquifier in fs.h
472  cpath     FS pathname node was created at
473  copyfrom  "<rev> <path>" of copyfrom data
474  copyroot  "<rev> <created-path>" of the root of this copy
475  minfo-cnt The number of nodes under (and including) this node
476             which have svn:mergeinfo.
477  minfo-here Exists if this node itself has svn:mergeinfo.
478
479The predecessor of a node-rev crosses both soft and true copies;
480together with the count field, it allows efficient determination of
481the base for skip-deltas.  The first node-rev of a node contains no
482"pred" field.  A node-revision with no properties may omit the "props"
483field.  A node-revision with no contents (a zero-length file or an
484empty directory) may omit the "text" field.  In a node-revision
485resulting from a true copy operation, the "copyfrom" field gives the
486copyfrom data.  The "copyroot" field identifies the root node-revision
487of the copy; it may be omitted if the node-rev is its own copy root
488(as is the case for node-revs with copy history, and for the root node
489of revision 0).  Copy roots are identified by revision and
490created-path, not by node-rev ID, because a copy root may be a
491node-rev which exists later on within the same revision file, meaning
492its offset is not yet known.
493
494The changed-path data is represented as a series of changed-path
495items, each consisting of two lines.  The first line has the format
496"<id> <action> <text-mod> <prop-mod> <path>\n", where <id> is the
497node-rev ID of the new node-rev, <action> is "add", "delete",
498"replace", or "modify", <text-mod> and <prop-mod> are "true" or
499"false" indicating whether the text and/or properties changed, and
500<path> is the changed pathname.  For deletes, <id> is the node-rev ID
501of the deleted node-rev, and <text-mod> and <prop-mod> are always
502"false".  The second line has the format "<rev> <path>\n" containing
503the node-rev's copyfrom information if it has any; if it does not, the
504second line is blank.
505
506Starting with FS format 4, <action> may contain the kind ("file" or
507"dir") of the node, after a hyphen; for example, an added directory
508may be represented as "add-dir".
509
510At the very end of a rev file is a pair of lines containing
511"\n<root-offset> <cp-offset>\n", where <root-offset> is the offset of
512the root directory node revision and <cp-offset> is the offset of the
513changed-path data.
514
515All numbers in the rev file format are unsigned and are represented as
516ASCII decimal.
517
518Transaction layout
519------------------
520
521A transaction directory has the following layout:
522
523  props                      Transaction props
524  next-ids                   Next temporary node-ID and copy-ID
525  changes                    Changed-path information so far
526  node.<nid>.<cid>           New node-rev data for node
527  node.<nid>.<cid>.props     Props for new node-rev, if changed
528  node.<nid>.<cid>.children  Directory contents for node-rev
529  <sha1>                     Text representation of that sha1
530
531In FS formats 1 and 2, it also contains:
532
533  rev                        Prototype rev file with new text reps
534  rev-lock                   Lockfile for writing to the above
535
536In newer formats, these files are in the txn-protorevs/ directory.
537
538The prototype rev file is used to store the text representations as
539they are received from the client.  To ensure that only one client is
540writing to the file at a given time, the "rev-lock" file is locked for
541the duration of each write.
542
543The two kinds of props files are all in hash dump format.  The "props"
544file will always be present.  The "node.<nid>.<cid>.props" file will
545only be present if the node-rev properties have been changed.
546
547The <sha1> files have been introduced in FS format 6. Their content
548is that of text rep references: "<rev> <offset> <length> <size> <digest>"
549They will be written for text reps in the current transaction and be
550used to eliminate duplicate reps within that transaction.
551
552The "next-ids" file contains a single line "<next-temp-node-id>
553<next-temp-copy-id>\n" giving the next temporary node-ID and copy-ID
554assignments (without the leading underscores).  The next node-ID is
555also used as a uniquifier for representations which may share the same
556underlying rep.
557
558The "children" file for a node-revision begins with a copy of the hash
559dump representation of the directory entries from the old node-rev (or
560a dump of the empty hash for new directories), and then an incremental
561hash dump entry for each change made to the directory.
562
563The "changes" file contains changed-path entries in the same form as
564the changed-path entries in a rev file, except that <id> and <action>
565may both be "reset" (in which case <text-mod> and <prop-mod> are both
566always "false") to indicate that all changes to a path should be
567considered undone.  Reset entries are only used during the final merge
568phase of a transaction.  Actions in the "changes" file always contain
569a node kind, even if the FS format is older than format 4.
570
571The node-rev files have the same format as node-revs in a revision
572file, except that the "text" and "props" fields are augmented as
573follows:
574
575  * The "props" field may have the value "-1" if properties have
576    been changed and are contained in a "props" file within the
577    node-rev subdirectory.
578
579  * For directory node-revs, the "text" field may have the value
580    "-1" if entries have been changed and are contained in a
581    "contents" file in the node-rev subdirectory.
582
583  * For the directory node-rev representing the root of the
584    transaction, the "is-fresh-txn-root" field indicates that it has
585    not been made mutable yet (see Issue #2608).
586
587  * For file node-revs, the "text" field may have the value "-1
588    <offset> <length> <size> <digest>" if the text representation is
589    within the prototype rev file.
590
591  * The "copyroot" field may have the value "-1 <created-path>" if the
592    copy root of the node-rev is part of the transaction in process.
593
594Locks layout
595------------
596
597Locks in FSFS are stored in serialized hash format in files whose
598names are MD5 digests of the FS path which the lock is associated
599with.  For the purposes of keeping directory inode usage down, these
600digest files live in subdirectories of the main lock directory whose
601names are the first 3 characters of the digest filename.
602
603Also stored in the digest file for a given FS path are pointers to
604other digest files which contain information associated with other FS
605paths that are beneath our path (an immediate child thereof, or a
606grandchild, or a great-grandchild, ...).
607
608To answer the question, "Does path FOO have a lock associated with
609it?", one need only generate the MD5 digest of FOO's
610absolute-in-the-FS path (say, 3b1b011fed614a263986b5c4869604e8), look
611for a file located like so:
612
613   /path/to/repos/locks/3b1/3b1b011fed614a263986b5c4869604e8
614
615And then see if that file contains lock information.
616
617To inquire about locks on children of the path FOO, you would
618reference the same path as above, but look for a list of children in
619that file (instead of lock information).  Children are listed as MD5
620digests, too, so you would simply iterate over those digests and
621consult the files they reference for lock information.
622