132785SpeterIt would be nice if the RCS file format (which is implemented by a
225839Spetergreat many tools, both free and non-free, both by calling GNU RCS and
325839Speterby reimplementing access to RCS files) were documented in some
425839Speterstandard separate from any one tool.  But as far as I know no such
525839Speterstandard exists.  Hence this file.
625839Speter
725839SpeterThe place to start is the rcsfile.5 manpage in the GNU RCS 5.7
825839Speterdistribution.  Then look at the diff at the end of this file (which
925839Spetercontains a few fixes and clarifications to that manpage).
1025839Speter
1125839SpeterIf you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a
1225839Spetercomment about their date format.  However, as far as we know there
1325839Speterisn't really any document describing MKS's changes to the RCS file
1425839Speterformat.
1525839Speter
1625839SpeterThe rcsfile.5 manpage does not document what goes in the "text" field
1725839Speterfor each revision.  The answer is that the head revision contains the
1825839Spetercontents of that revision and every other revision contain a bunch of
1925839Speteredits to produce that revision ("a" and "d" lines).  The GNU diff
2025839Spetermanual (the version I looked at was for GNU diff 2.4) documents this
2125839Speterformat somewhat (as the "RCS output format"), but the presentation is
2225839Spetera bit confusing as it is all tangled up with the documentation of
2325839Speterseveral other output formats.  If you just want some source code to
2425839Speterlook at, the part of CVS which applies these is RCS_deltas in
2525839Spetersrc/rcs.c.
2625839Speter
2744852SpeterThe rcsfile.5 documentation only _very_ briefly touches on the order
2844852Speterof the revisions.  The order _is_ important and CVS relies on it.
2944852SpeterHere is an example of what I was able to find, based on the join3
3044852Spetersanity.sh testcase (and the behavior I am documenting here seems to be
3144852Speterthe same for RCS 5.7 and CVS 1.9.27):
3225839Speter
3344852Speter    1.1 ----------------->  1.2
3444852Speter     \---> 1.1.2.1           \---> 1.2.2.1
3544852Speter
3644852SpeterHere is how this shows up in the RCS file (omitting irrelevant parts):
3744852Speter
3844852Speter  admin:  head 1.2;
3944852Speter  deltas:
4044852Speter    1.2 branches 1.2.2.1; next 1.1;
4144852Speter    1.1 branches 1.1.2.1; next;
4244852Speter    1.1.2.1 branches; next;
4344852Speter    1.2.2.1 branches; next;
4444852Speter  deltatexts:
4544852Speter    1.2
4644852Speter    1.2.2.1
4744852Speter    1.1
4844852Speter    1.1.2.1
4944852Speter
5044852SpeterYes, the order seems to differ between the deltas and the deltatexts.
5144852SpeterI have no idea how much of this should actually be considered part of
5244852Speterthe RCS file format, and how much programs reading it should expect to
5344852Speterencounter any order.
5444852Speter
5532785SpeterThe rcsfile.5 grammar shows the {num} after "next" as optional; if it
5632785Speteris omitted then there is no next delta node (for example 1.1 or the
5732785Speterhead of a branch will typically have no next).
5832785Speter
5925839SpeterThere is one case where CVS uses CVS-specific, non-compatible changes
6025839Speterto the RCS file format, and this is magic branches.  See cvs.texinfo
6125839Speterfor more information on them.  CVS also sets the RCS state to "dead"
6225839Speterto indicate that a file does not exist in a given revision (this is
6325839Speterstored just as any other RCS state is).
6425839Speter
6526065SpeterThe RCS file format allows quite a variety of extensions to be added
6626065Speterin a compatible manner by use of the "newphrase" feature documented in
6726065Speterrcsfile.5.  We won't try to document extensions not used by CVS in any
6826065Speterdetail, but we will briefly list them.  Each occurrence of a newphrase
6926065Speterbegins with an identifier, which is what we list here.  Future
7026065Speterdesigners of extensions are strongly encouraged to pick
7126065Speternon-conflicting identifiers.  Note that newphrase occurs several
7226065Speterplaces in the RCS grammar, and a given extension may not be legal in
7326065Speterall locations.  However, it seems better to reserve a particular
7426065Speteridentifier for all locations, to avoid confusion and complicated
7526065Speterrules.
7626065Speter
7726065Speter   Identifier   Used by
7826065Speter   ----------   -------
7926065Speter   namespace    RCS library done at Silicon Graphics Inc. (SGI) in 1996
8026065Speter                (a modified RCS 5.7--not sure it has any other name).
8126801Speter   dead         A set of RCS patches developed by Rich Pixley at
8232785Speter                Cygnus about 1992.  These were for CVS, and predated
8332785Speter                the current CVS death support, which uses a state "dead"
8432785Speter                rather than a "dead" newphrase.
8526065Speter
8644852SpeterCVS does use newphrases to implement the `PreservePermissions'
8744852Speterextension introduced in CVS 1.9.26.  The following new keywords are
8844852Speterdefined when PreservePermissions=yes:
8944852Speter
9044852Speter   owner
9144852Speter   group
9244852Speter   permissions
9344852Speter   special
9444852Speter   symlink
9544852Speter   hardlinks
9644852Speter
9744852SpeterThe contents of the `owner' and `group' field should be a numeric uid
9844852Speterand a numeric gid, respectively, representing the user and group who
9944852Speterown the file.  The `permissions' field contains an octal integer,
10044852Speterrepresenting the permissions that should be applied to the file.  The
10144852Speter`special' field contains two words; the first must be either `block'
10244852Speteror `character', and the second is the file's device number.  The
10344852Speter`symlink' field should be present only in files which are symbolic
10444852Speterlinks to other files, and absent on all regular files.  The
10544852Speter`hardlinks' field contains a list of filenames to which the current
10644852Speterfile is linked, in alphabetical order.  Because files often contain
10744852Spetercharacters special to RCS, like `.' and sometimes even contain spaces
10844852Speteror eight-bit characters, the filenames in the hardlinks field will
10944852Speterusually be enclosed in RCS strings.  For example:
11044852Speter
11144852Speter	hardlinks	README @install.txt@ @Installation Notes@;
11244852Speter
11344852SpeterThe hardlinks field should always include the name of the current
11444852Speterfile.  That is, in the repository file README,v, any hardlinks fields
11544852Speterin the delta nodes should include `README'; CVS will not operate
11644852Speterproperly if this is not done.
11744852Speter
11825839SpeterThe rules regarding keyword expansion are not documented along with
11925839Speterthe rest of the RCS file format; they are documented in the co(1)
12025839Spetermanpage in the RCS 5.7 distribution.  See also the "Keyword
12125839Spetersubstitution" chapter of cvs.texinfo.  The co(1) manpage refers to
12225839Speterspecial behavior if the log prefix for the $Log keyword is /* or (*.
12325839SpeterRCS 5.7 produces a warning whenever it behaves that way, and current
12425839Speterversions of CVS do not handle this case in a special way (CVS 1.9 and
12525839Speterearlier invoke RCS to perform keyword expansion).
12625839Speter
12732785SpeterNote that if the "expand" keyword is omitted from the RCS file, the
12832785Speterdefault is "kv".
12932785Speter
13025839SpeterNote that the "comment {string};" syntax from rcsfile.5 specifies a
13125839Spetercomment leader, which affects expansion of the $Log keyword for old
13225839Speterversions of RCS.  The comment leader is not used by RCS 5.7 or current
13325839Speterversions of CVS.
13425839Speter
13525839SpeterBoth RCS 5.7 and current versions of CVS handle the $Log keyword in a
13625839Speterdifferent way if the log message starts with "checked in with -k by ".
13725839SpeterI don't think this behavior is documented anywhere.
13825839Speter
13954427SpeterHere is a clarification regarding characters versus bytes in certain
14054427Spetercharacter sets like JIS and Big5:
14154427Speter
14254427Speter    The RCS file format, as described in the rcsfile(5) man page, is
14354427Speter    actually byte-oriented, not character-oriented, despite hints to
14454427Speter    the contrary in the man page.  This distinction is important for
14554427Speter    multibyte characters.  For example, if a multibyte character
14654427Speter    contains a `@' byte, the `@' must be doubled within strings in RCS
14754427Speter    files, since RCS uses `@' bytes as escapes.
14854427Speter
14954427Speter    This point is not an issue for encodings like ISO 8859, which do
15054427Speter    not have multibyte characters.  Nor is it an issue for encodings
15154427Speter    like UTF-8 and EUC-JIS, which never uses ASCII bytes within a
15254427Speter    multibyte character.  It is an issue only for multibyte encodings
15354427Speter    like JIS and BIG5, which _do_ usurp ASCII bytes.
15454427Speter
15554427Speter    If `@' doubling occurs within a multibyte char, the resulting RCS
15654427Speter    file is not a properly encoded text file.  Instead, it is a byte
15754427Speter    stream that does not use a consistent character encoding that can
15854427Speter    be understood by the usual text tools, since doubling `@' messes
15954427Speter    up the encoding.  This point affects only programs that examine
16054427Speter    the RCS files -- it doesn't affect the external RCS interface, as
16154427Speter    the RCS commands always give you the properly encoded text files
16254427Speter    and logs (assuming that you always check in properly encoded
16354427Speter    text).
16454427Speter
16554427Speter    CVS 1.10 (and earlier) probably has some bugs in this area on
16654427Speter    systems where a C "char" is signed and where the data contains
16754427Speter    bytes with the eighth bit set.
16854427Speter
16925839SpeterOne common concern about the RCS file format is the fact that to get
17025839Speterthe head of a branch, one must apply deltas from the head of the trunk
17125839Speterto the branchpoint, and then from the branchpoint to the head of the
17225839Speterbranch.  While more detailed analyses might be worth doing, we will
17325839Speternote:
17425839Speter
17525839Speter    * The performance bottleneck for CVS generally is figuring out which
17625839Speter    files to operate on and that sort of thing, not applying deltas.
17725839Speter
17825839Speter    * Here is one quick test (probably not a very good test; a better test
17925839Speter    would use a normally sized file (say 50-200K) instead of a small one):
18025839Speter
18125839Speter	I just did a quick test with a small file (on a Sun Ultra 1/170E
18225839Speter	running Solaris 5.5.1), with 1000 revisions on the main branch and
18325839Speter	1000 revisions on branch that forked at the root (i.e., RCS revisions
18425839Speter	1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ...,
18525839Speter	1.1.1.1000).  It took about 0.15 seconds real time to check in the
18625839Speter	first revision, and about 0.6 seconds to check in and 0.3 seconds to
18725839Speter	retrieve revision 1.1.1.1000 (the worst case).
18825839Speter
18925839Speter    * Any attempt to "fix" this problem should be careful not to interfere
19025839Speter    with other features, such as lightweight creation of branches
19125839Speter    (particularly using CVS magic branches).
19225839Speter
19325839SpeterDiff follows:
19425839Speter
19525839Speter(Note that in the following diff the old value for the Id keyword was:
19625839Speter    Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp 
19725839Speterand the new one was:
19825839Speter    Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp 
19925839Speterbut since this file itself might be subject to keyword expansion I
20025839Speterhaven't included a diff for that fact).
20125839Speter
20225839Speter===================================================================
20325839SpeterRCS file: RCS/rcsfile.5in,v
20425839Speterretrieving revision 5.6
20525839Speterretrieving revision 5.7
20625839Speterdiff -u -r5.6 -r5.7
20725839Speter--- rcsfile.5in	1995/06/05 08:28:35	5.6
20825839Speter+++ rcsfile.5in	1996/12/09 17:31:44	5.7
20925839Speter@@ -85,7 +85,8 @@
21025839Speter .LP
21125839Speter \f2sym\fP	::=	{\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}*
21225839Speter .LP
21325839Speter-\f2idchar\fP	::=	any visible graphic character except \f2special\fP
21425839Speter+\f2idchar\fP	::=	any visible graphic character,
21525839Speter+		except \f2digit\fP or \f2special\fP
21625839Speter .LP
21725839Speter \f2special\fP	::=	\f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP
21825839Speter .LP
21925839Speter@@ -119,12 +120,23 @@
22025839Speter the minute (00\-59),
22125839Speter and
22225839Speter .I ss
22325839Speter-the second (00\-60).
22425839Speter+the second (00\-59).
22525839Speter+If
22625839Speter .I Y
22725839Speter-contains just the last two digits of the year
22825839Speter-for years from 1900 through 1999,
22925839Speter-and all the digits of years thereafter.
23025839Speter-Dates use the Gregorian calendar; times use UTC.
23125839Speter+contains exactly two digits,
23225839Speter+they are the last two digits of a year from 1900 through 1999;
23325839Speter+otherwise,
23425839Speter+.I Y
23525839Speter+contains all the digits of the year.
23625839Speter+Dates use the Gregorian calendar.
23725839Speter+Times use UTC, except that for portability's sake leap seconds are not allowed;
23825839Speter+implementations that support leap seconds should output
23925839Speter+.B 59
24025839Speter+for
24125839Speter+.I ss
24225839Speter+during an inserted leap second, and should accept
24325839Speter+.B 59
24425839Speter+for a deleted leap second.
24525839Speter .PP
24625839Speter The
24725839Speter .I newphrase
24825839Speter@@ -144,16 +156,23 @@
24925839Speter field in order of decreasing numbers.
25025839Speter The
25125839Speter .B head
25225839Speter-field in the
25325839Speter-.I admin
25425839Speter-node points to the head of that sequence (i.e., contains
25525839Speter+field points to the head of that sequence (i.e., contains
25625839Speter the highest pair).
25725839Speter The
25825839Speter .B branch
25925839Speter-node in the admin node indicates the default
26025839Speter+field indicates the default
26125839Speter branch (or revision) for most \*r operations.
26225839Speter If empty, the default
26325839Speter branch is the highest branch on the trunk.
26425839Speter+The
26525839Speter+.B symbols
26625839Speter+field associates symbolic names with revisions.
26725839Speter+For example, if the file contains
26825839Speter+.B "symbols rr:1.1;"
26925839Speter+then
27025839Speter+.B rr
27125839Speter+is a name for revision
27225839Speter+.BR 1.1 .
27325839Speter .PP
27425839Speter All
27525839Speter .I delta
27625839Speter
277