132785SpeterIt would be nice if the RCS file format (which is implemented by a 225839Spetergreat many tools, both free and non-free, both by calling GNU RCS and 325839Speterby reimplementing access to RCS files) were documented in some 425839Speterstandard separate from any one tool. But as far as I know no such 525839Speterstandard exists. Hence this file. 625839Speter 725839SpeterThe place to start is the rcsfile.5 manpage in the GNU RCS 5.7 825839Speterdistribution. Then look at the diff at the end of this file (which 925839Spetercontains a few fixes and clarifications to that manpage). 1025839Speter 1125839SpeterIf you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a 1225839Spetercomment about their date format. However, as far as we know there 1325839Speterisn't really any document describing MKS's changes to the RCS file 1425839Speterformat. 1525839Speter 1625839SpeterThe rcsfile.5 manpage does not document what goes in the "text" field 1725839Speterfor each revision. The answer is that the head revision contains the 1825839Spetercontents of that revision and every other revision contain a bunch of 1925839Speteredits to produce that revision ("a" and "d" lines). The GNU diff 2025839Spetermanual (the version I looked at was for GNU diff 2.4) documents this 2125839Speterformat somewhat (as the "RCS output format"), but the presentation is 2225839Spetera bit confusing as it is all tangled up with the documentation of 2325839Speterseveral other output formats. If you just want some source code to 2425839Speterlook at, the part of CVS which applies these is RCS_deltas in 2525839Spetersrc/rcs.c. 2625839Speter 2744852SpeterThe rcsfile.5 documentation only _very_ briefly touches on the order 2844852Speterof the revisions. The order _is_ important and CVS relies on it. 2944852SpeterHere is an example of what I was able to find, based on the join3 3044852Spetersanity.sh testcase (and the behavior I am documenting here seems to be 3144852Speterthe same for RCS 5.7 and CVS 1.9.27): 3225839Speter 3344852Speter 1.1 -----------------> 1.2 3444852Speter \---> 1.1.2.1 \---> 1.2.2.1 3544852Speter 3644852SpeterHere is how this shows up in the RCS file (omitting irrelevant parts): 3744852Speter 3844852Speter admin: head 1.2; 3944852Speter deltas: 4044852Speter 1.2 branches 1.2.2.1; next 1.1; 4144852Speter 1.1 branches 1.1.2.1; next; 4244852Speter 1.1.2.1 branches; next; 4344852Speter 1.2.2.1 branches; next; 4444852Speter deltatexts: 4544852Speter 1.2 4644852Speter 1.2.2.1 4744852Speter 1.1 4844852Speter 1.1.2.1 4944852Speter 5044852SpeterYes, the order seems to differ between the deltas and the deltatexts. 5144852SpeterI have no idea how much of this should actually be considered part of 5244852Speterthe RCS file format, and how much programs reading it should expect to 5344852Speterencounter any order. 5444852Speter 5532785SpeterThe rcsfile.5 grammar shows the {num} after "next" as optional; if it 5632785Speteris omitted then there is no next delta node (for example 1.1 or the 5732785Speterhead of a branch will typically have no next). 5832785Speter 5925839SpeterThere is one case where CVS uses CVS-specific, non-compatible changes 6025839Speterto the RCS file format, and this is magic branches. See cvs.texinfo 6125839Speterfor more information on them. CVS also sets the RCS state to "dead" 6225839Speterto indicate that a file does not exist in a given revision (this is 6325839Speterstored just as any other RCS state is). 6425839Speter 6526065SpeterThe RCS file format allows quite a variety of extensions to be added 6626065Speterin a compatible manner by use of the "newphrase" feature documented in 6726065Speterrcsfile.5. We won't try to document extensions not used by CVS in any 6826065Speterdetail, but we will briefly list them. Each occurrence of a newphrase 6926065Speterbegins with an identifier, which is what we list here. Future 7026065Speterdesigners of extensions are strongly encouraged to pick 7126065Speternon-conflicting identifiers. Note that newphrase occurs several 7226065Speterplaces in the RCS grammar, and a given extension may not be legal in 7326065Speterall locations. However, it seems better to reserve a particular 7426065Speteridentifier for all locations, to avoid confusion and complicated 7526065Speterrules. 7626065Speter 7726065Speter Identifier Used by 7826065Speter ---------- ------- 7926065Speter namespace RCS library done at Silicon Graphics Inc. (SGI) in 1996 8026065Speter (a modified RCS 5.7--not sure it has any other name). 8126801Speter dead A set of RCS patches developed by Rich Pixley at 8232785Speter Cygnus about 1992. These were for CVS, and predated 8332785Speter the current CVS death support, which uses a state "dead" 8432785Speter rather than a "dead" newphrase. 8526065Speter 8644852SpeterCVS does use newphrases to implement the `PreservePermissions' 8744852Speterextension introduced in CVS 1.9.26. The following new keywords are 8844852Speterdefined when PreservePermissions=yes: 8944852Speter 9044852Speter owner 9144852Speter group 9244852Speter permissions 9344852Speter special 9444852Speter symlink 9544852Speter hardlinks 9644852Speter 9744852SpeterThe contents of the `owner' and `group' field should be a numeric uid 9844852Speterand a numeric gid, respectively, representing the user and group who 9944852Speterown the file. The `permissions' field contains an octal integer, 10044852Speterrepresenting the permissions that should be applied to the file. The 10144852Speter`special' field contains two words; the first must be either `block' 10244852Speteror `character', and the second is the file's device number. The 10344852Speter`symlink' field should be present only in files which are symbolic 10444852Speterlinks to other files, and absent on all regular files. The 10544852Speter`hardlinks' field contains a list of filenames to which the current 10644852Speterfile is linked, in alphabetical order. Because files often contain 10744852Spetercharacters special to RCS, like `.' and sometimes even contain spaces 10844852Speteror eight-bit characters, the filenames in the hardlinks field will 10944852Speterusually be enclosed in RCS strings. For example: 11044852Speter 11144852Speter hardlinks README @install.txt@ @Installation Notes@; 11244852Speter 11344852SpeterThe hardlinks field should always include the name of the current 11444852Speterfile. That is, in the repository file README,v, any hardlinks fields 11544852Speterin the delta nodes should include `README'; CVS will not operate 11644852Speterproperly if this is not done. 11744852Speter 11825839SpeterThe rules regarding keyword expansion are not documented along with 11925839Speterthe rest of the RCS file format; they are documented in the co(1) 12025839Spetermanpage in the RCS 5.7 distribution. See also the "Keyword 12125839Spetersubstitution" chapter of cvs.texinfo. The co(1) manpage refers to 12225839Speterspecial behavior if the log prefix for the $Log keyword is /* or (*. 12325839SpeterRCS 5.7 produces a warning whenever it behaves that way, and current 12425839Speterversions of CVS do not handle this case in a special way (CVS 1.9 and 12525839Speterearlier invoke RCS to perform keyword expansion). 12625839Speter 12732785SpeterNote that if the "expand" keyword is omitted from the RCS file, the 12832785Speterdefault is "kv". 12932785Speter 13025839SpeterNote that the "comment {string};" syntax from rcsfile.5 specifies a 13125839Spetercomment leader, which affects expansion of the $Log keyword for old 13225839Speterversions of RCS. The comment leader is not used by RCS 5.7 or current 13325839Speterversions of CVS. 13425839Speter 13525839SpeterBoth RCS 5.7 and current versions of CVS handle the $Log keyword in a 13625839Speterdifferent way if the log message starts with "checked in with -k by ". 13725839SpeterI don't think this behavior is documented anywhere. 13825839Speter 13954427SpeterHere is a clarification regarding characters versus bytes in certain 14054427Spetercharacter sets like JIS and Big5: 14154427Speter 14254427Speter The RCS file format, as described in the rcsfile(5) man page, is 14354427Speter actually byte-oriented, not character-oriented, despite hints to 14454427Speter the contrary in the man page. This distinction is important for 14554427Speter multibyte characters. For example, if a multibyte character 14654427Speter contains a `@' byte, the `@' must be doubled within strings in RCS 14754427Speter files, since RCS uses `@' bytes as escapes. 14854427Speter 14954427Speter This point is not an issue for encodings like ISO 8859, which do 15054427Speter not have multibyte characters. Nor is it an issue for encodings 15154427Speter like UTF-8 and EUC-JIS, which never uses ASCII bytes within a 15254427Speter multibyte character. It is an issue only for multibyte encodings 15354427Speter like JIS and BIG5, which _do_ usurp ASCII bytes. 15454427Speter 15554427Speter If `@' doubling occurs within a multibyte char, the resulting RCS 15654427Speter file is not a properly encoded text file. Instead, it is a byte 15754427Speter stream that does not use a consistent character encoding that can 15854427Speter be understood by the usual text tools, since doubling `@' messes 15954427Speter up the encoding. This point affects only programs that examine 16054427Speter the RCS files -- it doesn't affect the external RCS interface, as 16154427Speter the RCS commands always give you the properly encoded text files 16254427Speter and logs (assuming that you always check in properly encoded 16354427Speter text). 16454427Speter 16554427Speter CVS 1.10 (and earlier) probably has some bugs in this area on 16654427Speter systems where a C "char" is signed and where the data contains 16754427Speter bytes with the eighth bit set. 16854427Speter 16925839SpeterOne common concern about the RCS file format is the fact that to get 17025839Speterthe head of a branch, one must apply deltas from the head of the trunk 17125839Speterto the branchpoint, and then from the branchpoint to the head of the 17225839Speterbranch. While more detailed analyses might be worth doing, we will 17325839Speternote: 17425839Speter 17525839Speter * The performance bottleneck for CVS generally is figuring out which 17625839Speter files to operate on and that sort of thing, not applying deltas. 17725839Speter 17825839Speter * Here is one quick test (probably not a very good test; a better test 17925839Speter would use a normally sized file (say 50-200K) instead of a small one): 18025839Speter 18125839Speter I just did a quick test with a small file (on a Sun Ultra 1/170E 18225839Speter running Solaris 5.5.1), with 1000 revisions on the main branch and 18325839Speter 1000 revisions on branch that forked at the root (i.e., RCS revisions 18425839Speter 1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ..., 18525839Speter 1.1.1.1000). It took about 0.15 seconds real time to check in the 18625839Speter first revision, and about 0.6 seconds to check in and 0.3 seconds to 18725839Speter retrieve revision 1.1.1.1000 (the worst case). 18825839Speter 18925839Speter * Any attempt to "fix" this problem should be careful not to interfere 19025839Speter with other features, such as lightweight creation of branches 19125839Speter (particularly using CVS magic branches). 19225839Speter 19325839SpeterDiff follows: 19425839Speter 19525839Speter(Note that in the following diff the old value for the Id keyword was: 19625839Speter Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp 19725839Speterand the new one was: 19825839Speter Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp 19925839Speterbut since this file itself might be subject to keyword expansion I 20025839Speterhaven't included a diff for that fact). 20125839Speter 20225839Speter=================================================================== 20325839SpeterRCS file: RCS/rcsfile.5in,v 20425839Speterretrieving revision 5.6 20525839Speterretrieving revision 5.7 20625839Speterdiff -u -r5.6 -r5.7 20725839Speter--- rcsfile.5in 1995/06/05 08:28:35 5.6 20825839Speter+++ rcsfile.5in 1996/12/09 17:31:44 5.7 20925839Speter@@ -85,7 +85,8 @@ 21025839Speter .LP 21125839Speter \f2sym\fP ::= {\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}* 21225839Speter .LP 21325839Speter-\f2idchar\fP ::= any visible graphic character except \f2special\fP 21425839Speter+\f2idchar\fP ::= any visible graphic character, 21525839Speter+ except \f2digit\fP or \f2special\fP 21625839Speter .LP 21725839Speter \f2special\fP ::= \f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP 21825839Speter .LP 21925839Speter@@ -119,12 +120,23 @@ 22025839Speter the minute (00\-59), 22125839Speter and 22225839Speter .I ss 22325839Speter-the second (00\-60). 22425839Speter+the second (00\-59). 22525839Speter+If 22625839Speter .I Y 22725839Speter-contains just the last two digits of the year 22825839Speter-for years from 1900 through 1999, 22925839Speter-and all the digits of years thereafter. 23025839Speter-Dates use the Gregorian calendar; times use UTC. 23125839Speter+contains exactly two digits, 23225839Speter+they are the last two digits of a year from 1900 through 1999; 23325839Speter+otherwise, 23425839Speter+.I Y 23525839Speter+contains all the digits of the year. 23625839Speter+Dates use the Gregorian calendar. 23725839Speter+Times use UTC, except that for portability's sake leap seconds are not allowed; 23825839Speter+implementations that support leap seconds should output 23925839Speter+.B 59 24025839Speter+for 24125839Speter+.I ss 24225839Speter+during an inserted leap second, and should accept 24325839Speter+.B 59 24425839Speter+for a deleted leap second. 24525839Speter .PP 24625839Speter The 24725839Speter .I newphrase 24825839Speter@@ -144,16 +156,23 @@ 24925839Speter field in order of decreasing numbers. 25025839Speter The 25125839Speter .B head 25225839Speter-field in the 25325839Speter-.I admin 25425839Speter-node points to the head of that sequence (i.e., contains 25525839Speter+field points to the head of that sequence (i.e., contains 25625839Speter the highest pair). 25725839Speter The 25825839Speter .B branch 25925839Speter-node in the admin node indicates the default 26025839Speter+field indicates the default 26125839Speter branch (or revision) for most \*r operations. 26225839Speter If empty, the default 26325839Speter branch is the highest branch on the trunk. 26425839Speter+The 26525839Speter+.B symbols 26625839Speter+field associates symbolic names with revisions. 26725839Speter+For example, if the file contains 26825839Speter+.B "symbols rr:1.1;" 26925839Speter+then 27025839Speter+.B rr 27125839Speter+is a name for revision 27225839Speter+.BR 1.1 . 27325839Speter .PP 27425839Speter All 27525839Speter .I delta 27625839Speter 277