1
2          The Subversion Project:  Building a Better CVS
3          ==============================================
4
5              Ben Collins-Sussman <sussman@collab.net>
6              
7                      Written in August 2001
8              Published in Linux Journal, January 2002
9
10Abstract
11--------
12
13This article discusses the history, goals, features and design of
14Subversion (http://subversion.tigris.org), an open-source project that
15aims to produce a compelling replacement for CVS.
16
17
18Introduction 
19------------
20
21If you work on any kind of open-source project, you've probably worked
22with CVS.  You probably remember the first time you learned to do an
23anonymous checkout of a source tree over the net -- or your first
24commit, or learning how to look at CVS diffs.  And then the fateful
25day came: you asked your friend how to rename a file.
26
27"You can't", was the reply.
28
29What?  What do you mean?
30
31"Well, you can delete the file from the repository and then re-add it
32under a new name."
33
34Yes, but then nobody would know it had been renamed...
35
36"Let's call the CVS administrator.  She can hand-edit the repository's
37RCS files for us and possibly make things work."
38
39What?
40
41"And by the way, don't try to delete a directory either."
42
43You rolled your eyes and groaned.  How could such simple tasks be
44difficult?
45
46
47The Legacy of CVS
48-----------------
49
50No doubt about it, CVS has evolved into the standard Software
51Configuration Management (SCM) system of the open source community.
52And rightly so!  CVS itself is Free software, and its wonderful "non
53locking" development model -- whereby dozens of far-flung programmers
54collaborate -- fits the open-source world very well.  In fact, one
55might argue that without CVS, it's doubtful whether sites like
56Freshmeat or Sourceforge would ever have flourished as they do now.
57CVS and its semi-chaotic development model have become an essential
58part of open source culture.
59
60So what's wrong with CVS?
61
62Because it uses the RCS storage-system under the hood, CVS can only
63track file contents, not tree structures.  As a result, the user has
64no way to copy, move, or rename items without losing history.  Tree
65rearrangements are always ugly server-side tweaks.
66
67The RCS back-end cannot store binary files efficiently, and branching
68and tagging operations can grow to be very slow.  CVS also uses the
69network inefficiently; many users are annoyed by long waits, because
70file differeces are sent in only one direction (from server to client,
71but not from client to server), and binary files are always
72transmitted in their entirety.
73
74From a developer's standpoint, the CVS codebase is the result of
75layers upon layers of historical "hacks".  (Remember that CVS began
76life as a collection of shell-scripts to drive RCS.)  This makes the
77code difficult to understand, maintain, or extend.  For example: CVS's
78networking ability was essentially "stapled on".  It was never
79designed to be a native client-server system.  
80
81Rectifying CVS's problems is a huge task -- and we've only listed just
82a few of the many common complaints here.
83
84
85Enter Subversion
86----------------
87
88In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a company
89for commercially supporting and improving CVS.  Cyclic made the first
90public release of a network-enabled CVS (contributed by Cygnus
91software.)  In 1999, Karl Fogel published a book about CVS and the
92open-source development model it enables (cvsbook.red-bean.com).  Karl
93and Jim had long talked about writing a replacement for CVS; Jim had
94even drafted a new, theoretical repository design.  Finally, in
95February of 2000, Brian Behlendorf of CollabNet (www.collab.net)
96offered Karl a full-time job to write a CVS replacement.  Karl
97gathered a team together and work began in May.
98
99The team settled on a few simple goals: it was decided that Subversion
100would be designed as a functional replacement for CVS.  It would do
101everything that CVS does -- preserving the same development model
102while fixing the flaws in CVS's (lack-of) design.  Existing CVS users
103would be the target audience: any CVS user should be able to start
104using Subversion with little effort.  Any other SCM "bonus features"
105were decided to be of secondary importance (at least before a 1.0
106release.)
107
108At the time of writing, the original team has been coding for a little
109over a year, and we have a number of excellent volunteer contributors.
110(Subversion, like CVS, is a open-source project!)
111
112
113Subversion's Features
114----------------------
115
116Here's a quick run-down of some of the reasons you should be excited
117about Subversion:
118
119  * Real copies and renames.  The Subversion repository doesn't use
120    RCS files at all; instead, it implements a 'virtual' versioned
121    filesystem that tracks tree-structures over time (described
122    below).  Files *and* directories are versioned.  At last, there
123    are real client-side `mv' and `cp' commands that behave just as
124    you think.
125
126  * Atomic commits.  A commit either goes into the repository
127    completely, or not all.  
128
129  * Advanced network layer.  The Subversion network server is Apache,
130    and client and server speak WebDAV(2) to one another.  (See the
131    'design' section below.)
132
133  * Faster network access. A binary diffing algorithm is used to
134    store and transmit deltas in *both* directions, regardless of
135    whether a file is of text or binary type.
136
137  * Filesystem "properties".  Each file or directory has an invisible
138    hashtable attached.  You can invent and store any arbitrary
139    key/value pairs you wish: owner, perms, icons, app-creator,
140    mime-type, personal notes, etc.  This is a general-purpose feature
141    for users.  Properties are versioned, just like file contents.
142    And some properties are auto-detected, like the mime-type of a
143    file (no more remembering to use the '-kb' switch!)
144
145  * Extensible and hackable.  Subversion has no historical baggage; it
146    was designed and then implemented as a collection of shared C
147    libraries with well-defined APIs.  This makes Subversion extremely
148    maintainable and usable by other applications and languages.
149
150  * Easy migration.  The Subversion command-line client is very
151    similar to CVS; the development model is the same, so CVS users
152    should have little trouble making the switch.  Development of a
153    'cvs2svn' repository converter is in progress.
154
155  * It's Free.  Subversion is released under a Apache/BSD-style
156    open-source license.
157
158
159Subversion's Design
160-------------------
161
162Subversion has a modular design; it's implemented as a collection of C
163libraries.  Each layer has a well-defined purpose and interface.  In
164general, code flow begins at the top of the diagram and flows
165"downward" -- each layer provides an interface to the layer above it.
166
167              <<insert diagram here:  svn.tiff>>
168
169
170Let's take a short tour of these layers, starting at the bottom.
171
172
173--> The Subversion filesystem.  
174
175The Subversion Filesystem is *not* a kernel-level filesystem that one
176would install in an operating system (like the Linux ext2 fs.)
177Instead, it refers to the design of Subversion's repository.  The
178repository is built on top of a database -- currently Berkeley DB --
179and thus is a collection of .db files.  However, a library accesses
180these files and exports a C API that simulates a filesystem --
181specifically, a "versioned" filesystem.
182
183This means that writing a program to access the repository is like
184writing against other filesystem APIs: you can open files and
185directories for reading and writing as usual.  The main difference is
186that this particular filesystem never loses data when written to; old
187versions of files and directories are always saved as historical
188artifacts.
189
190Whereas CVS's backend (RCS) stores revision numbers on a per-file
191basis, Subversion numbers entire trees.  Each atomic 'commit' to the
192repository creates a completely new filesystem tree, and is
193individually labeled with a single, global revision number.  Files and
194directories which have changed are rewritten (and older versions are
195backed up and stored as differences against the latest version), while
196unchanged entries are pointed to via a shared-storage mechanism.  This
197is how the repository is able to version tree structures, not just
198file contents.
199
200Finally, it should be mentioned that using a database like Berkeley DB
201immediately provides other nice features that Subversion needs: data
202integrity, atomic writes, recoverability, and hot backups.  (See
203www.sleepycat.com for more information.)
204
205
206--> The network layer.
207
208Subversion has the mark of Apache all over it.  At its very core, the
209client uses the Apache Portable Runtime (APR) library.  (In fact, this
210means that Subversion client should compile and run anywhere Apache
211httpd does -- right now, this list includes all flavors of Unix,
212Win32, BeOS, OS/2, Mac OS X, and possibly Netware.)
213
214However, Subversion depends on more than just APR -- the Subversion
215"server" is Apache httpd itself.
216
217Why was Apache chosen?  Ultimately, the decision was about not
218reinventing the wheel.  Apache is a time-tested, open-source server
219process that ready for serious use, yet is still extensible.  It can
220sustain a high network load.  It runs on many platforms and can
221operate through firewalls.  It's able to use a number of different
222authentication protocols.  It can do network pipelining and caching.
223By using Apache as a server, Subversion gets all these features for
224free.  Why start from scratch?
225
226Subversion uses WebDAV as its network protocol.  DAV (Distributed
227Authoring and Versioning) is a whole discussion in itself (see
228www.webdav.org) -- but in short, it's an extension to HTTP that allows
229reads/writes and "versioning" of files over the web.  The Subversion
230project is hoping to ride a slowly rising tide of support for this
231protocol: all of the latest file-browsers for Win32, MacOS, and GNOME
232speak this protocol already.  Interoperability will (hopefully) become
233more and more of a bonus over time.
234
235For users who simply wish to access Subversion repositories on local
236disk, the client can do this too; no network is required.  The
237"Repository Access" layer (RA) is an abstract API implemented by both
238the DAV and local-access RA libraries.  This is a specific benefit of
239writing a "librarized" version control system; it's a big win over
240CVS, which has two very different, difficult-to-maintain codepaths for
241local vs. network repository-access.  Feel like writing a new network
242protocol for Subversion?  Just write a new library that implements the
243RA API!
244
245
246--> The client libraries.
247
248On the client side, the Subversion "working copy" library maintains
249administrative information within special SVN/ subdirectories, similar
250in purpose to the CVS/ administrative directories found in CVS working
251copies.
252
253A glance inside the typical SVN/ directory turns up a bit more than
254usual, however.  The `entries' file contains XML which describes the
255current state of the working copy directory (and which basically
256serves the purposes of CVS's Entries, Root, and Repository files
257combined).  But other items present (and not found in CVS/) include
258storage locations for the versioned "properties" (the metadata
259mentioned in 'Subversion Features' above) and private caches of
260pristine versions of each file.  This latter feature provides the
261ability to report local modifications -- and do reversions --
262*without* network access.  Authentication data is also stored within
263SVN/, rather than in a single .cvspass-like file.
264
265The Subversion "client" library has the broadest responsibility; its
266job is to mingle the functionality of the working-copy library with
267that of the repository-access library, and then to provide a
268highest-level API to any application that wishes to perform general
269version control actions.
270
271For example: the C routine `svn_client_checkout()' takes a URL as an
272argument.  It passes this URL to the repository-access library and
273opens an authenticated session with a particular repository.  It then
274asks the repository for a certain tree, and sends this tree into the
275working-copy library, which then writes a full working copy to disk
276(SVN/ directories and all.)
277
278The client library is designed to be used by any application.  While
279the Subversion source code includes a standard command-line client, it
280should be very easy to write any number of GUI clients on top of the
281client library.  Hopefully, these GUIs should someday prove to be much
282better than the current crop of CVS GUI applications (the majority of
283which are no more than fragile "wrappers" around the CVS command-line
284client.)
285
286In addition, proper SWIG bindings (www.swig.org) should make
287the Subversion API available to any number of languages:  java, perl,
288python, guile, and so on.  In order to Subvert CVS, it helps to be
289ubiquitous! 
290
291
292Subversion's Future
293-------------------
294
295The release of Subversion 1.0 is currently planned for early 2002.
296After the release of 1.0, Subversion is slated for additions such as
297i18n support, "intelligent" merging, better "changeset" manipulation,
298client-side plugins, and improved features for server administration.
299(Also on the wishlist is an eclectic collection of ideas, such as
300distributed, replicating repositories.)
301
302A final thought from Subversion's FAQ:
303
304   "We aren't (yet) attempting to break new ground in SCM systems, nor
305   are we attempting to imitate all the best features of every SCM
306   system out there.  We're trying to replace CVS."
307
308If, in three years, Subversion is widely presumed to be the "standard"
309SCM system in the open-source community, then the project will have
310succeeded.   But the future is still hazy:  ultimately, Subversion
311will have to win this position on its own technical merits.
312
313Patches are welcome.
314
315
316For More Information
317--------------------
318
319Please visit the Subversion project website at
320http://subversion.tigris.org.  There are discussion lists to join, and
321the source code is available via anonymous CVS -- and soon through
322Subversion itself.
323
324