11590Srgrimes
21590Srgrimes	@(#)README	8.1 (Berkeley) 6/9/93
3105236Scharnier  $FreeBSD: releng/10.2/usr.bin/compress/doc/README 108470 2002-12-30 21:18:15Z schweikh $
41590Srgrimes
51590SrgrimesCompress version 4.0 improvements over 3.0:
61590Srgrimes	o compress() speedup (10-50%) by changing division hash to xor
71590Srgrimes	o decompress() speedup (5-10%)
81590Srgrimes	o Memory requirements reduced (3-30%)
91590Srgrimes	o Stack requirements reduced to less than 4kb
101590Srgrimes	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
111590Srgrimes    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
121590Srgrimes	o Default to 'quiet' mode
131590Srgrimes	o Unification of 'force' flags
141590Srgrimes	o Manual page overhaul
151590Srgrimes	o Portability enhancement for M_XENIX
161590Srgrimes	o Removed text on #else and #endif
171590Srgrimes	o Added "-V" switch to print version and options
181590Srgrimes	o Added #defines for SIGNED_COMPARE_SLOW
191590Srgrimes	o Added Makefile and "usermem" program
201590Srgrimes	o Removed all floating point computations
211590Srgrimes	o New programs: [deleted]
221590Srgrimes
231590SrgrimesThe "usermem" script attempts to determine the maximum process size.  Some
241590Srgrimesediting of the script may be necessary (see the comments).  [It should work
25105236Scharnierfine on 4.3 BSD.] If you can't get it to work at all, just create file
261590Srgrimes"USERMEM" containing the maximum process size in decimal.
271590Srgrimes
281590SrgrimesThe following preprocessor symbols control the compilation of "compress.c":
291590Srgrimes
301590Srgrimes	o USERMEM		Maximum process memory on the system
31105236Scharnier	o SACREDMEM		Amount to reserve for other processes
321590Srgrimes	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
331590Srgrimes	o NO_UCHAR		Don't use "unsigned char" types
341590Srgrimes	o BITS			Overrules default set by USERMEM-SACREDMEM
351590Srgrimes	o vax			Generate inline assembler
361590Srgrimes	o interdata		Defines SIGNED_COMPARE_SLOW
371590Srgrimes	o M_XENIX		Makes arrays < 65536 bytes each
381590Srgrimes	o pdp11			BITS=12, NO_UCHAR
391590Srgrimes	o z8000			BITS=12
401590Srgrimes	o pcxt			BITS=12
411590Srgrimes	o BSD4_2		Allow long filenames ( > 14 characters) &
421590Srgrimes				Call setlinebuf(stderr)
431590Srgrimes
441590SrgrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
451590Srgrimesspecified with the "-b" flag.
461590Srgrimes
471590Srgrimesmemory: at least		BITS
481590Srgrimes------  -- -----                ----
491590Srgrimes     433,484			 16
501590Srgrimes     229,600			 15
511590Srgrimes     127,536			 14
521590Srgrimes      73,464			 13
531590Srgrimes           0			 12
541590Srgrimes
551590SrgrimesThe default is BITS=16.
561590Srgrimes
57105236ScharnierThe maximum bits can be overruled by specifying "-DBITS=bits" at
581590Srgrimescompilation time.
591590Srgrimes
601590SrgrimesWARNING: files compressed on a large machine with more bits than allowed by 
611590Srgrimesa version of compress on a smaller machine cannot be decompressed!  Use the
621590Srgrimes"-b12" flag to generate a file on a large machine that can be uncompressed 
631590Srgrimeson a 16-bit machine.
641590Srgrimes
651590SrgrimesThe output of compress 4.0 is fully compatible with that of compress 3.0.
661590SrgrimesIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
671590Srgrimesthe output of compress 3.0 may be fed into uncompress 4.0.
681590Srgrimes
691590SrgrimesThe output of compress 4.0 not compatible with that of
701590Srgrimescompress 2.0.  However, compress 4.0 still accepts the output of
711590Srgrimescompress 2.0.  To generate output that is compatible with compress
721590Srgrimes2.0, use the undocumented "-C" flag.
731590Srgrimes
741590Srgrimes	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
751590Srgrimes--------------------------------
761590Srgrimes
771590SrgrimesEnclosed is compress version 3.0 with the following changes:
781590Srgrimes
791590Srgrimes1.	"Block" compression is performed.  After the BITS run out, the
801590Srgrimes	compression ratio is checked every so often.  If it is decreasing,
811590Srgrimes	the table is cleared and a new set of substrings are generated.
821590Srgrimes
831590Srgrimes	This makes the output of compress 3.0 not compatible with that of
841590Srgrimes	compress 2.0.  However, compress 3.0 still accepts the output of
851590Srgrimes	compress 2.0.  To generate output that is compatible with compress
861590Srgrimes	2.0, use the undocumented "-C" flag.
871590Srgrimes
881590Srgrimes2.	A quiet "-q" flag has been added for use by the news system.
891590Srgrimes
901590Srgrimes3.	The character chaining has been deleted and the program now uses
911590Srgrimes	hashing.  This improves the speed of the program, especially
921590Srgrimes	during decompression.  Other speed improvements have been made,
931590Srgrimes	such as using putc() instead of fwrite().
941590Srgrimes
951590Srgrimes4.	A large table is used on large machines when a relatively small
961590Srgrimes	number of bits is specified.  This saves much time when compressing
971590Srgrimes	for a 16-bit machine on a 32-bit virtual machine.  Note that the
981590Srgrimes	speed improvement only occurs when the input file is > 30000
991590Srgrimes	characters, and the -b BITS is less than or equal to the cutoff
1001590Srgrimes	described below.
1011590Srgrimes
1021590SrgrimesMost of these changes were made by James A. Woods (ames!jaw).  Thank you
1031590SrgrimesJames!
1041590Srgrimes
1051590SrgrimesTo compile compress:
1061590Srgrimes
1071590Srgrimes	cc -O -DUSERMEM=usermem -o compress compress.c
1081590Srgrimes
1091590SrgrimesWhere "usermem" is the amount of physical user memory available (in bytes).  
1101590SrgrimesIf any physical memory is to be reserved for other processes, put in 
1111590Srgrimes"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
1121590Srgrimes
1131590SrgrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
1141590Srgrimesspecified, and the cutoff bits where the large+fast table is used.
1151590Srgrimes
1161590Srgrimesmemory: at least		BITS		cutoff
1171590Srgrimes------  -- -----                ----            ------
1181590Srgrimes   4,718,592 			 16		  13
1191590Srgrimes   2,621,440 			 16		  12
1201590Srgrimes   1,572,864			 16		  11
1211590Srgrimes   1,048,576			 16		  10
1221590Srgrimes     631,808			 16               --
1231590Srgrimes     329,728			 15               --
1241590Srgrimes     178,176			 14		  --
1251590Srgrimes      99,328			 13		  --
1261590Srgrimes           0			 12		  --
1271590Srgrimes
1281590SrgrimesThe default memory size is 750,000 which gives a maximum BITS=16 and no
1291590Srgrimeslarge+fast table.
1301590Srgrimes
1311590SrgrimesThe maximum bits can be overruled by specifying "-DBITS=bits" at
1321590Srgrimescompilation time.
1331590Srgrimes
1341590SrgrimesIf your machine doesn't support unsigned characters, define "NO_UCHAR" 
1351590Srgrimeswhen compiling.
1361590Srgrimes
1371590SrgrimesIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
1381590Srgrimes
1391590SrgrimesAfter compilation, move "compress" to a standard executable location, such 
1401590Srgrimesas /usr/local.  Then:
1411590Srgrimes	cd /usr/local
1421590Srgrimes	ln compress uncompress
1431590Srgrimes	ln compress zcat
1441590Srgrimes
1451590SrgrimesOn machines that have a fixed stack size (such as Perkin-Elmer), set the
1461590Srgrimesstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
1471590Srgrimes
1481590SrgrimesNext, install the manual (compress.l).
1491590Srgrimes	cp compress.l /usr/man/manl
1501590Srgrimes	cd /usr/man/manl
1511590Srgrimes	ln compress.l uncompress.l
1521590Srgrimes	ln compress.l zcat.l
1531590Srgrimes
1541590Srgrimes		- or -
1551590Srgrimes
1561590Srgrimes	cp compress.l /usr/man/man1/compress.1
1571590Srgrimes	cd /usr/man/man1
1581590Srgrimes	ln compress.1 uncompress.1
1591590Srgrimes	ln compress.1 zcat.1
1601590Srgrimes
1611590Srgrimes					regards,
1621590Srgrimes					petsd!joe
1631590Srgrimes
1641590SrgrimesHere is a note from the net:
1651590Srgrimes
1661590Srgrimes>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
1671590SrgrimesPath: ames!hplabs!pesnta!amd!turtlevax!ken
1681590SrgrimesFrom: ken@turtlevax.UUCP (Ken Turkowski)
1691590SrgrimesNewsgroups: net.sources
1701590SrgrimesSubject: Re: Compress release 3.0 : sample Makefile
1711590SrgrimesOrganization: CADLINC, Inc. @ Menlo Park, CA
1721590Srgrimes
1731590SrgrimesIn the compress 3.0 source recently posted to mod.sources, there is a
1741590Srgrimes#define variable which can be set for optimum performance on a machine
1751590Srgrimeswith a large amount of memory.  A program (usermem) to calculate the
176105236Scharnierusable amount of physical user memory is enclosed, as well as a sample
177105236Scharnier4.2BSD Vax Makefile for compress.
1781590Srgrimes
1791590SrgrimesHere is the README file from the previous version of compress (2.0):
1801590Srgrimes
1811590Srgrimes>Enclosed is compress.c version 2.0 with the following bugs fixed:
1821590Srgrimes>
1831590Srgrimes>1.	The packed files produced by compress are different on different
1841590Srgrimes>	machines and dependent on the vax sysgen option.
1851590Srgrimes>		The bug was in the different byte/bit ordering on the
1861590Srgrimes>		various machines.  This has been fixed.
1871590Srgrimes>
1881590Srgrimes>		This version is NOT compatible with the original vax posting
1891590Srgrimes>		unless the '-DCOMPATIBLE' option is specified to the C
1901590Srgrimes>		compiler.  The original posting has a bug which I fixed, 
1911590Srgrimes>		causing incompatible files.  I recommend you NOT to use this
1921590Srgrimes>		option unless you already have a lot of packed files from
193105236Scharnier>		the original posting by Thomas.
1941590Srgrimes>2.	The exit status is not well defined (on some machines) causing the
1951590Srgrimes>	scripts to fail.
1961590Srgrimes>		The exit status is now 0,1 or 2 and is documented in
1971590Srgrimes>		compress.l.
1981590Srgrimes>3.	The function getopt() is not available in all C libraries.
1991590Srgrimes>		The function getopt() is no longer referenced by the
2001590Srgrimes>		program.
2011590Srgrimes>4.	Error status is not being checked on the fwrite() and fflush() calls.
2021590Srgrimes>		Fixed.
2031590Srgrimes>
2041590Srgrimes>The following enhancements have been made:
2051590Srgrimes>
2061590Srgrimes>1.	Added facilities of "compact" into the compress program.  "Pack",
2071590Srgrimes>	"Unpack", and "Pcat" are no longer required (no longer supplied).
2081590Srgrimes>2.	Installed work around for C compiler bug with "-O".
2091590Srgrimes>3.	Added a magic number header (\037\235).  Put the bits specified
2101590Srgrimes>	in the file.
2111590Srgrimes>4.	Added "-f" flag to force overwrite of output file.
2121590Srgrimes>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
2131590Srgrimes>	compile.
2141590Srgrimes>6.	The 'uncompress' script has been deleted; simply 
2151590Srgrimes>	'ln compress uncompress' after you compile and it will work.
2161590Srgrimes>7.	Removed extra bit masking for machines that support unsigned
2171590Srgrimes>	characters.  If your machine doesn't support unsigned characters,
2181590Srgrimes>	define "NO_UCHAR" when compiling.
2191590Srgrimes>
2201590Srgrimes>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
2211590Srgrimes>standard executable location, such as /usr/local.  Then:
2221590Srgrimes>	cd /usr/local
2231590Srgrimes>	ln compress uncompress
2241590Srgrimes>	ln compress zcat
2251590Srgrimes>
2261590Srgrimes>On machines that have a fixed stack size (such as Perkin-Elmer), set the
2271590Srgrimes>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
2281590Srgrimes>
2291590Srgrimes>Next, install the manual (compress.l).
2301590Srgrimes>	cp compress.l /usr/man/manl		- or -
2311590Srgrimes>	cp compress.l /usr/man/man1/compress.1
2321590Srgrimes>
2331590Srgrimes>Here is the README that I sent with my first posting:
2341590Srgrimes>
2351590Srgrimes>>Enclosed is a modified version of compress.c, along with scripts to make it
236108470Sschweikh>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
2371590Srgrimes>>(petsd!joe) and a colleague (petsd!peora!srd) did:
2381590Srgrimes>>
2391590Srgrimes>>1. Removed VAX dependencies.
2401590Srgrimes>>2. Changed the struct to separate arrays; saves mucho memory.
2411590Srgrimes>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
2421590Srgrimes>>4. Sorted the character next chain and changed the search to stop
2431590Srgrimes>>prematurely.  This saves a lot on the execution time when compressing.
2441590Srgrimes>>
2451590Srgrimes>>This version is totally compatible with the original version.  Even though
2461590Srgrimes>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
2471590Srgrimes>>machine, due to the size of the arrays.
2481590Srgrimes>>
2491590Srgrimes>>Here is the README file from the original author:
2501590Srgrimes>> 
2511590Srgrimes>>>Well, with all this discussion about file compression (for news batching
2521590Srgrimes>>>in particular) going around, I decided to implement the text compression
2531590Srgrimes>>>algorithm described in the June Computer magazine.  The author claimed
2541590Srgrimes>>>blinding speed and good compression ratios.  It's certainly faster than
2551590Srgrimes>>>compact (but, then, what wouldn't be), but it's also the same speed as
2561590Srgrimes>>>pack, and gets better compression than both of them.  On 350K bytes of
257105236Scharnier>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
2581590Srgrimes>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
2591590Srgrimes>>>pack got about 30% compression, whereas compress got over 50%.  So, I
2601590Srgrimes>>>decided I had something, and that others might be interested, too.
2611590Srgrimes>>>
2621590Srgrimes>>>As is probably true of compact and pack (although I haven't checked),
2631590Srgrimes>>>the byte order within a word is probably relevant here, but as long as
2641590Srgrimes>>>you stay on a single machine type, you should be ok.  (Can anybody
2651590Srgrimes>>>elucidate on this?)  There are a couple of asm's in the code (extv and
2661590Srgrimes>>>insv instructions), so anyone porting it to another machine will have to
2671590Srgrimes>>>deal with this anyway (and could probably make it compatible with Vax
2681590Srgrimes>>>byte order at the same time).  Anyway, I've linted the code (both with
2691590Srgrimes>>>and without -p), so it should run elsewhere.  Note the longs in the
2701590Srgrimes>>>code, you can take these out if you reduce BITS to <= 15.
2711590Srgrimes>>>
2721590Srgrimes>>>Have fun, and as always, if you make good enhancements, or bug fixes,
2731590Srgrimes>>>I'd like to see them.
2741590Srgrimes>>>
2751590Srgrimes>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
2761590Srgrimes>>
2771590Srgrimes>>					regards,
2781590Srgrimes>>					joe
2791590Srgrimes>>
2801590Srgrimes>>--
2811590Srgrimes>>Full-Name:  Joseph M. Orost
2821590Srgrimes>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
2831590Srgrimes>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
2841590Srgrimes>>Phone:      (201) 870-5844
285