1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4<head>
5<meta name="generator" content="HTML Tidy, see www.w3.org" />
6<title>Clean up your Web pages with HTML TIDY</title>
7<meta name="keywords"
8content="HTML, validation, error correction, pretty-printing" />
9<meta name="author" content="Dave Raggett &lt;dsr@w3.org&gt;" />
10<style type="text/css">
11  body { 
12    margin-left: 10%; 
13    margin-right: 10%; 
14    font-family: sans-serif
15  }
16  h1 { margin-left: -8% }
17  h2,h3 { margin-left: -4% }
18  pre { color: green; font-weight: bold; font-family: monospace}
19  em { font-style: italic; color: rgb(0, 0, 153) }
20  strong { text-transform: uppercase; font-weight: bold }
21  .note {font-style: italic; color: rgb(192, 101, 101) }
22  //hr {text-align: center; width: 60% }
23  blockquote {
24    color: navy;
25    font-family: "Comic Sans MS", "Times New Roman", serif
26  }
27  blockquote.people { text-align: center; }
28  p.splash { color: maroon}
29  div h4 {margin-left 3%}
30  div p {margin-left: 5%}
31  table {
32    font-family: sans-serif;
33    font-size: 80%;
34    background: rgb(255,255,153)
35  }
36  td {
37    font-size: 80%
38  }
39  .people {font-family: "Lucida Calligraphy", serif}
40  :link { color: rgb(0, 0, 153) }
41  :visited { color: rgb(153, 0, 153) }
42  :active { color: rgb(255, 0, 102) }
43  a :hover { color: rgb(0, 0, 255) }
44</style>
45
46<style type="text/css">
47 blockquote.c9 {font-style: italic}
48 span.c8 {color: maroon}
49 p.c7 {font-style: italic}
50 a.c6 {font-weight: bold}
51 div.c5 {text-align: center}
52 hr.c4 {text-align: center}
53 p.c3 {text-align: center}
54 p.c2 {font-weight: bold; text-align: center}
55 h1.c1 {text-align: center}
56</style>
57
58<style type="text/css">
59 p.c1 {font-weight: bold}
60</style>
61</head>
62<body bgcolor="#FFFFFF" background="grid.gif" text="black"
63link="navy" vlink="black" alink="red">
64<h1 class="c1"><img src="tidy.gif" width="32" height="32"
65align="top" alt="icon" /> Clean up your Web pages<br />
66 with HTML TIDY</h1>
67
68<p class="c2">This version 4th August 2000</p>
69
70<p class="c3"><small>Copyright &#169; 1998-2000 <a
71href="http://www.w3.org/">W3C</a>, see <a
72href="tidy.c">tidy.c</a> for copyright notice.</small></p>
73
74<blockquote>With many thanks to <a
75href="http://www.hp.com/">Hewlett Packard</a> for financial
76support during the development of this software!</blockquote>
77
78<hr width="80%" class="c4" />
79<p class="c3"><a href="#help">How to use Tidy</a> | <a
80href="#download">Downloading Tidy</a> | <a
81href="release-notes.html">Release Notes</a><br />
82 <a href="#quotes">Integration with other Software</a> | <a
83href="#acks">Acknowledgements</a></p>
84
85<hr width="80%" class="c4" />
86<p>To get the latest version of Tidy please visit the original
87version of this page at: <a
88href="http://www.w3.org/People/Raggett/tidy/">http://www.w3.org/People/Raggett/tidy/</a>.
89Courtesy of Netmind, you can register for email reminders when
90new versions of tidy become available.</p>
91
92<form method="get"
93action="http://www.netmind.com/cgi-bin/uncgi/url-mind">
94<div class="c5"><input type="submit"
95value="Press Here to Register" /></div>
96</form>
97
98<p>The public email list devoted to HTML Tidy is: &lt;<a
99href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>&gt;. To
100subscribe send an email to html-tidy-request@w3.org with the word
101subscribe in the subject line (include the word unsubscribe if
102you want to unsubscribe). The <a
103href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a>
104for this list is accessible online. Please use this list to
105report errors or enhancement requests. See the <a
106href="release-notes.html" class="c6">release notes</a> for
107information on recent changes. Your feedback is welcome!</p>
108
109<p>If you find HTML Tidy useful and you would like to say thanks,
110then please send me a (paper) postcard or other souvenir from the
111area in which you live along with a few words on what you are
112using Tidy for. It will be fun to map out where Tidy users are to
113be found! My <a href="#address">postal address</a> is given at
114the end of this file.</p>
115
116<h3>Tutorials for HTML and CSS</h3>
117
118<p>If you are just starting off and would like to know more about
119how to author Web pages, you may find my <a
120href="http://www.w3.org/MarkUp/Guide/">guide to HTML and CSS</a>
121helpful. Please send me feedback on this, and I will do my best
122to further improve it.</p>
123
124<h4>Support for Word2000</h4>
125
126<p>Tidy can now perform wonders on HTML saved from Microsoft Word
1272000! Word bulks out HTML files with stuff for round-tripping
128presentation between HTML and Word. If you are more concerned
129about using HTML on the Web, check out Tidy's "<a
130href="#word2000">Word-2000"</a> config option! Of course Tidy
131does a good job on Word'97 files as well!</p>
132
133<h3>Introduction to TIDY</h3>
134
135<p>When editing HTML it's easy to make mistakes. Wouldn't it be
136nice if there was a simple way to fix these mistakes
137automatically and tidy up sloppy editing into nicely layed out
138markup? Well now there is! Dave Raggett's HTML TIDY is a free
139utility for doing just that. It also works great on the
140atrociously hard to read markup generated by specialized HTML
141editors and conversion tools, and can help you identify where you
142need to pay further attention on making your pages more
143accessible to people with disabilities.</p>
144
145<p>Tidy is able to fix up a wide range of problems and to bring
146to your attention things that you need to work on yourself. Each
147item found is listed with the line number and column so that you
148can see where the problem lies in your markup. Tidy won't
149generate a cleaned up version when there are problems that it
150can't be sure of how to handle. These are logged as "errors"
151rather than "warnings".</p>
152
153<p class="c7">Tidy features in a <a
154href="http://webreview.com/wr/pub/1999/07/16/feature/index.html">recent
155article on XHTML</a> by webreview.com.</p>
156
157<!-- is the final "index.html" needed or appropriate? -->
158<h3>Examples of TIDY at work</h3>
159
160<p>Tidy corrects the markup in a way that matches where possible
161the observed rendering in popular browsers from Netscape and
162Microsoft. Here are just a few examples of how TIDY perfects your
163HTML for you:</p>
164
165<ul>
166<li><b>Missing or mismatched end tags are detected and
167corrected</b> 
168
169<pre>
170   &lt;h1&gt;heading
171   &lt;h2&gt;subheading&lt;/h3&gt;
172</pre>
173
174<p>is mapped to</p>
175
176<pre>
177   &lt;h1&gt;heading&lt;/h1&gt;
178   &lt;h2&gt;subheading&lt;/h2&gt;
179</pre>
180</li>
181
182<li><b>End tags in the wrong order are corrected:</b> 
183
184<pre>
185   &lt;p&gt;here is a para &lt;b&gt;bold &lt;i&gt;bold italic&lt;/b&gt; bold?&lt;/i&gt; normal?
186</pre>
187
188<p>is mapped to</p>
189
190<pre>
191   &lt;p&gt;here is a para &lt;b&gt;bold &lt;i&gt;bold italic&lt;/i&gt; bold?&lt;/b&gt; normal?
192</pre>
193</li>
194
195<li><b>Fixes problems with heading emphasis</b> 
196
197<pre>
198   &lt;h1&gt;&lt;i&gt;italic heading&lt;/h1&gt;
199   &lt;p&gt;new paragraph
200</pre>
201
202<p>In Netscape and Internet Explorer this causes everything
203following the heading to be in the heading font size, not the
204desired effect at all!</p>
205
206<p>Tidy maps the example to</p>
207
208<pre>
209   &lt;h1&gt;&lt;i&gt;italic heading&lt;/i&gt;&lt;/h1&gt;
210   &lt;p&gt;new paragraph
211</pre>
212</li>
213
214<li><b>Recovers from mixed up tags</b> 
215
216<pre>
217   &lt;i&gt;&lt;h1&gt;heading&lt;/h1&gt;&lt;/i&gt;
218   &lt;p&gt;new paragraph &lt;b&gt;bold text
219   &lt;p&gt;some more bold text
220</pre>
221
222<p>Tidy maps this to</p>
223
224<pre>
225   &lt;h1&gt;&lt;i&gt;heading&lt;/i&gt;&lt;/h1&gt;
226   &lt;p&gt;new paragraph &lt;b&gt;bold text&lt;/b&gt;
227   &lt;p&gt;&lt;b&gt;some more bold text&lt;/b&gt;
228</pre>
229</li>
230
231<li><b>Getting the &lt;hr&gt; in the right place:</b> 
232
233<pre>
234   &lt;h1&gt;&lt;hr&gt;heading&lt;/h1&gt;
235   &lt;h2&gt;sub&lt;hr&gt;heading&lt;/h2&gt;
236</pre>
237
238<p>Tidy maps this to</p>
239
240<pre>
241   &lt;hr&gt;
242   &lt;h1&gt;heading&lt;/h1&gt;
243   &lt;h2&gt;sub&lt;/h2&gt;
244   &lt;hr&gt;
245   &lt;h2&gt;heading&lt;/h2&gt;
246</pre>
247</li>
248
249<li><b>Adding the missing "/" in end tags for anchors:</b> 
250
251<pre>
252   &lt;a href="#refs"&gt;References&lt;a&gt;
253</pre>
254
255<p>Tidy maps this to</p>
256
257<pre>
258   &lt;a href="#refs"&gt;References&lt;/a&gt;
259</pre>
260</li>
261
262<li><b>Perfecting lists by putting in tags missed out:</b> 
263
264<pre>
265   &lt;body&gt;
266   &lt;li&gt;1st list item
267   &lt;li&gt;2nd list item
268</pre>
269
270<p>is mapped to</p>
271
272<pre>
273   &lt;body&gt;
274   &lt;ul&gt;
275   &lt;li&gt;1st list item&lt;/li&gt;
276   &lt;li&gt;2nd list item&lt;/li&gt;
277   &lt;/ul&gt;
278</pre>
279</li>
280
281<li><b>Missing quotes around attribute values are added</b> 
282
283<p>Tidy inserts quote marks around all attribute values for you.
284It can also detect when you have forgotten the closing quote
285mark, although this is something you will have to fix
286yourself.</p>
287</li>
288
289<li><b>Unknown/Proprietary attributes are reported</b> 
290
291<p>Tidy has a comprehensive knowledge of the attributes defined
292in the HTML 4.0 recommendation from W3C. This often allows you to
293spot where you have mistyped an attribute or value.</p>
294</li>
295
296<li><b>Proprietary elements are recognized and reported as
297such.</b> 
298
299<p>Tidy will even work out which version of HTML you are using
300and insert the appropriate DOCTYPE element, as per the W3C
301recommendations.</p>
302</li>
303
304<li><b>Tags lacking a terminating '&gt;' are spotted</b> 
305
306<p>This is something you then have to fix yourself as Tidy is
307unsure of where the &gt; should be inserted.</p>
308</li>
309</ul>
310
311<h3>Layout style</h3>
312
313<p>You can choose which style you want Tidy to use when it
314generates the cleaned up markup: for instance whether you like
315elements to indent their contents or not. Several people have
316asked if Tidy could preserve the original layout. I am sorry to
317say that this would be very hard to support due to the way Tidy
318is implemented. Tidy starts by building a clean parse tree from
319the source file. The parse tree doesn't contain any information
320about the original layout. Tidy then pretty prints the parse tree
321using the current layout options. Trying to preserve the original
322layout would interact badly with the repair operations needed to
323build a clean parse tree and considerably complicate the
324code.</p>
325
326<p>Some browsers can screw up the right alignment of text
327depending on how you layout headings. As an example,
328consider:</p>
329
330<pre>
331&lt;h1 align="right"&gt;
332  Heading
333&lt;/h1&gt;
334
335&lt;h1 align="right"&gt;Heading&lt;/h1&gt;
336</pre>
337
338<p>Both of these should be rendered the same. Sadly a common
339browser bug fails to trim trailing whitespace and misaligns the
340first heading. HTML Tidy will protect you from this bug, except
341when you set the indent option to "yes".</p>
342
343<p>Setting the indent option to yes can also cause problems with
344table layout for some browsers:</p>
345
346<pre>
347&lt;td&gt;&lt;img src="foo.gif"&gt;&lt;/td&gt;
348&lt;td&gt;&lt;img src="foo.gif"&gt;&lt;/td&gt;
349</pre>
350
351<p>will look slightly different from:</p>
352
353<pre>
354&lt;td&gt;
355  &lt;img src="foo.gif"&gt;
356&lt;/td&gt;
357&lt;td&gt;
358  &lt;img src="foo.gif"&gt;
359&lt;/td&gt;
360</pre>
361
362<p>You can avoid such quirks by using indent:&#160;no or
363indent:&#160;auto in the config file.</p>
364
365<h3>Internationalization issues</h3>
366
367<p>Tidy offers you a choice of character encodings: US ASCII, ISO
368Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The
369full set of HTML 4.0 entities are defined. Cleaned up output uses
370HTML entity names for characters when appropriate. Otherwise
371characters outside the normal range are output as numeric
372character entities. Tidy defaults to assuming you want the output
373to be in US ASCII. Tidy doesn't yet recognize the use of the HTML
374meta element for specifying the character encoding.</p>
375
376<h3>Accessibility</h3>
377
378<p>Tidy offers advice on accessibility problems for people using
379non-graphical browsers. The most common thing you will see is the
380suggestion you add a summary attribute to table elements. The
381idea is to provide a summary of the table's role and structure
382suitable for use with aural browsers.</p>
383
384<h3>Cleaning up presentational markup</h3>
385
386<p>Many tools generate HTML with an excess of FONT, NOBR and
387CENTER tags. Tidy's <em>-clean</em> option will replace them by
388style properties and rules using CSS. This makes the markup
389easier to read and maintain as well as reducing the file size!
390Tidy is expected to get smarter at this in the future.</p>
391
392<p>Some pages rely on the presentation effects of isolated
393&lt;p&gt; or &lt;/p&gt; tags.Tidy deletes empty paragraph and
394heading elements etc. The use of empty paragraph elements is not
395recommended for adding vertical whitespace. Instead use style
396sheets, or the &lt;br&gt; element. Tidy won't discard paragraphs
397only containing a nonbreaking space &amp;nbsp;</p>
398
399<h3>Teaching Tidy about new tags!</h3>
400
401<p>You can teach Tidy about new tags by declaring them in the
402configuration file, the syntax is:</p>
403
404<pre>
405  new-inline-tags: <em>tag1, tag2, tag3</em>
406  new-empty-tags: <em>tag1, tag2, tag3</em>
407  new-blocklevel-tags: <em>tag1, tag2, tag3</em>
408  new-pre-tags: <em>tag1, tag2, tag3</em>
409</pre>
410
411<p>The same tag can be defined as empty and as inline or as empty
412and as block.</p>
413
414<p>These declarations can be combined to define an a new empty
415inline or empty block element, but you are not advised to declare
416tags as being both inline and block!</p>
417
418<p>Note that the new tags can only appear where Tidy expects
419inline or block-level tags respectively. This means you can't
420(yet) place new tags within the document head or other contexts
421with restricted content models. So far the most popular use of
422this feature is to allow Tidy to be applied to Cold Fusion
423files.</p>
424
425<p class="c7">I am working on ways to make it easy to customize
426the permitted document syntax using <a
427href="http://www.w3.org/People/Raggett/dtdgen/Docs/">assertion
428grammars</a>, and hope to apply this to a much smarter version of
429Tidy for release later this year or early next year.</p>
430
431<h3>Limited support for ASP, JSTE and PHP</h3>
432
433<p>Tidy is somewhat aware of the preprocessing language called
434ASP which uses a pseudo element syntax &lt;%&#160;...&#160;%&gt;
435to include preprocessor directives. ASP is normally interpreted
436by the web server before delivery to the browser. JSTE shares the
437same syntax, but sometimes also uses &lt;#&#160;...&#160;#&gt;.
438Tidy can also cope with another such language called PHP, which
439uses the syntax &lt;?php&#160;...&#160;?&gt;</p>
440
441<p>Tidy will cope with ASP, JSTE and PHP pseudo elements within
442element content and as replacements for attributes, for
443example:</p>
444
445<pre>
446  &lt;option &lt;% if rsSchool.Fields("ID").Value
447    = session("sessSchoolID")
448    then Response.Write("selected") %&gt;
449    value='&lt;%=rsSchool.Fields("ID").Value%&gt;'&gt;
450    &lt;%=rsSchool.Fields("Name").Value%&gt;
451    (&lt;%=rsSchool.Fields("ID").Value%&gt;)
452  &lt;/option&gt;
453</pre>
454
455<p>Note that Tidy doesn't understand the scripting language used
456within pseudo elements and attributes, and can easily get
457confused. Tidy may report missing attributes when these are
458hidden within preprocessor code. Tidy can also get things wrong
459if the code includes quote marks, e.g. if the example above is
460changed to:</p>
461
462<pre>
463    value="&lt;%=rsSchool.Fields("ID").Value%&gt;"
464</pre>
465
466<p>Tidy will now see the quote mark preceding ID as ending the
467attribute value, and proceed to complain about what follows. Note
468you can choose whether to allow line wrapping on spaces within
469pseudo elements or not using the <tt>wrap-asp</tt> option. If you
470used ASP, JSTE or PHP to create a start tag, but placed the end
471tag explicitly in the markup, Tidy won't be able to match them
472up, and will delete the end tag for you. So in this case you are
473advise to make the start tag explicit and to use ASP, JSTE or PHP
474for just the attributes, e.g.</p>
475
476<pre>
477   &lt;a href="&lt;%=random.site()%&gt;"&gt;do you feel lucky?&lt;/a&gt;
478</pre>
479
480<p>Tidy allows you to control whether line wrapping is enabled
481for ASP, JSTE and PHP instructions, see the wrap-asp, wrap-jste
482and wrap-php config options, respectively.</p>
483
484<p>I regret that Tidy does <b>not</b> support Tango preprocessing
485instructions which look like:</p>
486
487<pre>
488&lt;@if variable_1='a'&gt;
489    do something
490&lt;@else&gt;
491    do nothing
492&lt;/@if&gt;
493
494&lt;@include &lt;@cgi&gt;&lt;@appfilepath&gt;includes/message.html&gt;
495</pre>
496
497<p>Tidy supports another preprocessing syntax called "Tango", but
498only for attribute values. Adding support for pseudo elements
499written in Tango looks as if it would be quite tough, so I would
500like to gauge the level of interest before committing to this
501work.</p>
502
503<h3>Limited support for XML</h3>
504
505<p>XML processors compliant with W3C's XML 1.0 recommendation are
506very picky about which files they will accept. Tidy can help you
507to fix errors that cause your XML files to be rejected. Tidy
508doesn't yet recognize all XML features though, e.g. it doesn't
509understand CDATA sections or DTD subsets.</p>
510
511<h3>Creating Slides</h3>
512
513<p>The <em>-slides</em> option allows you to burst a single HTML
514file into a number of linked slides. Each H2 element in the input
515file is treated as delimiting the start of the next slide. The
516slides are named slide1.html, slide2.html, slide3.html etc. This
517is a relatively new feature and ideas are welcomed as to how to
518improve it. In particular, I plan to add support to the
519configuration file for setting the style sheet for slides and for
520customizing the slides via a template.</p>
521
522<p>I would be interested in hearing from anyone who can offer
523help with using JavaScript for adding dynamic effects to slides,
524for instance similar to those available in Microsoft
525PowerPoint.</p>
526
527<h3>Indenting text for a better layout</h3>
528
529<p>Indenting the content of elements makes the markup easier to
530read. Tidy can do this for all elements or just for those where
531it's needed. The auto-indent mode has been used below to avoid
532indenting the content of title, p and li elements:</p>
533
534<pre>
535&lt;html&gt;
536  &lt;head&gt;
537    &lt;title&gt;Test document&lt;/title&gt;
538  &lt;/head&gt;
539
540  &lt;body&gt;
541    &lt;p&gt;para which has enough text to cause a line break,
542    and so test the wrapping mechanism for long lines.&lt;/p&gt;
543&lt;pre&gt;
544This is
545&lt;em&gt;genuine
546       preformatted&lt;/em&gt;
547   text
548&lt;/pre&gt;
549
550    &lt;ul&gt;
551      &lt;li&gt;1st list item&lt;/li&gt;
552
553      &lt;li&gt;2nd list item&lt;/li&gt;
554    &lt;/ul&gt;
555    &lt;!-- end comment --&gt;
556  &lt;/body&gt;
557&lt;/html&gt;
558</pre>
559
560<p>Indenting the content does increase the size of the file, so
561you may prefer Tidy's default style:</p>
562
563<pre>
564 &lt;html&gt;
565 &lt;head&gt;
566 &lt;title&gt;Test document&lt;/title&gt;
567 &lt;/head&gt;
568 &lt;body&gt;
569 &lt;p&gt;para which has enough text to cause a line break,
570 and so test the wrapping mechanism for long lines.&lt;/p&gt;
571 
572 &lt;pre&gt;This is
573 &lt;em&gt;genuine
574       preformatted&lt;/em&gt;
575    text
576 &lt;/pre&gt;
577 
578 &lt;ul&gt;
579 &lt;li&gt;1st list item &lt;/li&gt;
580 
581 &lt;li&gt;2nd list item&lt;/li&gt;
582 &lt;/ul&gt;
583 
584 &lt;!-- end comment --&gt;
585 &lt;/body&gt;
586 &lt;/html&gt;
587 
588</pre>
589
590<h3><a id="help" name="help">How to run tidy</a></h3>
591
592<pre>
593   <span class="c8">tidy</span> <em>[[options] filename]*</em>
594</pre>
595
596<p>HTML tidy is not (yet) a Windows program. If you run tidy
597without any arguments, it will just sit there waiting to read
598markup on the stdin stream. Tidy's input and output default to
599stdin and stdout respectively. Errors are written to stderr but
600can be redirected to a file with the -f <em>filename</em>
601option.</p>
602
603<p>I generally use the -m option to get tidy to update the
604original file, and if the file is particularly bad I also use the
605-f option to write the errors to a file to make it easier to
606review them. Tidy supports a small set of character encoding
607options. The default is ASCII, which makes it easy to edit markup
608in regular text editors.</p>
609
610<p>For instance:</p>
611
612<pre>
613   tidy -f errs.txt -m index.html
614</pre>
615
616<p>which runs tidy on the file "index.html" updating it in place
617and writing the error messages to the file "errs.txt". Its a good
618idea to save your work before tidying it, as with all complex
619software, tidy may have bugs. If you find any please let me
620know!</p>
621
622<p>Thanks to Jacek Niedziela, The Win32 executable for tidy is
623now able to example wild cards in filenames. This utilizes the
624setargv library supplied with VC++.</p>
625
626<p>Tidy writes errors to stderr, and won't be paused by the more
627command. A work around is to redirect stderr to stdout as
628follows. This works on Unix and Windows NT, but not on other
629platforms. My thanks to Markus Wolf for this tip!</p>
630
631<pre>
632   tidy file.html 2&gt;&amp;1 | more
633</pre>
634
635<h4>Tidy's Options</h4>
636
637<p>To get a list of available options use:</p>
638
639<pre>
640   tidy -help
641</pre>
642
643<p>You may want to run it through more to view the help a page at
644a time.</p>
645
646<pre>
647   tidy -help | more
648</pre>
649
650<p>Input and Output default to stdin/stdout respectively. Single
651letter options apart from -f may be combined as in: tidy -f
652errs.txt -imu foo.html</p>
653
654<p>Matej Vela &lt;<a
655href="mailto:vela@debian.org">vela@debian.org</a>&gt; has written
656a <a href="man_page.txt">Unix man page for Tidy</a>, but for the
657latest details on config options and for the release notes please
658visit this page: <a
659href="http://www.w3.org/People/Raggett/tidy">http://www.w3.org/People/Raggett/tidy</a>.</p>
660
661<h3><a id="config" name="config">Using a Configuration
662File</a></h3>
663
664<p>Tidy now supports a configuration file, and this is now much
665the most convenient way to configure Tidy. Assuming you have
666created a config file named "config.txt" (the name doesn't
667matter), you can instruct Tidy to use it via the command line
668option <tt>-config config.txt</tt>, e.g.</p>
669
670<pre>
671   tidy -config config.txt file1.html file2.html
672</pre>
673
674<p>Alternatively, you can name the default config file via the
675environment variable named "HTML_TIDY". Note this should be the
676absolute path since you are likely to want to run Tidy in
677different directories. You can also set a config file at compile
678time by defining CONFIG_FILE as the path string, see
679platform.h.</p>
680
681<p>You can now set config options on the command line by
682preceding the name of the option immediately (no intervening
683space) by "--", for example:</p>
684
685<pre>
686  tidy --break-before-br true --show-warnings false
687</pre>
688
689<p>The following options are supported:</p>
690
691<dl>
692<dt>tidy-mark: <em>bool</em></dt>
693
694<dd>If set to <em>yes</em> (the default) Tidy will add a meta
695element to the document head to indicate that the document has
696been tidied. To suppress this, set tidy-mark to <em>no</em>. Tidy
697won't add a meta element if one is already present.</dd>
698
699<dt>markup: <em>bool</em></dt>
700
701<dd>Determines whether Tidy generates a pretty printed version of
702the markup. Bool values are either <em>yes</em> or <em>no</em>.
703Note that Tidy won't generate a pretty printed version if it
704finds unknown tags, or missing trailing quotes on attribute
705values, or missing trailing '&gt;' on tags. The default is
706<em>yes</em>.</dd>
707
708<dt>wrap: <em>number</em></dt>
709
710<dd>Sets the right margin for line wrapping. Tidy tries to wrap
711lines so that they do not exceed this length. The default is 66.
712Set wrap to zero if you want to disable line wrapping.</dd>
713
714<dt>wrap-attributes: <em>bool</em></dt>
715
716<dd>If set to <em>yes</em>, attribute values may be wrapped
717across lines for easier editing. The default is no. This option
718can be set independently of wrap-scriptlets</dd>
719
720<dt>wrap-script-literals: <em>bool</em></dt>
721
722<dd>If set to <em>yes</em>, this allows lines to be wrapped
723within string literals that appear in script attributes. The
724default is <em>no</em>. The example shows how Tidy wraps a really
725really long script string literal inserting a backslash character
726before the linebreak: 
727
728<pre>
729&lt;a href="somewhere.html" onmouseover="document.status = '...some \
730really, really, really, really, really, really, really, really, \
731really, really long string..';"&gt;test&lt;/a&gt;
732</pre>
733</dd>
734
735<dt>wrap-asp: <em>bool</em></dt>
736
737<dd>If set to <em>no</em>, this prevents lines from being wrapped
738within ASP pseudo elements, which look like:
739&lt;%&#160;...&#160;%&gt;. The default is <em>yes</em>.</dd>
740
741<dt>wrap-jste: <em>bool</em></dt>
742
743<dd>If set to <em>no</em>, this prevents lines from being wrapped
744within JSTE pseudo elements, which look like:
745&lt;#&#160;...&#160;#&gt;. The default is <em>yes</em>.</dd>
746
747<dt>wrap-php: <em>bool</em></dt>
748
749<dd>If set to <em>no</em>, this prevents lines from being wrapped
750within PHP pseudo elements. The default is <em>yes</em>.</dd>
751
752<dt>literal-attributes: <em>bool</em></dt>
753
754<dd>If set to <em>yes</em>, this ensures that whitespace
755characters within attribute values are passed through unchanged.
756The default is <em>no</em>.</dd>
757
758<dt>tab-size: <em>number</em></dt>
759
760<dd>Sets the number of columns between successive tab stops. The
761default is 4. It is used to map tabs to spaces when reading
762files. Tidy never outputs files with tabs.</dd>
763
764<dt>indent: <em>no, yes</em> or <em>auto</em></dt>
765
766<dd>If set to <em>yes</em>, Tidy will indent block-level tags.
767The default is <em>no</em>. If set to <em>auto</em> Tidy will
768decide whether or not to indent the content of tags such as
769title, h1-h6, li, td, th, or p depending on whether or not the
770content includes a block-level element. You are advised to avoid
771setting indent to yes as this can expose layout bugs in some
772browsers.</dd>
773
774<dt>indent-spaces: <em>number</em></dt>
775
776<dd>Sets the number of spaces to indent content when indentation
777is enabled. The default is 2 spaces.</dd>
778
779<dt>indent-attributes: <em>bool</em></dt>
780
781<dd>If set to <em>yes</em>, each attribute will begin on a new
782line. The default is <em>no</em>.</dd>
783
784<dt>hide-endtags: <em>bool</em></dt>
785
786<dd>If set to <em>yes</em>, optional end-tags will be omitted
787when generating the pretty printed markup. This option is ignored
788if you are outputting to XML. The default is <em>no</em>.</dd>
789
790<dt>input-xml: <em>bool</em></dt>
791
792<dd>If set to <em>yes</em>, Tidy will use the XML parser rather
793than the error correcting HTML parser. The default is
794<em>no</em>.</dd>
795
796<dt>output-xml: <em>bool</em></dt>
797
798<dd>If set to <em>yes</em>, Tidy will use generate the pretty
799printed output writing it as well-formed XML. Any entities not
800defined in XML 1.0 will be written as numeric entities to allow
801them to be parsed by an XML parser. The tags and attributes will
802be in the case used in the input document, regardless of other
803options. The default is <em>no</em>.</dd>
804
805<dt>add-xml-pi: <em>bool</em></dt>
806
807<dt>add-xml-decl: <em>bool</em></dt>
808
809<dd>If set to <em>yes</em>, Tidy will add the XML declatation
810when outputting XML or XHTML. The default is <em>no</em>. Note
811that if the input document includes an &lt;?xml?&gt; declaration
812then it will appear in the output independent of the value of
813this option.</dd>
814
815<dt>output-xhtml: <em>bool</em></dt>
816
817<dd>If set to <em>yes</em>, Tidy will generate the pretty printed
818output writing it as extensible HTML. The default is <em>no</em>.
819This option causes Tidy to set the doctype and default namespace
820as appropriate to XHTML. If a doctype or namespace is given they
821will checked for consistency with the content of the document. In
822the case of an inconsistency, the corrected values will appear in
823the output. For XHTML, entities can be written as named or
824numeric entities according to the value of the "numeric-entities"
825property. The tags and attributes will be output in the case used
826in the input document, regardless of other options.</dd>
827
828<dt>doctype: <em>omit, auto, strict, loose</em> or
829&lt;<em>fpi</em>&gt;</dt>
830
831<dd>This property controls the doctype declaration generated by
832Tidy. If set to <em>omit</em> the output file won't contain a
833doctype declaration. If set to <em>auto</em> (the default) Tidy
834will use an educated guess based upon the contents of the
835document. If set to <em>strict</em>, Tidy will set the doctype to
836the strict DTD. If set to <em>loose</em>, the doctype is set to
837the loose (transitional) DTD. Alternatively, you can supply a
838string for the formal public identifier (fpi) for example:</dd>
839
840<dd>
841<pre>
842    doctype: "-//ACME//DTD HTML 3.14159//EN"
843</pre>
844</dd>
845
846<dd>If you specify the fpi for an XHTML document, Tidy will set
847the system identifier to the empty string. Tidy leaves the
848document type for generic XML documents unchanged.</dd>
849
850<dt>char-encoding: <em>raw, ascii, latin1, utf8</em> or
851<em>iso2022</em></dt>
852
853<dd>Determines how Tidy interprets character streams. For
854<em>ascii</em>, Tidy will accept Latin-1 character values, but
855will use entities for all characters whose value &gt; 127. For
856<em>raw</em>, Tidy will output values above 127 without
857translating them into entities. For <em>latin1</em> characters
858above 255 will be written as entities. For <em>utf8</em>, Tidy
859assumes that both input and output is encoded as UTF-8. You can
860use <em>iso2022</em> for files encoded using the ISO2022 family
861of encodings e.g. ISO 2022-JP. The default is
862<em>ascii</em>.</dd>
863
864<dt>numeric-entities: <em>bool</em></dt>
865
866<dd>Causes entities other than the basic XML 1.0 named entities
867to be written in the numeric rather than the named entity form.
868The default is <em>no</em></dd>
869
870<dt>quote-marks: <em>bool</em></dt>
871
872<dd>If set to <em>yes</em>, this causes " characters to be
873written out as &amp;quot; as is preferred by some editing
874environments. The apostrophe character ' is written out as
875&amp;#39; since many web browsers don't yet support &amp;apos;.
876The default is <em>no</em>.</dd>
877
878<dt>quote-nbsp: <em>bool</em></dt>
879
880<dd>If set to <em>yes</em>, this causes non-breaking space
881characters to be written out as entities, rather than as the
882Unicode character value 160 (decimal). The default is
883<em>yes</em>.</dd>
884
885<dt>quote-ampersand: <em>bool</em></dt>
886
887<dd>If set to <em>yes</em>, this causes unadorned &amp;
888characters to be written out as &amp;amp;. The default is
889<em>yes</em>.</dd>
890
891<dt>assume-xml-procins: <em>bool</em></dt>
892
893<dd>If set to <em>yes</em>, this changes the parsing of
894processing instructions to require ?&gt; as the terminator rather
895than &gt;. The default is <em>no</em>. This option is
896automatically set if the input is in XML.</dd>
897
898<dt>fix-backslash: <em>bool</em></dt>
899
900<dd>If set to <em>yes</em>, this causes backslash characters "\"
901in URLs to be replaced by forward slashes "/". The default is
902<em>yes</em>.</dd>
903
904<dt>break-before-br: <em>bool</em></dt>
905
906<dd>If set to <em>yes</em>, Tidy will output a line break before
907each &lt;br&gt; element. The default is <em>no</em>.</dd>
908
909<dt>uppercase-tags: <em>bool</em></dt>
910
911<dd>Causes tag names to be output in upper case. The default is
912<em>no</em> resulting in lowercase, except for XML input where
913the original case is preserved.</dd>
914
915<dt>uppercase-attributes: <em>bool</em></dt>
916
917<dd>If set to <em>yes</em> attribute names are output in upper
918case. The default is <em>no</em> resulting in lowercase, except
919for XML where the original case is preserved.</dd>
920
921<dt><a id="word2000" name="word2000">word-2000:
922<em>bool</em></a></dt>
923
924<dd>If set to <em>yes</em>, Tidy will go to great pains to strip
925out all the surplus stuff Microsoft Word 2000 inserts when you
926save Word documents as "Web pages". The default is <em>no</em>.
927Note that Tidy doesn't yet know what to do with VML markup from
928Word, but in future I hope to be able to map VML to SVG.<br />
929<br />
930 Microsoft has developed its own optional filter for exporting to
931HTML, and the 2.0 version is much improved. You can download the
932filter free from the <a
933href="http://officeupdate.microsoft.com/2000/downloadDetails/Msohtmf2.htm">
934Microsoft Office Update site</a>.</dd>
935
936<dt>clean: <em>bool</em></dt>
937
938<dd>If set to <em>yes</em>, causes Tidy to strip out surplus
939presentational tags and attributes replacing them by style rules
940and structural markup as appropriate. It works well on the html
941saved from Microsoft Office'97. The default is <em>no</em>.</dd>
942
943<dt>logical-emphasis: <em>bool</em></dt>
944
945<dd>If set to <em>yes</em>, causes Tidy to replace any occurrence
946of i by em and any occurrence of b by strong. In both cases, the
947attributes are preserved unchanged. The default is <em>no</em>.
948This option can now be set independently of the clean and
949drop-font-tags options.</dd>
950
951<dt>drop-empty-paras: <em>bool</em></dt>
952
953<dd>If set to <em>yes</em>, empty paragraphs will be discarded.
954If set to no, empty paragraphs are replaced by a pair of
955<code>br</code> elements as HTML4 precludes empty paragraphs. The
956default is <em>yes</em>.</dd>
957
958<dt>drop-font-tags: <em>bool</em></dt>
959
960<dd>If set to <em>yes</em> together with the clean option (see
961above), Tidy will discard font and center tags rather than
962creating the corresponding style rules. The default is
963<em>no</em>.</dd>
964
965<dt>enclose-text: <em>bool</em></dt>
966
967<dd>If set to <em>yes</em>, this causes Tidy to enclose any text
968it finds in the body element within a p element. This is useful
969when you want to take an existing html file and use it with a
970style sheet. Any text at the body level will screw up the
971margins, but wrap the text within a p element and all is well!
972The default is <em>no</em>.</dd>
973
974<dt>enclose-block-text: <em>bool</em></dt>
975
976<dd>If set to <em>yes</em>, this causes Tidy to insert a p
977element to enclose any text it finds in any element that allows
978mixed content for HTML transitional but not HTML strict. The
979default is <em>no</em>.</dd>
980
981<dt>fix-bad-comments: <em>bool</em></dt>
982
983<dd>If set to <em>yes</em>, this causes Tidy to replace
984unexpected hyphens with "=" characters when it comes across
985adjacent hyphens. The default is <em>yes</em>. This option is
986provided for users of Cold Fusion which uses the comment syntax:
987&lt;!---&#160;---&gt;</dd>
988
989<dt>add-xml-space: <em>bool</em></dt>
990
991<dd>If set to <em>yes</em>, this causes Tidy to add
992xml:space="preserve" to elements such as pre, style and script
993when generating XML. This is needed if the whitespace in such
994elements is to be parsed appropriately without having access to
995the DTD. The default is <em>no</em>.</dd>
996
997<dt>alt-text: <em>string</em></dt>
998
999<dd>This allows you to set the default alt text for img
1000attributes. This feature is dangerous as it suppresses further
1001accessibility warnings. <b>YOU ARE RESPONSIBLE FOR MAKING YOUR
1002DOCUMENTS ACCESSIBLE TO PEOPLE WHO CAN'T SEE THE
1003IMAGES!!!</b></dd>
1004
1005<dt>write-back: <em>bool</em></dt>
1006
1007<dd>If set to <em>yes</em>, Tidy will write back the tidied
1008markup to the same file it read from. The default is <em>no</em>.
1009You are advised to keep copies of important files before tidying
1010them as on rare occasions the result may not always be what you
1011expect.</dd>
1012
1013<dt>keep-time: <em>bool</em></dt>
1014
1015<dd>If set to <em>yes</em>, Tidy won't alter the last modified
1016time for files it writes back to. The default is <em>yes</em>.
1017This allows you to tidy files without effecting which ones will
1018be uploaded to the Web server when using a tool such as
1019'SiteCopy'. Note that this feature may not work on some
1020platforms.</dd>
1021
1022<dt>error-file: <em>filename</em></dt>
1023
1024<dd>Writes errors and warnings to the named file rather than to
1025stderr.</dd>
1026
1027<dt>show-warnings: <em>bool</em></dt>
1028
1029<dd>If set to <em>no</em>, warnings are suppressed. This can be
1030useful when a few errors are hidden in a flurry of warnings. The
1031default is <em>yes</em>.</dd>
1032
1033<dt>quiet: <em>bool</em></dt>
1034
1035<dd>If set to <em>yes</em>, Tidy won't output the welcome message
1036or the summary of the numbers of errors and warnings. The default
1037is <em>no</em>.</dd>
1038
1039<dt>gnu-emacs: <em>bool</em></dt>
1040
1041<dd>If set to <em>yes</em>, Tidy changes the format for reporting
1042errors and warnings to a format that is more easily parsed by GNU
1043Emacs. The default is <em>no</em>.</dd>
1044
1045<dt>split: <em>bool</em></dt>
1046
1047<dd>If set to <em>yes</em> Tidy will use the input file to create
1048a sequence of slides, splitting the markup prior to each
1049successive &lt;h2&gt;. You can see an example of the results in a
1050<a
1051href="http://www.w3.org/Talks/1999/03/24-stockholm-xhtml/">recent
1052talk I made on XHTML</a>. The slides are written to
1053"slide1.html", "slide2.html" etc. The default is
1054<em>no</em>.</dd>
1055
1056<dt>new-empty-tags: <em>tag1, tag2, tag3</em></dt>
1057
1058<dd>Use this to declare new empty inline tags. The option takes a
1059space or comma separated list of tag names. Unless you declare
1060new tags, Tidy will refuse to generate a tidied file if the input
1061includes previously unknown tags. Remember to also declare empty
1062tags as either inline or blocklevel, see below.</dd>
1063
1064<dt>new-inline-tags: <em>tag1, tag2, tag3</em></dt>
1065
1066<dd>Use this to declare new non-empty inline tags. The option
1067takes a space or comma separated list of tag names. Unless you
1068declare new tags, Tidy will refuse to generate a tidied file if
1069the input includes previously unknown tags.</dd>
1070
1071<dt>new-blocklevel-tags: <em>tag1, tag2, tag3</em></dt>
1072
1073<dd>Use this to declare new block-level tags. The option takes a
1074space or comma separated list of tag names. Unless you declare
1075new tags, Tidy will refuse to generate a tidied file if the input
1076includes previously unknown tags. Note you can't change the
1077content model for elements such as table, ul, ol and dl. This is
1078explained in more detail in the <a
1079href="release-notes.html">release notes</a>.</dd>
1080
1081<dt>new-pre-tags: <em>tag1, tag2, tag3</em></dt>
1082
1083<dd>Use this to declare new tags that are to be processed in
1084exactly the same way as HTML's pre element. The option takes a
1085space or comma separated list of tag names. Unless you declare
1086new tags, Tidy will refuse to generate a tidied file if the input
1087includes previously unknown tags. Note you can't as yet add new
1088CDATA elements (similar to script).</dd>
1089</dl>
1090
1091<h4>Sample Config File</h4>
1092
1093<p>This is just an example to get you started.</p>
1094
1095<pre>
1096// sample config file for HTML tidy
1097indent: auto
1098indent-spaces: 2
1099wrap: 72
1100markup: yes
1101output-xml: no
1102input-xml: no
1103show-warnings: yes
1104numeric-entities: yes
1105quote-marks: yes
1106quote-nbsp: yes
1107quote-ampersand: no
1108break-before-br: no
1109uppercase-tags: no
1110uppercase-attributes: no
1111char-encoding: latin1
1112new-inline-tags: cfif, cfelse, math, mroot, 
1113  mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
1114  munder, mover, mmultiscripts, msup, msub, mtext,
1115  mprescripts, mtable, mtr, mtd, mth
1116new-blocklevel-tags: cfoutput, cfquery
1117new-empty-tags: cfelse
1118</pre>
1119
1120<h3><a id="scripts" name="scripts">Using Tidy from
1121scripts</a></h3>
1122
1123<p>If you want to run Tidy from a Perl or other scripting
1124language you may find it of value to inspect the result returned
1125by Tidy when it exits: 0 if everything is fine, 1 if there were
1126warnings and 2 if there were errors. This is an example using
1127Perl:</p>
1128
1129<pre>
1130if (close(TIDY) == 0) {
1131  my $exitcode = $? &gt;&gt; 8;
1132  if ($exitcode == 1) {
1133    printf STDERR "tidy issued warning messages\n";
1134  } elsif ($exitcode == 2) {
1135    printf STDERR "tidy issued error messages\n";
1136  } else {
1137    die "tidy exited with code: $exitcode\n";
1138  }
1139} else {
1140  printf STDERR "tidy detected no errors\n";
1141}
1142</pre>
1143
1144<h3><a id="download" name="download">Downloadable
1145Binaries</a></h3>
1146
1147<p class="note">If you are prepared to maintain a public URL for
1148HTML Tidy compiled for a specific platform, please let me know so
1149that I can add a link to your page. This will avoid the need for
1150me to update this page whenever you recompile.</p>
1151
1152<div class="platforms">
1153<h4>Windows 95/98/NT/2000</h4>
1154
1155<p><b><a
1156href="http://www.w3.org/People/Raggett/tidy.exe">tidy.exe</a></b>.
1157Windows 95/98/NT/2000 executable (32-bit Windows console-mode
1158program). This is the executable that I maintain as part of the
1159HTML Tidy distribution. The command line parameters are described
1160above, along with the extensive configuration file options.</p>
1161
1162<p><b><a
1163href="http://www.chami.com/free/html-kit/">HTML-Kit</a></b> - a
1164free HTML editor for Windows 95/98/NT/2000 with integrated
1165support for Tidy.</p>
1166
1167<p><b><a
1168href="http://perso.wanadoo.fr/ablavier/TidyGUI/">TidyGUI</a></b>.
1169Windows front end for running Tidy, written by Andr&#233;
1170Blavier. Andr&#233; has also written a <b><a
1171href="http://perso.wanadoo.fr/ablavier/TidyCOM/">Windows COM
1172wrapper</a></b> for Tidy. He describes how to use this from
1173Visual Basic.</p>
1174
1175<p><b><a href="http://www.evrsoft.com/">Evrsoft's 1st Page
11762000</a></b> - a free HTML editor for Windows 95/98/NT/2000 with
1177integrated support for Tidy. 1st Page 2000 is a high-end
1178authoring tool that makes it easy to add effects based upon
1179scripting.</p>
1180
1181<p><b><a href="http://www.notetab.com/">NoteTab</a></b> - an
1182award winning text and html editor for Windows with built-in
1183support for running HTML Tidy. NoteTab is written by Eric
1184Fookes.</p>
1185
1186<h4>Mac OS</h4>
1187
1188Several versions of <a
1189href="http://www.geocities.com/SiliconValley/1057/tidy.html">HTML
1190Tidy for Mac OS</a> are available, including a standalone
1191Macintosh application with a graphical user interface, a BBEdit
1192plugin, a MPW tool, or as a FilterTop filter ( <a
1193href="http://www.geocities.com/SiliconValley/1057/images/TidyHTML.GIF">
1194Screenshot</a>). My thanks to <a
1195href="mailto:teague@mailandnews.com">Terry Teague</a> for this
1196port.<br />
1197<br />
1198 
1199
1200<h4>Atari</h4>
1201
1202<p>Arnaud Bercegeay's site for the <a
1203href="http://tidy.atari.org">Atari binary for Tidy</a>.</p>
1204
1205<h4>Amiga</h4>
1206
1207<p>Keith Blakemore-Noble maintains a page for <a
1208href="http://www.amiga.u-net.com/MadDogSoftware/Tidy.html">Tidy
1209on Amiga</a>.</p>
1210
1211<h4>BeOS</h4>
1212
1213<p>Peter Enzerink is maintaining <a
1214href="http://www.bytepeople.com/beos/apps/htmltidy.html">HTML
1215Tidy</a> for BeOS. Link points to download for HTML Tidy as well
1216as HTML Tidy editor addons for BeOS.</p>
1217
1218<h4>AIX</h4>
1219
1220<p>Ciaran Deignan maintains an <a
1221href="http://www-frec.bull.com/cgi-bin/list_dir.cgi/download/">AIX
1222binary for Tidy</a>. The link is to a general download page. The
1223executable is available for AIX 4.3.2 and later.</p>
1224
1225<h4>Linux</h4>
1226
1227<p>Dimitri Papadopoulos maintains a <a
1228href="http://perso.club-internet.fr/dpo/rpm/">Tidy RPM package
1229for Redhat Linux</a> You may also be able to find Tidy on other
1230Linux distribution sites, e.g. <a
1231href="http://rpmfind.net/">http://rpmfind.net/</a>.</p>
1232
1233<!-- no longer accessible :-( 
1234      <p><b><a href= 
1235      "http://www.astro.uni-bonn.de/~webstw/cm/w3c_tidy/index.html">
1236      Linux users</a></b>! ochen M. Braun is maintaining Tidy binary
1237      for Linux (ELF 32-bit LSB executable using '<tt>libc.so.5</tt>'
1238      for Intel&#160;80386): '<a href= 
1239      "ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy"><tt>tidy</tt></a>
1240      '. Additionally a man page can be downloaded: <a href= 
1241      "ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy.1"><tt>
1242      tidy.1</tt></a>.</p>
1243       -->
1244<h4>UnixWare</h4>
1245
1246<p>Simon Trimmer &lt;<a
1247href="mailto:simon@ocston.org">simon@ocston.org</a>&gt; maintains
1248a <a href="http://www.ocston.org/~simon/tidy/">Tidy binary for
1249Unixware</a>.</p>
1250
1251<h4>HP-UX</h4>
1252
1253<p>You can get precompiled versions of Tidy for HPUX, from <a
1254href="http://www.informatik.uni-stuttgart.de/ifi/gr/mitarbeiter/hopp/tidy/tidy.html">
1255Olaf Hopp</a>, and from <a
1256href="http://geocities.com/ian_springer/hpux_tidy.html">Ian
1257Springer</a>.</p>
1258
1259<h4>MSDOS</h4>
1260
1261<p>Nick B. maintains <a
1262href="http://members.xoom.com/nickbeee/tidy386/">Tidy386 for
1263DOS</a>. This exploits the DPMI mechanism for the memory
1264management.</p>
1265
1266<h4>Solaris</h4>
1267
1268<p>Stephen Fuqua maintains a page for <a
1269href="http://www.hep.utexas.edu/~sfuqua/unix">Tidy on
1270Solaris</a>.</p>
1271
1272<h4>OS/2</h4>
1273
1274<p>Kaz SHiMZ &lt;<a
1275href="mailto:kshimz@sfc.co.jp">kshimz@sfc.co.jp</a>&gt; maintains
1276an <a
1277href="http://www.dd.iij4u.or.jp/~kshimz/warp/tidy/index.html">OS/2
1278binary for Tidy</a>.</p>
1279
1280<h4>FreeBSD</h4>
1281
1282<p>Martin Fouts maintains <a
1283href="http://www.fogey.com/fouts/tidy.htm">Tidy on
1284FreeBSD</a>.</p>
1285
1286<h4>RISC OS</h4>
1287
1288<p><a href="mailto:archifishal@altavista.net">Alex Macfarlane
1289Smith</a> maintains a <a
1290href="http://www.toth.org.uk/~aardvark/programs/tidy.shtml">port
1291of Tidy to the RISC OS</a>.</p>
1292
1293<h4>MiNT (Atari) OS</h4>
1294
1295<p><a href="mailto:eaiching@t0.or.at)">Edgar Aichinger</a>
1296maintains a <a
1297href="http://wh58-508.st.uni-magdeburg.de/sparemint/html/packages/tidy.html">
1298port of Tidy to the MiNT OS</a>. MiNT is a UNIX for m68k Atari
1299computers and is nearly FHS compliant (we don't use bootable OS
1300images nor have any mounting capabilities, so neither /boot nor
1301/mnt are used). The binary also runs on ordinary TOS, since the
1302MiNT libraries cover all GEMDOS/GEM functions.</p>
1303</div>
1304
1305<h3><a id="quotes" name="quotes">Integrating Tidy as part of
1306other Software</a></h3>
1307
1308<p>You can also incorporate Tidy as part of a larger program, for
1309instance in HTML editors or HTML transformation tools used for
1310import filters, or for when you want to customize Web content to
1311get the best out of different kinds of browsers. Imagine
1312authoring clean HTML with CSS and at a touch of a button
1313producing variants that look great and work reliably on a large
1314variety of different browsers, taking into account the quirks of
1315each. For instance, providing the ability to tune content for
1316different versions of Netscape and Internet Explorer, and for
1317browsers running on set-top boxes for televisions, handheld and
1318palmtop devices, cell phones, and voice browsers. I am happy to
1319quote for software development for such tools.</p>
1320
1321<p>Sebastian Lange has contributed a perl wrapper for calling
1322Tidy from your perl scripts, see <a
1323href="sl-tidy.pl">sl-tidy.pl</a>.</p>
1324
1325<h4>Using Tidy from emacs</h4>
1326
1327<p>Pete Gelbman emailed this <a
1328href="http://lists.w3.org/Archives/Public/html-tidy/2000AprJun/0047.html">
1329tip</a> for using Tidy with the Unix version of emacs. lets you
1330highlight a region of text and run Tidy on it. Tidy's "fixed"
1331output will replace your highlighted region right in place. The
1332error/warnings output will be directed into a separate
1333mini-buffer below in your main screen.</p>
1334
1335<h3><a id="java" name="java">Java port of HTML Tidy</a></h3>
1336
1337<p>Andy Quick &lt;<a
1338href="mailto:ac.quick@sympatico.ca">ac.quick@sympatico.ca</a>&gt;
1339maintains a Java port of Tidy, so you can now integrate Tidy into
1340your Java applications. Andy is tracking the releases of Tidy in
1341C (this page). More information is available on <a
1342href="http://www3.sympatico.ca/ac.quick/">Andy's home
1343page</a>.</p>
1344
1345<h3><a id="implementation" name="implementation">Source
1346Code</a></h3>
1347
1348<p>The code is in ANSI C and uses the C standard library for i/o.
1349The parser works top down, building a complete parse tree in
1350memory. Document text is held as Unicode represented as UTF-8 in
1351a character buffer that expands as needed. The code has so far
1352been tested on Windows'95, Windows'98, Windows NT, Windows 2000,
1353Linux, FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX, NeXtStep,
1354MacOS, BeOS, OS/2, AIX, Amiga, Atari, SunOS, Solaris, IRIX and
1355HP-UX, amongst others.</p>
1356
1357<p>Here is a link to the Open Source <a href="tidy.c">copyright
1358notice and license</a>.</p>
1359
1360<dl>
1361<dt><a href="/tidy4aug00.tgz">tidy4aug00.tgz</a></dt>
1362
1363<dd>gzipped tar file for source code (Unix line ends)</dd>
1364
1365<dt><a href="/tidy4aug00.zip">tidy4aug00.zip</a></dt>
1366
1367<dd>zipped source code (Windows line ends)</dd>
1368
1369<dt><a href="platform.h">platform.h</a>, <a
1370href="html.h">html.h</a></dt>
1371
1372<dd>the include files with common definitions</dd>
1373
1374<dt><a href="config.c">config.c</a></dt>
1375
1376<dd>support for customizing Tidy via config files</dd>
1377
1378<dt><a href="lexer.c">lexer.c</a></dt>
1379
1380<dd>lexical analysis and buffer management</dd>
1381
1382<dt><a href="parser.c">parser.c</a></dt>
1383
1384<dd>HTML and XML parsers</dd>
1385
1386<dt><a href="tags.c">tags.c</a></dt>
1387
1388<dd>dictionary of tags and their properties</dd>
1389
1390<dt><a href="attrs.c">attrs.c</a></dt>
1391
1392<dd>dictionary of attributes and their properties</dd>
1393
1394<dt><a href="istack.c">istack.c</a></dt>
1395
1396<dd>stack of active inline elements</dd>
1397
1398<dt><a href="entities.c">entities.c</a></dt>
1399
1400<dd>dictionary of entities</dd>
1401
1402<dt><a href="clean.c">clean.c</a></dt>
1403
1404<dd>smarts for cleaning up presentational markup</dd>
1405
1406<dt><a href="pprint.c">pprint.c</a></dt>
1407
1408<dd>pretty printing for HTML and XML</dd>
1409
1410<dt><a href="localize.c">localize.c</a></dt>
1411
1412<dd>Change this file to localize tidy's messages</dd>
1413
1414<dt><a href="tidy.c">tidy.c</a></dt>
1415
1416<dd>main() and error reporting routines</dd>
1417
1418<dt><a href="Makefile">Makefile</a></dt>
1419
1420<dd>Makefile for gcc</dd>
1421
1422<dt><a href="man_page.txt">Unix Man page</a></dt>
1423
1424<dd>Maintained by Matej Vela &lt;vela@debian.org&gt;</dd>
1425</dl>
1426
1427<p>Conventions for whether lines end with CRLF, LF or CR vary
1428from one system to another. I have included the C source for a
1429utility <b>tab2space</b> which can be used to ensure that files
1430use the line end convention of your choice, and to expand tabs to
1431spaces.</p>
1432
1433<pre>
1434   tab2space -t4 -unix *.h *.c
1435   tab2space -tabs -unix Makefile
1436</pre>
1437
1438<p>Note use of "-tabs" to ensure that tabs are preserved in the
1439Makefile (it won't work without them!).</p>
1440
1441<p>For those of you on Unix, here is a script you can use to
1442strip carriage returns:</p>
1443
1444<pre>
1445#!/bin/sh
1446echo Stripping Carriage Returns from files...
1447for i
1448do
1449        # If a writable file
1450        if [ -f $i ]
1451        then
1452                if [ -w $i ]
1453                then
1454                        echo $i
1455                        # strip CRs from input and output to temp file
1456                        tr -d '\015' &lt; $i &gt; toix.tmp
1457                        mv toix.tmp $i
1458                else
1459                        echo $i: write-protected
1460                fi
1461        else
1462                echo $i: not a file
1463        fi
1464done
1465</pre>
1466
1467<p>Save this script to a file, e.g. "<em>scripcr</em>" and use
1468"<em>chmod +x stripcr</em>" to make it executable. You can then
1469run it as "<em>stripcr *.c *.h Overview.html Makefile</em>"</p>
1470
1471<h2><a id="acks" name="acks">Acknowledgements</a></h2>
1472
1473<p>I would like to thank the many people who have written to me
1474with suggestions for improvements or reporting bugs. Your help
1475has been invaluable.</p>
1476
1477<blockquote class="people">Jonathan Adair, Drew Adams, Osma
1478Ahvenlampi, Carsten Allefeld, Richard Allsebrook, Jacob Sparre
1479Andersen, Joe D'Andrea, Jerry Andrews, Bruce Aron, Takuya Asada,
1480Edward Avis, Carlos Piqueres Ayela, Nick B, Chang Hyun Baek, Nick
1481B, Denis Barbier, Chuck Baslock, Christer Bernerus, David J.
1482Biesack, John Bigby, Yu Jian Bin, Alexander Biron, Keith
1483Blakemore-Noble, Eric Blossom, Berend de Boer, Ochen M. Braun,
1484Dave Bryan, David Brooke, Andy Brown, Keith B. Brown, Andreas
1485Buchholz, Maurice Buxton, Jelks Cabaniss, John Cappelletti,
1486Trevor Carden, Terry Cassidy, Mathew Cepl, Kendall Clark, Rob
1487Clark, Jeremy Clulow, Dan Connolly, Larry Cousin, Ken Cox, Luis
1488M. Cruz, John Cumming, Ian Davey, Keith Davies, Ciaran Deignan,
1489David Duffy, Emma Duke-Williams, Tamminen Eero, Bodo Eing, Peter
1490Enzerink, Baruch Even, David Fallon, Claus Andr&#233;
1491F&#228;rber, Stephanie Foott, Darren Forcier, Martin Fouts,
1492Frederik Fouvry, Rene Fritz, Stephen Fuqua, Martin Gallwey, Pete
1493Gelbman, Francisco Guardiola, David Getchell, Michael Giroux,
1494Davor Golek, Guus Goos, L&#233;a Gris, Rainer Gutsche, Kai
1495Hackemesser, Juha H&#228;iki&#246;, David Halliday,
1496Johann-Christian Hanke, Vlad Harchev, Shane Harrelson, Andre
1497Hinrichs, Bjoern Hoehrmann, G. Ken Holman, Bill Homer, Olaf Hopp,
1498Craig Horman, Jack Horsfield, Nigel Horspool, Pao-Hsi Huang,
1499Stuart Hungerford, Marc Jauvin, Rick Jelliffe, Peter Jeremy,
1500Craig Johnson, Charles LaFountain, Steven Lobo, Zdenek Kabelac,
1501Michael Kay, Jeffery Kendall, Axel Kielhorn, Konstantinos
1502Kleisouris, Johannes Koch, Daniel Kohn, Rudy Kohut, Allan
1503Kuchinsky, Volker Kuhlmann, Michael LaStella, Johnny Lee, Steve
1504Lee, Tony Leneis, Nick Leverton, Todd Lewis, Dietmar Lippold,
1505Gert-Jan C. Lokhorst, Murray Longmore, John Love-Jensen,
1506Satwinder Mangat, Carole Mah, Anton Marsden, Bede McCall, Shane
1507McCarron, Thomas McGuigan, Ian McKellar, Al Medeiros, Chris
1508Nappin, Ann Navarro, Jacek Niedziela, Morten Blinksbjerg Nielsen,
1509Kenichi Numata, Allan Odgaard, Matt Oshry, Gerald Oskoboiny, Paul
1510Ossenbruggen, Ernst Paalvast, Christian Pantel, Dimitri
1511Papadopoulos, Rick Parsons, Steven Pemberton, Daniel Persson, Lee
1512Anne Phillips, Xavier Plantefeve, Karl Prinz, Andy Quick, Jany
1513Quintard, Julian Reschke, Stephen Reynolds, Thomas Ribbrock, Ross
1514L. Richardson, Philip Riebold, Erik Rossen, Dan Rudman, Peter
1515Ruevski, Christian Ruetgers, Klaus Johannes Rusch, John Russell,
1516Eric Schindler, J. Schlauch, Christian Sch&#252;ler, Klaus
1517Alexander Seistrup, Jim Seymour, Kazuyoshi Shimizu, Geoff
1518Sinclair, Jo Smith, Paul Smith, Steve Spilker, Rafi Stern,
1519Jacques Steyn, Michael J. Suzio, Zac Thompson, Eric Thorbjornsen,
1520Oren Tirosh, John Tobler, Omri Traub, Lo&#239;c Tr&#233;gan,
1521Jason Tribbeck, Simon Trimmer, Steffen Ullrich, Stuart Updegrave,
1522Charles A. Upsdell, Jussi Vestman, Larry W. Virden, Daniel
1523Vogelheim, Nigel Wadsworth, Jez Wain, Randy Waki, Paul Ward, Neil
1524Weber, Bertilo Wennergren, Yudong Yang, Jeff Young, Edward Zalta,
1525Johannes Zellner, Christian Zuckschwerdt</blockquote>
1526
1527<h3><a id="address" name="address">Dave's Address</a></h3>
1528
1529<pre>
1530    73b Ground Corner
1531    Holt
1532    Wiltshire
1533    BA14 6RT
1534    United Kingdom
1535</pre>
1536
1537<p><small><a href="http://www.w3.org/People/Raggett">Dave
1538Raggett</a> &lt;<a href="mailto:dsr@w3.org">dsr@w3.org</a>&gt; is
1539an engineer from <a href="http://www.hp.com/">Hewlett
1540Packard</a>'s <a href="http://www.hpl.hp.co.uk">UK
1541Laboratories</a>, and works on assignment to the World Wide Web
1542Consortium, where he is the W3C lead for HTML, XForms and Voice
1543Browsers and Math.</small></p>
1544</body>
1545</html>
1546
1547