1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4<head> 5<meta name="generator" content="HTML Tidy, see www.w3.org" /> 6<title>Clean up your Web pages with HTML TIDY</title> 7<meta name="keywords" 8content="HTML, validation, error correction, pretty-printing" /> 9<meta name="author" content="Dave Raggett <dsr@w3.org>" /> 10<style type="text/css"> 11 body { 12 margin-left: 10%; 13 margin-right: 10%; 14 font-family: sans-serif 15 } 16 h1 { margin-left: -8% } 17 h2,h3 { margin-left: -4% } 18 pre { color: green; font-weight: bold; font-family: monospace} 19 em { font-style: italic; color: rgb(0, 0, 153) } 20 strong { text-transform: uppercase; font-weight: bold } 21 .note {font-style: italic; color: rgb(192, 101, 101) } 22 //hr {text-align: center; width: 60% } 23 blockquote { 24 color: navy; 25 font-family: "Comic Sans MS", "Times New Roman", serif 26 } 27 blockquote.people { text-align: center; } 28 p.splash { color: maroon} 29 div h4 {margin-left 3%} 30 div p {margin-left: 5%} 31 table { 32 font-family: sans-serif; 33 font-size: 80%; 34 background: rgb(255,255,153) 35 } 36 td { 37 font-size: 80% 38 } 39 .people {font-family: "Lucida Calligraphy", serif} 40 :link { color: rgb(0, 0, 153) } 41 :visited { color: rgb(153, 0, 153) } 42 :active { color: rgb(255, 0, 102) } 43 a :hover { color: rgb(0, 0, 255) } 44</style> 45 46<style type="text/css"> 47 blockquote.c9 {font-style: italic} 48 span.c8 {color: maroon} 49 p.c7 {font-style: italic} 50 a.c6 {font-weight: bold} 51 div.c5 {text-align: center} 52 hr.c4 {text-align: center} 53 p.c3 {text-align: center} 54 p.c2 {font-weight: bold; text-align: center} 55 h1.c1 {text-align: center} 56</style> 57 58<style type="text/css"> 59 p.c1 {font-weight: bold} 60</style> 61</head> 62<body bgcolor="#FFFFFF" background="grid.gif" text="black" 63link="navy" vlink="black" alink="red"> 64<h1 class="c1"><img src="tidy.gif" width="32" height="32" 65align="top" alt="icon" /> Clean up your Web pages<br /> 66 with HTML TIDY</h1> 67 68<p class="c2">This version 4th August 2000</p> 69 70<p class="c3"><small>Copyright © 1998-2000 <a 71href="http://www.w3.org/">W3C</a>, see <a 72href="tidy.c">tidy.c</a> for copyright notice.</small></p> 73 74<blockquote>With many thanks to <a 75href="http://www.hp.com/">Hewlett Packard</a> for financial 76support during the development of this software!</blockquote> 77 78<hr width="80%" class="c4" /> 79<p class="c3"><a href="#help">How to use Tidy</a> | <a 80href="#download">Downloading Tidy</a> | <a 81href="release-notes.html">Release Notes</a><br /> 82 <a href="#quotes">Integration with other Software</a> | <a 83href="#acks">Acknowledgements</a></p> 84 85<hr width="80%" class="c4" /> 86<p>To get the latest version of Tidy please visit the original 87version of this page at: <a 88href="http://www.w3.org/People/Raggett/tidy/">http://www.w3.org/People/Raggett/tidy/</a>. 89Courtesy of Netmind, you can register for email reminders when 90new versions of tidy become available.</p> 91 92<form method="get" 93action="http://www.netmind.com/cgi-bin/uncgi/url-mind"> 94<div class="c5"><input type="submit" 95value="Press Here to Register" /></div> 96</form> 97 98<p>The public email list devoted to HTML Tidy is: <<a 99href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>>. To 100subscribe send an email to html-tidy-request@w3.org with the word 101subscribe in the subject line (include the word unsubscribe if 102you want to unsubscribe). The <a 103href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a> 104for this list is accessible online. Please use this list to 105report errors or enhancement requests. See the <a 106href="release-notes.html" class="c6">release notes</a> for 107information on recent changes. Your feedback is welcome!</p> 108 109<p>If you find HTML Tidy useful and you would like to say thanks, 110then please send me a (paper) postcard or other souvenir from the 111area in which you live along with a few words on what you are 112using Tidy for. It will be fun to map out where Tidy users are to 113be found! My <a href="#address">postal address</a> is given at 114the end of this file.</p> 115 116<h3>Tutorials for HTML and CSS</h3> 117 118<p>If you are just starting off and would like to know more about 119how to author Web pages, you may find my <a 120href="http://www.w3.org/MarkUp/Guide/">guide to HTML and CSS</a> 121helpful. Please send me feedback on this, and I will do my best 122to further improve it.</p> 123 124<h4>Support for Word2000</h4> 125 126<p>Tidy can now perform wonders on HTML saved from Microsoft Word 1272000! Word bulks out HTML files with stuff for round-tripping 128presentation between HTML and Word. If you are more concerned 129about using HTML on the Web, check out Tidy's "<a 130href="#word2000">Word-2000"</a> config option! Of course Tidy 131does a good job on Word'97 files as well!</p> 132 133<h3>Introduction to TIDY</h3> 134 135<p>When editing HTML it's easy to make mistakes. Wouldn't it be 136nice if there was a simple way to fix these mistakes 137automatically and tidy up sloppy editing into nicely layed out 138markup? Well now there is! Dave Raggett's HTML TIDY is a free 139utility for doing just that. It also works great on the 140atrociously hard to read markup generated by specialized HTML 141editors and conversion tools, and can help you identify where you 142need to pay further attention on making your pages more 143accessible to people with disabilities.</p> 144 145<p>Tidy is able to fix up a wide range of problems and to bring 146to your attention things that you need to work on yourself. Each 147item found is listed with the line number and column so that you 148can see where the problem lies in your markup. Tidy won't 149generate a cleaned up version when there are problems that it 150can't be sure of how to handle. These are logged as "errors" 151rather than "warnings".</p> 152 153<p class="c7">Tidy features in a <a 154href="http://webreview.com/wr/pub/1999/07/16/feature/index.html">recent 155article on XHTML</a> by webreview.com.</p> 156 157<!-- is the final "index.html" needed or appropriate? --> 158<h3>Examples of TIDY at work</h3> 159 160<p>Tidy corrects the markup in a way that matches where possible 161the observed rendering in popular browsers from Netscape and 162Microsoft. Here are just a few examples of how TIDY perfects your 163HTML for you:</p> 164 165<ul> 166<li><b>Missing or mismatched end tags are detected and 167corrected</b> 168 169<pre> 170 <h1>heading 171 <h2>subheading</h3> 172</pre> 173 174<p>is mapped to</p> 175 176<pre> 177 <h1>heading</h1> 178 <h2>subheading</h2> 179</pre> 180</li> 181 182<li><b>End tags in the wrong order are corrected:</b> 183 184<pre> 185 <p>here is a para <b>bold <i>bold italic</b> bold?</i> normal? 186</pre> 187 188<p>is mapped to</p> 189 190<pre> 191 <p>here is a para <b>bold <i>bold italic</i> bold?</b> normal? 192</pre> 193</li> 194 195<li><b>Fixes problems with heading emphasis</b> 196 197<pre> 198 <h1><i>italic heading</h1> 199 <p>new paragraph 200</pre> 201 202<p>In Netscape and Internet Explorer this causes everything 203following the heading to be in the heading font size, not the 204desired effect at all!</p> 205 206<p>Tidy maps the example to</p> 207 208<pre> 209 <h1><i>italic heading</i></h1> 210 <p>new paragraph 211</pre> 212</li> 213 214<li><b>Recovers from mixed up tags</b> 215 216<pre> 217 <i><h1>heading</h1></i> 218 <p>new paragraph <b>bold text 219 <p>some more bold text 220</pre> 221 222<p>Tidy maps this to</p> 223 224<pre> 225 <h1><i>heading</i></h1> 226 <p>new paragraph <b>bold text</b> 227 <p><b>some more bold text</b> 228</pre> 229</li> 230 231<li><b>Getting the <hr> in the right place:</b> 232 233<pre> 234 <h1><hr>heading</h1> 235 <h2>sub<hr>heading</h2> 236</pre> 237 238<p>Tidy maps this to</p> 239 240<pre> 241 <hr> 242 <h1>heading</h1> 243 <h2>sub</h2> 244 <hr> 245 <h2>heading</h2> 246</pre> 247</li> 248 249<li><b>Adding the missing "/" in end tags for anchors:</b> 250 251<pre> 252 <a href="#refs">References<a> 253</pre> 254 255<p>Tidy maps this to</p> 256 257<pre> 258 <a href="#refs">References</a> 259</pre> 260</li> 261 262<li><b>Perfecting lists by putting in tags missed out:</b> 263 264<pre> 265 <body> 266 <li>1st list item 267 <li>2nd list item 268</pre> 269 270<p>is mapped to</p> 271 272<pre> 273 <body> 274 <ul> 275 <li>1st list item</li> 276 <li>2nd list item</li> 277 </ul> 278</pre> 279</li> 280 281<li><b>Missing quotes around attribute values are added</b> 282 283<p>Tidy inserts quote marks around all attribute values for you. 284It can also detect when you have forgotten the closing quote 285mark, although this is something you will have to fix 286yourself.</p> 287</li> 288 289<li><b>Unknown/Proprietary attributes are reported</b> 290 291<p>Tidy has a comprehensive knowledge of the attributes defined 292in the HTML 4.0 recommendation from W3C. This often allows you to 293spot where you have mistyped an attribute or value.</p> 294</li> 295 296<li><b>Proprietary elements are recognized and reported as 297such.</b> 298 299<p>Tidy will even work out which version of HTML you are using 300and insert the appropriate DOCTYPE element, as per the W3C 301recommendations.</p> 302</li> 303 304<li><b>Tags lacking a terminating '>' are spotted</b> 305 306<p>This is something you then have to fix yourself as Tidy is 307unsure of where the > should be inserted.</p> 308</li> 309</ul> 310 311<h3>Layout style</h3> 312 313<p>You can choose which style you want Tidy to use when it 314generates the cleaned up markup: for instance whether you like 315elements to indent their contents or not. Several people have 316asked if Tidy could preserve the original layout. I am sorry to 317say that this would be very hard to support due to the way Tidy 318is implemented. Tidy starts by building a clean parse tree from 319the source file. The parse tree doesn't contain any information 320about the original layout. Tidy then pretty prints the parse tree 321using the current layout options. Trying to preserve the original 322layout would interact badly with the repair operations needed to 323build a clean parse tree and considerably complicate the 324code.</p> 325 326<p>Some browsers can screw up the right alignment of text 327depending on how you layout headings. As an example, 328consider:</p> 329 330<pre> 331<h1 align="right"> 332 Heading 333</h1> 334 335<h1 align="right">Heading</h1> 336</pre> 337 338<p>Both of these should be rendered the same. Sadly a common 339browser bug fails to trim trailing whitespace and misaligns the 340first heading. HTML Tidy will protect you from this bug, except 341when you set the indent option to "yes".</p> 342 343<p>Setting the indent option to yes can also cause problems with 344table layout for some browsers:</p> 345 346<pre> 347<td><img src="foo.gif"></td> 348<td><img src="foo.gif"></td> 349</pre> 350 351<p>will look slightly different from:</p> 352 353<pre> 354<td> 355 <img src="foo.gif"> 356</td> 357<td> 358 <img src="foo.gif"> 359</td> 360</pre> 361 362<p>You can avoid such quirks by using indent: no or 363indent: auto in the config file.</p> 364 365<h3>Internationalization issues</h3> 366 367<p>Tidy offers you a choice of character encodings: US ASCII, ISO 368Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The 369full set of HTML 4.0 entities are defined. Cleaned up output uses 370HTML entity names for characters when appropriate. Otherwise 371characters outside the normal range are output as numeric 372character entities. Tidy defaults to assuming you want the output 373to be in US ASCII. Tidy doesn't yet recognize the use of the HTML 374meta element for specifying the character encoding.</p> 375 376<h3>Accessibility</h3> 377 378<p>Tidy offers advice on accessibility problems for people using 379non-graphical browsers. The most common thing you will see is the 380suggestion you add a summary attribute to table elements. The 381idea is to provide a summary of the table's role and structure 382suitable for use with aural browsers.</p> 383 384<h3>Cleaning up presentational markup</h3> 385 386<p>Many tools generate HTML with an excess of FONT, NOBR and 387CENTER tags. Tidy's <em>-clean</em> option will replace them by 388style properties and rules using CSS. This makes the markup 389easier to read and maintain as well as reducing the file size! 390Tidy is expected to get smarter at this in the future.</p> 391 392<p>Some pages rely on the presentation effects of isolated 393<p> or </p> tags.Tidy deletes empty paragraph and 394heading elements etc. The use of empty paragraph elements is not 395recommended for adding vertical whitespace. Instead use style 396sheets, or the <br> element. Tidy won't discard paragraphs 397only containing a nonbreaking space &nbsp;</p> 398 399<h3>Teaching Tidy about new tags!</h3> 400 401<p>You can teach Tidy about new tags by declaring them in the 402configuration file, the syntax is:</p> 403 404<pre> 405 new-inline-tags: <em>tag1, tag2, tag3</em> 406 new-empty-tags: <em>tag1, tag2, tag3</em> 407 new-blocklevel-tags: <em>tag1, tag2, tag3</em> 408 new-pre-tags: <em>tag1, tag2, tag3</em> 409</pre> 410 411<p>The same tag can be defined as empty and as inline or as empty 412and as block.</p> 413 414<p>These declarations can be combined to define an a new empty 415inline or empty block element, but you are not advised to declare 416tags as being both inline and block!</p> 417 418<p>Note that the new tags can only appear where Tidy expects 419inline or block-level tags respectively. This means you can't 420(yet) place new tags within the document head or other contexts 421with restricted content models. So far the most popular use of 422this feature is to allow Tidy to be applied to Cold Fusion 423files.</p> 424 425<p class="c7">I am working on ways to make it easy to customize 426the permitted document syntax using <a 427href="http://www.w3.org/People/Raggett/dtdgen/Docs/">assertion 428grammars</a>, and hope to apply this to a much smarter version of 429Tidy for release later this year or early next year.</p> 430 431<h3>Limited support for ASP, JSTE and PHP</h3> 432 433<p>Tidy is somewhat aware of the preprocessing language called 434ASP which uses a pseudo element syntax <% ... %> 435to include preprocessor directives. ASP is normally interpreted 436by the web server before delivery to the browser. JSTE shares the 437same syntax, but sometimes also uses <# ... #>. 438Tidy can also cope with another such language called PHP, which 439uses the syntax <?php ... ?></p> 440 441<p>Tidy will cope with ASP, JSTE and PHP pseudo elements within 442element content and as replacements for attributes, for 443example:</p> 444 445<pre> 446 <option <% if rsSchool.Fields("ID").Value 447 = session("sessSchoolID") 448 then Response.Write("selected") %> 449 value='<%=rsSchool.Fields("ID").Value%>'> 450 <%=rsSchool.Fields("Name").Value%> 451 (<%=rsSchool.Fields("ID").Value%>) 452 </option> 453</pre> 454 455<p>Note that Tidy doesn't understand the scripting language used 456within pseudo elements and attributes, and can easily get 457confused. Tidy may report missing attributes when these are 458hidden within preprocessor code. Tidy can also get things wrong 459if the code includes quote marks, e.g. if the example above is 460changed to:</p> 461 462<pre> 463 value="<%=rsSchool.Fields("ID").Value%>" 464</pre> 465 466<p>Tidy will now see the quote mark preceding ID as ending the 467attribute value, and proceed to complain about what follows. Note 468you can choose whether to allow line wrapping on spaces within 469pseudo elements or not using the <tt>wrap-asp</tt> option. If you 470used ASP, JSTE or PHP to create a start tag, but placed the end 471tag explicitly in the markup, Tidy won't be able to match them 472up, and will delete the end tag for you. So in this case you are 473advise to make the start tag explicit and to use ASP, JSTE or PHP 474for just the attributes, e.g.</p> 475 476<pre> 477 <a href="<%=random.site()%>">do you feel lucky?</a> 478</pre> 479 480<p>Tidy allows you to control whether line wrapping is enabled 481for ASP, JSTE and PHP instructions, see the wrap-asp, wrap-jste 482and wrap-php config options, respectively.</p> 483 484<p>I regret that Tidy does <b>not</b> support Tango preprocessing 485instructions which look like:</p> 486 487<pre> 488<@if variable_1='a'> 489 do something 490<@else> 491 do nothing 492</@if> 493 494<@include <@cgi><@appfilepath>includes/message.html> 495</pre> 496 497<p>Tidy supports another preprocessing syntax called "Tango", but 498only for attribute values. Adding support for pseudo elements 499written in Tango looks as if it would be quite tough, so I would 500like to gauge the level of interest before committing to this 501work.</p> 502 503<h3>Limited support for XML</h3> 504 505<p>XML processors compliant with W3C's XML 1.0 recommendation are 506very picky about which files they will accept. Tidy can help you 507to fix errors that cause your XML files to be rejected. Tidy 508doesn't yet recognize all XML features though, e.g. it doesn't 509understand CDATA sections or DTD subsets.</p> 510 511<h3>Creating Slides</h3> 512 513<p>The <em>-slides</em> option allows you to burst a single HTML 514file into a number of linked slides. Each H2 element in the input 515file is treated as delimiting the start of the next slide. The 516slides are named slide1.html, slide2.html, slide3.html etc. This 517is a relatively new feature and ideas are welcomed as to how to 518improve it. In particular, I plan to add support to the 519configuration file for setting the style sheet for slides and for 520customizing the slides via a template.</p> 521 522<p>I would be interested in hearing from anyone who can offer 523help with using JavaScript for adding dynamic effects to slides, 524for instance similar to those available in Microsoft 525PowerPoint.</p> 526 527<h3>Indenting text for a better layout</h3> 528 529<p>Indenting the content of elements makes the markup easier to 530read. Tidy can do this for all elements or just for those where 531it's needed. The auto-indent mode has been used below to avoid 532indenting the content of title, p and li elements:</p> 533 534<pre> 535<html> 536 <head> 537 <title>Test document</title> 538 </head> 539 540 <body> 541 <p>para which has enough text to cause a line break, 542 and so test the wrapping mechanism for long lines.</p> 543<pre> 544This is 545<em>genuine 546 preformatted</em> 547 text 548</pre> 549 550 <ul> 551 <li>1st list item</li> 552 553 <li>2nd list item</li> 554 </ul> 555 <!-- end comment --> 556 </body> 557</html> 558</pre> 559 560<p>Indenting the content does increase the size of the file, so 561you may prefer Tidy's default style:</p> 562 563<pre> 564 <html> 565 <head> 566 <title>Test document</title> 567 </head> 568 <body> 569 <p>para which has enough text to cause a line break, 570 and so test the wrapping mechanism for long lines.</p> 571 572 <pre>This is 573 <em>genuine 574 preformatted</em> 575 text 576 </pre> 577 578 <ul> 579 <li>1st list item </li> 580 581 <li>2nd list item</li> 582 </ul> 583 584 <!-- end comment --> 585 </body> 586 </html> 587 588</pre> 589 590<h3><a id="help" name="help">How to run tidy</a></h3> 591 592<pre> 593 <span class="c8">tidy</span> <em>[[options] filename]*</em> 594</pre> 595 596<p>HTML tidy is not (yet) a Windows program. If you run tidy 597without any arguments, it will just sit there waiting to read 598markup on the stdin stream. Tidy's input and output default to 599stdin and stdout respectively. Errors are written to stderr but 600can be redirected to a file with the -f <em>filename</em> 601option.</p> 602 603<p>I generally use the -m option to get tidy to update the 604original file, and if the file is particularly bad I also use the 605-f option to write the errors to a file to make it easier to 606review them. Tidy supports a small set of character encoding 607options. The default is ASCII, which makes it easy to edit markup 608in regular text editors.</p> 609 610<p>For instance:</p> 611 612<pre> 613 tidy -f errs.txt -m index.html 614</pre> 615 616<p>which runs tidy on the file "index.html" updating it in place 617and writing the error messages to the file "errs.txt". Its a good 618idea to save your work before tidying it, as with all complex 619software, tidy may have bugs. If you find any please let me 620know!</p> 621 622<p>Thanks to Jacek Niedziela, The Win32 executable for tidy is 623now able to example wild cards in filenames. This utilizes the 624setargv library supplied with VC++.</p> 625 626<p>Tidy writes errors to stderr, and won't be paused by the more 627command. A work around is to redirect stderr to stdout as 628follows. This works on Unix and Windows NT, but not on other 629platforms. My thanks to Markus Wolf for this tip!</p> 630 631<pre> 632 tidy file.html 2>&1 | more 633</pre> 634 635<h4>Tidy's Options</h4> 636 637<p>To get a list of available options use:</p> 638 639<pre> 640 tidy -help 641</pre> 642 643<p>You may want to run it through more to view the help a page at 644a time.</p> 645 646<pre> 647 tidy -help | more 648</pre> 649 650<p>Input and Output default to stdin/stdout respectively. Single 651letter options apart from -f may be combined as in: tidy -f 652errs.txt -imu foo.html</p> 653 654<p>Matej Vela <<a 655href="mailto:vela@debian.org">vela@debian.org</a>> has written 656a <a href="man_page.txt">Unix man page for Tidy</a>, but for the 657latest details on config options and for the release notes please 658visit this page: <a 659href="http://www.w3.org/People/Raggett/tidy">http://www.w3.org/People/Raggett/tidy</a>.</p> 660 661<h3><a id="config" name="config">Using a Configuration 662File</a></h3> 663 664<p>Tidy now supports a configuration file, and this is now much 665the most convenient way to configure Tidy. Assuming you have 666created a config file named "config.txt" (the name doesn't 667matter), you can instruct Tidy to use it via the command line 668option <tt>-config config.txt</tt>, e.g.</p> 669 670<pre> 671 tidy -config config.txt file1.html file2.html 672</pre> 673 674<p>Alternatively, you can name the default config file via the 675environment variable named "HTML_TIDY". Note this should be the 676absolute path since you are likely to want to run Tidy in 677different directories. You can also set a config file at compile 678time by defining CONFIG_FILE as the path string, see 679platform.h.</p> 680 681<p>You can now set config options on the command line by 682preceding the name of the option immediately (no intervening 683space) by "--", for example:</p> 684 685<pre> 686 tidy --break-before-br true --show-warnings false 687</pre> 688 689<p>The following options are supported:</p> 690 691<dl> 692<dt>tidy-mark: <em>bool</em></dt> 693 694<dd>If set to <em>yes</em> (the default) Tidy will add a meta 695element to the document head to indicate that the document has 696been tidied. To suppress this, set tidy-mark to <em>no</em>. Tidy 697won't add a meta element if one is already present.</dd> 698 699<dt>markup: <em>bool</em></dt> 700 701<dd>Determines whether Tidy generates a pretty printed version of 702the markup. Bool values are either <em>yes</em> or <em>no</em>. 703Note that Tidy won't generate a pretty printed version if it 704finds unknown tags, or missing trailing quotes on attribute 705values, or missing trailing '>' on tags. The default is 706<em>yes</em>.</dd> 707 708<dt>wrap: <em>number</em></dt> 709 710<dd>Sets the right margin for line wrapping. Tidy tries to wrap 711lines so that they do not exceed this length. The default is 66. 712Set wrap to zero if you want to disable line wrapping.</dd> 713 714<dt>wrap-attributes: <em>bool</em></dt> 715 716<dd>If set to <em>yes</em>, attribute values may be wrapped 717across lines for easier editing. The default is no. This option 718can be set independently of wrap-scriptlets</dd> 719 720<dt>wrap-script-literals: <em>bool</em></dt> 721 722<dd>If set to <em>yes</em>, this allows lines to be wrapped 723within string literals that appear in script attributes. The 724default is <em>no</em>. The example shows how Tidy wraps a really 725really long script string literal inserting a backslash character 726before the linebreak: 727 728<pre> 729<a href="somewhere.html" onmouseover="document.status = '...some \ 730really, really, really, really, really, really, really, really, \ 731really, really long string..';">test</a> 732</pre> 733</dd> 734 735<dt>wrap-asp: <em>bool</em></dt> 736 737<dd>If set to <em>no</em>, this prevents lines from being wrapped 738within ASP pseudo elements, which look like: 739<% ... %>. The default is <em>yes</em>.</dd> 740 741<dt>wrap-jste: <em>bool</em></dt> 742 743<dd>If set to <em>no</em>, this prevents lines from being wrapped 744within JSTE pseudo elements, which look like: 745<# ... #>. The default is <em>yes</em>.</dd> 746 747<dt>wrap-php: <em>bool</em></dt> 748 749<dd>If set to <em>no</em>, this prevents lines from being wrapped 750within PHP pseudo elements. The default is <em>yes</em>.</dd> 751 752<dt>literal-attributes: <em>bool</em></dt> 753 754<dd>If set to <em>yes</em>, this ensures that whitespace 755characters within attribute values are passed through unchanged. 756The default is <em>no</em>.</dd> 757 758<dt>tab-size: <em>number</em></dt> 759 760<dd>Sets the number of columns between successive tab stops. The 761default is 4. It is used to map tabs to spaces when reading 762files. Tidy never outputs files with tabs.</dd> 763 764<dt>indent: <em>no, yes</em> or <em>auto</em></dt> 765 766<dd>If set to <em>yes</em>, Tidy will indent block-level tags. 767The default is <em>no</em>. If set to <em>auto</em> Tidy will 768decide whether or not to indent the content of tags such as 769title, h1-h6, li, td, th, or p depending on whether or not the 770content includes a block-level element. You are advised to avoid 771setting indent to yes as this can expose layout bugs in some 772browsers.</dd> 773 774<dt>indent-spaces: <em>number</em></dt> 775 776<dd>Sets the number of spaces to indent content when indentation 777is enabled. The default is 2 spaces.</dd> 778 779<dt>indent-attributes: <em>bool</em></dt> 780 781<dd>If set to <em>yes</em>, each attribute will begin on a new 782line. The default is <em>no</em>.</dd> 783 784<dt>hide-endtags: <em>bool</em></dt> 785 786<dd>If set to <em>yes</em>, optional end-tags will be omitted 787when generating the pretty printed markup. This option is ignored 788if you are outputting to XML. The default is <em>no</em>.</dd> 789 790<dt>input-xml: <em>bool</em></dt> 791 792<dd>If set to <em>yes</em>, Tidy will use the XML parser rather 793than the error correcting HTML parser. The default is 794<em>no</em>.</dd> 795 796<dt>output-xml: <em>bool</em></dt> 797 798<dd>If set to <em>yes</em>, Tidy will use generate the pretty 799printed output writing it as well-formed XML. Any entities not 800defined in XML 1.0 will be written as numeric entities to allow 801them to be parsed by an XML parser. The tags and attributes will 802be in the case used in the input document, regardless of other 803options. The default is <em>no</em>.</dd> 804 805<dt>add-xml-pi: <em>bool</em></dt> 806 807<dt>add-xml-decl: <em>bool</em></dt> 808 809<dd>If set to <em>yes</em>, Tidy will add the XML declatation 810when outputting XML or XHTML. The default is <em>no</em>. Note 811that if the input document includes an <?xml?> declaration 812then it will appear in the output independent of the value of 813this option.</dd> 814 815<dt>output-xhtml: <em>bool</em></dt> 816 817<dd>If set to <em>yes</em>, Tidy will generate the pretty printed 818output writing it as extensible HTML. The default is <em>no</em>. 819This option causes Tidy to set the doctype and default namespace 820as appropriate to XHTML. If a doctype or namespace is given they 821will checked for consistency with the content of the document. In 822the case of an inconsistency, the corrected values will appear in 823the output. For XHTML, entities can be written as named or 824numeric entities according to the value of the "numeric-entities" 825property. The tags and attributes will be output in the case used 826in the input document, regardless of other options.</dd> 827 828<dt>doctype: <em>omit, auto, strict, loose</em> or 829<<em>fpi</em>></dt> 830 831<dd>This property controls the doctype declaration generated by 832Tidy. If set to <em>omit</em> the output file won't contain a 833doctype declaration. If set to <em>auto</em> (the default) Tidy 834will use an educated guess based upon the contents of the 835document. If set to <em>strict</em>, Tidy will set the doctype to 836the strict DTD. If set to <em>loose</em>, the doctype is set to 837the loose (transitional) DTD. Alternatively, you can supply a 838string for the formal public identifier (fpi) for example:</dd> 839 840<dd> 841<pre> 842 doctype: "-//ACME//DTD HTML 3.14159//EN" 843</pre> 844</dd> 845 846<dd>If you specify the fpi for an XHTML document, Tidy will set 847the system identifier to the empty string. Tidy leaves the 848document type for generic XML documents unchanged.</dd> 849 850<dt>char-encoding: <em>raw, ascii, latin1, utf8</em> or 851<em>iso2022</em></dt> 852 853<dd>Determines how Tidy interprets character streams. For 854<em>ascii</em>, Tidy will accept Latin-1 character values, but 855will use entities for all characters whose value > 127. For 856<em>raw</em>, Tidy will output values above 127 without 857translating them into entities. For <em>latin1</em> characters 858above 255 will be written as entities. For <em>utf8</em>, Tidy 859assumes that both input and output is encoded as UTF-8. You can 860use <em>iso2022</em> for files encoded using the ISO2022 family 861of encodings e.g. ISO 2022-JP. The default is 862<em>ascii</em>.</dd> 863 864<dt>numeric-entities: <em>bool</em></dt> 865 866<dd>Causes entities other than the basic XML 1.0 named entities 867to be written in the numeric rather than the named entity form. 868The default is <em>no</em></dd> 869 870<dt>quote-marks: <em>bool</em></dt> 871 872<dd>If set to <em>yes</em>, this causes " characters to be 873written out as &quot; as is preferred by some editing 874environments. The apostrophe character ' is written out as 875&#39; since many web browsers don't yet support &apos;. 876The default is <em>no</em>.</dd> 877 878<dt>quote-nbsp: <em>bool</em></dt> 879 880<dd>If set to <em>yes</em>, this causes non-breaking space 881characters to be written out as entities, rather than as the 882Unicode character value 160 (decimal). The default is 883<em>yes</em>.</dd> 884 885<dt>quote-ampersand: <em>bool</em></dt> 886 887<dd>If set to <em>yes</em>, this causes unadorned & 888characters to be written out as &amp;. The default is 889<em>yes</em>.</dd> 890 891<dt>assume-xml-procins: <em>bool</em></dt> 892 893<dd>If set to <em>yes</em>, this changes the parsing of 894processing instructions to require ?> as the terminator rather 895than >. The default is <em>no</em>. This option is 896automatically set if the input is in XML.</dd> 897 898<dt>fix-backslash: <em>bool</em></dt> 899 900<dd>If set to <em>yes</em>, this causes backslash characters "\" 901in URLs to be replaced by forward slashes "/". The default is 902<em>yes</em>.</dd> 903 904<dt>break-before-br: <em>bool</em></dt> 905 906<dd>If set to <em>yes</em>, Tidy will output a line break before 907each <br> element. The default is <em>no</em>.</dd> 908 909<dt>uppercase-tags: <em>bool</em></dt> 910 911<dd>Causes tag names to be output in upper case. The default is 912<em>no</em> resulting in lowercase, except for XML input where 913the original case is preserved.</dd> 914 915<dt>uppercase-attributes: <em>bool</em></dt> 916 917<dd>If set to <em>yes</em> attribute names are output in upper 918case. The default is <em>no</em> resulting in lowercase, except 919for XML where the original case is preserved.</dd> 920 921<dt><a id="word2000" name="word2000">word-2000: 922<em>bool</em></a></dt> 923 924<dd>If set to <em>yes</em>, Tidy will go to great pains to strip 925out all the surplus stuff Microsoft Word 2000 inserts when you 926save Word documents as "Web pages". The default is <em>no</em>. 927Note that Tidy doesn't yet know what to do with VML markup from 928Word, but in future I hope to be able to map VML to SVG.<br /> 929<br /> 930 Microsoft has developed its own optional filter for exporting to 931HTML, and the 2.0 version is much improved. You can download the 932filter free from the <a 933href="http://officeupdate.microsoft.com/2000/downloadDetails/Msohtmf2.htm"> 934Microsoft Office Update site</a>.</dd> 935 936<dt>clean: <em>bool</em></dt> 937 938<dd>If set to <em>yes</em>, causes Tidy to strip out surplus 939presentational tags and attributes replacing them by style rules 940and structural markup as appropriate. It works well on the html 941saved from Microsoft Office'97. The default is <em>no</em>.</dd> 942 943<dt>logical-emphasis: <em>bool</em></dt> 944 945<dd>If set to <em>yes</em>, causes Tidy to replace any occurrence 946of i by em and any occurrence of b by strong. In both cases, the 947attributes are preserved unchanged. The default is <em>no</em>. 948This option can now be set independently of the clean and 949drop-font-tags options.</dd> 950 951<dt>drop-empty-paras: <em>bool</em></dt> 952 953<dd>If set to <em>yes</em>, empty paragraphs will be discarded. 954If set to no, empty paragraphs are replaced by a pair of 955<code>br</code> elements as HTML4 precludes empty paragraphs. The 956default is <em>yes</em>.</dd> 957 958<dt>drop-font-tags: <em>bool</em></dt> 959 960<dd>If set to <em>yes</em> together with the clean option (see 961above), Tidy will discard font and center tags rather than 962creating the corresponding style rules. The default is 963<em>no</em>.</dd> 964 965<dt>enclose-text: <em>bool</em></dt> 966 967<dd>If set to <em>yes</em>, this causes Tidy to enclose any text 968it finds in the body element within a p element. This is useful 969when you want to take an existing html file and use it with a 970style sheet. Any text at the body level will screw up the 971margins, but wrap the text within a p element and all is well! 972The default is <em>no</em>.</dd> 973 974<dt>enclose-block-text: <em>bool</em></dt> 975 976<dd>If set to <em>yes</em>, this causes Tidy to insert a p 977element to enclose any text it finds in any element that allows 978mixed content for HTML transitional but not HTML strict. The 979default is <em>no</em>.</dd> 980 981<dt>fix-bad-comments: <em>bool</em></dt> 982 983<dd>If set to <em>yes</em>, this causes Tidy to replace 984unexpected hyphens with "=" characters when it comes across 985adjacent hyphens. The default is <em>yes</em>. This option is 986provided for users of Cold Fusion which uses the comment syntax: 987<!--- ---></dd> 988 989<dt>add-xml-space: <em>bool</em></dt> 990 991<dd>If set to <em>yes</em>, this causes Tidy to add 992xml:space="preserve" to elements such as pre, style and script 993when generating XML. This is needed if the whitespace in such 994elements is to be parsed appropriately without having access to 995the DTD. The default is <em>no</em>.</dd> 996 997<dt>alt-text: <em>string</em></dt> 998 999<dd>This allows you to set the default alt text for img 1000attributes. This feature is dangerous as it suppresses further 1001accessibility warnings. <b>YOU ARE RESPONSIBLE FOR MAKING YOUR 1002DOCUMENTS ACCESSIBLE TO PEOPLE WHO CAN'T SEE THE 1003IMAGES!!!</b></dd> 1004 1005<dt>write-back: <em>bool</em></dt> 1006 1007<dd>If set to <em>yes</em>, Tidy will write back the tidied 1008markup to the same file it read from. The default is <em>no</em>. 1009You are advised to keep copies of important files before tidying 1010them as on rare occasions the result may not always be what you 1011expect.</dd> 1012 1013<dt>keep-time: <em>bool</em></dt> 1014 1015<dd>If set to <em>yes</em>, Tidy won't alter the last modified 1016time for files it writes back to. The default is <em>yes</em>. 1017This allows you to tidy files without effecting which ones will 1018be uploaded to the Web server when using a tool such as 1019'SiteCopy'. Note that this feature may not work on some 1020platforms.</dd> 1021 1022<dt>error-file: <em>filename</em></dt> 1023 1024<dd>Writes errors and warnings to the named file rather than to 1025stderr.</dd> 1026 1027<dt>show-warnings: <em>bool</em></dt> 1028 1029<dd>If set to <em>no</em>, warnings are suppressed. This can be 1030useful when a few errors are hidden in a flurry of warnings. The 1031default is <em>yes</em>.</dd> 1032 1033<dt>quiet: <em>bool</em></dt> 1034 1035<dd>If set to <em>yes</em>, Tidy won't output the welcome message 1036or the summary of the numbers of errors and warnings. The default 1037is <em>no</em>.</dd> 1038 1039<dt>gnu-emacs: <em>bool</em></dt> 1040 1041<dd>If set to <em>yes</em>, Tidy changes the format for reporting 1042errors and warnings to a format that is more easily parsed by GNU 1043Emacs. The default is <em>no</em>.</dd> 1044 1045<dt>split: <em>bool</em></dt> 1046 1047<dd>If set to <em>yes</em> Tidy will use the input file to create 1048a sequence of slides, splitting the markup prior to each 1049successive <h2>. You can see an example of the results in a 1050<a 1051href="http://www.w3.org/Talks/1999/03/24-stockholm-xhtml/">recent 1052talk I made on XHTML</a>. The slides are written to 1053"slide1.html", "slide2.html" etc. The default is 1054<em>no</em>.</dd> 1055 1056<dt>new-empty-tags: <em>tag1, tag2, tag3</em></dt> 1057 1058<dd>Use this to declare new empty inline tags. The option takes a 1059space or comma separated list of tag names. Unless you declare 1060new tags, Tidy will refuse to generate a tidied file if the input 1061includes previously unknown tags. Remember to also declare empty 1062tags as either inline or blocklevel, see below.</dd> 1063 1064<dt>new-inline-tags: <em>tag1, tag2, tag3</em></dt> 1065 1066<dd>Use this to declare new non-empty inline tags. The option 1067takes a space or comma separated list of tag names. Unless you 1068declare new tags, Tidy will refuse to generate a tidied file if 1069the input includes previously unknown tags.</dd> 1070 1071<dt>new-blocklevel-tags: <em>tag1, tag2, tag3</em></dt> 1072 1073<dd>Use this to declare new block-level tags. The option takes a 1074space or comma separated list of tag names. Unless you declare 1075new tags, Tidy will refuse to generate a tidied file if the input 1076includes previously unknown tags. Note you can't change the 1077content model for elements such as table, ul, ol and dl. This is 1078explained in more detail in the <a 1079href="release-notes.html">release notes</a>.</dd> 1080 1081<dt>new-pre-tags: <em>tag1, tag2, tag3</em></dt> 1082 1083<dd>Use this to declare new tags that are to be processed in 1084exactly the same way as HTML's pre element. The option takes a 1085space or comma separated list of tag names. Unless you declare 1086new tags, Tidy will refuse to generate a tidied file if the input 1087includes previously unknown tags. Note you can't as yet add new 1088CDATA elements (similar to script).</dd> 1089</dl> 1090 1091<h4>Sample Config File</h4> 1092 1093<p>This is just an example to get you started.</p> 1094 1095<pre> 1096// sample config file for HTML tidy 1097indent: auto 1098indent-spaces: 2 1099wrap: 72 1100markup: yes 1101output-xml: no 1102input-xml: no 1103show-warnings: yes 1104numeric-entities: yes 1105quote-marks: yes 1106quote-nbsp: yes 1107quote-ampersand: no 1108break-before-br: no 1109uppercase-tags: no 1110uppercase-attributes: no 1111char-encoding: latin1 1112new-inline-tags: cfif, cfelse, math, mroot, 1113 mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover, 1114 munder, mover, mmultiscripts, msup, msub, mtext, 1115 mprescripts, mtable, mtr, mtd, mth 1116new-blocklevel-tags: cfoutput, cfquery 1117new-empty-tags: cfelse 1118</pre> 1119 1120<h3><a id="scripts" name="scripts">Using Tidy from 1121scripts</a></h3> 1122 1123<p>If you want to run Tidy from a Perl or other scripting 1124language you may find it of value to inspect the result returned 1125by Tidy when it exits: 0 if everything is fine, 1 if there were 1126warnings and 2 if there were errors. This is an example using 1127Perl:</p> 1128 1129<pre> 1130if (close(TIDY) == 0) { 1131 my $exitcode = $? >> 8; 1132 if ($exitcode == 1) { 1133 printf STDERR "tidy issued warning messages\n"; 1134 } elsif ($exitcode == 2) { 1135 printf STDERR "tidy issued error messages\n"; 1136 } else { 1137 die "tidy exited with code: $exitcode\n"; 1138 } 1139} else { 1140 printf STDERR "tidy detected no errors\n"; 1141} 1142</pre> 1143 1144<h3><a id="download" name="download">Downloadable 1145Binaries</a></h3> 1146 1147<p class="note">If you are prepared to maintain a public URL for 1148HTML Tidy compiled for a specific platform, please let me know so 1149that I can add a link to your page. This will avoid the need for 1150me to update this page whenever you recompile.</p> 1151 1152<div class="platforms"> 1153<h4>Windows 95/98/NT/2000</h4> 1154 1155<p><b><a 1156href="http://www.w3.org/People/Raggett/tidy.exe">tidy.exe</a></b>. 1157Windows 95/98/NT/2000 executable (32-bit Windows console-mode 1158program). This is the executable that I maintain as part of the 1159HTML Tidy distribution. The command line parameters are described 1160above, along with the extensive configuration file options.</p> 1161 1162<p><b><a 1163href="http://www.chami.com/free/html-kit/">HTML-Kit</a></b> - a 1164free HTML editor for Windows 95/98/NT/2000 with integrated 1165support for Tidy.</p> 1166 1167<p><b><a 1168href="http://perso.wanadoo.fr/ablavier/TidyGUI/">TidyGUI</a></b>. 1169Windows front end for running Tidy, written by André 1170Blavier. André has also written a <b><a 1171href="http://perso.wanadoo.fr/ablavier/TidyCOM/">Windows COM 1172wrapper</a></b> for Tidy. He describes how to use this from 1173Visual Basic.</p> 1174 1175<p><b><a href="http://www.evrsoft.com/">Evrsoft's 1st Page 11762000</a></b> - a free HTML editor for Windows 95/98/NT/2000 with 1177integrated support for Tidy. 1st Page 2000 is a high-end 1178authoring tool that makes it easy to add effects based upon 1179scripting.</p> 1180 1181<p><b><a href="http://www.notetab.com/">NoteTab</a></b> - an 1182award winning text and html editor for Windows with built-in 1183support for running HTML Tidy. NoteTab is written by Eric 1184Fookes.</p> 1185 1186<h4>Mac OS</h4> 1187 1188Several versions of <a 1189href="http://www.geocities.com/SiliconValley/1057/tidy.html">HTML 1190Tidy for Mac OS</a> are available, including a standalone 1191Macintosh application with a graphical user interface, a BBEdit 1192plugin, a MPW tool, or as a FilterTop filter ( <a 1193href="http://www.geocities.com/SiliconValley/1057/images/TidyHTML.GIF"> 1194Screenshot</a>). My thanks to <a 1195href="mailto:teague@mailandnews.com">Terry Teague</a> for this 1196port.<br /> 1197<br /> 1198 1199 1200<h4>Atari</h4> 1201 1202<p>Arnaud Bercegeay's site for the <a 1203href="http://tidy.atari.org">Atari binary for Tidy</a>.</p> 1204 1205<h4>Amiga</h4> 1206 1207<p>Keith Blakemore-Noble maintains a page for <a 1208href="http://www.amiga.u-net.com/MadDogSoftware/Tidy.html">Tidy 1209on Amiga</a>.</p> 1210 1211<h4>BeOS</h4> 1212 1213<p>Peter Enzerink is maintaining <a 1214href="http://www.bytepeople.com/beos/apps/htmltidy.html">HTML 1215Tidy</a> for BeOS. Link points to download for HTML Tidy as well 1216as HTML Tidy editor addons for BeOS.</p> 1217 1218<h4>AIX</h4> 1219 1220<p>Ciaran Deignan maintains an <a 1221href="http://www-frec.bull.com/cgi-bin/list_dir.cgi/download/">AIX 1222binary for Tidy</a>. The link is to a general download page. The 1223executable is available for AIX 4.3.2 and later.</p> 1224 1225<h4>Linux</h4> 1226 1227<p>Dimitri Papadopoulos maintains a <a 1228href="http://perso.club-internet.fr/dpo/rpm/">Tidy RPM package 1229for Redhat Linux</a> You may also be able to find Tidy on other 1230Linux distribution sites, e.g. <a 1231href="http://rpmfind.net/">http://rpmfind.net/</a>.</p> 1232 1233<!-- no longer accessible :-( 1234 <p><b><a href= 1235 "http://www.astro.uni-bonn.de/~webstw/cm/w3c_tidy/index.html"> 1236 Linux users</a></b>! ochen M. Braun is maintaining Tidy binary 1237 for Linux (ELF 32-bit LSB executable using '<tt>libc.so.5</tt>' 1238 for Intel 80386): '<a href= 1239 "ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy"><tt>tidy</tt></a> 1240 '. Additionally a man page can be downloaded: <a href= 1241 "ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy.1"><tt> 1242 tidy.1</tt></a>.</p> 1243 --> 1244<h4>UnixWare</h4> 1245 1246<p>Simon Trimmer <<a 1247href="mailto:simon@ocston.org">simon@ocston.org</a>> maintains 1248a <a href="http://www.ocston.org/~simon/tidy/">Tidy binary for 1249Unixware</a>.</p> 1250 1251<h4>HP-UX</h4> 1252 1253<p>You can get precompiled versions of Tidy for HPUX, from <a 1254href="http://www.informatik.uni-stuttgart.de/ifi/gr/mitarbeiter/hopp/tidy/tidy.html"> 1255Olaf Hopp</a>, and from <a 1256href="http://geocities.com/ian_springer/hpux_tidy.html">Ian 1257Springer</a>.</p> 1258 1259<h4>MSDOS</h4> 1260 1261<p>Nick B. maintains <a 1262href="http://members.xoom.com/nickbeee/tidy386/">Tidy386 for 1263DOS</a>. This exploits the DPMI mechanism for the memory 1264management.</p> 1265 1266<h4>Solaris</h4> 1267 1268<p>Stephen Fuqua maintains a page for <a 1269href="http://www.hep.utexas.edu/~sfuqua/unix">Tidy on 1270Solaris</a>.</p> 1271 1272<h4>OS/2</h4> 1273 1274<p>Kaz SHiMZ <<a 1275href="mailto:kshimz@sfc.co.jp">kshimz@sfc.co.jp</a>> maintains 1276an <a 1277href="http://www.dd.iij4u.or.jp/~kshimz/warp/tidy/index.html">OS/2 1278binary for Tidy</a>.</p> 1279 1280<h4>FreeBSD</h4> 1281 1282<p>Martin Fouts maintains <a 1283href="http://www.fogey.com/fouts/tidy.htm">Tidy on 1284FreeBSD</a>.</p> 1285 1286<h4>RISC OS</h4> 1287 1288<p><a href="mailto:archifishal@altavista.net">Alex Macfarlane 1289Smith</a> maintains a <a 1290href="http://www.toth.org.uk/~aardvark/programs/tidy.shtml">port 1291of Tidy to the RISC OS</a>.</p> 1292 1293<h4>MiNT (Atari) OS</h4> 1294 1295<p><a href="mailto:eaiching@t0.or.at)">Edgar Aichinger</a> 1296maintains a <a 1297href="http://wh58-508.st.uni-magdeburg.de/sparemint/html/packages/tidy.html"> 1298port of Tidy to the MiNT OS</a>. MiNT is a UNIX for m68k Atari 1299computers and is nearly FHS compliant (we don't use bootable OS 1300images nor have any mounting capabilities, so neither /boot nor 1301/mnt are used). The binary also runs on ordinary TOS, since the 1302MiNT libraries cover all GEMDOS/GEM functions.</p> 1303</div> 1304 1305<h3><a id="quotes" name="quotes">Integrating Tidy as part of 1306other Software</a></h3> 1307 1308<p>You can also incorporate Tidy as part of a larger program, for 1309instance in HTML editors or HTML transformation tools used for 1310import filters, or for when you want to customize Web content to 1311get the best out of different kinds of browsers. Imagine 1312authoring clean HTML with CSS and at a touch of a button 1313producing variants that look great and work reliably on a large 1314variety of different browsers, taking into account the quirks of 1315each. For instance, providing the ability to tune content for 1316different versions of Netscape and Internet Explorer, and for 1317browsers running on set-top boxes for televisions, handheld and 1318palmtop devices, cell phones, and voice browsers. I am happy to 1319quote for software development for such tools.</p> 1320 1321<p>Sebastian Lange has contributed a perl wrapper for calling 1322Tidy from your perl scripts, see <a 1323href="sl-tidy.pl">sl-tidy.pl</a>.</p> 1324 1325<h4>Using Tidy from emacs</h4> 1326 1327<p>Pete Gelbman emailed this <a 1328href="http://lists.w3.org/Archives/Public/html-tidy/2000AprJun/0047.html"> 1329tip</a> for using Tidy with the Unix version of emacs. lets you 1330highlight a region of text and run Tidy on it. Tidy's "fixed" 1331output will replace your highlighted region right in place. The 1332error/warnings output will be directed into a separate 1333mini-buffer below in your main screen.</p> 1334 1335<h3><a id="java" name="java">Java port of HTML Tidy</a></h3> 1336 1337<p>Andy Quick <<a 1338href="mailto:ac.quick@sympatico.ca">ac.quick@sympatico.ca</a>> 1339maintains a Java port of Tidy, so you can now integrate Tidy into 1340your Java applications. Andy is tracking the releases of Tidy in 1341C (this page). More information is available on <a 1342href="http://www3.sympatico.ca/ac.quick/">Andy's home 1343page</a>.</p> 1344 1345<h3><a id="implementation" name="implementation">Source 1346Code</a></h3> 1347 1348<p>The code is in ANSI C and uses the C standard library for i/o. 1349The parser works top down, building a complete parse tree in 1350memory. Document text is held as Unicode represented as UTF-8 in 1351a character buffer that expands as needed. The code has so far 1352been tested on Windows'95, Windows'98, Windows NT, Windows 2000, 1353Linux, FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX, NeXtStep, 1354MacOS, BeOS, OS/2, AIX, Amiga, Atari, SunOS, Solaris, IRIX and 1355HP-UX, amongst others.</p> 1356 1357<p>Here is a link to the Open Source <a href="tidy.c">copyright 1358notice and license</a>.</p> 1359 1360<dl> 1361<dt><a href="/tidy4aug00.tgz">tidy4aug00.tgz</a></dt> 1362 1363<dd>gzipped tar file for source code (Unix line ends)</dd> 1364 1365<dt><a href="/tidy4aug00.zip">tidy4aug00.zip</a></dt> 1366 1367<dd>zipped source code (Windows line ends)</dd> 1368 1369<dt><a href="platform.h">platform.h</a>, <a 1370href="html.h">html.h</a></dt> 1371 1372<dd>the include files with common definitions</dd> 1373 1374<dt><a href="config.c">config.c</a></dt> 1375 1376<dd>support for customizing Tidy via config files</dd> 1377 1378<dt><a href="lexer.c">lexer.c</a></dt> 1379 1380<dd>lexical analysis and buffer management</dd> 1381 1382<dt><a href="parser.c">parser.c</a></dt> 1383 1384<dd>HTML and XML parsers</dd> 1385 1386<dt><a href="tags.c">tags.c</a></dt> 1387 1388<dd>dictionary of tags and their properties</dd> 1389 1390<dt><a href="attrs.c">attrs.c</a></dt> 1391 1392<dd>dictionary of attributes and their properties</dd> 1393 1394<dt><a href="istack.c">istack.c</a></dt> 1395 1396<dd>stack of active inline elements</dd> 1397 1398<dt><a href="entities.c">entities.c</a></dt> 1399 1400<dd>dictionary of entities</dd> 1401 1402<dt><a href="clean.c">clean.c</a></dt> 1403 1404<dd>smarts for cleaning up presentational markup</dd> 1405 1406<dt><a href="pprint.c">pprint.c</a></dt> 1407 1408<dd>pretty printing for HTML and XML</dd> 1409 1410<dt><a href="localize.c">localize.c</a></dt> 1411 1412<dd>Change this file to localize tidy's messages</dd> 1413 1414<dt><a href="tidy.c">tidy.c</a></dt> 1415 1416<dd>main() and error reporting routines</dd> 1417 1418<dt><a href="Makefile">Makefile</a></dt> 1419 1420<dd>Makefile for gcc</dd> 1421 1422<dt><a href="man_page.txt">Unix Man page</a></dt> 1423 1424<dd>Maintained by Matej Vela <vela@debian.org></dd> 1425</dl> 1426 1427<p>Conventions for whether lines end with CRLF, LF or CR vary 1428from one system to another. I have included the C source for a 1429utility <b>tab2space</b> which can be used to ensure that files 1430use the line end convention of your choice, and to expand tabs to 1431spaces.</p> 1432 1433<pre> 1434 tab2space -t4 -unix *.h *.c 1435 tab2space -tabs -unix Makefile 1436</pre> 1437 1438<p>Note use of "-tabs" to ensure that tabs are preserved in the 1439Makefile (it won't work without them!).</p> 1440 1441<p>For those of you on Unix, here is a script you can use to 1442strip carriage returns:</p> 1443 1444<pre> 1445#!/bin/sh 1446echo Stripping Carriage Returns from files... 1447for i 1448do 1449 # If a writable file 1450 if [ -f $i ] 1451 then 1452 if [ -w $i ] 1453 then 1454 echo $i 1455 # strip CRs from input and output to temp file 1456 tr -d '\015' < $i > toix.tmp 1457 mv toix.tmp $i 1458 else 1459 echo $i: write-protected 1460 fi 1461 else 1462 echo $i: not a file 1463 fi 1464done 1465</pre> 1466 1467<p>Save this script to a file, e.g. "<em>scripcr</em>" and use 1468"<em>chmod +x stripcr</em>" to make it executable. You can then 1469run it as "<em>stripcr *.c *.h Overview.html Makefile</em>"</p> 1470 1471<h2><a id="acks" name="acks">Acknowledgements</a></h2> 1472 1473<p>I would like to thank the many people who have written to me 1474with suggestions for improvements or reporting bugs. Your help 1475has been invaluable.</p> 1476 1477<blockquote class="people">Jonathan Adair, Drew Adams, Osma 1478Ahvenlampi, Carsten Allefeld, Richard Allsebrook, Jacob Sparre 1479Andersen, Joe D'Andrea, Jerry Andrews, Bruce Aron, Takuya Asada, 1480Edward Avis, Carlos Piqueres Ayela, Nick B, Chang Hyun Baek, Nick 1481B, Denis Barbier, Chuck Baslock, Christer Bernerus, David J. 1482Biesack, John Bigby, Yu Jian Bin, Alexander Biron, Keith 1483Blakemore-Noble, Eric Blossom, Berend de Boer, Ochen M. Braun, 1484Dave Bryan, David Brooke, Andy Brown, Keith B. Brown, Andreas 1485Buchholz, Maurice Buxton, Jelks Cabaniss, John Cappelletti, 1486Trevor Carden, Terry Cassidy, Mathew Cepl, Kendall Clark, Rob 1487Clark, Jeremy Clulow, Dan Connolly, Larry Cousin, Ken Cox, Luis 1488M. Cruz, John Cumming, Ian Davey, Keith Davies, Ciaran Deignan, 1489David Duffy, Emma Duke-Williams, Tamminen Eero, Bodo Eing, Peter 1490Enzerink, Baruch Even, David Fallon, Claus André 1491Färber, Stephanie Foott, Darren Forcier, Martin Fouts, 1492Frederik Fouvry, Rene Fritz, Stephen Fuqua, Martin Gallwey, Pete 1493Gelbman, Francisco Guardiola, David Getchell, Michael Giroux, 1494Davor Golek, Guus Goos, Léa Gris, Rainer Gutsche, Kai 1495Hackemesser, Juha Häikiö, David Halliday, 1496Johann-Christian Hanke, Vlad Harchev, Shane Harrelson, Andre 1497Hinrichs, Bjoern Hoehrmann, G. Ken Holman, Bill Homer, Olaf Hopp, 1498Craig Horman, Jack Horsfield, Nigel Horspool, Pao-Hsi Huang, 1499Stuart Hungerford, Marc Jauvin, Rick Jelliffe, Peter Jeremy, 1500Craig Johnson, Charles LaFountain, Steven Lobo, Zdenek Kabelac, 1501Michael Kay, Jeffery Kendall, Axel Kielhorn, Konstantinos 1502Kleisouris, Johannes Koch, Daniel Kohn, Rudy Kohut, Allan 1503Kuchinsky, Volker Kuhlmann, Michael LaStella, Johnny Lee, Steve 1504Lee, Tony Leneis, Nick Leverton, Todd Lewis, Dietmar Lippold, 1505Gert-Jan C. Lokhorst, Murray Longmore, John Love-Jensen, 1506Satwinder Mangat, Carole Mah, Anton Marsden, Bede McCall, Shane 1507McCarron, Thomas McGuigan, Ian McKellar, Al Medeiros, Chris 1508Nappin, Ann Navarro, Jacek Niedziela, Morten Blinksbjerg Nielsen, 1509Kenichi Numata, Allan Odgaard, Matt Oshry, Gerald Oskoboiny, Paul 1510Ossenbruggen, Ernst Paalvast, Christian Pantel, Dimitri 1511Papadopoulos, Rick Parsons, Steven Pemberton, Daniel Persson, Lee 1512Anne Phillips, Xavier Plantefeve, Karl Prinz, Andy Quick, Jany 1513Quintard, Julian Reschke, Stephen Reynolds, Thomas Ribbrock, Ross 1514L. Richardson, Philip Riebold, Erik Rossen, Dan Rudman, Peter 1515Ruevski, Christian Ruetgers, Klaus Johannes Rusch, John Russell, 1516Eric Schindler, J. Schlauch, Christian Schüler, Klaus 1517Alexander Seistrup, Jim Seymour, Kazuyoshi Shimizu, Geoff 1518Sinclair, Jo Smith, Paul Smith, Steve Spilker, Rafi Stern, 1519Jacques Steyn, Michael J. Suzio, Zac Thompson, Eric Thorbjornsen, 1520Oren Tirosh, John Tobler, Omri Traub, Loïc Trégan, 1521Jason Tribbeck, Simon Trimmer, Steffen Ullrich, Stuart Updegrave, 1522Charles A. Upsdell, Jussi Vestman, Larry W. Virden, Daniel 1523Vogelheim, Nigel Wadsworth, Jez Wain, Randy Waki, Paul Ward, Neil 1524Weber, Bertilo Wennergren, Yudong Yang, Jeff Young, Edward Zalta, 1525Johannes Zellner, Christian Zuckschwerdt</blockquote> 1526 1527<h3><a id="address" name="address">Dave's Address</a></h3> 1528 1529<pre> 1530 73b Ground Corner 1531 Holt 1532 Wiltshire 1533 BA14 6RT 1534 United Kingdom 1535</pre> 1536 1537<p><small><a href="http://www.w3.org/People/Raggett">Dave 1538Raggett</a> <<a href="mailto:dsr@w3.org">dsr@w3.org</a>> is 1539an engineer from <a href="http://www.hp.com/">Hewlett 1540Packard</a>'s <a href="http://www.hpl.hp.co.uk">UK 1541Laboratories</a>, and works on assignment to the World Wide Web 1542Consortium, where he is the W3C lead for HTML, XForms and Voice 1543Browsers and Math.</small></p> 1544</body> 1545</html> 1546 1547