1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml"> 4<head> 5<meta name="generator" content="HTML Tidy, see www.w3.org" /> 6<title>HTML TIDY - Release Notes</title> 7<meta name="keywords" 8content="HTML, validation, error correction, pretty-printing" /> 9<meta name="author" content="Dave Raggett <dsr@w3.org>" /> 10<style type="text/css"> 11 body { 12 margin-left: 10%; 13 margin-right: 10%; 14 font-family: sans-serif 15 } 16 h1 { margin-left: -8% } 17 h2,h3,h4,h5,h6 { margin-left: -4% } 18 pre { color: green; font-weight: bold; 19 font-size: 80%; font-family: monospace} 20 em { font-style: italic; font-weight: bold } 21 strong { text-transform: uppercase; font-weight: bold } 22 .note {font-style: italic; color: rgb(192, 101, 101) } 23 //hr {text-align: center; width: 60% } 24 blockquote { 25 color: navy; 26 margin-left: 1%; 27 margin-right: 1%; 28 text-align: center; 29 font-family: "Comic Sans MS", "Times New Roman", serif 30 } 31 table { 32 font-family: sans-serif; 33 font-size: 80%; 34 background: rgb(255,255,153) 35 } 36 td { 37 font-size: 80% 38 } 39 .people {font-family: "Lucida Calligraphy", serif} 40 :link { color: rgb(0, 0, 153) } 41 :visited { color: rgb(153, 0, 153) } 42 :active { color: rgb(255, 0, 102) } 43 a :hover { color: rgb(0, 0, 255) } 44</style> 45 46<style type="text/css"> 47 p.c1 {font-style: italic} 48</style> 49</head> 50<body bgcolor="#FFFFFF" background="grid.gif" text="black" 51link="navy" vlink="black" alink="red"> 52<h1>HTML TIDY - Release Notes</h1> 53 54<p><a href="http://www.w3.org/People/Raggett">Dave Raggett</a> <a 55href="mailto:dsr@w3.org">dsr@w3.org</a></p> 56 57<h4>Public Email List for Tidy: <<a 58href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>></h4> 59 60<p>I have set up an archived mailing list devoted to Tidy. To 61subscribe send an email to html-tidy-request@w3.org with the word 62subscribe in the subject line (include the word unsubscribe if 63you want to unsubscribe). The <a 64href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a> 65for this list is accessible online. Please use this list to 66report errors or enhancement requests.</p> 67 68<h3>Things awaiting further attention</h3> 69 70<p>These have been moved to the <a href="pending.html">pending 71page</a>, which includes all the suggestions for improvements and 72bug fixes. I am looking for volunteers to help with these as my 73current workload means that I don't get much time left to work on 74HTML Tidy.</p> 75 76<h2>August 2000</h2> 77 78<p>Ann Navarro comments that the "appears to" message is 79confusing when it differs from the doctype declaration. Perhaps 80it would make sense to also report the doctype? Tidy will now 81report the FPI when present, and then the apparent version as 82deduced from the elements and attributes present in the rest of 83the document.</p> 84 85<p>John Russell sent in an example which featured a script 86element in a frameset document where the script element appears 87after the head and before the frameset. This is I believe 88illegal, but Tidy proceeds to do the dumb thing discarding the 89frameset element! I think it should move the script element into 90the head and continue. This is now implemented.</p> 91 92<p>Jacques Steyn says that Tidy doesn't know about the HTML4 char 93attribute for col elements. Now fixed.</p> 94 95<p>Carlos Piqueres Ayela would like Tidy to detect all cases of 96repeated attributes, e.g. repeated valign in table cells. This 97was introduced a few releases back, but I forgot to apply this 98check for the elements with special purpose attribute checking 99methods. Now fixed. Tidy will issue a warning for each repeated 100attribute. In principle Tidy could merge repeated class 101attributes, but this will require more work. My apologies to 102Carole Mah for not having the time to do this now.</p> 103 104<p>Henry Zrepa would like an option to suppress whitespace 105munging on selected attributes used for legacy scripts passed as 106parameters to plugins. I have added a new boolean option 107"literal-attributes" which can be set to yes to preserve 108whitespace within attribute values. A better solution would be to 109make this selectable on a per element basis, but I don't have 110time to explore this now.</p> 111 112<p>Edward Zalta spotted that Tidy always removed newlines 113immediately after start tags even for empty elements such as img. 114An exception to this rule is the br element. Now fixed.</p> 115 116<h2>July 2000</h2> 117 118<p>Edward Zalta sent me an example, where Tidy was inadvertently 119wrapping lines after an image element. The problem was a 120conditional in pprint.c, now fixed.</p> 121 122<p>Andy Quick offered a bug fix for the AddClass() function in 123clean.c. My thanks to Terry Teague for bringing this to my 124attention. Davor Golek reported a problem with the -f option. I 125discovered a bug in line 898 in tidy.c, now fixed.</p> 126 127<h2>June 2000</h2> 128 129<p>Fixed bug in NormalizeSpaces (== in place of =) on line 1301699.</p> 131 132<p>I have added a new config option "gnu-emacs" following a 133suggestion by David Biesack. The option changes the way errors 134and warnings are reported to make them easier for Emacs to 135parse.</p> 136 137<p>Tony Leneis noticed that Tidy didn't know that width and 138height attributes on the img element aren't allowed in HTML 2.0. 139He also noted that Tidy didn't know that HTML 2.0 allows img as a 140direct child of body. Both of these bugs are now fixed.</p> 141 142<p>I have refined CanPrune() to block pruning empty elements with 143if they have id or name attributes. Previously any attribute 144would prevent an empty element from being pruned. The rationale 145is that such empty elements are placed there to be filled 146dynamically by a script. This is unlikely to occur unless the 147element can be referenced via id or name.</p> 148 149<p>Denis Barbier sent in details patches that suppresses numerous 150warnings when compiling tidy, especially:</p> 151 152<ul> 153<li>`static' declaration of subroutines when possible</li> 154 155<li>initialization of variables when it might be used before 156assignment</li> 157 158<li>change name of local variables when it overrides global ones 159(count, index, fp)</li> 160 161<li>suppression of long jump, buffers are closed in 162FatalError</li> 163</ul> 164 165<p>Fixed memory leak in CoerceNode. My thanks to Daniel Persson 166for spotting this. Tapio Markula asked if Tidy could give 167improved detection of spurious </ in script elements. Now 168done.</p> 169 170<p>My thanks to John Russell who pointed out that Tidy wasn't 171complaining about src attributes on hr elements. My thanks to 172Johann-Christian Hanke who spotted that Tidy didn't know about 173the Netscape wrap attribute for the text area element.</p> 174 175<p>Sebastian Lange has contributed a perl wrapper for calling 176Tidy from your perl scripts, see <a 177href="sl-tidy.pl">sl-tidy.pl</a>.</p> 178 179<p>Stephen Reynolds would like comments that end with a line 180break to retain this property when tidied. I have added a new 181boolean property to the node structure which is set by the end 182comment parser in lexer.c and acted on by the comment formatting 183code in pprint.c</p> 184 185<p>Henry Zrepa (sp?) reported that XHTML <param\> elements 186were being discarded. This was due to an error in ParseBlock, now 187fixed.</p> 188 189<p>Carole E. Mah noted that Tidy doesn't complain if there are 190two or more title elements. Tidy will now complain if there are 191more than one title element or more than one base element.</p> 192 193<h2>May 2000</h2> 194 195<p>Following a suggestion by Julian Reschke, I have added an 196option to add xml:space="preserve" to elements such as pre, style 197and script when generating XML. This is needed if these elements 198are to be correctly parsed without access to the DTD.</p> 199 200<h2>April 2000</h2> 201 202<p>Randy Wacki notes that IsValidAttribute() wasn't checking that 203the first character in an attribute name is a letter. Now 204fixed.</p> 205 206<p>Jelks Cabaniss wants the naked li style hack made into an 207option or at least tweaked to work in IE and Opera as well as 208Navigator. Sadly, even Navigator 6 preview 1 replicates the buggy 209CSS support for lists found in Navigator 4. Neither Navigator 6 210nor IE5 (win32) supports the CSS marker-offset property, and so 211far I have been unable to find a safe way to replicate the visual 212rendering of naked li elements (ones without an enclosing ul or 213ol element). As a result I have opted for the safer approach of 214adding a class value to the generated ul element 215(class="noindent") to keep track of which li's weren't properly 216enclosed.</p> 217 218<p>Rick Parsons would like to be able to use quote marks around 219file names which include spaces, when specifying files in the 220config file. Currently, this only effects the "error-file" 221option. I have changed that to use ParseString. You can specify 222error files with spaces in their names.</p> 223 224<p>Karen Schlesinger would like tidy to avoid pruning empty span 225elements when these have id attributes, e.g. for use in setting 226the content later via the DOM. Done.</p> 227 228<p>I have modified GetToken() to switch mode from 229IgnoreWhitespace to MixedContent when encountering non-white 230textual content. This solves a problem noticed by Murray 231Longmore, where Tidy was swallowing white space before an end 232tag, when the text is the first child of the body element.</p> 233 234<p>Tidy needs to check for text as direct child of blockquote 235etc. which isn't allowed in HTML 4 strict. This could be 236implemented as a special check which or's in transitional into 237the version vector when appropriate.</p> 238 239<p>ParseBlock now recognizes that text isn't allowed directly in 240the block content model for HTML strict. Furthermore, following a 241suggestion by Berend de Boer, a new option enclose-block-text has 242the same effect as enclose-text but also applies to any block 243element that allows mixed content for HTML transitional but not 244HTML strict.</p> 245 246<p>Jany Quintard noted that Tidy didn't realise the width and 247height attribute aren't allowed on table cells in HTML strict 248(it's fine on HTML transitional). This is now fixed. Nigel 249Wadsworth wanted border on table without a value to be mapped 250into border="1". Tidy already does this but only if the output is 251XHTML.</p> 252 253<p>Jelks Cabaniss wanted Tidy to check that a link to a external 254style sheet includes a type attribute. This is now done. He also 255suggested extending the clean operation to migrate presentation 256attributes on body to style rules. Done.</p> 257 258<h2>March 2000</h2> 259 260<p>I have been working on improving the Word2000 cleanup, but 261have yet to figure out foolproof rules of thumb for recognizing 262when paragraphs should be included as part of ul or ol lists. 263Tidy recognizes the class "MsoListBullet" which Word seems to 264derive from the Word style named "List Bullet". I have yet to 265deal with nested lists in Word2000. This is something I was able 266to deal with for html exported from Word97, but it looks like 267being significantly harder to deal with for Word2000.</p> 268 269<p>Tidy is now able to create a pre element for paragraphs with 270the style "Code". So try to use this style in your Word documents 271for preformatted text. Tidy strips out the p tags and coerces 272non-breaking spaces to regular spaces when assembling the pre 273element's content.</p> 274 275<p>I would very much welcome any suggestions on how to make the 276Word2000 clean up work better!</p> 277 278<p>Changed Style2Rule() in clean.c to check for an existing class 279attribute, and to append the new class after a space. Previously 280you got two class attributes which is an error</p> 281 282<p>Changed default for add-xml-pi to no since this was causing 283serious problems for several browsers.</p> 284 285<p>Joakim Holm notes that tidy crashes on ASP when used for 286attributes. The problem turned out to be caused by 287CheckUniqueAttribute() which was being inappropriate apply to ASP 288nodes.</p> 289 290<p>John Bigby noted that Tidy didn't know about Microsoft's data 291binding feature. I have added the corresponding attributes to the 292table in attr.c and tweaked CanPrune() so that empty elements 293aren't deleted if they have attributes.</p> 294 295<p>Tidy is now more sophistocated about how it treats nested 296<b>'s etc. It will prune redundant tags as needed. One 297difficulty is in knowing whether a start tag is a typo and should 298have been an end-tag or whether it starts a nested element. I 299can't think of a hard and fast rule for this. Tidy will coerce a 300<b> to </b> except when it is directly after a 301preceding <b>.</p> 302 303<p>Bertilo Wennergren noted that Tidy lost <frame/> 304elements. This has now been fixed with a patch to 305ParseFrameSet.</p> 306 307<h2>February 2000</h2> 308 309<p>Dave Bryan spotted an error in pprint.c which allowed some 310attributes to be wrapped even when wrap-attributes was set to no. 311On a separate point, I have now added a check to issue a warning 312if SYSTEM, PUBLIC, //W3C, //DTD or //EN are not in upper 313case.</p> 314 315<p>Tidy now realises that inline content and text is not allowed 316as a direct child of body in HTML strict.</p> 317 318<p>Dave Bryan also noticed that Tidy was preferring HTML 4.0 to 3194.01 when doctype is set to strict or transitional, since the 320entries for 4.0 appeared earlier than those for 4.01 in the table 321named W3C_Version in lexer.c. I have reversed the order of the 322entries to correct this. Dave also spotted that ParseString() in 323config.c is erroneously calling NextProperty() even though it has 324already reached the end of the line.</p> 325 326<h2>January 2000</h2> 327 328<p>I have added a new function ApparentVersion() which takes the 329doctype into account as well as other clues. This is now used to 330report the apparent version of the html in use.</p> 331 332<p>Thanks to the encouragement of Denis Barbier, I finally got 333around to deal with the extra bracketing needed to quiet gcc 334-Wall. This involved the initialization of the tag, attribute and 335entity tables, and miscellaneous side-effecting while and for 336loops.</p> 337 338<p>PPrintXMLTree has been updated so that it only inserts line 339breaks after start tags and before end tags for elements without 340mixed content. This brings Tidy into line with current wisdom for 341XML editors. My thanks to Eric Thorbjornsen for suggesting a fix 342to FindTag that ensures that Tidy doesn't mistreat elements 343looking like html.</p> 344 345<p><table border> is now converted to 346<table border="1"> when converting to XHTML.</p> 347 348<p>I have added support for CDATA marked sections which are 349passed through without change, e.g.</p> 350 351<pre> 352<![CDATA[ .. markup here has no effect .. ]]> 353</pre> 354 355<p>A number of people were interested in Tidied documents be 356marked as such using a meta element. Tidy will now add the 357following to the head if not already present:</p> 358 359<pre> 360<meta name="generator" content="HTML Tidy, see www.w3.org"> 361</pre> 362 363<p>If you don't want this added, set the option tidy-mark to 364no.</p> 365 366<p>In the January 12th release, ParseXMLElement screwed up on 367doctypes and toplevel comments, causing a memory exception. This 368has now been fixed. PPrintXMLTree now uses zero indent for 369comments to avoid progressive indentation as an XML document is 370repeatedly tidied. I have added a blank line after elements 371unless they are the last in the parent's content.</p> 372 373<p>Johnny Lee reports that Tidy didn't realise that HTML4 allows 374the object element in the document head. Now fixed. Rainer 375Gutsche noticed that Tidy wasn't moving an initial space after a 376anchor start tag to just before the element. I have streamlined 377the trimming of spaces.</p> 378 379<p>Johannes Zellner spotted that newly declared preformatted tags 380weren't being treated as such for XML documents. Now fixed.</p> 381 382<h2>December 1999</h2> 383 384<p>Tidy now generates the XHTML namespace and system identifier 385as specified by the current <a 386href="http://www.w3.org/TR/xhtml1/">XHTML Proposed 387Recommendation</a>. In addition it now assumes the latest version 388of HTML4 - HTML 4.01. This fixes an omission in 4.0 by adding the 389name attribute to the img and form elements. This means that 390documents with rollovers and smart forms will now validate!</p> 391 392<p>James Pickering noticed that Tidy was missing off the xhtml- 393prefix for the XHTML DTD file names in the system identifier on 394the doctype. This was a recent change to XHTML. I have fixed 395lexer.c to deal with this.</p> 396 397<p>This release adds support for <a 398href="http://developer.netscape.com/viewsource/schroder_template/schroder_template.html"> 399JSTE</a> psuedo elements looking like: <# #>. Note 400that Tidy can't distinguish between ASP and JSTE for psuedo 401elements looking like: <% %>. Line wrapping of this 402syntax is inhibited by setting either the wrap-asp or wrap-jste 403options to no.</p> 404 405<p>Thanks to Jacek Niedziela, The Win32 executable for tidy is 406now able to example wild cards in filenames. This utilizes the 407setargv library supplied with VC++.</p> 408 409<p>Jonathan Adair asked for the hashtables to be cleared when 410emptied to avoid problems when running Tidy a second time, when 411Tidy is embedded in other code. I have applied this to 412FreeEntities(), FreeAttrTable(), FreeConfig(), and 413FreeTags().</p> 414 415<p>Ian Davey spotted that Tidy wasn't deleting inline emphasis 416elements when these only contained whitespace (other than 417non-breaking spaces). This was due to an oversight in the 418CanPrune() function, now fixed.</p> 419 420<p>Michel Lemay spotted some bugs in if statements and provided 421some sample html files that caused Tidy to crash. On further 422study, I found a bug in the code that moves font elements inside 423anchors. I have fixed this and added a new method to test the 424tree for internal consistency in its bidirectional links: 425CheckNodeIntegrity().</p> 426 427<p>I have also refined the code for handling noframes to make it 428more robust. It will now handle noframes within a body within a 429noframes etc. (something permitted by HTML4). It will also 430recover if the noframes end tag is missing or is in the wrong 431place.</p> 432 433<p>I have fleshed out the table for mapping characters in the 434Windows Western character set into Unicode, see Win2Unicode[]. 435Yahoo was, for example, using the Windows Western character for 436bullet, which is in Unicode is U+2022.</p> 437 438<p>David Halliday noticed that applets without any content 439between the start and end tags were being pruned by Tidy. This is 440a bug and has now been fixed.</p> 441 442<p>I have changed the way Tidy handles empty paragraphs when the 443drop-empty-paras is set to no. HTML4 doesn't allow empty 444paragraphs so I am now replacing them by a pair of br elements, 445so that the formatting is preserved. When drop-empty-paras is set 446to yes, empty paragraphs are simply removed.</p> 447 448<p>Darren Forcier asked for a way to suppress fixing up of 449comments when these include adjacent hyphens since this was 450screwing up Cold Fusion's special comment syntax. The new option 451is called: <i>fix-bad-comments</i> and defaults to yes.</p> 452 453<p>Using Michel's examples I have improved the way the table 454parser deals with unexpected content. This is now consistently 455moved before the table, or to the head element as appropriate. 456Microsoft and Netscape differ in how an unclosed blockquote 457renders when found at the table or tr level. Netscape indents the 458table but Microsoft does not. This is getting too tricky for me 459to deal with!</p> 460 461<p>Using a sample page from Yahoo, I discovered that Netscape 462Navigator doesn't implement the text-align style property on tr 463or table elements. As a result I have added a special check for 464this in BlockStyle() to avoid translating the align attribute on 465tr or table into a style rule.</p> 466 467<p>Richard Allsebrook would like to be able to map b/i to 468strong/em without the full clean process being invoked. I have 469therefore decoupled these two options. Note that setting 470logical-emphasis is also decoupled from drop-font-tags.</p> 471 472<h2>30th November 1999</h2> 473 474<p>This is an interim release to provide a bug fix for a bug 475introduced earlier in the month. I have fixed a bug in the 476emphasis code which looks for start tags Which are most likely 477intended as end tags. This bug only appeared in the November 478release and could cause a crash or indefinite looping. My thanks 479to a respondent calling himself "Michael" who provided a 480collection of files that allowed me to track this down.</p> 481 482<p>I have also added page transition effects for the slide maker 483feature. The effects are currently only visible on IE4 and above, 484and take advantage of the meta element. I will provide an option 485to select between a range of transition effects in the next 486release.</p> 487 488<h2>November 1999</h2> 489 490<p>David Duffy found a case causing Tidy to loop indefinitely. 491The problem occurred when a blocklevel element is found within a 492list item that isn't enclosed in a ul or ol element. I have added 493a check to ParseList to prevent this.</p> 494 495<p>Takuya Asada tells me that in Raw mode Tidy is incorrectly 496mapping 0xA0 to the entity   causing problems for Shift_JIS 497etc. Now fixed. Larry Virden reported a problem with ParseConfig 498when one of the arguments was null. I have added a check for 499this.</p> 500 501<p>Thomas McGuigan notes that Tidy issues a warning for noframes 502elements without a body element. HTML4 is defined so that the 503content of the noframes element is restricted to a single body 504element. However, it also allows you to omit the start and end 505tags for body, something that isn't allowed for XHTML. I have 506changed the code to only issue the warning when generating 507XML.</p> 508 509<p>Added new --version or -v option that reports the release date 510to the error stream. ParseConfig() now returns false if it 511doesn't use the parameter. This avoids the next argument on the 512command line from being swallowed inadvertently, e.g. for unknown 513options. Tidy now warns about unrecognized options.</p> 514 515<p>I have revised the way Tidy deals with comments to avoid 516problems with repeated hyphens. First "--" is illegal in XML, and 517second, the comment syntax for SGML is very error prone when it 518comes to when and where you can use hyphens. As a result, Tidy 519will now replace repeated hyphens with "=" characters. My thanks 520to Yudong Yang and Randy Waki for their input on this.</p> 521 522<p>Emphasis start tags will now be coerced to end tags when the 523corresponding element is already open. For instance 524<u>...<u>. This behavior doesn't apply to font tags 525or start tags with attributes. My thanks to Luis M. Cruz for 526suggesting this idea.</p> 527 528<p>Jonathan Adair would like Tidy to warn when the same attribute 529appears more than once in the same element. This is an error for 530both SGML and XML. The best way to make this check would be to 531sort the attributes and look for duplicate entries. Other people 532have asked for the attributes to be sorted, but I need further 533input on the appropriate sort order. As an interim solution, Tidy 534uses a simple test which generates n+1 warnings if an attribute 535is repeated n times.</p> 536 537<h2>October 1999</h2> 538 539<p>On Unix systems you can get Tidy to look for a config file in 540~/.tidyrc or ~your/.tidyrc etc. when the HTML_TIDY environment 541variable isn't set. To enable this feature don't forget to 542uncomment SUPPORT_GETPWNAM in the platform.h file. This feature 543won't work on Windows. My thanks to Todd Lewis who contributed 544the code.</p> 545 546<p>Darren Forcier reports that Cold Fusion uses the following 547syntax:</p> 548 549<pre> 550<CFIF True IS True> 551 This should always be output 552<CFELSE> 553 This will never output 554</CFIF> 555</pre> 556 557<p>After declaring the CFIF tag in the config file, Tidy was 558screwing up the Cold Fusion expression syntax, mapping 'True' to 559'True=""' etc. My fix was to leave such pseudo attributes 560untouched if they occur on user defined elements.</p> 561 562<p>Jelks Cabaniss noticed that Tidy wasn't adding an id attribute 563to the map element when converting to XHTML. I have added 564routines to do this for both 'a' and 'map'. The value of the id 565attribute is taken from the name attribute.</p> 566 567<p>Larry Cousin noted that Tidy is now screwing up on option 568elements. This proved to be a recently introduced error, which I 569have now fixed. Peter Ruevski forwarded an example that caused 570Tidy to loop endlessly. The problem was caused by an ol start tag 571followed by a b start tag and then an li element. I have solved 572the problem with a fix to ParseBlock.</p> 573 574<p>I have revised the way Tidy deals with unexpected content in 575lists. Tidy now wraps such content in list items with the style 576attribute set to "list-style: none" to suppress list bullets. If 577an li element is found unexpectedly in the body or block-level 578content, it is wrapped into a ul element with the style attribute 579set to "margin-left: -2em". This provides a closer match to the 580observed rendering on current browsers. I use a couple of 581postprocessing steps (List2BQ and BQ2Div) to further clean this 582up to use div elements. My thanks to Thomas Ribbrock for sending 583me a challenging example that led me to this solution.</p> 584 585<p>A number of people have asked for a config option to set the 586alt attribute for images when missing. The alt-text property can 587now be used for this purpose. Please note that YOU are 588responsible for making your documents accessible to people who 589can't view the images!</p> 590 591<p>Terry Teague spotted a bug in ParseConfigFile() that prevented 592Tidy from parsing more that one file. This has been fixed by 593setting the char buffer to zero in the call to InitConfig() 594before parsing. Terry also noted a few places where I had slipped 595back into using malloc and free rather than MemAlloc and MemFree, 596now fixed.</p> 597 598<p>Bjoern Hoehrmann notes that the September 27th release mapped 599empty paragraphs to br elements, which introduces extra 600whitespace in IE and Navigator. The former behavior to strip 601empty paragraphs is as per HTML4 and works fine on most browsers 602with the exception of Lynx. I have reverted to stripping empty 603P's, but have added an option to leave them alone.</p> 604 605<p>Bjoern also drew my attention to a bug in the September 606release where table content is lacking a preceding td or th start 607tag. Tidy moves such content to before the table element to match 608the observed rendering. This is now working as planned. I have 609tweaked the printing behavior when the omit end tags option is 610set. It now omits the </html> as well as the optional start 611tags for html, head and body.</p> 612 613<p>Pao-Hsi Huang had problems with the contents of the option 614element being discarded. I was unable to reproduce this problem, 615but did notice that I unintentionally preserving newlines within 616option text. This is now fixed. Shane Harrelson spotted that 617table cells containing a single font element, when cleaned 618dropped the font element without getting the corresponding style. 619Now fixed via a tweak to InlineStyle().</p> 620 621<p>Andre Hinrichs wanted Tidy to do a better job on font elements 622with relative size changes. This is in fact rather tricky. 623Currently, Tidy uses percentage scaling values for fonts rather 624than the enumeration defined by CSS [xx-small | x-small | small | 625medium | large | x-large | xx-large]. The first problem is to 626match these 7 values onto the 6 define by the font element. The 627next problem is caused by the fact that CSS doesn't provide 628matching relative font size values that you could match to the 629ones defined for the font element. I have done my best using 630percentage values, base on tests with IE and Navigator. If anyone 631can come up with a better approach, please let me know.</p> 632 633<p>Tom Berger reported a problem when quote-marks was set to yes. 634Using his test file everything is now working fine. Several 635people asked for a way to turn off line wrapping. Tidy will now 636interpret zero as meaning disable wrapping. Johannes Zellner 637wants to include some tcl code in his XML markup and asks for a 638way define new tags that behave in the same way as HTML's pre 639element. The new option is new-pre-tags.</p> 640 641<h2>September 1999</h2> 642 643<p>Tidy will now add a type attribute to the style and script 644attributes when this is missing. Tidy examines the language 645attribute to determine what media type to use. I have also added 646code to create an id attribute for anchors when a name attribute 647is present, and to report a warning if id and name don't 648match.</p> 649 650<p>Added support for cleaning up HTML generated by Microsoft Word 6512000 when you save as "Web Page". When you set "word-2000: yes" 652Tidy makes a Herculean effort to clean up the mess created when 653Word 2000 exports to HTML. Word bulks out HTML with presentation 654information that allows it to round-trip documents between HTML 655and Word without lost of information. This makes the HTML hard to 656edit and can cause some very popular browsers to crash! I haven't 657dealt with the VML markup Word uses for line drawings.</p> 658 659<p>Applied fix to InsertNodeAfterElement() to set 660node->next->prev. My thanks to "Advocate" for this. This 661was only encountered when dealing with PRE tags containing 662content illegal for PRE. (Called twice by ParsePre to move 663illegal PRE content to be a later sibling of PRE, then open PRE 664again afterward)</p> 665 666<p>Change to table row parser so that when Tidy comes across an 667empty row, it inserts an empty cell rather than deleting it. This 668is consistent with browser behavior and avoids problems with 669cells that span rows.</p> 670 671<p>Baruch Even sent extensive patches for improved support for 672the PHP preprocessing psuedo tags. You can now use the 'wrap-php: 673no' to suppress line wrapping within PHP instructions. In the 674process of this work, I have created a new function InsertMisc() 675for dealing with comments, processing instructions, ASP and 676PHP.</p> 677 678<p>I have update the table of tags to include additional 679proprietary tags such as server, ilayer, layer, nolayer and 680multicol. Using patches sent in by Edward Avis, Tidy now offers a 681quiet mode which suppresses the initial welcome message and the 682summary report on the number of errors or warnings. Jason 683Tribbeck sent in patches to allow config options normally set in 684the config file to be set on the command line, by preceding them 685with a "--" (no intervening space), for example:</p> 686 687<pre> 688 tidy --break-before-br true --show-warnings false 689</pre> 690 691<p>Kenichi Numata discovered that Tidy looped indefinitely for 692examples similar to the following:</p> 693 694<pre> 695<font size=+2>Title 696<ol> 697</font>Text 698</ol> 699</pre> 700 701<p>I have now cured this problem which used to occur when a 702</font> tag was placed at the beginning of a list element. 703If the example included a list item before the </ol> Tidy 704will now create the following markup:</p> 705 706<pre> 707<font size=+2>Title</font> 708<blockquote>Text </blockquote> 709<ol> 710<li>list item</li> 711</ol> 712</pre> 713 714<p>This uses blockquote to indent the text without the 715bullet/number and switches back to the ol list for the first true 716list item.</p> 717 718<p>I have worked hard to improve support for server side 719preprocessing instructions such as ASP, PHP and Tango. Tidy now 720allows you to replace attribute values by such instructions and 721is able to fix up the case where the instruction appears without 722delimiting quote marks. Tidy supports ASP and PHP in element 723content and also in place of attribute value pairs. Support for 724Tango is limited to attribute values only.</p> 725 726<p>John Love-Jensen contribute a table for mapping the MacRoman 727character set into Unicode. I have added a new charset option 728"mac" to support this. Note the translation is one way and 729doesn't convert back to the Mac codes on output.</p> 730 731<p>Some people place <p> at the end of their list items to 732introduce whitespace before the next item. I have modified 733TrimEmptyElement to coerce empty p elements to br elements to 734reproduce this rendering. If a p start tag is found in dt 735elements, I now coerce the p to a br. Satwinder Mangat has 736alerted me to several such problems. First, text as a direct 737child of dl should be wrapped in a dt and not a dd element. 738Second, unlike other inline tags, browser only close anchors on a 739anchor start or end tag. Actually Navigator and IE differ in how 740they handle this. Try the following example:</p> 741 742<pre> 743<p><b><a href=foo>some text</i> which should be in the label</a></p> 744 745<p>next para and guess what the emphasis will be?</p> 746</pre> 747 748<p>Navigator 4 renders the second paragraph in normal text while 749IE renders it in bold. If you substitute <a> for the 750</i>, once again the browsers differ. IE stops underlining 751at the <a> text while Navigator continues until the 752</a>, although it realizes that you can't click there.</p> 753 754<p>Satwinder continues: browsers happily interpret center within 755a heading. Tidy now moves the center element to be the parent of 756the rest of the heading, splitting it as needed, rather than 757prematurely ending the heading. The same applies to a div element 758within a heading. Satwinder notes that Tidy inserts a ul when an 759li is encountered as a direct child of body.</p> 760 761<p>This is a case where you can't produce a legal HTML file that 762renders the same way as browsers handle this. The same applies to 763a dt or dd element without an enclosing dl element. I can report 764that W3C's HTML working group was unwilling to bless naked li's 765etc. A similar problem arises for dt elements when they contain 766hr, center or div. The specs say this is illegal, but browsers 767render it fine!</p> 768 769<p>I have done my best for hr, splitting the dt as needed and 770enclosing the hr within a dd. The hr doesn't look the same, 771sadly, as it now starts at the left margin for the dd'st rather 772than the left margin for dt's. I wasn't sure how to deal with 773center and div within dt, and chose to discard them.</p> 774 775<p></br> is now mapped to <br> to match observed 776browser rendering. On the same basis, an unmatched </p> is 777mapped to <br><br>. This should improve fidelity of 778tidied files to the original rendering, subject to the 779limitations in the HTML standards described above.</p> 780 781<p>Vlad Harchev spotted that Tidy was swallowing the first and 782last spaces within inline elements when in a pre element. Now 783fixed. Zac Thompson spotted that Tidy didn't know that the tags 784s, strike and u weren't allowed in HTML4 strict. I have now fixed 785this.</p> 786 787<p>Tidy now preserves the last modified time for the files it 788writes back to. This was introduced on the suggestion of 789René Fritz, who uses the SiteCopy utility to upload recently 790modified files to his Web server. By preserving file timestamps 791Tidy can be used on all files in a directory without impacting 792which ones will be uploaded, the next time SiteCopy runs. This is 793implemented using the fstat and futime system calls. If your 794platform doesn't support these calls, set PRESERVEFILETIMES to 0 795in platform.h</p> 796 797<p>I have fixed a bug on lexer.c which screwed up the removal of 798doctype elements. This bug was associated with the symptom of 799printing an indefinite number of doctype elements.</p> 800 801<h2>August 1999</h2> 802 803<p>Added lowsrc and bgproperties attributes to attribute table. 804Rob Clark tells me that bgproperties="fixed" on the body elements 805causes NS and IE to fix the background relative to the window 806rather that the document's content.</p> 807 808<p>Terry Teague kindly drew my attention to several bugs 809discovered by other people: My thanks to Randy Waki for 810discovering a bug when an unexpected inline end-tag is found in a 811ul or ol element. I have added new code to ParseList in parser.c 812to pop the inline stack and discard the end tag. I am checking to 813see whether a similar problem occurs elsewhere. Randy also 814discovered a bug (now fixed) in TrimInitialSpace() in parser.c 815which caused it to fail when the element was the first in the 816content. John Cumming found that comments cause problems in table 817row group elements such as tbody. I have fixed this oversight in 818this release.</p> 819 820<p>Bjoern Hoehrmann tells me that bgsound is only allowed in the 821head and not in the body, according to the Microsoft 822documentation. I have therefore updated the entry in tags.c. The 823slide generation feature caused an exception when the original 824document didn't include a document type declaration. The fix 825involve setting the link to the parent node when creating the 826doctype node.</p> 827 828<h2>26th July 1999</h2> 829 830<p>Jussi Vestman reported a bug in FixDocType in lexer.c which 831caused tidy to corrupt the parse tree, leading to an infinite 832loop. I independently spotted this and fixed it. Justin 833Farnsworth spotted that Tidy wasn't handling XML processing 834instructions which end in ?> rather than just > as 835specified by SGML. I have added a new option: 836assume-xml-procins: yes which when set to yes expects the 837XML style of processing instruction. It defaults to no, but is 838automatically set to yes for XML input. Justin notes that the XML 839PIs are used for a server preprocessor format called PHP, which 840will now be easy to handle with Tidy. Richard Allsebrook's mail 841prompted me to make sure that the contents of processing 842instructions are treated as CDATA so that < and > etc. are 843passed through unescaped.</p> 844 845<p>Bill Sowers asks for Tidy to support another server 846preprocessor format called Tango which features syntax such 847as:</p> 848 849<pre> 850<b><@include <@cgi><appfilepath>includes/message.html></b> 851</pre> 852 853<p>I don't have time to add support for Tango in this release, 854but would be happy if someone else were to mail in appropriate 855changes. Darrell Bircsak reports problems when using DOS on 856Win98. I am using Win95 and have been unable to reproduce the 857problem. Jelks Cabaniss notes that Tidy doesn't support XML 858document type subset declarations. This is a documented 859shortcoming and needs to be fixed in the not too distant future. 860Tidy focuses on HTML, so this hasn't been a priority todate.</p> 861 862<p>Jussi Vestman asks for an optional feature for mapping IP 863addresses to DNS hostnames and back again in URLs. Sadly, I don't 864expect to be able to do this for quite a while. Adding network 865support to Tidy would also allow it to check for bad URLs.</p> 866 867<p>Ryan Youck reports that Tidy's behavior when finding a ul 868element when it expects an li start tag doesn't match Netscape or 869IE. I have confirmed this and have changed the code for parsing 870lists to append misplaced lists to the end of the previous list 871item. If a new list is found in place of the first list item, I 872now place it into a blockquote and move it before the start of 873the current list, so as to preserve the intended rendering.</p> 874 875<p>I have added a new option - enclose-text which encloses any 876text it finds at the body level within p elements. This is very 877useful for curing problems with the margins when applying style 878sheets.</p> 879 880<h2>9th July 1999</h2> 881 882<p>Added bgsound to tags.c. Added '_' to definition of namechars 883to match html4.decl. My thanks to Craig Horman for spotting 884this.</p> 885 886<p>Jelks Cabaniss asked for the clean option to be automatically 887set when the drop-font-tags option is set. Jelks also notes that 888a lot of the authoring tools automatically generate, for example, 889<I> and <B> in place of <em> and <strong> 890(MS FrontPage 98 generated the latter, but FP2000 has reverted to 891the former - with no option to change or set it). Jelks suggested 892adding a general tag substitution mechanism. As a simpler measure 893for now, I have added a new property called logical-emphasis to 894the config file for replacing i by em and b by strong.</p> 895 896<h2>7th July 1999</h2> 897 898<p>Fixed recent bug with escaping ampersands and plugged memory 899leaks following Terry Teagues suggestions. Changed 900IsValidAttrName() in lexer.c to test for namechars to allow - and 901: in names.</p> 902 903<h2>2nd July 1999</h2> 904 905<p>Chami noticed that the definition for the marquee tag was 906wrong. I have fixed the entry in tags.c and Tidy now works fine 907on the example he sent. To support mixing MathML with HTML I have 908added a new config option for declaring empty inline tags 909"new-empty-tags". Philip Riebold noted that single quote marks 910were being silently dropped unless quote marks was set to yes. 911This is an unfortunate bug recently introduced and now fixed.</p> 912 913<p>Paul Smith sent in an example of badly formed tables, where 914paragraph elements occurred in table rows without enclosing table 915cells. Tidy was handling this by inserting a table cell. After 916comparison with Netscape and IE, I have revised the code for 917parsing table rows to move unexpected content to just before the 918table.</p> 919 920<h2>26th June 1999</h2> 921 922<p>Tony Leneis reports that Tidy incorrectly thinks the table 923frame attribute is a transitional feature. Now fixed. Chami 924reported a bug in ParseIndent in config.c and that onsumbit is 925missing from the table of attributes. Both now fixed. Carsten 926Allefeld reports that Tidy doesn't know that the valign attribute 927was introduced in HTML 3.2 and is ok in HTML 4.0 strict, 928necessitating a trivial change to attrs.c.</p> 929 930<p>Axel Kielhorn notes that Tidy wasn't checking the preamble for 931the DOCTYPE tag matches either "html PUBLIC" or "html SYSTEM". 932Bill Homer spotted changes needed for Tidy to compile with SGI 933MIPSpro C++. All of Bill's changes have been incorporated, except 934for the include file "unistd.h" (for the unlink call) which isn't 935available on win32. To include this define NEEDS_UNISTD_H</p> 936 937<p>Bjoern Hoehrmann asked for information on how to use the 938result returned by Tidy when it exits. I have included a example 939using Perl that Bjoern sent in. Bodo Eing reported that Tidy gave 940misleading warning when title text is emphasized. It now reports 941a missing </title> before any unexpected markup.</p> 942 943<p>Bruce Aron says that many WYSIWYG HTML editors place a font 944element around an hypertext link enclosing the anchor element 945rather that its contents. Unfortunately, the anchor element then 946overrides the color change specified by the font element! I have 947added an extra rule to ParseInline to move the font element 948inside an anchor when the anchor is the only child of the font 949element. Note CSS is a better long term solution, and Tidy can be 950used to replace font elements by style rules using the clean 951option.</p> 952 953<p>Carsten Allefeld reported that valign on table cells caused 954Tidy to mislabel content as HTML 4.0 transitional rather than 955strict. Now fixed. A number of people said they expected the 956quote-mark option to apply to all text and not just to attribute 957values. I have obliged and changed the option accordingly.</p> 958 959<p>Some people have wondered why "</" causes an error when 960present within scripts. The reason is that this substring is not 961permitted by the SGML and XML standards. Tidy now fixes this by 962inserting a backslash, changing the substring to "<\/". Note 963this is only done for JavaScript and not for other scripting 964languages.</p> 965 966<p>Chami reported that onsubmit wasn't recognized by Tidy - now 967fixed. Chris Nappin drew my attention to the fact that script 968string literals in attributes weren't being wrapped correctly 969when QuoteMarks was set to no. Now fixed. Christian Zuckschwerdt 970asked for support for the POSIX long options format e.g. --help. 971I have modified tidy.c to support this for all the long options. 972I have kept support for -help and -clean etc.</p> 973 974<p>Craig Horman sent in a routine for checking attribute names 975don't contain invalid characters, such as commas. I have used 976this to avoid spurious attribute/value pairs when a quotemark is 977misplaced. Darren Forcier is interested in wrapping Tidy up as a 978Win32 DLL. Darren asked for Tidy to release its memory resources 979for the various tables on exit. Now done, see DeInitTidy() in 980tidy.c</p> 981 982<p>Darren also asks about the config file mechanism for declaring 983additional tags, e.g. <b>new-blocklevel-tags: cfoutput, 984cfquery</b> for use with Cold Fusion. You can add inline and 985blocklevel elements but as yet you can't add empty elements 986(similar to br or hr) or to change the content model for the 987table, ul, ol and dl elements. Note that the indent option 988applies to new elements in the same way as it does for built-in 989elements. Tidy will accept the following:</p> 990 991<pre> 992<cfquery name="MyQuery" datasource="Customer"> 993 select CustomerName from foo where x > 1 994</cfquery> 995 996<cfoutput query="MyQuery"> 997 <table> 998 <tr> 999 <td>#CustomerName#</TD> 1000 </tr> 1001 </table> 1002</cfoutput> 1003</pre> 1004 1005<p>but the next example <b>won't</b> since you can't as yet 1006modify the content model for the table element:</p> 1007 1008<pre> 1009<cfquery name="MyQuery" datasource="Customer"> 1010 select CustomerName from foo where x > 1 1011</cfquery> 1012 1013<table> 1014 <cfoutput query="MyQuery"> 1015 <tr> 1016 <td>#CustomerName#</TD> 1017 </tr> 1018 </cfoutput> 1019</table> 1020</pre> 1021 1022<p>I have been studying richer ways to support modular extensions 1023to html using assertions and a generalization of regular 1024expressions to trees. This work has led a tool for generating 1025DTDs named <b>dtdgen</b> and I am in the process of creating a 1026further tool for verification. More information is available in 1027my note on <a 1028href="http://www.w3.org/People/Raggett/dtdgen/Docs">Assertion 1029Grammars</a>. Please contact me if you are interested in helping 1030with this work.</p> 1031 1032<p>David Fallon is interested in using Tidy to dynamically repair 1033markup in an HTML editor as people type. My recommendation is to 1034take advantage of the tables in tags.c and attrs.c for this, and 1035to defer to application of the full range of heuristics to such a 1036time as saving to disk or when explicitly requested. The CM_OPT 1037property in the tags table indicates that the end tag is 1038optional, while CM_EMPTY indicates that an element is 1039<i>empty</i>, i.e. has no content.</p> 1040 1041<p>Betsy Miller reports: <i>I tried printing the HTML Tidy page 1042for a class I am teaching tomorrow on HTML, and everything in the 1043"green" style (all of the examples) print in the smallest font I 1044have ever seen (in fact they look like tiny little horizontal 1045lines). Any explanation?</i>.</p> 1046 1047<p>Yes. This is a problem with Internet Explorer and Style 1048Sheets. The Tidy page includes a CSS style sheet that tries to 1049make the size of the font used for the examples 80% smaller than 1050for normal text. Internet Explorer gets this wrong, picking a 1051very much smaller font. I am hoping this bug is fixed in the IE 10525.0 release. I have changed the style sheet to work around 1053this.</p> 1054 1055<p>Francisco Guardiola writes that Tidy wasn't fixing frameset 1056documents with body elements unenclosed in noframes elements. Now 1057fixed. Frederik Fouvry found that comments after the html end tag 1058generated a warning for content after body. I can't reproduce 1059this symptom and assume it was fixed in an earlier release.</p> 1060 1061<p>Indrek Toom wants to know how to format tables so that tr 1062elements indent their content, but td tags do not. The solution 1063is to use <i>indent: auto</i>. Jelks Cabaniss noted that the 1064clean option created style rules with tag names in uppercase, 1065which would cause problems for Extensible HTML (xhtml). This 1066prompted me to overhaul Tidy to switch to lower case for that tag 1067tables and literals. I have adopted Jelks' suggestion for adding 1068support for a doctype property in config files. This supports 1069<em>omit, auto, strict, loose</em> or a string specifying the fpi 1070(formal public identifier).</p> 1071 1072<p>Johannes Koch notes that Tidy doesn't fix up the doctype 1073correctly when bursting to slides. He says that if a document 1074contains the HTML 4.0 strict DT declaration, then the slides also 1075include the same strict DT declaration, but also contain the 1076center tag which does not appear in the strict DTD. I have 1077applied a simple work around, which is to remove the original 1078doctype when bursting to slides.</p> 1079 1080<p>I have extended the support for the ASP preprocessing syntax 1081to cope with the use of ASP within tags for attributes. I have 1082also added a new option <tt>wrap-asp</tt> to the config file 1083support to allow you to turn off wrapping within ASP code. Thanks 1084to Ken Cox for this idea.</p> 1085 1086<p>Larry Virden asked for a compile-time option for setting the 1087config file, he says "The reason it would be useful is to be able 1088to define a set of commonly used additional tags. For instance, 1089our site is starting to use a lot of ColdFusion. I would love to 1090be able to put the CF tags into a site wide file so that users of 1091tidy automatically get them defined". You can now do this by 1092defining CONFIG_FILE in platform.h</p> 1093 1094<p>Loïc Trégan asks: Is there a way to generate a 1095"light" xml, with no "<!DOCTYPE...>" and "xlmns=..."? I 1096have tweaked the code to allow the doctype property to apply when 1097outputting XML, and added a new property "add-xml-pi" to control 1098whether an <?xml?> processing instruction is added or not. 1099To generate a minimal XML document, you can set the xml-out 1100property to yes, the doctype and add-xml-pi property to no.</p> 1101 1102<p>Marc Jauvin has been using Windows Application to generate Web 1103pages and found that some of them generate very "non-portable" 1104HTML. One of the problems that is often introduced is the use of 1105"\" in URLs instead of "/" which confuses Unix Web servers. To 1106deal with this I have introduced the "fix-backslash" property. 1107This has been set by default to yes, but can be set to no if that 1108causes problems.</p> 1109 1110<p>The new property <tt>indent-attributes</tt> when set to yes 1111places each attribute on a new line. Note that the attributes are 1112only indented one space. Paul Ossenbruggen asked for something 1113slightly different, where the second and subsequent attributes 1114start on a new line and are indented to line up under the first 1115attribute. That proved to involve rather more work to implement 1116than I have time for right now. I plan to work some more on this 1117for a future release.</p> 1118 1119<p>Peter Jeremy reported that when an error file is specified to 1120tidy (-f file), the error file is opened for every HTML file 1121specified on the command line, but not closed until all HTML 1122files have been processed. If a large number of files are 1123specified on the command line (e.g. processing the FreeBSD 1124handbook), this can overflow the process or system file 1125descriptor table. I have now fixed this so that the error file is 1126only opened once.</p> 1127 1128<p>Rafi Stern notes: I have entered output-xml: yes in my config 1129file, not output-xhtml. Tidy second guesses me and adds the xmlns 1130attribute for XHTML at the head of my file, which I then have to 1131remove as this interferes with my XSLT parser. Fixed along with 1132the other bugs reported by Rafi.</p> 1133 1134<p>Steffen Ullrich and Andy Quick both spotted a problem with 1135attribute values consisting of an empty string, e.g. 1136<tt>alt=""</tt>. This was caused by bugs in tidy.c and in 1137lexer.c, both now fixed. Jussi Vestman noted Tidy had problems 1138with hr elements within headings. This appears to be an old bug 1139that came back to life! Now fixed. Jussi also asked for a config 1140file option for fixing URLs where non-conforming tools have used 1141backslash instead of forward slash.</p> 1142 1143<p>An example from Thomas Wolff allowed me to the idea of 1144inserting the appropriate container elements for naked list items 1145when these appear in block level elements. At the same time I 1146have fixed a bug in the table code to infer implicit table rows 1147for text occurring within row group elements such as thead and 1148tbody. An example sent in by Steve Lee allowed me to pin point an 1149endless loop when a head or body element is unexpectedly found in 1150a table cell.</p> 1151 1152<h2>15th April 1999</h2> 1153 1154<p>Another minor release. Jacob Sparre Andersen reports a bug 1155with &quot; in attribute values. Now fixed. Francisco 1156Guardiola reports problems when a body element follows the 1157frameset end tag. I have fixed this with a patch to ParseHTML, 1158ParseNoFrames and ParseFrameset in parser.c Chris Nappin wrote in 1159with the suggestion for a config file option for enabling 1160wrapping script attributes within embedded string literals. You 1161can now do this using "wrap-script-strings: yes".</p> 1162 1163<h2>14th April 1999</h2> 1164 1165<p>Added check for Asp tags on line 2674 in parser.c so that Asp 1166tags are not forcibly moved inside an HTML element. My thanks to 1167Stuart Updegrave for this. Fixed problem with & entities. 1168Bede McCall spotted that &amp; was being written out as 1169&amp;amp;. The fix alters ParseEntity() in lexer.c</p> 1170 1171<h2>12th April 1999</h2> 1172 1173<p>Added a missing "else" on line 241 in config.c (thanks for 1174Keith Blakemore-Noble for spotting this). Added config.c and .o 1175to the Makefile (an oversight in the release on the 8th 1176April).</p> 1177 1178<h2>8th April 1999</h2> 1179 1180<h4>Localization:</h4> 1181 1182<p>All the message text is now defined in localize.c which should 1183make it a tad easier to localize Tidy for different 1184languages.</p> 1185 1186<h4>Config file support:</h4> 1187 1188<p>I have added support for configuring tidy via a configuration 1189file. The new code is in config.h which provides a table driven 1190parser for RFC822 style headers. The new command line option 1191-config <filename> can be used to identify the config file. 1192The environment variable "HTML_TIDY" may be used to name the 1193config file. If defined, it is parsed before scanning the command 1194line. You are advised to use an absolute path for the variable to 1195avoid problems when running tidy in different directories.</p> 1196 1197<h4>Allan Kuchinsky:</h4> 1198 1199<p>Reports that the XML DOM parser by Eduard Derksen screws up on 1200 , naked & and % in URLs as well as having problems with 1201newlines after the '=' before attribute values.</p> 1202 1203<p>I have tweaked PrintChar when generating XML to output   1204in place of &nbsp; and &amp; in place of &. In 1205general XHTML when parsed as well-formed XML shouldn't use named 1206entities other than those defined in XML 1.0. Note that this 1207isn't a problem if the parser uses the XHTML DTDs which import 1208the entity definitions.</p> 1209 1210<h4>Allan Odgaard:</h4> 1211 1212<p>When tidy encounter entities without a terminating semi-colon 1213(e.g. "©") then it correctly outputs "©", but it 1214doesn't report an error.</p> 1215 1216<p>I have added a ReportEntityError procedure to localize.c and 1217updated ParseEntity to call this for missing semicolons and 1218unknown entities.</p> 1219 1220<h4>Andreas Buchholz:</h4> 1221 1222<p>Tidy warns if table element is missing. This is incorrect for 1223HTML 3.2 which doesn't define this attribute.</p> 1224 1225<p>The summary attribute was introduced in HTML 4.0 as an aid for 1226accessibility. I have modified CheckTABLE to suppress the warning 1227when the document type explicitly designates the document as 1228being HTML 2.0 or HTML 3.2.</p> 1229 1230<h4>Andy Brown:</h4> 1231 1232<p>I have renamed the field from class to tag_class as "class" is 1233a reserved word in C++ with the goal of allowing tidy to be 1234compiled as C++ e.g. when part of a larger program.</p> 1235 1236<p>I have switched to Bool and the values yes and no to avoid 1237problems with detecting which compilers define bool and those 1238that don't.</p> 1239 1240<p>Andy would prefer a return code or C++ exception rather than 1241an exit. I have removed the calls to exit from pprint.c and used 1242a long jump from FatalError() back to main() followed by 1243returning 2. It should be easy to adapt this to generate a C++ 1244exception.</p> 1245 1246<p>Sometimes the prev links are inconsistent with next links. I 1247have fixed some tree operations which might have caused this. Let 1248me know if any inconsistencies remain.</p> 1249 1250<h4>Ann Navarro:</h4> 1251 1252<p>Would like to be able to use:</p> 1253 1254<pre> 1255 tidy file.html | more 1256</pre> 1257 1258<p>to pause the screen output, and/or full output passing to file 1259as with</p> 1260 1261<pre> 1262 tidy file.html > output.txt 1263</pre> 1264 1265<p>Tidy writes markup to stdout and errors to stderr. 'More' only 1266works for stdout so that the errors fly by. My compromise is to 1267write errors to stdout when the markup is suppressed using the 1268command line option -e or "markup: no" in the config file.</p> 1269 1270<h4>html-kit@chamisplace.com</h4> 1271 1272<p>Writes asking for a single output routine for Tidy. Acting on 1273his suggestion, I have added a new routine tidy_out() which 1274should make it easier to embed HTML Tidy in a GUI application 1275such as HTML-Kit. The new routine is in localize.c. All input 1276takes place via ReadCharFromStream() in tidy.c, excepting command 1277line arguments and the new config file mechanism.</p> 1278 1279<p>Chami also asks for single routines for initializing and 1280de-initializing Tidy, something that happens often from the GUI 1281environment of HTML-Kit. I have added InitTidy() and DeInitTidy() 1282in tidy.c to try to satisfy this need. Chami now supports an 1283online interface for Tidy at the URL:</p> 1284 1285<pre> 1286 <a 1287href="http://www.chamisplace.com/asp/hk.asp">http://www.chamisplace.com/asp/hk.asp</a> 1288</pre> 1289 1290<p>He further asks for Tidy to optionally output a length 1291parameter whenever possible. This could represent the length of 1292the element, attribute or code block related to the error. An 1293online validator could then highlight the starting and ending 1294columns which may be easier for beginners to understand, rather 1295than pointing to a single character column. I will investigate 1296this for a future release.</p> 1297 1298<h4>Chang Hyun Baek:</h4> 1299 1300<p>Reports a problem when generating XML using -iso2022. Tidy 1301inserts ?/p< rather than </p>. I tried Chang's test file 1302but it worked fine with in all the right places. Please let me 1303know if this problem persists.</p> 1304 1305<h4>Christian Ruetgers:</h4> 1306 1307<p>When using -indent option Tidy emits a newline before which 1308alters the layout of some tables.</p> 1309 1310<p>I note that browsers aren't conforming to the SGML spec on 1311generally ignoring a newline immediately after start tags and 1312immediately before end tags. Netscape does this for pre elements 1313but not for other tags! My work around is to avoid additional 1314newlines for the content of th and td elements, except where 1315their content starts with a block level element. This kind of 1316thing is getting really hairy!</p> 1317 1318<h4>Christian Pantel:</h4> 1319 1320<p>Would like the servlet tag added to tidy. This looks very 1321similar to applet and used for preprocessing document content 1322before delivery. Servlet acts as a container for param elements 1323and fallback content to be shown if the server doesn't support 1324servlet. I have added it as a proprietary tag and parse it in the 1325same way as applet.</p> 1326 1327<p>Christian also reports that <td><hr/></td> 1328caused Tidy to discard the <hr/> element. I have fixed the 1329associated bug in ParseBlock.</p> 1330 1331<h4>Chuck Baslock:</h4> 1332 1333<p>Points out that an isolated & is converted to & in 1334element content and in attribute values. This is in fact correct 1335and in agreement with the recommendations for HTML 2.0 1336onwards.</p> 1337 1338<h4>Craig Horman:</h4> 1339 1340<p>Reports that Tidy loops indefinitely if a naked LI is found in 1341a table cell. I have patched ParseBlock to fix this, and now 1342successfully deal with naked list items appearing in table cells, 1343clothing them in a ul.</p> 1344 1345<h4>Craig Johnson:</h4> 1346 1347<p>Reports that Tidy gets confused by </comment> before the 1348doctype. This is apparently inserted by some authoring tool or 1349other. I have patched Tidy to safely recover from the 1350unrecognized and unexpected end tag without moving the parse 1351state into the head or body.</p> 1352 1353<h4>Daniel Vogelheim:</h4> 1354 1355<p>Asks for Tidy to recognize obsolete elements such as LISTING 1356and to replace them by more modern equivalents, in this case pre. 1357I have added code to issue a warning and replace such elements as 1358xmp, listing, plaintext by pre, and dir and menu by ul. Daniel 1359also asks for a means to suppressing warnings, i.e. to only 1360report errors. I have added the boolean "show-warnings" to the 1361config file support to deal with this and split off warnings to 1362ReportWarnings().</p> 1363 1364<h4>Dan Rudman:</h4> 1365 1366<p>Would love a version of Tidy written in Java. This is a big 1367job. I am working on a completely new implementation of Tidy, 1368this time using an object-oriented approach but I don't expect to 1369have this done until later this year. <b>DEFERRED</b></p> 1370 1371<h4>David Brooke:</h4> 1372 1373<p>Reports that when tidying an XMLfile with characters above 127 1374Tidy is outputting the numeric entity followed by the character. 1375I have fixed this by a patch to PPrintChar() for XmlTags.</p> 1376 1377<h4>David Getchell:</h4> 1378 1379<p>Reports that Tidy thinks an ol list is HTML 4.0 when you use 1380the type attribute. I have fixed an error in attrs.c to correct 1381this feature to first appearing in HTML 3.2.</p> 1382 1383<h4>Drew Adams:</h4> 1384 1385<p>Reported problems when using comments to hide the contents of 1386script elements from ancient browsers. I wasn't able to reproduce 1387the problem, and guess I fixed it earlier.</p> 1388 1389<p>Drew also reported a problem which on further investigation is 1390caused by the very weird syntax for comments in SGML and XML. The 1391syntax for comments is really error prone:</p> 1392 1393<pre> 1394 <!--[text excluding --]--[[whitespace]*--[text excluding --]--]*> 1395</pre> 1396 1397<p>This means that <!----> is a complete comment but 1398<!------> is not since the parser is expecting a matching 1399terminating -- and as it doesn't find the -- it ploughs on and on 1400treating the rest of the markup as a comment unless it finds 1401another end comment. I have added a rule of thumb (a heuristic) 1402for detecting this situation. Basically I count the number of 1403comment groups without other characters and if the count is > 14042 and a '>' is seen, a warning is generated.</p> 1405 1406<p>Drew goes on to comment on the -clean option. This made me 1407take another look at the relative font sizes I am using for the 1408absolute font sizes for 0 through 6. I have tweaked them to get a 1409reasonable match before/after applying -clean as viewed on NS4 1410and IE4. Font size=3 is taken as the normal body font size and as 1411such the font element is silently dropped unless it also defines 1412a color.</p> 1413 1414<p>I have also added InlineStyle to deal with the cases where an 1415inline element has as its only child a font element. A further 1416possibility would be to promote style properties common to all 1417children of an element to the element. I will have to leave this 1418for future work.</p> 1419 1420<p>Drew asks why </ is not allowed in script content. The 1421answer is that SGML treats </ as delimiting the end of CDATA 1422element content, so that it ends prematurely before the 1423</script> end tag. Browsers tend not to follow the SGML 1424standard in this respect, but Tidy is designed to help you do 1425so.</p> 1426 1427<h4>Guus Goos:</h4> 1428 1429<p>Notes that tidy *.html doesn't work under DOS. This is because 1430DOS unlike Unix doesn't expand names with wildcards to the list 1431of matching file names. This is a right nuisance and one more 1432reason why Linux is gaining popularity. I plan to provide a work 1433around in a future release of Tidy. Are there any free drop-in 1434replacements for the DOS shell that fix this problem?</p> 1435 1436<h4>Jack Horsfield:</h4> 1437 1438<p>Like a number of others would like list items and table cells 1439to be output compactly where possible. I have added a flag to 1440avoid indentation of content to tags.c that avoids further 1441indentation when the content is inline, e.g.</p> 1442 1443<pre> 1444 <ul> 1445 <li>some text</li> 1446 <li> 1447 <p> 1448 a new paragraph 1449 </p> 1450 </li> 1451 </ul> 1452</pre> 1453 1454<p>This behavior is enabled via "smart-indent: yes" and overrides 1455"indent: no". Use "indent-spaces: 5" to set the number of spaces 1456used for each level of indentation.</p> 1457 1458<h4>Jeff Young:</h4> 1459 1460<p>Has a few suggestions that will make Tidy work with XSL. 1461Thanks, I have incorporated all of them into the new release.</p> 1462 1463<h4>Jelks Cabaniss:</h4> 1464 1465<p>Reports that the Tidy thinks the end tag is missing if the 1466script element has no content. I have patched ParseScript to fix 1467this. Jelks also asks for a way to ask Tidy to hide the contents 1468of script and style elements; a way to avoid promoting inline 1469styles with -clean to style rules as a work around for a bug in 1470IE for URLs with relative URLs; finally, a way to avoid empty 1471elements being discarded, especially if they define an ID for 1472scripting. Very reasonable, but I would prefer leave these to a 1473future release. (This release is big enough right now!).</p> 1474 1475<p>One thing I can satisfy right away is a mailing list for Tidy. 1476html-tidy@w3.org has been created for discussing Tidy and I have 1477placed the details for subscribing and accessing the Web archive 1478on the Tidy overview page.</p> 1479 1480<h4>Johannes Koch:</h4> 1481 1482<p>Reports that Tidy isn't quite right about when it reports the 1483doctype as inconsistent or not. I have tweaked HTMLVersion() to 1484fix this. Let me know if any further problems arise.</p> 1485 1486<h4>John Tobler:</h4> 1487 1488<p>Wants to know how to get Tidy to preserve his explicit 1489entities e.g. " and  . Currently Tidy interprets all 1490entities as character values and as such has no way to 1491distinguish whether these were derived from entities or not. To 1492help John with this release you can use "quote-marks: yes" in the 1493config file if you want all " marks to appear as " and 1494"quote-nbsp: yes" if you want non-breaking spaces to be shown as 1495entities. Note that for XML in general   is not-predeclared, 1496so you should also use "numeric-entities: yes". This doesn't 1497apply to XHTML though.</p> 1498 1499<p>John also reports that the weirdly complex URLs using the 1500javascript: scheme as used by www.bookmarklets.com can cause Tidy 1501indigestion. I have made Tidy aware of which attributes are using 1502Javascript and disabled the missing quote mark heuristic for 1503these. I have also tweaked the way unknown entities are reported 1504to say that the markup have contain unescaped ampersands.</p> 1505 1506<h4>Mathew Cepl:</h4> 1507 1508<p>Notes that dir and menu are deprecated and not allowed in 1509HTML4 strict. I have updated the entry in the tags table for 1510these two. I also now coerce them automatically to ul when -clean 1511is set.</p> 1512 1513<h4>Maurice Buxton:</h4> 1514 1515<p>Reports that some implementations of gcc don't work with the 1516current compiler directive Tidy uses to avoid duplicate typedefs 1517for uint and ulong. I don't have a truly platform independent 1518solution for this, so you may need to edit platform.h if the code 1519doesn't compile out of the box on your platform.</p> 1520 1521<h4>Osma Ahvenlampi:</h4> 1522 1523<p>Found that Tidy is confused by map elements in the head. Tidy 1524knows that map is only allowed in the body and thinks the author 1525has left out the</p> 1526 1527<p>start tag. Thereafter elements which it knows only belong in 1528the head are moved to the head, so things should work out ok. 1529Osma also reports having difficulties with non-breaking spaces, 1530but I was unable to reproduce these with the new release of Tidy, 1531so perhaps the problems have been fixed.</p> 1532 1533<h4>Paul Ward:</h4> 1534 1535<p>Reports that Tidy caused JavaScript errors when it introduced 1536linebreaks in JavaScript attributes. Tidy goes to some efforts to 1537avoid this and I am interested in any reports of further problems 1538with the new release.</p> 1539 1540<h4>Rafi Stern:</h4> 1541 1542<p>Would like Tidy to warn when a tag has an extra quote mark, as 1543in <a href="xxxxxx"">. I have patched ParseAttribute to do 1544this.</p> 1545 1546<h4>Rene Fritz:</h4> 1547 1548<p>Reported a space being inserted at the end of lines when a the 1549text is wrapped at the start of hypertext links. This isn't 1550occurring with this release, so I guess the problem was solved a 1551while back. Rene also suggests that Tidy could be used to add and 1552remove metadata and attributes etc. for a group of files, e.g. to 1553add a link to a style sheet or to assert attribution. This sounds 1554like a good idea for work in the future.</p> 1555 1556<h4>Shane McCarron:</h4> 1557 1558<p>Reports that Tidy sometimes wraps text within markup that 1559occurs in the context of a pre element. I am only able to repeat 1560this when the markup wraps within start tags, e.g. between 1561attribute values. This is perfectly legitimate and doesn't effect 1562rendering.</p> 1563 1564<h4>Steven Lobo:</h4> 1565 1566<p>Notes that Tidy doesn't remove entities such as &nbsp; or 1567&copy; which aren't defined by XML 1.0. That is true - these 1568entities <b>are</b> fine if you are using XHTML. If you want to 1569generate generic XML then you need to use the -n option or to set 1570"numeric-entities: yes" in the config file. This will then output 1571all such entities in their numeric form or as direct character 1572values according to the character encoding flags.</p> 1573 1574<h4>Steven Pemberton:</h4> 1575 1576<p>Comments that he would like Tidy to replace naked & in 1577URLs by &. You can now use "quote-ampersands: yes" in the 1578config file to ensure this. Note that this is always done when 1579outputting to XML where naked '&' characters are illegal.</p> 1580 1581<p>Steven also asks for a way to allow Tidy to proceed after 1582finding unknown elements. The issue is how to parse them, e.g. to 1583treat them as inline or block level elements? The latter would 1584terminate the current paragraph whereas the former would not.</p> 1585 1586<p>If treated as inline, presumably, unknown tags should be 1587treated specially, for instance, normal inline end tags close the 1588currently open inline element, but this doesn't feel right for 1589unknown tags. What should the content model for unknown tags be - 1590flow? Again its far from obvious. One way to avoid these 1591difficulties would be to provide a means for authors to declare 1592unknown tags in the config file.</p> 1593 1594<p>You can now declare new inline and block-level tags in the 1595config file, e.g.:</p> 1596 1597<pre> 1598define-inline-tags: foo, bar 1599define-blocklevel-tags: blob 1600</pre> 1601 1602<p>The content model for new tags allows for block or inline 1603content. Steven further comments that some authors use ul without 1604an li to indent content. Tidy currently coerces these to wrap the 1605content within an li which alters the rendering. He suggests 1606using blockquote instead. I have done this, and if you use the 1607-clean option at the same time, it gets replaced by a div element 1608with a class and style rule for indenting the content.</p> 1609 1610<h4>Stuart Updegrave:</h4> 1611 1612<p>Would like to be able to coerce attributes to uppercase. I 1613have added support for "uppercase-attributes: yes" for this. 1614Stuart also asks for Tidy to support Microsoft's ASP tags. These 1615are part of Microsoft's server-side scripting model (similar to 1616CGI). I have treated ASP tags in the same way as processing 1617instructions, and they don't effect the version of HTML as they 1618are assumed to have been interpreted before delivery to the 1619client.</p> 1620 1621<p>Stuart is also interested in having Tidy reading from and 1622writing back to the Windows clipboard. This sounds interesting 1623but I have to leave this to a future release.</p> 1624 1625<h4>Terry Cassidy:</h4> 1626 1627<p>Points out that Tidy doesn't like "top" or "bottom" for the 1628align attribute on the caption element. I have added a new 1629routine to check the align attribute for the caption element and 1630cleaned up the code for checking the document type.</p> 1631 1632<h4>Xavier Plantefeve:</h4> 1633 1634<p>Suggests that I should ensure that the options are self 1635consistent, e.g. if -asxml is set, then this should imply lower 1636case and override any instruction to omit optional end tags. 1637Accordingly, I have introduced a new routine AdjustConfig() that 1638is applied after reading the command line and config files and 1639before tidying any files.</p> 1640 1641<p>Xavier wonders whether name attributes should be replaced or 1642supplemented by id attributes when translating HTML anchors to 1643XHTML. This is something I am thinking about for a future release 1644along with supplementing lang attributes by xml:lang 1645attributes.</p> 1646 1647<h4>Zdenek Kabelac:</h4> 1648 1649<p>Asks for headings and paragraphs to be treated specially when 1650other tags are indented. I have dealt with this via the new 1651smart-indent mechanism.</p> 1652 1653<h2>22nd February 1999</h2> 1654 1655<p>Tidy can now fix up XML empty tags for which the attribute 1656values are unquoted, e.g. <br clear=all/>. Care is taken to 1657avoid this being applied to tags with URLs, e.g. <a 1658href=http://acme.com/> where the / is part of the attribute 1659value and doesn't signify an empty tag. Authors are advised to 1660always quote attribute values to avoid such problems!</p> 1661 1662<h2>22nd January 1999</h2> 1663 1664<p>Tidy no longer complains about a missing </tr> before a 1665<tbody>. Added link to a free <a 1666href="http://www.chami.com/free/html-kit/">win32 GUI for 1667tidy</a>.</p> 1668 1669<h2>11th January 1999</h2> 1670 1671<p>Added a link to the OS/2 distribution of Tidy made available 1672by Kaz SHiMZ. No changes to Tidy's source code.</p> 1673 1674<h2>7th January 1999</h2> 1675 1676<p>Fixed bug in ParseBlock that resulted in nested table 1677cells.</p> 1678 1679<p>Fixed clean.c to add the style property "text-align:" rather 1680than "align:".</p> 1681 1682<p>Disabled line wrapping within HTML alt, content and value 1683attribute values. Wrapping will still occur when output as 1684XML.</p> 1685 1686<h2>16th December 1998</h2> 1687 1688<p>This release fixes a problem with missing quotemarks in 1689attribute values introduced in the December 14th release. It also 1690fixes problems with parsing tables when the table cells include 1691naked list items and when unexpected end tags are encountered for 1692td and tr cells. Warnings are now generated for unknown entities 1693(those not defined by HTML 4.0). It may be worth thinking about a 1694new option to determine how to handle these, especially for 1695XML.</p> 1696 1697<h2>14th December 1998</h2> 1698 1699<p>Rewrote parser for elements with CDATA content to fix problems 1700with tags in script content.</p> 1701 1702<p>New pretty printer for XML mode. I have also modified the XML 1703parser to recognize xml:space attributes appropriately. I have 1704yet to add support for CDATA marked sections though.</p> 1705 1706<p>script and noscript are now allowed in inline content.</p> 1707 1708<p>To make it easier to drive tidy from scripts, it now returns 2 1709if any errors are found, 1 if any warnings are found, otherwise 1710it returns 0. Note tidy doesn't generate the cleaned up markup if 1711it finds errors other than warnings.</p> 1712 1713<p>Fixed bug causing the column to be reported incorrectly when 1714there are inline tags early on the same line.</p> 1715 1716<p>Added -numeric option to force character entities to be 1717written as numeric rather than as named character entities. 1718Hexadecimal character entities are never generated since Netscape 17194 doesn't support them.</p> 1720 1721<p>Entities which aren't part of HTML 4.0 are now passed through 1722unchanged, e.g. &precompiler-entity; This means that an 1723isolated & will be pass through unchanged since there is no 1724way to distinguish this from an unknown entity.</p> 1725 1726<p>Tidy now detects malformed comments, where something other 1727than whitespace or '--' is found when '>' is expected at the 1728end of a comment.</p> 1729 1730<p>The <br> tags are now positioned at the start of a blank 1731line to make their presence easier to spot.</p> 1732 1733<p>The -asxml mode now inserts the appropriate Voyager html 1734namespace on the html element and strips the doctype. The html 1735namespace will be usable for rigorous validation as soon as W3C 1736finishes work on formalizing the definition of document profiles, 1737see: <a 1738href="http://www.w3.org/TR/WD-html-in-xml/">WD-html-in-xml</a>.</p> 1739 1740<h2>13th November 1998 and earlier releases</h2> 1741 1742<p>Fixed bug wherein <style type=text/css> was written 1743out as <style type="text/ss">.</p> 1744 1745<p>Tidy now handles wrapping of attributes containing JavaScript 1746text strings, inserting the line continuation marker as needed, 1747for instance:</p> 1748 1749<pre> 1750onmouseover="window.status='Mission Statement, \ 1751Our goals and why they matter.'; return true" 1752</pre> 1753 1754<p>You can now set the wrap margin with the -wrap option.</p> 1755 1756<p>When the output is XML, tidy now ensures the content starts 1757with <?xml version="1.0"?>.</p> 1758 1759<p>The Document type for HTML 2.0 is now "-//IETF//DTD HTML 17602.0//". In previous versions of tidy, it was incorrectly set to 1761"-//W3C//DTD HTML 2.0//".</p> 1762 1763<p>When using the -clean option isolated FONT elements are now 1764mapped to SPAN elements. Previously these FONT elements were 1765simply dropped.</p> 1766 1767<p>NOFRAMES now works fine with BODY element in frameset 1768documents.</p> 1769</body> 1770</html> 1771 1772