UnixReview: HTML Tidy tool

Warren Togami warren at togami.com
Wed Sep 19 00:31:44 PDT 2001


http://unixreview.com/articles/2001/0109/0109e/0109e.htm

(excerp from the article)

One thing I love about the UNIX philosophy is the idea that each program
should do one job and do it really well. There are zillions of small tools
for UNIX-type OSes that make life much easier and are hugely useful, but
they don't necessarily get written about. They certainly don't receive the
same kind of coverage that Apache and Sendmail receive. One of my favorites,
HTML Tidy, is a tool for HTML/Web development that I think will interest a
lot of folks. HTML Tidy cleans up HTML produced by WYSIWYG editors and such.

Keep It Tidy

Webmasters are often asked to look after HTML documents that were produced
by other folks using WYSIWYG editors or word processors like Microsoft Word.
Although the use of such editors and word processors simplifies production
for many people, the HTML produced by these programs is usually pretty ugly.
It might be acceptable in a browser, but it's downright unpleasant to
maintain.

In my case, I use DocBook for a lot of documents and SGML&Tools to render
HTML from DocBook markup and then tweak it slightly. Although the HTML looks
just fine in a browser, it's kind of ugly when you open it up in Vim.
Running the output through HTML Tidy makes it easier to read and leaves you
with "valid" 4.01 HTML.

Here's a short before and after example of HTML Tidy's handiwork:

      Before
<HTML
><HEAD
><TITLE
>DissociatedPress.net</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.49"><LINK
REL="NEXT"
TITLE="Projects"
HREF="x6.html"></HEAD
><BODY
CLASS="ARTICLE"
><DIV
CLASS="ARTICLE"
><DIV
CLASS="TITLEPAGE"
><H1
CLASS="TITLE"
><A
NAME="AEN1"
>DissociatedPress.net</A
></H1
><HR></DIV
><P
><B
>DissociatedPress.net</B
></P
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="AEN3"
>Site Guide</A
></H1
>...</BODY
></HTML
>



      After
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<title>DissociatedPress.net</title>
<meta name="GENERATOR" content=
"Modular DocBook HTML Stylesheet Version 1.49">
<link rel="NEXT" title="Projects" href="x6.html">
<style type="text/css">
 hr.c2 {text-align: left}
 p.c1 {font-weight: bold}
</style>
</head>
<body class="ARTICLE">
<div class="ARTICLE">
<div class="TITLEPAGE">
<h1 class="TITLE"><a name="AEN1">DissociatedPress.net</a></h1>
...
</body>
</html>



Cleaning up after WYSIWYG programs and conversion tools isn't the only thing
HTML Tidy is good for, of course. Tidy will also validate your files and
correct them, if necessary. You can also tell Tidy to simply check for
errors without making any corrections.



More information about the LUAU mailing list