HTML::Clean(3pm) User Contributed Perl Documentation HTML::Clean(3pm)
NAME
HTML::Clean - Cleans up HTML code for web browsers, not humans
SYNOPSIS
use HTML::Clean;
$h = new HTML::Clean($filename); # or..
$h = new HTML::Clean($htmlcode);
$h->compat();
$h->strip();
$data = $h->data();
print $$data;
DESCRIPTION
The HTML::Clean module encapsulates a number of common techniques for minimizing the size
of HTML files. You can typically save between 10% and 50% of the size of a HTML file
using these methods. It provides the following features:
Remove unneeded whitespace (begining of line, etc)
Remove unneeded META elements.
Remove HTML comments (except for styles, javascript and SSI)
Replace tags with equivilant shorter tags (<strong> --> <b>)
etc.
The entire proces is configurable, so you can pick and choose what you want to clean.
THE HTML::Clean CLASS
$h = new HTML::Clean($dataorfile, [$level]);
This creates a new HTML::Clean object. A Prerequisite for all other functions in this
module.
The $dataorfile parameter supplies the input HTML, either a filename, or a reference
to a scalar value holding the HTML, for example:
$h = new HTML::Clean("/htdocs/index.html");
$html = "<strong>Hello!</strong>";
$h = new HTML::Clean(\$html);
An optional 'level' parameter controls the level of optimization performed. Levels
range from 1 to 9. Level 1 includes only simple fast optimizations. Level 9 includes
all optimizations.
$h->initialize($dataorfile)
This function allows you to reinitialize the HTML data used by the current object.
This is useful if you are processing many files.
$dataorfile has the same usage as the new method.
Return 0 for an error, 1 for success.
$h->level([$level])
Get/set the optimization level. $level is a number from 1 to 9.
$myref = $h->data()
Returns the current HTML data as a scalar reference.
strip(\%options);
Removes excess space from HTML
You can control the optimizations used by specifying them in the %options hash refer-
ence.
The following options are recognized:
boolean values (0 or 1 values)
whitespace Remove excess whitespace
shortertags <strong> -> <b>, etc..
blink No blink tags.
contenttype Remove default contenttype.
comments Remove excess comments.
entities " -> ", etc.
dequote remove quotes from tag parameters where possible.
defcolor recode colors in shorter form. (#ffffff -> white, etc.)
javascript remove excess spaces and newlines in javascript code.
htmldefaults remove default values for some html tags
lowercasetags translate all HTML tags to lowercase
parameterized values
meta Takes a space separated list of meta tags to remove,
default "GENERATOR FORMATTER"
emptytags Takes a space separated list of tags to remove when there is no
content between the start and end tag, like this: <b></b>.
The default is 'b i font center'
Please note that if your HTML includes preformatted regions (this means, if it
includes <pre>...</pre>, we do not suggest removing whitespace, as it will alter the
rendered defaults.
HTML::Clean will print out a warning if it finds a preformatted region and is
requested to strip whitespace. In order to prevent this, specify that you don't want
to strip whitespace - i.e.
$h->strip( {whitespace => 0} );
compat()
This function improves the cross-platform compatibility of your HTML. Currently
checks for the following problems:
Insuring all IMG tags have ALT elements.
Use of Arial, Futura, or Verdana as a font face.
Positioning the <TITLE> tag immediately after the <head> tag.
defrontpage();
This function converts pages created with Microsoft Frontpage to something a Unix
server will understand a bit better. This function currently does the following:
Converts Frontpage 'hit counters' into a unix specific format.
Removes some frontpage specific html comments
SEE ALSO
Modules
FrontPage::Web, FrontPage::File
Web Sites
Distribution Site - http://people.itu.int/~lindner/
AUTHORS
Paul Lindner for the International Telecommunication Union (ITU)
COPYRIGHT
The HTML::Strip module is Copyright (c) 1998,99 by the ITU, Geneva Switzerland. All
rights reserved.
You may distribute under the terms of either the GNU General Public License or the Artis-
tic License, as specified in the Perl README file.
perl v5.8.8 2008-03-07 HTML::Clean(3pm)
Generated by $Id: phpMan.php,v 4.49 2006/02/26 13:18:18 chedong Exp $ Author: Che Dong
On Apache
Under GNU General Public License
2012-05-24 08:30 @38.107.179.238 Crawled by CCBot/1.0 (+http://www.commoncrawl.org/bot.html)