XML::RSS::Parser(3pm) User Contributed Perl Documentation XML::RSS::Parser(3pm)
NAME
XML::RSS::Parser - A liberal object-oriented parser for RSS feeds.
SYNOPSIS
#!/usr/bin/perl -w
use strict;
use XML::RSS::Parser;
use FileHandle;
my $p = XML::RSS::Parser->new;
my $fh = FileHandle->new('/path/to/some/rss/file');
my $feed = $p->parse_file($fh);
# output some values
my $feed_title = $feed->query('/channel/title');
print $feed_title->text_content;
my $count = $feed->item_count;
print " ($count)\n";
foreach my $i ( $feed->query('//item') ) {
my $node = $i->query('title');
print ' '.$node->text_content;
print "\n";
}
DESCRIPTION
XML::RSS::Parser is a lightweight liberal parser of RSS feeds. This parser is "liberal" in
that it does not demand compliance of a specific RSS version and will attempt to
gracefully handle tags it does not expect or understand. The parser's only requirements
is that the file is well-formed XML and remotely resembles RSS. Roughly speaking, well
formed XML with a "channel" element as a direct sibling or the root tag and "item"
elements etc.
There are a number of advantages to using this module then just using a standard parser-
tree combination. There are a number of different RSS formats in use today. In very subtle
ways these formats are not entirely compatible from one to another. XML::RSS::Parser makes
a couple assumptions to "normalize" the parse tree into a more consistent form. For
instance, it forces "channel" and "item" into a parent-child relationship. For more detail
see "SPECIAL PROCESSING NOTES".
This module is leaner then XML::RSS -- the majority of code was for generating RSS files.
It also provides a XPath-esque interface to the feed's tree.
While XML::RSS::Parser creates a normalized parse tree, it still leaves the mapping of
overlapping and alternate tags common in the RSS format space to the developer. For this
look at the XML::RAI (RSS Abstraction Interface) package which provides an object-oriented
layer to XML::RSS::Parser trees that transparently maps these various tags to one common
interface.
XML::RSS::Parser is based on XML::Elemental, a a SAX-based package for easily parsing XML
documents into a more native and mostly object-oriented perl form.
SPECIAL PROCESSING NOTES
There are a number of different RSS formats in use today. In very subtle ways these
formats are not entirely compatible from one to another. What's worse is that there are
unlabeled versions within the standard in addition to tags with overlapping purposes and
vague definitions. (See Mark Pilgrim's "The myth of RSS compatibility"
"/diveintomark.org/archives/2004/02/04/incompatible- rss" in http: for just a sampling of
what I mean.) To ease working with RSS data in different formats, the parser does not
create the feed's parse tree verbatim. Instead it makes a few assumptions to "normalize"
the parse tree into a more consistent form.
With the refactoring of this module and the switch to a true tree structure, the
normalization process has been simplified. Some of the version 2x proved to be problematic
with more advanced and complex feeds.
o The RSS namespace (if any) is extracted from the first sibling of the root tag. We
don't use the root tag because in RSS 1.0 the root tag is in the RDF namespace and not
RSS. That namespace is treated as the '#default' (no prefix) namespace for the parse
tree.
o The parser will not include the root tags of "rss" or "RDF" in the tree. Namespace
declaration information is still extracted.
o The parser forces "channel" and "item" into a parent-child relationship. In versions
0.9 and 1.0, "channel" and "item" tags are siblings.
Two significant changes were made with the release of version 4.0.
XML::RSS::Parser is not a subclass of XML::Elemental.
This change should be transparent in most cases, but deemed necessary for the error
handling and special handling of RSS data.
XML::RSS::Parser uses Clarkian Notation for element and attribute names.
This change is inherited from recent changes in XML::Elemental. The previous system
was flawed and not widely adopted. Clarkian notation is the form used by XML::SAX and
XML::Simple to name a few. Use the "process_name" in XML::Elemental::Util to parse
element and attribute names intoo their namespace URI and local name parts.
NAMESPACE PREFIXES
The following prefix and namespace combinations are recognized by default. Use
"register_ns_prefix" to add more as needed.
admin http://webns.net/mvcb/
ag http://purl.org/rss/1.0/modules/aggregation/
annotate http://purl.org/rss/1.0/modules/annotate/
atom http://www.w3.org/2005/Atom
audio http://media.tangent.org/rss/1.0/
cc http://web.resource.org/cc/
company http://purl.org/rss/1.0/modules/company
content http://purl.org/rss/1.0/modules/content/
cp http://my.theinfo.org/changed/1.0/rss/
dc http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
email http://purl.org/rss/1.0/modules/email/
ev http://purl.org/rss/1.0/modules/event/
feedburner http://rssnamespace.org/feedburner/ext/1.0
foaf http://xmlns.com/foaf/0.1/
image http://purl.org/rss/1.0/modules/image/
itunes http://www.itunes.com/DTDs/Podcast-1.0.dtd
l http://purl.org/rss/1.0/modules/link/
openSearch http://a9.com/-/spec/opensearchrss/1.0/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
ref http://purl.org/rss/1.0/modules/reference/
reqv http://purl.org/rss/1.0/modules/richequiv/
rss091 http://purl.org/rss/1.0/modules/rss091#
search http://purl.org/rss/1.0/modules/search/
slash http://purl.org/rss/1.0/modules/slash/
ss http://purl.org/rss/1.0/modules/servicestatus/
str http://hacks.benhammersley.com/rss/streaming/
sub http://purl.org/rss/1.0/modules/subscription/
sy http://purl.org/rss/1.0/modules/syndication/
tapi http://api.technorati.com/dtd/tapi-001.xml#
taxo http://purl.org/rss/1.0/modules/taxonomy/
thr http://purl.org/rss/1.0/modules/threading/
trackback http://madskills.com/public/xml/rss/module/trackback/
wiki http://purl.org/rss/1.0/modules/wiki/
xhtml http://www.w3.org/1999/xhtml
xml http://www.w3.org/XML/1998/namespace/
creativeCommons http://backend.userland.com/creativeCommonsRssModule
METHODS
The following objects and methods are provided in this package.
XML::RSS::Parser->new
Constructor. Returns a reference to a new XML::RSS::Parser object.
$parser->parse =item $parser->parse_file =item $parser->parse_string =item
$parser->parse_uri
These methods are mostly pass-thru to the underlying SAX parser provided by
XML::Elemental. (See XML::SAX::Base for more.)
XML::RSS::Parser wraps these calls in eval statements and rather then dying returns
undefined. Any parsing errors can be retreived by using the "errstr" method inherited
from Class::ErrorHandler.
Once the markup has been parsed it is automatically passed through the "rss_normalize"
method before the parse tree is returned to the caller.
XML::RSS::Parser->register_ns_prefix(prefix,curi)
Registers the given path with namespace URI for XPath lookups. Both parameters are
required.
XML::RSS::Parser->ns_qualify(element, namespace_uri)
An simple utility implemented as an abstract method that will return a fully namespace
qualified string for the supplied element. Return values are now in Clarkian notation.
XML::RSS::Parser->prefix(namespace_uri)
Returns the prefix to the given namespace URI. Returns "undef" if the prefix is not
known.
XML::RSS::Parser->namespace(prefix)
Returns the namespace URI to the given prefix. Returns "undef" if the namespace is not
registered.
error
Sets an error message for later retreival and returns "undef". Inherited from
Class::ErrorHandler.
errstr
Returns the last error message set by "error". Inherited from Class:ErrorHandler.
DEPENDENCIES
XML::SAX, XML::Elemental, Class::ErrorHandler, Class::XPath 1.4*
Versions up to 1.4 have a design flaw that would cause it to choke on feeds with the /
character in an attribute value. For example the Yahoo! feeds.
SEE ALSO
XML::RAI
The Feed Validator <http://www.feedvalidator.org/>
What is RSS? <http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html>
Raising the Bar on RSS Feed Quality "/www.oreillynet.com/pub/a/webservices/2002/11/19/
rssfeedquality.html" in http:
The myth of RSS compatibility "/diveintomark.org/archives/2004/02/04/incompatible- rss" in
http:
AUTHOR & COPYRIGHT
Except where otherwise noted, XML::RSS::Parser is Copyright 2003-2005, Timothy Appnel,
cpan AT timaoutloud.org. All rights reserved.
POD ERRORS
Hey! The above document had some coding errors, which are explained below:
Around line 127:
=begin without a target?
Around line 310:
'=item' outside of any '=over'
Around line 364:
You forgot a '=back' before '=head1'
Around line 390:
=back without =over
Around line 400:
'=end' without a target?
perl v5.10.0 2005-11-18 XML::RSS::Parser(3pm)
Generated by $Id: phpMan.php,v 4.49 2006/02/26 13:18:18 chedong Exp $ Author: Che Dong
On Apache
Under GNU General Public License
2012-05-25 11:12 @38.107.179.238 Crawled by CCBot/1.0 (+http://www.commoncrawl.org/bot.html)