RSS-parser On PHP
RSS (really simple syndication) has been developed by firm netscape and represents expansion xml created specially for registration of news lines.
For today the format has gone through already 2-¯ edition and is the standard standard for a marking of news.
Example simple rss the document:
<? xml version = " 1.0" encoding = "windows-1251"?>
<! doctype rss public " - // netscape communications // dtd rss 0.91 // en "
" http://my.netscape.com/publish/formats/rss-0.91.dtd
<rss version = " 0.91">
<channel>
<title> none </title>
<description> filling of emptiness </description>
<link> http: // riscom.com / ~ none </link>
</channel>
<image>
<title> none </title>
<url> http: // riscom.com / ~ none/favicon.gif </url>
<link> http: // riscom.com / ~ none </link>
<width> 16 </width>
<height> 16 </height>
</image>
<item>
<title> rss - tasty{delicious} news </title>
<link> http: // riscom.com / ~ none / </link>
<description> That such rss and with what it eat </description>
</item>
</rss>
The structure is evident enough and clear.
Two general{common} blocks (channel and image), used to the whole document and the block (i) item, containing news.
The block channel defines{determines} a source of news:
title - a name of a site;
description - the description;
link - the address
The block image - graphic display of a site:
title - the name;
link - the address of a picture;
width, height - width and height.
Blocks item can be as much as necessary and in them news are described:
title - heading;
link - the address of the news;
description - the description.
Everything, that is above a tag rss called as heading of the document and is applied to any xml to the document, certainly with corresponding updatings.
Now, after we have learned to create rss the document, let's think, that to us with all this goods to do{make}.
The first, and the easiest idea, certainly, anything with it to not do{make} it, simply to create procedurinu automatic generation rss from any published news and to forget about him. Say, start up those for whom it is necessary, parsjat it .
But, suppose, that we have some resource on which we want to publish news all of a known site www.ionpopescu.md, and mister popescu in any does not want to give their (news) in what or the other format except for as in rss.
What to us remains?
Correctly! We shall be parsit`.
Here, besides, there are two outputs{exits}: the first - to use all known parserami xml such as sablotron and to not take itself(himself) for a ride, the second - to fool.
The second variant has and more one justification, present that you use any free-of-charge hosting, and khoster well in any does not want to establish at itself sablotron or his analogues.
----------------------------------------------------
And a script:
<? php
function startelement ($parser, $name, $attrs) {
global $tag, $rss;
if ($name == ' rss')
$rss = ' ^rss ';
elseif ($name == ' rdf:rdf ')
$rss = ' ^rdf:rdf ';
$tag. = ' ^ '. $name;
}
function endelement ($parser, $name) {
global $tag;
global $itemcount, $items;
if ($name == ' item ') {
$itemcount ++;
if (! isset ($items [$itemcount])) $items [$itemcount] = array (' title ' => ", ' link ' => ", ' desc ' => ", ' pubdate ' => ");
}
$tag = substr ($tag, 0, strrpos ($tag, ' ^ '));
}
function characterdata ($parser, $data) {
global $tag, $chantitle, $chanlink, $chandesc, $rss, $imgtitle, $imglink, $imgurl;
global $items, $itemcount;
$rsschannel = ";
if ($data) {
if ($tag == $rss. ' ^channel^title ') {
$chantitle. = $data;
} elseif ($tag == $rss. ' ^channel^link ') {
$chanlink. = $data;
} elseif ($tag == $rss. ' ^channel^description ') {
$chandesc. = $data;
}
if ($rss == ' ^rss') $rsschannel = '^channel';
if ($tag == $rss. $rsschannel. ' ^item^title ') {
$items [$itemcount] [' title ']. = $data;
} elseif ($tag == $rss. $rsschannel. ' ^item^link ') {
$items [$itemcount] [' link ']. = $data;
} elseif ($tag == $rss. $rsschannel. ' ^item^description ') {
$items [$itemcount] [' desc ']. = $data;
} elseif ($tag == $rss. $rsschannel. ' ^item^pubdate ') {
$items [$itemcount] [' pubdate ']. = $data;
} elseif ($tag == $rss. $rsschannel. ' ^image^title ') {
$imgtitle. = $data;
} elseif ($tag == $rss. $rsschannel. ' ^image^link ') {
$imglink. = $data;
} elseif ($tag == $rss. $rsschannel. ' ^image^url ') {
$imgurl. = $data;
}
}
}
function parserss ($url) {
global $tag, $chantitle, $chanlink, $chandesc, $rss, $items, $itemcount, $imgtitle, $imglink, $imgurl;
$chantitle = ";
$chanlink = ";
$chandesc = ";
$imgtitle = ";
$imglink = ";
$imgurl = ";
$tag = ";
$rss = ";
global $items, $itemcount;
$itemcount = 0;
$items = array (0 => array (' title ' => ", ' link ' => ", ' desc ' => ", ' pubdate ' => "));
$xml_parser = xml_parser_create ();
xml_set_element_handler ($xml_parser, "startelement", "endelement");
xml_set_character_data_handler ($xml_parser, "characterdata");
$fp = fopen ($url, "r");
$data = " ";
while (true) {
$datas = fread ($fp, 4096);
if (strlen ($datas) == 0) {
break;
}
$data. = $datas;
}
@fclose ($fp);
if ($data! = ") {
$xmlresult = xml_parse ($xml_parser, $data);
$xmlerror = xml_error_string (xml_get_error_code ($xml_parser));
$xmlcrtline = xml_get_current_line_number ($xml_parser);
if ($xmlresult)
displaydata ();
else
print (" error parsing this feed! <br/> error: $xmlerror, at line: $xmlcrtline ");
} else {
print (" error while retriving feed $url ");
}
xml_parser_free ($xml_parser);
}
function displaydata () {
global $chantitle, $chanlink, $chandesc, $rss, $items, $itemcount, $imgtitle, $imglink, $imgurl;
global $items, $itemcount;
?>
<html> <head> <title> <? = $chantitle?> </title> </head>
<body>
<div>
<a href = " <? = $chanlink?> "> <img src = " <? = $imgurl?> " alt = " <? = $imgtitle?> " border = "0"/> </a>
<h1> <? = $chantitle?> </h1>
<h3> <? = $chandesc?> </h3>
</div>
<hr/>
<? php
for ($i = 0; $i <count ($items)-1; $i ++) {
echo " <h4> ". $items [$i] [' title ']. " </h4> ";
echo " <h5> ". $items [$i] [' pubdate ']. " </h5> ";
echo " <a href = ' ". $ items [$i] ['link.'] " '> ". $items [$i] [' desc ']. " </a> ";
}
?>
</body> </html>
<? php}
$url = " http: // xmlhack.ru/index.rdf ";
parserss ($url);
?>

|