[Zope] HTML parsers and Wget like function

Bakhtiar A Hamid kedai at kedai.com.my
Thu Jul 1 11:36:39 EDT 2004


On Thu, 01 Jul 2004 20:02:02 +0900, Grant Morganryuuguu wrote
> I am considering Zope/python for a project and would like to get 
> some pointers to see if this is a reasonable fit. I need to get a 
> URL from the web, parse the HTML ,extract some data from the page, 
> rewrite the <a href> tags and display it on the website. I found the 
> HTML parser in library 
> http://www.python.org/doc/current/lib/markup.html and 
> http://www.crummy.com/software/BeautifulSoup/ (which is down now but 
> was up a couple of days ago) does anyone have any other suggestions 
> for manipulating HTML in Zope/python. For getting the the page from 
> a URL is there something like Wget (unix program) in Zope for this - 
> I searched around the manual but did not see anything.
> 

there's KebasData (http://www.zope.org/Members/kedai/KebasData) 

it can scrape pages, parse for what ever, but the regex may be a bit of a 
head spinner.  so a regex tool would help (kde has one, there's one for bash, 
iirc, etc) 

rewriting url can be done in the render_method.  a bit tricky, since the 
original can change anytime

it's not great code, but works for me. 

cookies are not there yet.  so is using python own socket.timeoutsocket().  
kebasdata was written a while back, whne there was no timeout support in 
python core; so i used timeoutsocket to ..er.. timeout .. :P

soon, methinks 

> Thanks,
> Grant
> _______________________________________________
> Zope maillist  -  Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - 
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )


--
NSTP (M) BHD



More information about the Zope mailing list