Web::Scraper

Web Scraping Toolkit using HTML and CSS Selectors or XPath expressions
Download

Web::Scraper Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Tatsuhiko Miyagawa
  • Publisher web site:
  • http://search.cpan.org/~miyagawa/

Web::Scraper Tags


Web::Scraper Description

Web Scraping Toolkit using HTML and CSS Selectors or XPath expressions Web::Scraper is a web scraper toolkit, inspired by Ruby's equivalent Scrapi. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged Perl data strcuture.The scraper and process blocks provide a method to define what segments of a document to extract. It understands CSS and HTML Selectors as well as XPath expressions.SYNOPSIS use URI; use Web::Scraper; # First, create your scraper block my $tweets = scraper { # Parse all LIs with the class "status", store them into a resulting # array 'tweets'. We embed another scraper for each tweet. process "li.status", "tweets[]" => scraper { # And, in that array, pull in the elementy with the class # "entry-content", "entry-date" and the link process ".entry-content", body => 'TEXT'; process ".entry-date", when => 'TEXT'; process 'a', link => '@href'; }; }; my $res = $tweets->scrape( URI->new("http://twitter.com/miyagawa") ); # The result has the populated tweets array for my $tweet (@{$res->{tweets}}) { print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n"; } Requirements: · Perl


Web::Scraper Related Software