webreaper

webreaper can download a web page and its links.
Download

webreaper Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • brian d foy
  • Publisher web site:
  • http://search.cpan.org/~bdfoy/

webreaper Tags


webreaper Description

webreaper can download a web page and its links. SYNOPSIS webreaper URLThe webreaper program downloads web sites. It creates a directory, named after the host of the URL given on the command line, in the current working directory, and will optionally create a tarball of it.Getting around web site misfeaturesThis script has many features to make it look like a normal, interaction web browser. You can set values for some features, or use the defaults, enumerated later.Set the user-agent string with the -a switch. Some web sites refuse to work with certain browsers because they want you to use Internet Explorer. While webreaper is not subject to javascript checks (except for ones that try to redirect you), some servers try that behind-the-scenes.Set the referer string. Some sites limit what you can see based on how they think you got to the address (i.e. they want you to click on a certain link). The script automatically sets the referer strings for links it finds in web pages, but you can set the referer for the first link (the one you specify on the command line) with the -r switch.Basic browser featuresFor websites that use a login and password, use the -u and -p switches. This feature is still a bit broken because it sends the authorization string for every address.Script featuresWatch the action by turning on verbose messages with the -v switch. If you run this script from another script, cron, or some other automated method, you probably want no output, so do not use -v. You can also set the WEBREAPER_VERBOSE environment variable.To get even more output, use the -d switch to turn on debugging output. You can also set the WEBREAPER_DEBUG varaible.You can create a single file of everything that you download by creating an archive with the -t switch, which creates a tarball.The script limits its traversal to URLs below the starting URL. This may change in the future.Command line switches-a USER_AGENTset the user agent string-elist of file extensions to store (not yet implemented)-Elist of file extensions to skip (not yet implemented)-dturn on debugging output-D DIRECTORYuse this directory for downloads-fstore all files in the same directory (flat)-h HOST1allowed hosts, comma separated.-n NUMBERstop after requesting NUMBER resources, whether or not webreaper stored them-N NUMBERstop after storing NUMBER resources-r REFERER_URLreferer for the first URL-p PASSWORDpassword for basic auth-s SECONDSsleep between requests-tcreate tar archive-u USERNAMEusername for basic auth-vverbose ouput-zcreate a zip archive Requirements: · Perl


webreaper Related Software