robot-detection

Library for detecting if a HTTP User Agent header is likely to be a bot
Download

robot-detection Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL v3
  • Price:
  • FREE
  • Publisher Name:
  • Rory McCann
  • Publisher web site:
  • http://technomancy.org

robot-detection Tags


robot-detection Description

robot_detection is a Python module to detect if a given HTTP User Agent is a web crawler. It uses the list of registered robots from http://www.robotstxt.org: Robots Database.UsageThere is only one, function, is_robot that takes a string (unicode or not) and returns True iff that string matches a known robot in the robotstxt.org robot databaseExample >>> import robot_detection >>> robot_detection.is_robot(user_agent_string)UpdatingYou can download a new version of the Robot Database from this link.Download the database dump, and run the file robot_detection.py with the file as first argument.wget http://www.robotstxt.org/db/all.txt $ python robot_detection.py all.txtIf the database has changed, it'll print out the new version of robot_useragents variable that you need to put into the source code.TestsSome simple unittests are included. Running the tests.py file will run the tests.Product's homepage


robot-detection Related Software