Wayback Machine

Wayback Machine is an open source java implementation of the The Internet Archive Wayback Machine.
Download

Wayback Machine Ranking & Summary

Advertisement

  • Rating:
  • License:
  • MPL
  • Price:
  • FREE
  • Publisher Name:
  • Jeff Kaplan
  • Publisher web site:
  • http://www.archive.org

Wayback Machine Tags


Wayback Machine Description

Wayback Machine is an open source java implementation of the The Internet Archive Wayback Machine. Wayback Machine is an open source java implementation of the The Internet Archive Wayback Machine.The current production version of the Wayback Machine is implemented in perl, and lacks in maintainability and extensibility. Also, the code is not open source. Primary motivation for the new version is to address these three issues, enabling public distribution of the application, and easy experimentation with new features and access technologies. The current Java version of the Wayback Machine supports two access, or replay modes of operation: "Archival Url" mode and "Proxy" mode. Archival URL mode provides a user experience very close to the current production Wayback Machine. All query and replay access requests can be expressed as URLs. In Archival Url replay mode, HTML documents are delivered with additional Javascript embedded in the page. This Javascript alters the document within the browser, attempting to make links and embedded content refer back to the Wayback Machine by rewriting them as Archival URLs. Proxy URL mode allows replaying of archived documents within a client browser by configuring the browser to proxy all HTTP requests through the Wayback Machine. This has the strong advantage that no Javascript page markup is required to coerce the client browser to request additional URLs and embedded content from the Wayback Machine -- content just works as-is. One major disadvantage of this mode is that there is no way to forward temporal information with each replay request. Because of this limitation, only the most recently archived version of any resource is accessible thru the Wayback Machine in proxy Url mode.Another limitation of the Proxy URL mode is that it requires special configuration of the client web browser to access the Wayback Service. This browser configuration is not complex, but it means that content cannot be accessed as a global URL. See the User Manual to learn more about access modes. The current Java version is intended to operate as a standalone webapp, maintaining an index on the machine hosting the webapp. This index contains records of the resources within a set of ARC files, which are also assumed to be stored on the same machine hosting the webapp. This software includes the capability to scan for ARC files in a specified location, and to automatically index and serve content in newly discovered ARC files as they appear. Directing the Wayback Machine to look for ARC files in the directory where an instance of the Heritrix web crawler is writing ARC output should provide the capability to browse content archived by Heritrix as it is crawled. Future versions of this software may integrate more tightly with the Heritrix web crawler application.What's New in This Release:· A sorted CDX flat file ResourceIndex implementation was added, allowing for much larger data sets.· Support for ArchivalUrl Date-Range requests was added.· Character set detection was improved so pages are not mangled when server side modification occurs.· Several new command-line tools were added for generating and updating each ResourceIndex type.· Indexing and merging processing were separated into different threads.· Bugfixes were made to allow integration with NutchWax full-text searching.


Wayback Machine Related Software