wayback

Open source java implementation of the The Internet Archive Wayback Machine
Download

wayback Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • Brad Tofel
  • Publisher web site:
  • http://www.archive.org/index.php
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 58.2 MB

wayback Tags


wayback Description

Open source java implementation of the The Internet Archive Wayback Machine The current production version of the Wayback Machine is implemented in perl, and lacks in maintainability and extensibility. Also, the code is not open source. Primary motivation for the new version is to address these three issues, enabling public distribution of the application, and easy experimentation with new features and access technologies.The current Java version of the Wayback Machine supports three access, or Replay modes of operation: "Archival Url" mode "Proxy" mode, and "Domain Prefix" mode.Archival URL mode provides a user experience very close to the current production Wayback Machine. All query and replay access requests can be expressed as URLs. In Archival Url replay mode, archived content is modified as it is returned to users, attempting to make links and embedded content refer back to the Wayback Machine by rewriting them as Archival URLs.Proxy URL mode allows replaying of archived documents within a client browser by configuring the browser to proxy all HTTP requests through the Wayback Machine. This has the strong advantage that no Javascript or server side page markup is required to coerce the client browser to request additional URLs and embedded content from the Wayback Machine -- content just works as-is. When used with the Firefox plugin extension, available here , client browsers can navigate between versions of the current document, and the Wayback Machine server will attempt to display images from the same time period as pages being viewed. The Proxy URL mode will require special configuration of the client web browser to access the Wayback Service. This browser configuration is not complex, but it means that content will not be available as a global URL.DomainPrefix mode is similar to ArchivalUrl mode, but uses a wildcard DNS scheme to rewrite URLs, allowing all URL substitution to occur on the server. This mode is considered experimental.The current Java version can operate in several deployment modes, ranging from a stand alone application on a single host holding all archived documents and indexes, up to a highly distributed system where indexes and archived content is spread across hundreds of machines.In the local, standalone mode, this software includes the capability to scan for new archived content in a specified location, and to automatically index and serve the new content as it appears. Directing the Wayback to look for ARC files in the directory where an instance of the Heritrix web crawler is writing ARC output should provide the capability to browse content archived by Heritrix as it is crawled. Requirements: · Java · Tomcat What's New in This Release: · Completely new implementation of ResourceStore classes, including recursive local directory scanning, scanning multiple local directories, an experimental remote directory scanning capability, and groundwork for future support of both non ARC/WARC file formats and large scale automatic indexing. · Complete overhaul of the Replay system, allowing jspInserts within ArchivalUrl, DomainPrefix, and Proxy replay modes. Also includes groundwork for future fine-grained mime-type and url-based Replay customizations. · Added capability to explicitly set Locale to use for an AccessPoint, overriding the default behavior of using the user agents specified preferred language. · New flat file implementation of FileLocationDB. See CDXCollection.xml within the .war file for and example usage. · AnchorDate feature, tracking the date with which a user begins a replay session. During this session, wayback will always attempt to remain near this date, preventing time-drift within a replay session. · AnchorWindow feature, which allows users to specify a maximum time window in either direction of the AnchorDate that they wish to view replayed content. When a user has set this option, Wayback will not display captures outside the specified window. · New command line tool location-db to create a location DB offline, populating with lines read from STDIN. · Added new AccessControlSettingOperation authentication control component, allowing the configuration of the appropriate Exclusion system per-request, as defined by arbitrary BooleanOperators. See ComplexAccessPoint.xml within the .war file for an example usage. · Added .asx archival URL replay, which rewrites links inside archived .asx files, attempting to make them point back into the Wayback service. · Now accept "http:/" as identical to "http://" in the beginning of a URL, working around a browser bug which stripped multiple "/"s in URL paths. · @ Refactoring of ResourceIndex interfaces, to allow for future update-able ResourceIndex implementations beyond BDBIndex based ResourceIndexes. · Major internal refactoring of WaybackRequest object, providing more stable get/set methods for accessing the standard internal fields with type-safety. · Major internal refactoring of SearchResults into CaptureSearchResults and UrlSearchResults, which was previously under-specified and often confusing. These new classes provide more stable get/set methods for accessing the standard internal fields with type-safety. · Changed locations of replay, query, and exception .jsp files within .war file to underneath WEB-INF, so they are not directly accessible via HTTP. · German translation of default Wayback UI. Thanks Andreas! · Czech translation of default Wayback UI. Thanks Luk???? Mat??jka! (


wayback Related Software