Wikiprep

Perl script for preprocessing Wikipedia XML dumps
Download

Wikiprep Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • Evgeniy Gabrilovich
  • Publisher web site:
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 26 KB

Wikiprep Tags


Wikiprep Description

Perl script for preprocessing Wikipedia XML dumps Wikiprep is a Perl script that parses MediaWiki data dumps in XML format and extracts useful information from them. Wikiprep implements a subset of MediaWiki syntax (such as template inclusion with parameters, external and internal links, redirects, headings, etc)Output is in the form of several files: some of them in simple, line-oriented format and some of them in XML. One of the files also contains processed Wikipedia pages in a simple HTML-like syntax. The goal of Wikiprep is to convert Wikipedia data dumps into a format that can be easily processed with other tools. These tools then do not need to have the full knowledge of all quirks and odd corners of MediaWiki syntax. Requirements: · Perl


Wikiprep Related Software