WikiprepPerl script for preprocessing Wikipedia XML dumps | |
Download |
Wikiprep Ranking & Summary
Advertisement
- License:
- GPL
- Price:
- FREE
- Publisher Name:
- Evgeniy Gabrilovich
- Publisher web site:
- Operating Systems:
- Mac OS X
- File Size:
- 26 KB
Wikiprep Tags
Wikiprep Description
Perl script for preprocessing Wikipedia XML dumps Wikiprep is a Perl script that parses MediaWiki data dumps in XML format and extracts useful information from them. Wikiprep implements a subset of MediaWiki syntax (such as template inclusion with parameters, external and internal links, redirects, headings, etc)Output is in the form of several files: some of them in simple, line-oriented format and some of them in XML. One of the files also contains processed Wikipedia pages in a simple HTML-like syntax. The goal of Wikiprep is to convert Wikipedia data dumps into a format that can be easily processed with other tools. These tools then do not need to have the full knowledge of all quirks and odd corners of MediaWiki syntax. Requirements: · Perl
Wikiprep Related Software