Duke

A fast deduplication engine
Download

Duke Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Apache License 2.0
  • Publisher Name:
  • Lars Marius Garshol
  • File Size:
  • 1.8 MB

Duke Tags


Duke Description

A fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java Duke is a fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java on top of Lucene. At the moment it can process 1,000,000 records in 11 minutes on a standard laptop in a single thread. It consists of a command-line tool which can read CSV, JDBC, SPARQL, and NTriples data. There is also an API for programming incremental processing and storing the result of processing in a relational database.


Duke Related Software