DukeA fast deduplication engine | |
Download |
Duke Ranking & Summary
Advertisement
Duke Tags
Duke Description
A fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java Duke is a fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java on top of Lucene. At the moment it can process 1,000,000 records in 11 minutes on a standard laptop in a single thread. It consists of a command-line tool which can read CSV, JDBC, SPARQL, and NTriples data. There is also an API for programming incremental processing and storing the result of processing in a relational database.
Duke Related Software