PROSPERA: PRospering knOwledge with Scalability, PrEcision, and RecAll

PROSPERA is a Hadoop-based scalable knowledge-harvesting engine which combines pattern-based gathering of relational fact candidates with weighted MaxSat-based consistency reasoning to identify the most likely correct facts.

Publications

Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Scalable Knowledge Harvesting with High Precision and High Recall"(pdf)
4th ACM International Conference on Web Search and Data Mining (WSDM 2011)
Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Find your Advisor: Robust Knowledge Gathering from the Web" (pdf)
13th International Workshop on the Web and Databases (WebDB 2010) (co-located with SIGMOD/PODS 2010)

Source Code

Download code here: prospera.tar.gz

Experiments

The following data is for the experiments reported in N.Nakashole et. al WSDM2011.

Experiment 1: Scalability Experiment (sports relations)

Precision/Recall per iteration: - Sports precision/recall
Extracted Facts: - Iteration 1; - Iteration 2; - Iteration 3; - Iteration 4; - Iteration 5; - Iteration 6; - PROSPERA Variant: NoReasoner; - PROSPERA Variant: UnWeighted
Labeled Samples: - 1 - 2 - 3 - 4 - 5 - 6 noreasoner unweighted
Constraints: - Sports Constraints
Seeds: - Sports relation seeds

Experiment 2: Constraints experiment (academic relations)

Precision/Recall per iteration: - Academia precision/recall
Extracted Facts: - Iteration 1; - Iteration 2; - PROSPERA Variant: NoReasoner
Labeled Samples: - 1 - 2 noreasoner
Constraints: Academic constraints
Seeds: - From YAGO ontology