PROSPERA: PRospering knOwledge with Scalability, PrEcision, and RecAll
PROSPERA is a Hadoop-based scalable knowledge-harvesting engine which combines pattern-based gathering of relational fact candidates with weighted MaxSat-based consistency reasoning to identify the most likely correct facts.
Publications
- Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Scalable Knowledge Harvesting with High Precision and High Recall"(pdf)
4th ACM International Conference on Web Search and Data Mining (WSDM 2011) - Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Find your Advisor: Robust Knowledge Gathering from the Web" (pdf)
13th International Workshop on the Web and Databases (WebDB 2010) (co-located with SIGMOD/PODS 2010)
Source Code
Download code here: prospera.tar.gz
Experiments
The following data is for the experiments reported in N.Nakashole et. al WSDM2011.
Experiment 1: Scalability Experiment (sports relations)
- Precision/Recall per iteration
- Extracted Facts
- Labeled Samples
- Constraints
- Seeds
Experiment 2: Constraints experiment (academic relations)
- Precision/Recall per iteration
- Extracted Facts
- Labeled Samples
- Constraints
- Seeds
- From YAGO ontology