SOFIE: A Self-Organizing Framework for Information Extraction
SOFIE is a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the existing knowledge and on the new knowledge in order to disambiguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plausibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontological reasoning in one unified model.
Recently, SOFIE has been improved so as to generalize textual patterns. This new project is called PROSPERA. SOFIE and PROSPERA are part of the YAGO-NAGA project at the Max-Planck Institute for Informatics in Saarbrücken/Germany.
Please find more information below.
Downloads
Download the Java source code of SOFIE (CC-BY license).
Publications
- Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Scalable Knowledge Harvesting with High Precision and High Recall" (pdf)
4th ACM International Conference on Web Search and Data Mining(WSDM 2011) - Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Find your Advisor: Robust Knowledge Gathering from the Web" (pdf)
13th International Workshop on the Web and Databases (WebDB 2010) (co-located with SIGMOD/PODS 2010) - Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum
"SOFIE: A Self-Organizing Framework for Information Extraction" (pdf, bib, TechRep pdf)
18th International World Wide Web conference (WWW 2009) - Fabian M. Suchanek
"Automated Construction and Growth of a Large Ontology" (pdf, abstract)
PhD-Thesis
People
- Suchanek, Fabian
- Sozio, Mauro
- Weikum, Gerhard