HIGGINS
HIGGINS project aims to combine crowdsourcing with automated information extraction techniques to enable high-quality fact extraction from complex textual inputs.
Overview
Ambiguity, complexity, and diversity in natural language textual expressions are major hindrances to automated knowledge extraction. As a result state-of-the-art methods for extracting entities and relationships from unstructured data make incorrect extractions or produce noise. With the advent of human computing, computationally hard tasks have been addressed through human inputs. While text-based knowledge acquisition can benefit from this approach, humans alone cannot bear the burden of extracting knowledge from the vast textual resources that exist today. Even making payments for crowdsourced acquisition can quickly become prohibitively expensive. HIGGINS employs principled methods to effectively garner human computing inputs for improving the extraction of knowledge-base facts from natural language texts. The idea is to complement automatic extraction techniques with human computing to reap the benefits of both while overcoming each others' limitations.
HIGGINS architecture combines an information extraction (IE) engine with a human computing (HC) engine to produce high quality facts. The IE engine combines statistics derived from Web Corpora (Wikipedia and ClueWeb) with semantic resources (WordNet and ConceptNet) to construct a large dictionary of entity and relational phrases. It employs specifically designed statistical language models for phrase relatedness to come up with questions and relevant candidate answers that are presented to human workers. In our experiments we extract relation-centric facts about fictitious characters in narrative text, where the issues of diversity and complexity in expressing relations are far more pronounced.
For scientific works, please cite this paper
Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition. Sarath Kumar Kondreddi, Peter Triantafillou and Gerhard Weikum, In proceedings of International Conference on Data Engineering (ICDE), 2014, Chicago, IL, USA.
People
- Weikum, Gerhard
- Triantafillou, Peter
- Kondreddi, Sarath
Publications
- Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum
Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition
IEEEInternational Conference on Data Engineering, Chicago, IL, USA, 2014. - Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum
Human Computing Games for Knowledge Acquisition. Demonstration Track.
ACMInternational Conference on Information and Knowledge Management, San Francisco, CA, USA, 2013. - Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum
HIGGINS: Knowledge Acquisition Meets the Crowds. Poster Track.
ACM22nd International World Wide Web Conference, Rio de Janeiro, Brazil, 2013.
HIGGINS Data
Dictionary of Relations
- Link: Hand-crafted relations
Format: <lemmatized relation> tab-space <relation string> tab-space <POS string>
- Link: Person-person relations from ReVerb extractions on ClueWeb09
Format: <lemmatized relation> tab-space <relation string> tab-space <POS string>
- Link: Person-person relations from PATTY
Format: <lemmatized relation> tab-space <relation string> tab-space <POS string>
Experiments
Higgins Results
Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)
Prominent Set
Movie Plots Prominent Set CSV HTML Movie Cast Prominent Set CSV HTML Book Plots Prominent Set CSV HTML Book Cast Prominent Set CSV HTML Random Set
Movie Plots Random Set CSV HTML Movie Cast Random Set CSV HTML Book Plots Random Set CSV HTML Book Cast Random Set CSV HTML C. Comparison of Higgins Components
Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)
Statistics Only
Random Set
Movie Plots Random Set CSV HTML Movie Cast Random Set CSV HTML Book Plots Random Set CSV HTML Book Cast Random Set CSV HTML Semantics Only
Random Set
Movie Plots Random Set CSV HTML Movie Cast Random Set CSV HTML Book Plots Random Set CSV HTML Book Cast Random Set CSV HTML E. HC Only
Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)
Prominent Set
Movie Plots Prominent Set CSV HTML Movie Cast Prominent Set CSV HTML Book Plots Prominent Set CSV HTML Book Cast Prominent Set CSV HTML Random Set
Movie Plots Random Set CSV HTML Movie Cast Random Set CSV HTML Book Plots Random Set CSV HTML Book Cast Random Set CSV HTML D. OLLIE extractions
Format: (<question id> , <entity one> , <entity two> , <sentences> , <number of sentences></\>)
Note: Each entityone-entitytwo pair may have multiple extractions (can be identified by questionid)
Prominent Set
Movie Plots Prominent Set CSV HTML Movie Cast Prominent Set CSV HTML Book Plots Prominent Set CSV HTML Book Cast Prominent Set CSV HTML Random Set
Movie Plots Random Set CSV HTML Movie Cast Random Set CSV HTML Book Plots Random Set CSV HTML Book Cast Random Set CSV HTML
HIGGINS Games
Please follow this link: http://higgins.mpi-inf.mpg.de/games/index.html