HIGGINS

HIGGINS project aims to combine crowdsourcing with automated information extraction techniques to enable high-quality fact extraction from complex textual inputs.

Overview

Ambiguity, complexity, and diversity in natural language textual expressions are major hindrances to automated knowledge extraction. As a result state-of-the-art methods for extracting entities and relationships from unstructured data make incorrect extractions or produce noise. With the advent of human computing, computationally hard tasks have been addressed through human inputs. While text-based knowledge acquisition can benefit from this approach, humans alone cannot bear the burden of extracting knowledge from the vast textual resources that exist today. Even making payments for crowdsourced acquisition can quickly become prohibitively expensive. HIGGINS employs principled methods to effectively garner human computing inputs for improving the extraction of knowledge-base facts from natural language texts. The idea is to complement automatic extraction techniques with human computing to reap the benefits of both while overcoming each others' limitations.

HIGGINS architecture combines an information extraction (IE) engine with a human computing (HC) engine to produce high quality facts. The IE engine combines statistics derived from Web Corpora (Wikipedia and ClueWeb) with semantic resources (WordNet and ConceptNet) to construct a large dictionary of entity and relational phrases. It employs specifically designed statistical language models for phrase relatedness to come up with questions and relevant candidate answers that are presented to human workers. In our experiments we extract relation-centric facts about fictitious characters in narrative text, where the issues of diversity and complexity in expressing relations are far more pronounced.

For scientific works, please cite this paper

Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition. Sarath Kumar Kondreddi, Peter Triantafillou and Gerhard Weikum, In proceedings of International Conference on Data Engineering (ICDE), 2014, Chicago, IL, USA.

People

Weikum, Gerhard
Triantafillou, Peter
Kondreddi, Sarath

Publications

Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum
Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition
IEEE International Conference on Data Engineering, Chicago, IL, USA, 2014.
Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum
Human Computing Games for Knowledge Acquisition. Demonstration Track.
ACM International Conference on Information and Knowledge Management, San Francisco, CA, USA, 2013.
Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum
HIGGINS: Knowledge Acquisition Meets the Crowds. Poster Track.
ACM 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, 2013.

HIGGINS Data

Dictionary of Relations

Link: Hand-crafted relations
Format: <lemmatized relation> tab-space <relation string> tab-space <POS string>
Link: Person-person relations from ReVerb extractions on ClueWeb09
Format: <lemmatized relation> tab-space <relation string> tab-space <POS string>
Link: Person-person relations from PATTY
Format: <lemmatized relation> tab-space <relation string> tab-space <POS string>

Experiments

Higgins Results

Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)

Prominent Set

Movie Plots Prominent Set CSV HTML
Movie Cast Prominent Set CSV HTML
Book Plots Prominent Set CSV HTML
Book Cast Prominent Set CSV HTML

Random Set

Movie Plots Random Set CSV HTML
Movie Cast Random Set CSV HTML
Book Plots Random Set CSV HTML
Book Cast Random Set CSV HTML
C. Comparison of Higgins Components
Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)

Statistics Only
Random Set

Movie Plots Random Set CSV HTML
Movie Cast Random Set CSV HTML
Book Plots Random Set CSV HTML
Book Cast Random Set CSV HTML

Semantics Only
Random Set

Movie Plots Random Set CSV HTML
Movie Cast Random Set CSV HTML
Book Plots Random Set CSV HTML
Book Cast Random Set CSV HTML
E. HC Only
Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)

Prominent Set

Movie Plots Prominent Set CSV HTML
Movie Cast Prominent Set CSV HTML
Book Plots Prominent Set CSV HTML
Book Cast Prominent Set CSV HTML

Random Set

Movie Plots Random Set CSV HTML
Movie Cast Random Set CSV HTML
Book Plots Random Set CSV HTML
Book Cast Random Set CSV HTML
D. OLLIE extractions
Format: (<question id> , <entity one> , <entity two> , <sentences> , <number of sentences></\>)
Note: Each entityone-entitytwo pair may have multiple extractions (can be identified by questionid)

Prominent Set

Movie Plots Prominent Set CSV HTML
Movie Cast Prominent Set CSV HTML
Book Plots Prominent Set CSV HTML
Book Cast Prominent Set CSV HTML

Random Set

Movie Plots Random Set CSV HTML
Movie Cast Random Set CSV HTML
Book Plots Random Set CSV HTML
Book Cast Random Set CSV HTML

Prominent Set
Movie Plots Prominent Set	CSV	HTML
Movie Cast Prominent Set	CSV	HTML
Book Plots Prominent Set	CSV	HTML
Book Cast Prominent Set	CSV	HTML

Random Set
Movie Plots Random Set	CSV	HTML
Movie Cast Random Set	CSV	HTML
Book Plots Random Set	CSV	HTML
Book Cast Random Set	CSV	HTML

Statistics Only
Movie Plots Random Set	CSV	HTML
Movie Cast Random Set	CSV	HTML
Book Plots Random Set	CSV	HTML
Book Cast Random Set	CSV	HTML

Semantics Only
Movie Plots Random Set	CSV	HTML
Movie Cast Random Set	CSV	HTML
Book Plots Random Set	CSV	HTML
Book Cast Random Set	CSV	HTML

Prominent Set
Movie Plots Prominent Set	CSV	HTML
Movie Cast Prominent Set	CSV	HTML
Book Plots Prominent Set	CSV	HTML
Book Cast Prominent Set	CSV	HTML

Random Set
Movie Plots Random Set	CSV	HTML
Movie Cast Random Set	CSV	HTML
Book Plots Random Set	CSV	HTML
Book Cast Random Set	CSV	HTML

Prominent Set
Movie Plots Prominent Set	CSV	HTML
Movie Cast Prominent Set	CSV	HTML
Book Plots Prominent Set	CSV	HTML
Book Cast Prominent Set	CSV	HTML

Random Set
Movie Plots Random Set	CSV	HTML
Movie Cast Random Set	CSV	HTML
Book Plots Random Set	CSV	HTML
Book Cast Random Set	CSV	HTML

HIGGINS Games

Please follow this link: http://higgins.mpi-inf.mpg.de/games/index.html