Information Extraction
Block seminar, 7 ECTS credits, winter semester 2016–17
Basic Information
- Type: block seminar
- Lecturer:Jannik Strötgen
- Credits: 7 ECTS credits
- Registration: The seminar and the waiting list are full. No further registrations can be considered, sorry!
- Block seminar: February 9 and Feburary 10, 2017 (for details, see schedule below)
News
The block seminar is over. The final grades can be found here. Note that it contains the last four digits of your "Matrikelnummer".
Topics
In this seminar, we will cover topics such as named entity recognition and normalization, temporal information extraction, relation extraction, and fact extraction. A list of topics can be found below.
Organization
Registration and initial information
- The number of participants is limited, but you can be waitlisted as participation in the kick-off meeting and the first lecture is mandatory. If registered students do not show up, their places are given to waitlisted students.
- To register, please send me an email with: (i) your name, (ii) Matrikelnummer, (iii) preferred email address, (iv) field of study, and (v) semester (incl. whether BA or MA).
- You will get a reply whether you are registered or waitlisted.
The seminar is a block seminar and will take place on two (consecutive) days at the end of January or beginning of February - the exact days are agreed with the participants. However, there will also be two meetings at the beginning of the semester:
October 26, 2016 -- Kick-off meeting (participation is mandatory)
- date and time: October 26, 2016, 2:15 pm - 3:45 pm
- place: seminar room 23, MPI-Inf building (E 1.4, ground level)
- explanation of the structure and organization of the seminar
- brief introduction to information extraction
- presentation of the topics
November 2, 2016 -- Lecture (participation is mandatory)
- date and time: November 2, 2016, 2:15 pm - 3:45 pm
- place: seminar room 0.01, MMCI building (E 1.7)
- "How to prepare and present a seminar talk"
- As this is a block seminar, it is particularly crucial that the students' presentations are of high quality. This lecture aims at preparing the participants in such a way that their slides and presentations will be of high quality.
Schedule
slides and material can be found below
- October 26, 2016:
- Kick-off meeting
- November 2, 2016:
- "How to prepare and present a seminar talk"
- November 2, 2016:
- students send a ranked list of their top 3 topics via email
- November 30, 2016:
- students send a suggestion of the outline of their seminar paper, including an itemization of the planned content for each section
- January 11, 2017:
- students submit their final seminar paper
- two weeks before the first day of the block seminar:
- students send preliminary slides
- two days before the first day of the block seminar:
- students send their final slides which they will use in the block seminar
- block seminar, day 1: February 9, 2017
- block seminar, day 2: February 10, 2017
Schedule of the block seminar
might be subject to change
All seminar papers are now available to the participants. Get them here.
Day 1 - February 9, 2017 -- 10:15 to 15:45
- 10:15 -- 10:25 Introduction
- 10:25 -- 11:00 Talk 1-1
- 11:00 -- 11:35 Talk 1-2
- 11:35 -- 12:10 Talk 1-3
- 12:10 -- 12:45 Talk 1-4
- 12:45 -- 14:00 lunch
- 14:00 -- 14:35: Talk 1-5
- 14:35 -- 15:10: Talk 1-6
- 15:10 -- 15:45: Talk 1-7
Day 2 - February 10, 2017 -- 10:15 to 15:45
- 10:15 -- 10:50 Talk 2-1
- 10:50 -- 11:25 Talk 2-2
- 11:25 -- 12:00 Talk 2-3
- E-1: Open Domain Event Extraction -- Alexander Mohr [slides]
- Ritter et al (2012): Open Domain Event Extraction from Twitter, KDD. [pdf]
- 12:00 -- 12:35 Talk 2-4
- 12:35 -- 13:45 lunch breack
- 13:45 -- 14:20 Talk 2-5
- 14:20 -- 14:55 Talk 2-6
- 14:55 -- 15:30 Talk 2-7
- 15:30 -- 15:45 Final Words, Discussion, Wrap-up
Rules and Grading
- participation in the kick-off meeting, the lecture and both days of the block seminar is mandatory
- students will be assigned a particular topic and have to hand in a seminar paper (template will be provided) and give a presentation (20 minutes + 10 minutes for discussion)
- grading will be based on
- the report
- the presentation
- knowledge on the subject (as evidenced in the discussion after the presentation)
- activity in the discussions
- ability to stick to deadlines
- Attention: According to the study regulations, you are only allowed to withdraw from the seminar within three weeks after the kick-off meeting, i.e., until November 16. Later withdrawal counts as "failed".
Slides, Templates, and Material
all documents are password protected
Slides (If you are not one of my students but interested in my slides, just contact me)
2016-10-26: organization and introduction [pdf]
2016-11-02: "How to present" lecture slides [pdf]
Templates
2016-11-02: Latex beamer template for presentations [pdf] [tar.gz]
2016-11-02: Latex seminar paper template [pdf] [tar.gz]
Material
- Introduction
- Sarawagi (2008): Information Extraction, Foundations and Trends in Databases, 2008. [pdf]
- Named Entity Recognition and Disambiguation
- N-1: Nadeau and Sekine (2007): A Survey of Named Entity Recognition and Classification, Linguisticae Investigationes, 2007. [pdf]
- N-2: Hoffart et al. (2011): Robust Disambiguation of Named Entities in Text, EMNLP, 2011. [pdf]
- N-3: Luo et al. (2015): Joint Named Entity Recognition and Disambiguation, EMNLP, 2015. [pdf]
- N-4: Guo et al. (2009): Named Entity Recognition in Query, SIGIR, 2009. [pdf]
- Temporal and Geographic Information Extraction
- T-1: Strötgen, Gertz (2013): Multilingual and Cross-domain Temporal Tagging, Language Resources and Evaluation, 2013. [pdf]
- T-2: Kuzey et al. (2016): As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes, WWW, 2016. [pdf]
- T-3: Lieberman, Samet (2012): Adaptive Context Features for Toponym Resolution in Streaming News, SIGIR, 2012. [pdf]
- Relation Extraction and Text Mining
- R-1: Zhu et al. (2009): StatSnowball: a Statistical Approach to Extracting Entity Relationships, WWW, 2009. [pdf]
- R-2: del Corro, Gemulla (2013): ClausIE: Clause-Based Open Information Extraction, WWW, 2013. [pdf]
- R-3: Yeung and Jatowt (2011): Studying How the Past is Remembered: Towards Computational History through Large Scale Text Mining, CIKM, 2011. [pdf]
- Knowledge and Commonsense Harvesting
- K-1: Hoffart et al. (2013): YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia, AI journal, 2013. [pdf]
- K-2: Tandon et al. (2014): WebChild: Harvesting and Organizing Commonsense Knowledge from the Web, WSDM, 2014. [pdf]
- Event Extraction
- Word Representations