Information Retrieval and Data Mining
Core course, 9 ECTS credits, winter semester 2019 – 2020
News
- 23.10.2019: First assignment is out!
- 30.10.2019: Slides from Lecture 02 updated.
- 30.10.2019: First assignment is due today at the lecture by 16:15!
- 05.11.2019: Hints added to Assignment 2!
- 15.11.2019: Clarifications added to Assignment 4 Problem 3.
- 26.11.2019: Tutorial presentation bonuses introduced - see Google Groups!
- 09.03.2020: Final Exam grades out - check here. Congrats! :)
- 09.03.2020: Final Exam inspection will be on the 12th of March. Check email in Google group for details.
- 09.03.2020: The re-exam topics will largely be based on those in the assignments and the first exam.
- 06.05.2020: The re-exam date has been decided (see below).
- 14.05.2021: Re-exam grades out - check here. Congrats! :)
Basic Information
Type | Core course, 9 ECTS | ||||||||||||
Lecturers | |||||||||||||
Coordinators and Contact | |||||||||||||
Lectures | Wednesdays, 16-18, E1 3 - Hörsaal II (0.02) and Fridays, 14-16, E1 3 - Hörsaal II (0.02) | ||||||||||||
Tutorials |
| ||||||||||||
Exams | Final Exam: Wednesday 26.02.2020, 14:00 - 17:00, Lecture Hall 001, E2.5. Re-exam: Monday 19.10.2020, 14:00 - 17:00, Lecture Hall GHH, 001, 002 and 003, E1 3 | ||||||||||||
Teaching Assistants |
| ||||||||||||
Google Group | IRDM19 |
Lecture Schedule
Lecture | Date | Topic | Lecturer | Reading |
Lecture 01 | Oct 16 | Foundations I | RSR | Aggarwal Ch. 2 |
Lecture 02 | Oct 18 | Foundations II | RSR | Aggarwal Ch. 12 |
Lecture 03 | Oct 23 | Statistics I | AY | Wasserman Ch. 1-5 |
Lecture 04 | Oct 25 | Statistics II | AY | Wasserman Ch. 6, 7, 9, 10 |
Lecture 05 | Oct 30 | Pattern Mining I | RSR | Aggarwal Ch. 4, Zaki & Meira Ch. 8 |
holiday | ||||
Lecture 06 | Nov 06 | Pattern Mining II | RSR | Aggarwal Ch. 5, Zaki & Meira Ch. 9, 12 |
Lecture 07 | Nov 08 | Classification | AY | Aggarwal Ch. 10, Zaki & Meira Ch. 18, 19, 22 |
Lecture 08 | Nov 13 | Clustering I | JV | Aggarwal Ch. 6 |
Lecture 09 | Nov 15 | Clustering II | JV | Aggarwal Ch. 7 |
Lecture 10 | Nov 20 | Sequences I | RSR | Aggarwal Ch. 3, 14, 15 |
Lecture 11 | Nov 22 | Sequences II | RSR | Aggarwal Ch. 14, 15 |
Lecture 12 | Nov 27 | Graphs I | RSR | Aggarwal Ch. 17, 19, Zaki & Meira Ch. 4, 11, 16 |
Lecture 13 | Nov 29 | Graphs II | RSR | Zaki & Meira Ch. 16 |
Lecture 14 | Dec 04 | Anomaly Detection | RSR | Aggarwal Ch. 8, 9 |
Lecture 15 | Dec 06 | IR Basics | AY | Manning et al. Ch. 1, 5.1, 6, Zhai & Massung Ch. 8 |
Lecture 16 | Dec 11 | Ranking I | AY | Manning et al. Ch. 6, 12, Zhai & Massung Ch. 6 |
Lecture 17 | Dec 13 | Preprocessing & Evaluation | AY | Manning et al. Ch. 2.1-2.2, 3.3, 8, Zhai & Massung Ch. 9 |
Lecture 18 | Dec 18 | Ranking II | AY | Manning et al. Ch. 11, 18, Zhai & Massung Ch. 17 |
Lecture 19 | Dec 20 | Indexing | AY | Manning et al. Ch. (3,) 4, 5 |
CHRISTMAS | BREAK | |||
Lecture 20 | Jan 08 | Link Analysis | RSR | Manning et al. Ch. 21, Aggarwal Ch. 18 |
Lecture 21 | Jan 10 | Click Analysis | RSR | |
Lecture 22 | Jan 15 | Neural IR I | AY | |
Lecture 23 | Jan 17 | Neural IR II | AY | Deep Learning Book Ch 9, MacAvaney et al. 2019, Dai & Callan 2019 |
Lecture 24 | Jan 22 | Query expansion | AY | Manning 9 and 19.6 |
Lecture 25 | Jan 24 | Entities in IR | AY | |
Lecture 26 | Jan 29 | Question Answering Systems | RSR | |
Lecture 27 | Jan 31 | Recap | RSR, AY |
Tutorial Schedule
Release Date | Submission Date | Tutorial Date | Topic | Exercice Sheet | Solution |
Oct 23 | Oct 30 | Nov 4/5 | Foundations | ||
Oct 30 | Nov 6 | Nov 11/12 | Statistics | ||
Nov 6 | Nov 13 | Nov 18/19 | Pattern Mining | ||
Nov 13 | Nov 20 | Nov 25/26 | Classification | ||
Nov 20 | Nov 27 | Dec 2/3 | Clustering | ||
Nov 27 | Dec 4 | Dec 9/10 | Sequences | ||
Dec 4 | Dec 11 | Dec 16/17 | Graphs | ||
Dec 11 | Dec 18 | Jan 6/7 | IR Basics | ||
Dec 18 | Jan 8 | Jan 13/14 | Ranking and Evaluation | ||
Jan 8 | Jan 15 | Jan 20/21 | Ranking and Indexing | ||
Jan 15 | Jan 22 | Jan 27/28 | Link and Click Analysis | Assignment 11 | Solution 11 |
Jan 22 | Jan 29 | Feb 3/4 | Neural IR | Solution 12 | |
Jan 29 | Fec 5 | Feb 10/11 | Query Expansion | ||
Feb 5 | Feb 12 | Feb 17/18 | Entities and QA |
Course Contents
Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.
Prerequisites
Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.
Tutorials and Exercises
After you receive the assignment sheet, you solve the problems (individually) at home, and submit them on the appointed dates to the TAs before the lecture (by 16:15). During the tutorial sessions, the TAs will ask some of you to present your solutions. Every student must present their solutions at least 2 times during the semester. The TAs will also help in clarifying your answers. Your submitted sheets will be graded and handed back to you at the end of the session.
To do the exercises, you have to study the required reading material and go through the slides.
We do not allow plagiarism. The first time you are caught, you will receive 0 points for the specific assignment. The second time, you will be de-registered from the course.
Grading and Requirements for Passing the Course
The overall grade will be the best result of the end-term and a re-exam (there will be no further attempts). There will be no mid-term exams. The final exam is closed-book and no discussion is allowed.
To participate in the final written exam, the following prerequisites are required:
- Submit ALL 14 assignments
- Obtain 50% or more on average over all assignments (80% or more on average will fetch you a bonus point, that results in one grade point jump (if possible) in the final exam)
- Present solutions at least 2 times in the tutorials
Literature
We will use the following primary textbooks.
For Probability and Statistics,
- Larry Wasserman: All of Statistics, Springer, 2004.
For Data Mining,
- Charu Aggarwal: Data Mining - The Textbook, Springer, 2015.
- Mohammed J. Zaki and Wagner Meira Jr: Data Mining and Analysis, Cambridge University Press, 2014.
For Information Retrieval,
- Chris Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
- ChengXiang Zhai and Sean Massung: Text Data Management and Analytics, Morgan & Claypool, 2016
These and additional references are available in the library: