Semantic Culturomics
This project is developed jointly with the DBWeb team of Télécom ParisTech.
The last decade has seen the rise of large knowledge bases, such as YAGO, DBpedia, Freebase, or NELL. In this project, we show how this structured knowledge can help understand and mine trends in unstructured data. By combining YAGO with the archive of the French newspaper Le Monde, we can conduct analyses that would not be possible with word frequency statistics alone. We find indications about the increasing role that women play in politics, about the impact that the city of birth can have on a person's career, or about the average age of famous people in different professions. By mining commercial products from the Web, we can trace the global trade flow on a map.
People
- Huet, Thomas
- Suchanek, Fabian
Publications
- Aliaksandr Talaika, Joanna “Asia” Biega, Antoine Amarilli, Fabian M. Suchanek:
“IBEX: Harvesting Entities from the Web Using Unique Identifiers” (pdf)
Workshop paper at Web and Databases (WebDB) at SIGMOD , 2015
See also: IBEX Web page - Fabian M. Suchanek, Nicoleta Preda:
“Semantic Culturomics” (pdf) (slides)
Vision paper at Very Large Databases (VLDB) , 2014 - Original paper:
Thomas Huet, Joanna “Asia” Biega, Fabian M. Suchanek:
“Mining History with Le Monde” (pdf)
Workshop paper at Automated Knowledge Base Construction (AKBC) at CIKM , 2013
Results
Temporal statistics
Mentions of men and women over time.
Average age of people per occupation.
Mentions of countries per continent.
Spatial statistics
Importance of the capital per country. In articles about countries in blue the capital is never mentioned, whereas for countries in red , it is always mentioned.
Percentage of foreign companies mentioned per country. For countries in blue , only local companies are mentioned whereas for countries in red , only foreign companies are mentioned.