Named Entity Extraction and Disambiguation from an
Uncertainty Perspective
Mena B. Habib, Maurice van Keulen
Database group, University of Twente, The Netherlands {m.b.habib , m.vankeulen}@ewi.utwente.nl
Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. This work addresses two problems with named entity extraction and disambiguation. First, almost no existing works examine the extraction and disambiguation interdependency. Second, existing disambiguation techniques mostly take as input extracted named entities without considering the uncertainty and imperfection of the extraction process.
It is the aim of this work to investigate both avenues and to show that explicit handling of the uncertainty of annotation has much potential for making both extraction and disambiguation more robust. We conducted experiments with a set of holiday home descriptions with the aim to extract and disambiguate toponyms as a representative example of named entities. We show that the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how retraining the extraction models with information automatically derived from the disambiguation results, improves the extraction models. This mutual reinforcement is shown to even have an effect after several iterations.
References
1. M. van Keulen, Mena B. Habib: “Handling Uncertainty in Information Extraction.” Proceedings of the 7th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2011), pages 109-112, 2011.
2. Mena B. Habib and M. van Keulen. “Named Entity Extraction and Disambiguation: The Reinforcement Effect.”. In Proceedings of the 5th International Workshop on Management of Uncertain Data, MUD 2011, collocated with the international conference on Very Large Databases VLDB 2011, pages 9-16, 2011.