Wikipedia is a huge source of multilingual knowledge curated by human contributors. Wiki articles are independently written in the various languages and may cover different perspectives about a given subject. The aim of this paper is to exploit Wikipedia multilingual information for knowledge enrichment and summarization. Investigating the link structure of a Wiki article in a source language and comparing it with the structure of articles about the same subject written in other languages gives insights about the body of knowledge shared among languages. This investigation is also useful to identify knowledge perspectives not covered in the source language but covered in other languages. We implemented these ideas in CAKES, which: i) exploits Wikipedia information on the fly without requiring any data preprocessing; ii) enables to specify the set of languages to be considered and; iii) ranks subjects interesting for a given article on the basis of their popularity among languages. © 2012 The Author(s).
CAKES: Cross-lingual wikipedia knowledge enrichment and summarization
FIONDA, Valeria;PIRRO', GIUSEPPE
2012-01-01
Abstract
Wikipedia is a huge source of multilingual knowledge curated by human contributors. Wiki articles are independently written in the various languages and may cover different perspectives about a given subject. The aim of this paper is to exploit Wikipedia multilingual information for knowledge enrichment and summarization. Investigating the link structure of a Wiki article in a source language and comparing it with the structure of articles about the same subject written in other languages gives insights about the body of knowledge shared among languages. This investigation is also useful to identify knowledge perspectives not covered in the source language but covered in other languages. We implemented these ideas in CAKES, which: i) exploits Wikipedia information on the fly without requiring any data preprocessing; ii) enables to specify the set of languages to be considered and; iii) ranks subjects interesting for a given article on the basis of their popularity among languages. © 2012 The Author(s).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.