Project: The Dictionary/ Grammar reading Machine : Computational Tools for Accessing the World's Linguistic Heritage

Acronym DReaM (Reference Number: JPICH.DH.17.010)
Duration 01/01/2018 - 31/12/2020
Project Topic The diversity of the world's 6,500 languages embodies a wealth of information on human cognition and the history of populations. As languages go extinct, the linguistic heritage of human kind increasingly resides in grammars and dictionaries, which are rapidly accumulating. Accessing this heritage entails that the descriptions are available and that they are read by someone. Availability is a problem because publications are often difficult to access. The project aims to enhance access to the world’s linguistic heritage by making an existing collection of more than 9,000 PDF documents no longer protected by to copy-right available in a stable archive enriched by added metadata and computational tools developed to search information within the texts. Moreover, a number of dictionaries will be converted to apps for mobile devices that can be distributed to speakers of minority languages, handing back to these speakers some of their linguistic heritage. The next step, that of reading language descriptions, sounds trivial, but when all relevant publications are taken into account a researcher who would like to access information on all the world’s languages is literally faced with hundreds of thousands of publications. Therefore, another aim of the project is to develop information-extraction tools specifically tailored to the task of dealing with language descriptions. Using cutting-edge methods from Machine Learning and Natural Language Processing, the researchers intend to build a system that can extract millions of snippets of information and link them in ways such that it is possible to construct individual language profiles from a variety of sources and to output comparative databases for the purpose of typological and historical linguistics.
Network JPI Cultural Heritage
Call JPI Cultural Heritage - Digital Heritage Joint Call

Project partner

Number Name Role Country
1 Uppsala University Coordinator Sweden
2 Gothenburg University Observer Sweden
3 Leiden University Partner Netherlands
4 LLACAN / CNRS Partner France