Project: Chemically Informed Knowledge Extraction from Literature

A large proportion of scientific knowledge is only available within text. This may be scientific papers, internal reports, patents, or text fields within semi-structured data such as medical records or electronic lab notebooks. Given the unprecedented and continuing increase in scientific information, new ways are required to extract the necessary information to inform commercial and academic research and development. _x000D__x000D_Over the last decade, improved search technology has given scientists better ability to find relevant documents. However, key information may be in lower ranked documents, and scientists can fail to make connections where information is spread across multiple documents, or different types of document such as medical records vs. scientific literature. Advanced text mining technology has recently emerged as an automated solution for extracting and connecting information from various information sources at large scale. The technology has been making a large impact in the life sciences, for example in pre-clinical safety, systems biology, and target selection. _x000D__x000D_This project will provide a chemically aware text mining solution for information analysis. The system _x000D_will re-structure the text-based information using chemical knowledge, mapping from textual descriptors to explicit chemical structures. To achieve this, the system will need to recognise chemicals expressed in a variety of formats, and transform into a standard chemical structure description format. This includes chemical names, proprietary drug names, names based on the structure (IUPAC name, International Chemical Identifier) and Markush descriptors. This structured view of the data will be integrated fully into the text mining system to allow users to perform chemical, textual, numerical and biological searches, and extract structured information for further analysis. This functionality will enable users to gather chemical structures and other data, then manage and search based on the chemical structures. The technology is relevant across all the chemistry related sciences. _x000D__x000D_The project brings together Linguamatics which is the leading text mining provider in the life sciences, with ChemAxon, a leading supplier of chemical software. By pooling together the expertise of the two companies, we believe we can provide new software which will introduce the benefits of text mining to the chemical community. _x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D__x000D_

Acronym ChiKEL (Reference Number: 5492)
Duration 01/06/2010 - 31/05/2012
Project Topic Development of an interactive text mining platform focussed on the needs of chemists, bringing together an innovative interactive text mining platform with state of the art chemical software.
Project Results
We have developed a chemically aware text mining system integrating ChemAxon's name-to-structure code with the I2E system. This was released to customers as part of the Linguamatics I2E 4.0 software release in December 2012. We are also intending to publish chemically enabled indexes on our hosted patent solution during February 2013. _x000D_
Network Eurostars
Call Eurostars Cut-Off 4

Project partner

Number Name Role Country
2 Linguamatics Limited Coordinator United Kingdom
2 ChemAxon Kft. Partner Hungary