The notion « Semantic Web » has been widely used (or abused) to qualify very different technologies, such as Computer Service interoperability, Meaning Extraction, Natural Language Computer Interaction or XML schemas to describe taxonomies. _x000D__x000D_The OpenSem project aims at putting together the best of these technologies, with a very pragmatic approach and a clear focus: putting end users in touch with the richness of the web content. _x000D_OpenSem is a vertical project that will build and integrate content extraction services, semantic analysis services indexing services, rich user interfaces._x000D__x000D_The ideas OpenSem project come from an analysis that its members share about the web, its future, the way people use it, and the technical difficulties for building a sensible view of it:_x000D__x000D_1. The web is incredibly full of free conceptual content, which is only little exploited by the tools available. The Wikipedia for instance offers a free catalog of facts, places, famous people, but this catalog is not exploited as a structured content. A book e-commerce website is also a great catalogue that link together the greatest works of the humand mink with their authors, and reviews from its users. But no one may take advantage of this data as an open semantic asset._x000D__x000D_2. Each page, site or concept available on the Web is obviously related with the entities that link to it, or the page where there is a comment about it, etc… These data allow to give a meaninful context over the web content, and to calculate its relevance. However, these data are only available to a few web actors ; those which are able to put together the heavy and expensive infrastructure necessary to deal with big amount of data. Those synthetic data about context and relevance could prove very useful for many web applications, provided they were easily available. _x000D__x000D_3. After decades of research, mature technologies emerge and deal with analysis and inference of ontology However, these technologies are challenged by the Web because of its huge volume and mutation rate. In order to be effective, semantic technologies need to be adapted to take into account principles such as “freshness”, “relevance”, “spam detection”, which are the core technology that a search engine masters. Without a complete mastering of these search and relevance oriented technologies, semantic analysis will fail when facing real, unfiltered, web content.._x000D__x000D_4. Online communities are built around small groups of highly motivated people, that are eager to contribute and discover new things about their subject of expertise. These communities are built around a subject which can de declined into multiple sub-concepts, with a complex structure, and they would take benefit in getting a comprehensive view of their shared topic of interest. Let us consider, for instance, the virtual online community of all the web users passionate about ecology. This will cover various topics such as renewable energy, organic farming, genetically modified organisms, oil spill .. From one site to another, these subjects are tackled with either an analytical or controversial point of view. Some websites will build extended encyclopedic resources around a subject while other will react on news. A synthetic view of these sites, along with their resources, topics and subtopics would have a great value for the community members, or for the professional surveyor. _x000D__x000D_5. Simple user interface, are, at the end, the only one that are able to attract and keep end users in their daily use: home page, navigation clicks, search field. User interface for the semantic Web must keep these simple interactions, and just make them smoother._x000D_6. In order to carry on a cute analysis of the content and classify content, semantic technology must rely on effective stemming, and morphotagging technologies. _x000D__x000D_7. Projects that try to mine and explore semantic content are too often limited to the content of a single language, namely English, and do not take into account the problems that arise when dealing with multiple language. That’s also the case for most platform that create collaborative content and social networks._x000D_8. In order to stay relevant and reliable on the long term, a semantic platform must open itself as a collaborative platform, on which multiple kinds of web actors shall contribute to the common knowledge: content provider, social networks, end users … All these web cors must be able to use OpenSem not only as a web service, providing a semantic context around web content and topics, but also as a place where they can publish knowledge as a data.

Acronym OPENSEM (Reference Number: 4276)
Duration 27/11/2008 - 27/05/2011
Project Topic OpenSem aims at gathering and leveraging the best of "Semantic Web" technologies (Computer Service interoperability, Meaning Extraction, NL Computer Interaction or XML schema to describe taxonomies) to dramatically improve web content richness end-users experience.
Project Results
(after finalisation)
The CO achievements of the Opensem Projects are: _x000D_- the OpenMot platform with an open API that allows a user to annotate documents_x000D_using semantic processing;_x000D_- Priberam and Synapse's semantic technologies: sentiment analysis module, named_x000D_entities extraction module, ontology extraction module, question answering modules_x000D_- the integration of these technologies in a common platform._x000D__x000D_As a proof of concept, three prototypes demonstrate different part of the project_x000D_results with real world applications. They address various types of corpora and show_x000D_the semantic processings developed within the project for the four languages_x000D_addressed by the consortium._x000D__x000D_The first prototype is a Vertical Search on cameras. With this _x000D_prototype, users can search cameras and refine their search very _x000D_accurately by using the characteristics semantically detected in raw _x000D_descriptions of the camera models in any of the four languages handled _x000D_by the consortium (English, French, Portuguese and Spanish)._x000D__x000D_The second prototype is a Cross-Language Natural Language Search _x000D_Engine on political news. Users can ask questions using natural language _x000D_in any of the four languages handled by the consortium. Precise answers _x000D_are then returned to them in the languages defined by the user (all four _x000D_if not specified). The user can then navigate through the results using _x000D_different refinements on Named Entities and on answers themselves._x000D__x000D_The third prototype is a Sentiment Analyser Navigator on movie _x000D_comments. Users can search movies or actors and see a sentiment analysis _x000D_that was performed on movie comments. Sentiment analysis synthesis is _x000D_performed on all four languages. User can give their feedback on the opinion_x000D_analyzed by the technological modules. Navigation is pushed to a high degree of_x000D_refinement features so that users can really navigate between results _x000D_more than using textual search. _x000D__x000D_An website provide general public information about the opensem content and_x000D_technologies._x000D_
Network Eurostars
Call Eurostars Cut-Off 1

Project partner

Number Name Role Country
3 EXALEAD S.A. Coordinator France
3 Priberam Informática, S.A. Partner Portugal