Project: Applying Textual Analysis Tools for Visual Search

Current internet search technologies are limited to search content in the internet by keywords only. Users must express a search query in a form that yields many irrelevant results._x000D__x000D_There are several common problems with current text search:_x000D_• Text search is language specific - when a user searches by keywords, he must use a specific language to specify the search. Even within a given language, there are many terminologies and ways to describe the same thing. _x000D_• Text search is ambiguous - for non trivial searches the user must know about the keywords used by the requested website._x000D_ _x000D_Visual search supports the natural method in which users want to search when it’s difficult to express the query in words, or when they are on the go and see something they want. _x000D_Visual Search is a requisite and natural expansion to current search tools, and has attracted the attention of all internet giants which put their eye on the enormous potential e-shopping market. _x000D__x000D_Because digitized images consist of arrays of pixel intensities with no inherent meaning, the image databases that contain them are generally unstructured. Content-Based Information Retrieval (CBIR) is the process of searching for and retrieving images from an unstructured database based on information extracted from the content of those images._x000D_When searching a database for visual images, CBIR systems base their retrievals on the content of an image, and not on external tags, such as file name, captions, headings, keywords attached as metatags, etc. This focus on content, rather than manually defined external tags, provides CBIR systems with the potential to be qualitatively more effective for image searches than any other type of search._x000D__x000D_CBIR systems currently operate effectively only on certain characteristics from the content of an image, the most common of which are color, texture, and shape. Queries based on these characteristics yields similar images with no correlations between them which most of the time are completely irrelevant. To efficiently search images, a search based on a features derived through logical inference of the objects in the image is required. Queries based on logical features will retrieve similar object rather than similar images._x000D__x000D_With the solution that will be developed in the project, the user will be able to query the web by picture and the system will immediately detect objects and through real-time comparison of the elements' footprint it will present the user with information about the selected objects and images from all over the web with similar objects. It means that a user can get immediately images with a similar objects to the one in his selected picture, along with links and information subtracted from these links. _x000D_The solution will be based on cutting edge technology of image identification and a novel indexing method for feature vectors and text mining._x000D_The solution will be also offered as a browser plug-in which enables users to select an image while they surf the internet through the use of mouse right click._x000D__x000D_

Acronym ATAT-VS (Reference Number: 5241)
Duration 15/03/2010 - 31/01/2013
Project Topic The solution that will be developed, will allow web users to query the web by images. The system will detect objects in the uploaded or selected images and based on real-time comparison of the detected objects it will present the user information about the objects or images with similar objects
Project Results
(after finalisation)
1. Crawling: _x000D_Work on the crawling module has been fundamental. A study and a technical analysis have been carried out in order to validate the relevancy of a complete change of the platform or improvements of the existing module. As a result we kept the existing module with many evolutions in terms of both functionalities and industrialisation in order to make it more reliable in the ATAT-VS environment._x000D_In terms of results, the current crawler module is more robust, allows to manage larger volumes and includes image crawling._x000D_2. ULI :_x000D_This module, initiated during the first year of the project, is now able to extract informations associated to the crawled images. Thsi data is composed of: text of the link, title of the link, alternative text of the image, and of course a selection of the page content relarted to the image. This module manages expressions, codepage and formats normalisation and provides SQL or XML formats output. The second period of the project has been used to consolidate and industrialise this module_x000D_3. Representative terms identification: _x000D_This module, initiated during the first year of the project, allows to extract representative and differentiating words and expressions from a text. It is integrated in ULI module in order to use this functionality on textual data associated to images and thus better characterize them. This is a multilingual module. _x000D_Works during the second period of the project were focused on this module. Many iterations between Pertimm and Corrigon have been performed in order to test and assess this module in many cases. Each iteration was used to identify posotive points and negative ones. Their analysis led to specify improvements for the next iteration. The first iterations were quality oriented which implied resources for analysing, defining algorithms and developments. The latest iterations were more focused on scalability issues and led to a module well industrialised and exploitable even if some light improvements may be necessary.
Network Eurostars
Call Eurostars Cut-Off 3

Project partner

Number Name Role Country
2 Pertimm Partner France
2 Corrigon LTD Coordinator Israel