The Quaero project – new European search technology
In the first of several articles on European search engine research, Pandia takes a look at the French search engine Quaero.
Although the search engine scene is dominated by American companies like Google, Yahoo!, Microsoft and Ask, there is quite a lot of search engine research going on on the European scene as well.
What’s typical for many of them is that they are technology driven, i.e. they are based on bottom-up approaches where research institutions and companies — supported by public institutions — try to develop new search technologies that can be applied by businesses at a later stage.
The Chorus project
The Chorus project is a EU Commission coordination project that is to “enhance the interactions and to invoke discussions between the key players of the IST ‘Cluster’ and thus to stimulate the creation of the European Research Area (ERA)”
This is Euro-speak meaning that Chorus is to function as a learning arena for companies and research institutions involved in information and communication technology innovation.
Chorus is, for instance, supporting the preparation of a road map for the development of audio-visual search engines in the European Union.
The Quaero multimedia search adventure
One of these search engines is the French Quaero. At a recent Chorus workshop Pieter van der Linden and Henri Gouraud presented the current state of the project.
Van der Linden and Gouraud argue that multimedia content will become more and more important, and that such content will be accessed from different types of devices, including the mobile PCs, phones, cameras etc.
Moreover, the traditional distinction between consumers and producers will vanish. As we have pointed out here at Pandia, the arrival of services like YouTube and Flickr show that the user has become content provider, uploading videos and photos for all to see.
Van der Linden and Gouraud also draw attention to the arrival of blogs and podcasts, and the increasing use of multimedia content. In April 2006 26 percent of French Internet users (called “internauts” in French. Now, that is a good word!) had watched video online. One year later the percentage had risen to 36.6 percent.
More than a search engine
In order to handle all this information, we need tools for search and selection, including aggregators for personalized content. We will no longer accept that the TV companies decide what we are to see and when we are to see it. If we want to see the Sopranos at 1.15 AM, that’s the way it will be, and we want to find it quickly and without difficulty.
Quaero is to take part in this market, by developing technologies for finding, accessing, manipulating and processing multimedia and multilingual content.
This means, as far as we understand it, that Quaero is more than a search engine for multimedia content. It will also be a tool for reading and using such content. Moreover, the ability to interpret and make use of content in different languages is a secondary objective, which makes very much sense in a European context.
Six Quaero cases
Van der Linden and Gouraud present six cases:
1. A Consumer Multimedia Search Engine
This search tool will enable you to search through podcasts and radio and TV broadcasts using speech to text transcription. It will also search images and video using names, context and metadata annotations.
2. Multimedia Search Services to enrich European portals
This module will enhance the user experience by delivering more convenient in interfaces.
3. Personalized Video
Quaero also plans to give users access to video on interactive consumer networked devices anytime and anywhere. If we understand this correctly, Quaero aims at covering all types of video delivery tools, including IPTV networks, set top boxes, PCs and mobile phones.
4. “Recondition the Audiovisual Cultural Heritage”
It is not totally clear to us what this means. They say that Quaero will “document once, publish multiple – improved annotation and encoding”, by a combination of automatic and manual means. This could be a reference to the semantic web, where content is tagged according to certain rules for easy retrieval. Quearo will structure audiovisual content and digital books for access from public and professional portals.
5. Professional Digital Media Asset Management for Broadcasting Industry
This module includes tools for the handling, (post) production, aggregation, storage, search and reuse of video material in the multimedia industry.
6. Platform for Text and Image Annotation
This module includes tools for the digitization and translation of paper based information (digitization of libraries, patents at patent offices etc.)
Combining technology push with market pull
Van der Linden and Gouraud argue that Quaero aims to cover the whole development chain in one single, large, structured, collaborative program, including (in their terms) “applied research, basic research, corpus development and evaluation infrastructure”.
The research and development is organized as application-specific projects.
Six technology areas
The technological challenges are enormous. Van der Linden and Gouraud presents no less than six technological areas:
1. Infrastructure
Quaero wants to develop tools for large scale document and index storage distribution. It will need instruments for combining information from these systems and devices. The team also needs to develop work flow procedures for managing and aggregating content and metadata.
2. Metadata
Here they mention the “ingestion” and structuring of metadata. It is all about standards, presumably.
3. Automatic annotation
To tag all this information by hand would be a nightmare, so Quaero needs to develop technologies for speech to text conversion, image recognition, OCR, language recognition, summarizing, translation, thesaurus building and more.
4. User interface
To help users find relevant content more easily, Quaero is experimenting with natural language queries and question answering, cross lingual queries, context management and profiling, summarizing, and translation.
5. Search and extraction
We guess this is the ultimate Google challenge: How to index billions of documents, rank them, store them and use them for answering questions. This module also includes the “use of web document statistics for approximate, phonetic and multilingual search.”
6. Security
Finally there is a module for security, including video and audio fingerprinting. We are not sure, but this may have something to do with intellectual property right protection.
Quaero partners
There are both public and private partners:
The private companies mentioned are Thomson, France Telecom, Jouve, the Euopean search engine Exalead, Bertin Technologies, LTU Technologies, Vecsys, Synapse Development.
The public research laboratories or institutes taking part are LIMSI-CNRS, RWTH-Aachen, Karlsruhe University, INRIA, LIG-UJF, IRCAM, ENST-GET, IRIT, INIST-CNRS, MIG-INRA, LIPN. These acronyms do not mean much to many of us, but it is important to note that not all of these are French. The Germans are, for instance, still on board.
There are also several French public institutions in the consortium. This is important, as it seems clear to us that Quaero also can be considered part of the French government’s attempts to develop a counter weight to the American dominance of this industry. The French Agence de l’Innovation Industrielle has allotted EUR 100 million to the five year budget. The state aid has to be authorized by DG Competition of European Commission, but we would guess that that will not be a problem.
Searching the Web with Quaero
So, can you do a search using Quaero today? No, the consortium doesn’t even own the Quaero.com domain name, which leads us to believe that Quaero is so not much going to become a new search engine in its own right as a deliverer of technology to other online properties.
The French search engine Exalead is part of the team, and Exalead’s image search engine is given as one example of the use of Quaero search technology by van der Linden and Gouraud. Another one is the French Audiosurf search engine.
The multimedia search engine Quaero, Europe’s answer to Google
EUR 400 million to the European search engine Quaero
Recent news from Pandia
How to Benefit from Google’s Search Plus
Search Engine News Jan 29: Google+ is Everywhere!
Social Media News Jan 29
Search Engine Marketing News Jan 29
How Google Has Revolutionized the Way Consumers Save Money
The new Search Plus Your World feature will cause Google a lot of pain