The Theseus image and video search project

German flagThe Germans will not be left behind in the search engine race, which is why the German federal government is supporting the Theseus search engine project.

Theseus grew out of the French-German Quaero project, but Theseus is now an independent exercise. The Germans apparently got tired of French America-bashing and the idea of developing an alternative to Google.

The future is semantic video search

Thesues is definitely not about developing a European Google. This is a technology-push initiative in which research institutions and companies cooperate to develop search technologies that can be used in various search engines and tools. The main focus is on image and video search.

Theseus is to develop so-called semantic search technologies, where the search engine analyzes the context of the information found in order to determine what it is all about. Given the social nature of today’s Web this should, according to the Germans, be more feasible.

The first prototype is expected to be ready in 2011.

Theseus was presented at a recent Chorus workshop.

The Theseus Content Technology Cluster (CTC)

The project consists of several cases and a cross-cutting Content Technology Cluster.

The Theseus Content Technology Cluster (CTC) was presented by Dr. Ralf Schäfer of the Fraunhofer Heinrich-Herz-Institut. Fraunhofer is one of Europe’s largest public research institutions. Schäfer is coordinator of the CTC.

It seems to us that CTC is the part of Theseus that is to solve the most fundamental science and technology challenges in this project.

CTC consist of eight so-called work packages (Eurospeak for project modules). The headlines alone give you a certain idea about what this is all about:

  • WP1: CTC Management
  • WP2: Video, Audio, Metadata, Platforms
  • WP3: Ontology Management
  • WP4: Situation Aware Dialogue Shell for the Semantic Access to Media and Services
  • WP5: User Interface, Visualization
  • WP6: Statistical Machine Learning
  • WP7: DRM/IPR Management
  • WP8: Evaluation

Theseus logo

Image recognitions and analysis

WP2 is all about image recognition. The challenge is to get search engines to recognize and catalogue objects found in images and videos.

Theseus is doing research on video recognition and video codec compression. The idea is to analyze still frames of videos to extract features, faces etc. You should be able to use this information to find similar images, other videos or pictures of the same person etc.

Metadata

Theseus is also looking at metadata standards and standardization, metadata generation, indexing and retrieval and automatic picture quality assessment.

It is unclear to us how much importance Theseus puts on metadata tagging for interpreting images and videos. It is notoriously difficult to get webmasters and content providers to tag media in any systematic manner, at least in the traditional German sense of the word “systematic”.

Ontology management and semantic access

WP3 is about ontology management - i.e. about how to sort the information you find into different classes and make it more easily searchable.

WP4 — with the poetic title “Situation Aware Dialogue Shell for the Semantic Access to Media and Services” — is apparently about how to present this information to searchers.

We readily admit that Schäfer’s presentation is getting so technical here that we have problems following him. However, it seems that Theseus is looking for new and better ways of integrating a better understanding of what the searcher is looking for with a more efficient analysis of the content of media files. User adaptation and personalization is part of this.

Search result presentation

WP5 is focused on the user interface and the visualization of search results. The goal is to present data in various types of media (clients) including “Web 3.0 clients”. He seems to be talking about mobile web clients here, as well as TV based search.

In the Theseus context Web 3.0 is understood to be the end product of the marriage of the social web (Web 2.0) with semantic search and presentation capabilities.

In any case, it is clear that Theseus sees far beyond your regular browser window when it comes to tools for searching for information.

They also look at “visualization techniques for semantic annotation”. One way of visualizing semantic annotation is the tag clouds you find on various Web 2.0 sites. The Germans plan to go far beyond that kind of presentation and navigation.

Statistical machine learning

WP 6 is about statistical machine learning, i.e. how to develop software that can use statistics to handle the large amount of information that is extracted from images and videos. The goal is to develop software that “learns” as it goes along.

The software may for instance try to divide the image into sets of types of information (color, shape, contrast, entropy) and use statistical methods to define what the image is all about.

Intellectual property rights

WP7 is about DRM and IPR Management, in other words: intellectual property rights and access control for digital media.

The Germans are clearly trying to solve the problem of all the copyright violations taking place on the Internet, and are doing research on watermarking, authentication, encryption and identification.

Theseus case studies

The following information is from a presentation made by Thomas Niessen, Director of Program Management, the Theseus Program Office. The cases are to represent practical applications of Theseus research.

We will not try to give a more thorough explanation here. This link leads to a Google translation of the relevant page on the German site.

ALEXANDRIA (Lycos Europe)
Publishing platform for user generated content.
Combination of semantics and community recommendation.

TEXO (SAP)
Future business value networks (Web of services).
Combination of SOA-based software components.

MEDICO (Siemens)
Scalable semantical analysis of diagnostic images in medicine.

CONTENTUS (DNB)
Process chain for providing semantic access to AV-archives as a part of safeguarding the national cultural heritage.

ORDO (empolis)
Integrated Digital Control center for distributed, heterogeneous information.
IP-related content (as pilot application).
Individual organization of local and remote media assets.

PROCESSUS (empolis)
Integration of semantically enriched process-chains in and across industrial companies.
Dynamic composition of content and services.
Integration of in-house and external communication.

Institutions involved in Theseus

German National Library
DFKI
Deutsche Thomson
FZI Karlsruhe
Empolis
Fraunhofer Gesellschaft
Festo
LMU Munich
IRT
TU Darmstadt
Intelligent Views
TU Dresden
Lycos
TU Munich
M2any
University of Karlsruhe
Moresophy
University of Erlangen
Ontoprise
SAP
Siemens
VFI

Theseus is funded by the German Federal Ministry of Economics and Technology (BMWi). The five year project has a budget of 180 million Euro. BMWi has provides some 90 million Euro. The participating partners from industry and research provide the rest.

See also: The Quaero project - new European search technology
Theseus site (in German)

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • blogmarks
  • Blue Dot
  • Bumpzee
  • Furl
  • Ma.gnolia
  • MisterWong
  • Propeller
  • Reddit
  • Simpy
  • StumbleUpon
  • TwitThis
  • Wikio
  • YahooMyWeb
  • BlinkList
  • NewsVine
  • Netvouz