Thesis Topics

If you are a student at the University of Tübingen in a MSc degree program at the department for Computer Science, and are interested in working with us (e.g. for a masters thesis, research internship, research project), please send an enquiry to eml-sekretariat at Please include a current CV including a description of previous research/work experiences, a transcript of all previous courses and grades, and a brief statement of what research you want to do and why you want to join our group. A list of open topics is below— however, note that this list is not exhaustive, and we might have additional topics available.

A benchmark for multimodal explanation
In this project, we will review the state-of-the-art in multi-modal explanation, and establish a benchmark of the existing techniques. The goal of this project is to contribute a better understanding of existing approaches and propose convincing quantitative, qualitative evaluation metrics for multimodal explanation.
Cross-domain few-shot learning
Most of the existing few-shot learning methods work well when base classes with abundant training examples are semantically close to the novel classes. In this project, we study the challenging cross-domain few-shot learning problem where there is a significant semantic domain gap between base and novel classes.
Explanations for audio-visual representation learning
Many recent audio-visual representations are learned exploiting the natural alignment of audio and visual information in videos. In this project, we aim to generate explanations for audio-visual representations and we will analyse the power and quality of the natural (temporal/semantic) alignment in different video benchmark datasets used for training.
Exploiting context for open world semantic segmentation
In this project, we will explore the possibility to perform zero-shot semantic segmentation in an open world, i.e. without any prior assumption on the unseen concepts we will see at test time. In particular, we will analyze the impact that context and external knowledge can have on this challenging scenario.
Few-shot video action recognition
Few-shot learning aims to recognize novel classes with only limited labelled data. In this project, we study and develop new methods for few-shot learning in the context of video action recognition.
Few-shot object detection
Few-shot object detection aims to localise novel classes with only a few annotated images. In this project, we study and develop new methods for few-shot object detection.
Learning audio representations
Data augmentation is commonly used to increase the amount of data used for training (e.g. by additionally using transformed versions of the input data). In this project, we aim to learn powerful audio representations using a variety of different data augmentation methods.
Long-tail object recognition
Real-world datasets often follow a long-tailed distribution i.e., only a few classes have abundant examples while the majority of classes have only a few examples. In this project, we study and develop new methods that address the imbalanced issue in the long-tail recognition datasets.
Transferring multimodal knowledge for efficient network adaptation
Can the knowledge learned from different modalities - such as language, vision, and audio - be transferred to solve a new task? In this project, we will explore how we can build a generic adaptor; upon easily acquired open-source pre-trained networks by devising a lightweight network and effective supervision signals to solve a new task, i.e. to reuse the representation or supervision from pre-trained networks efficiently without piling up the complexity caused by re-training or fine-tuning.
When compositional zero-shot learning meets predictive domain adaptation
Despite their differences, both Predictive Domain Adaptation and Compositional Zero-Shot Learning (CZSL) require working with composed information: domain description + semantic in PDA, state + objects in CZSL. In this project, we will explore how the advancements on one task can benefit the other, even in the case of limited supervision.
Semi-supervised few-shot learning
Few-shot learning aims to recognize novel classes with only limited labelled data. In this project, we study and develop new methods that leverage a large amount of unlabeled data to boost few-shot learning.
Fine-grained visually informed sound source separation
In this project, we consider the problem of sound source separation in video data. We aim to leverage fine-grained visual information to separate different sounds in a video.
Visual explanations for self-supervised learning
In this project, we will explore existing techniques for visual explanation to study self-supervised learning models in visual recognition tasks. The main goal of this project is to provide a good visual explanation to better understand what content has been learned by the self-supervised learning models and why they are powerful representation learners.
Zero-shot object detection
In this project, we study and develop new methods for zero-shot object recognition that aims to recognize novel classes with no training examples.