Lecture: Advanced Speech Processing, Winter Term 2025/2026
        
- Instructor: Prof. Dr. Emanuël Habets
 
- Teaching Assistant: TBD
 
- Time: Winter Term 2025/2026, Tuesday's 14:15-15:45
 
- Place: Am Wolfsmantel 33, Erlangen-Tennenlohe, Room 3R4.04
 
- Format: Lecture
 
- Credits: 2,5 ECTS
 
- Exam (graded): Oral examination at the end of the term
 
News
The first lecture will be held on October 28th at 14.15.
Format
The lecture has the following format:
- Every meeting consists of 90 minutes
 
For further information, please contact Prof. Dr. Emanuël Habets.
Content
Speech is at the core of human communication and increasingly central to our interaction with technology. From voice assistants and teleconferencing to hearing aids, security applications, and immersive media, speech technologies must perform robustly in real-world acoustic environments. These environments are often far from ideal: noise, reverberation, and interfering sources can severely degrade the quality and intelligibility of speech signals. At the same time, advances in machine learning and signal processing have opened new opportunities for creating, modifying, and analyzing speech in powerful ways.
This lecture provides a comprehensive introduction to advanced speech processing, covering both classical and modern neural approaches. Topics include:
- Speech quality and intelligibility assessment: objective and subjective methods for evaluating speech processing algorithms.
 
- Speech enhancement: noise reduction and dereverberation with classical signal processing and deep learning.
 
- Speech extraction and separation: isolating target speakers or signals from complex mixtures.
 
- Beamforming: spatial filtering to enhance speech captured by microphone arrays using classical array processing and deep learning.
 
- Speaker identification and verification: modeling and recognizing speaker characteristics for personalization and security.
 
- Text-to-speech synthesis (TTS): generating natural and expressive speech from text with modern neural architectures.
 
- Voice anonymization: transforming speech signals to protect privacy while preserving intelligibility.
 
- Self-supervised and foundation speech models: Learning robust speech representations from unlabeled data, and leveraging large-scale pre-trained models (e.g., wav2vec 2.0, HuBERT, Whisper) for representation learning and transfer to downstream speech tasks.
 
The lecture combines theoretical foundations, algorithmic insights, and practical demonstrations. Students will gain an understanding of both classical methods and cutting-edge neural approaches, and their application in real-world scenarios.
Target audience: This lecture is designed for graduate students and researchers interested in speech and audio technology. By the end of the lecture, participants will have a strong foundation to understand, design, and critically evaluate advanced speech processing methods.
Complementary courses:
- Speech and Language Understanding by Prof. Dr.-Ing. Andreas Maier, which includes automatic speech recognition.
 
- Generative Models for Signal Processing by Dr.-Ing. Andreas Brendel and Dr. Nicola Pia, which includes neural speech coding.
 
Course Material
The lecture slides can be downloaded on StudOn.
Links
Further audio-related courses offered by the AudioLabs can be found at: