Informed Sound Activity Detection in Music and Audio Signals (ISAD2)

Logo_DFG Teaser_ISAD2 Logo_FAU Logo_IDMT

In the ISAD2 project, we develop model-based and data-driven techniques for learning and detecting characteristic sound events in acoustic data including music recordings and environmental sounds. The project is funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.

Project Description

Informed Sound Activity Detection in Music and Audio Signals

In music information retrieval (MIR), the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. In the project's first phase (2017-2020), we explored fundamental techniques for detecting characteristic sound events present in a given music recording. Here, our focus was on informed approaches that exploit musical knowledge in the form of score information, instrument samples, or musically salient sections. We considered concrete tasks such as locating audio sections with a specific timbre or instrument, identifying monophonic themes in complex polyphonic music recordings, and classifying music genres or playing styles based on melodic contours. We tested our approaches within complex music scenarios, including instrumental Western classical music, jazz, and opera recordings. In this second phase of the project, our goals are significantly extended. First, we go beyond the music scenario by considering environmental sounds as a second challenging audio domain. As a central methodology, we explore and combine the benefits of model-based and data-driven techniques to learn task-specific sound event representations. Furthermore, we investigate hierarchical approaches to simultaneously incorporate, exploit, learn, and capture sound events that manifest on different temporal scales and belong to hierarchically ordered categories. An overarching goal of the project's second phase is to develop explainable deep learning models that provide a better understanding of the structural and acoustic properties of sound events.

Projektbeschreibung

Informierte Klangquellenerkennung in Musik- und Audiosignalen

Im Bereich des Music Information Retrieval (MIR) ist die Entwicklung von computergestützten Methoden zur Analyse, Segmentierung und Klassifizierung von Musiksignalen von grundlegender Bedeutung. In der ersten Projektephase (2017-2020) untersuchten wir grundlegende Techniken zur Erkennung charakteristischer Klangereignisse, die in einer gegebenen Musikaufnahme vorhanden sind. Dabei lag unser Fokus auf Ansätzen, die musikalisches Wissen in Form von Notentextinformationen, Klangbeispielen oder musikalisch repräsentativen Musikpassagen nutzen. Zentrale Aufgabenstellungen bestanden im Auffinden von Audioabschnitten mit einer bestimmten Klangfarbe oder Instrumentierung, die Erkennung monophoner Themen in polyphonen Musikaufnahmen und die Klassifizierung von Musikstilen oder Spielweisen anhand melodischer Konturmerkmale. Die entwickelten Erkennungsverfahren wurden im Rahmen komplexer Musikszenarien (u.a. klassische Musik, Jazzmusik und Opernaufnahmen) experimentell getestet und ausgewertet. In der zweiten Projektphase erweitern wir unsere Ziele erheblich. Erstens betrachten wir neben dem Musikszenario die Erkennung von Umwelt- und Umgebungsgeräusche als zweite komplexe Audiodomäne. Zweitens kombinieren wir, als unsere zentrale Methodik, Aspekte von modellbasierten und datengetriebenen Verfahren, um aufgabenspezifische Darstellungsformen von Klangereignissen zu lernen. Darüber hinaus verfolgen wir integrative und hierarchische Strategien, um Schallereignisse auf verschiedenen Zeitskalen und hinsichtlich hierarchisch angeordneter Kategorien zu erfassen und zu analysieren. Unser übergeordnetes Ziel der zweiten Projektphase ist es, erklärbare und nachvollziehbare Deep-Learning-Modelle zu entwickeln, die ein besseres Verständnis der strukturellen und akustischen Eigenschaften von Klangquellen ermöglichen.

Projected-Related Activities

  • Organization (Stefan Balke, Jakob Abesser, Meinard Müller): Special Session Sound Analysis for Music and Audio Signals, Jahrestagung für Akustik (DAGA), Hannover, Germany, March 20/21, 2024

  • Lecture and Seminar (4 SWS) by Jakob Abesser: Computational Analysis of Sound and Music. TU Ilmenau, Summer Semester 2024
    Lecture Slides & Jupyter Notebooks

  • Organization (Jakob Abesser, Sebastian Stober, Meinard Müller): Special Session Sound Analysis for Music and Audio Signals, Jahrestagung für Akustik (DAGA), Hamburg, Germany, March 8, 2023
    PDF (Inhaltsverzeichnis)

  • Research Seminar (2 SWS) by Jakob Abesser and Martin Pfleiderer: KI-gestützte Audioanalyse von Musik und Soundscapes. HfM Weimar, Winter Semester 2022/2023

  • Talk (Jakob Abeßer): Erkennung akustischer Quellen in komplexen Szenarien. Jenaer Akustiktag, Ernst Abbe Hochschule, Jena, April 27, 2022

  • Talk (Jakob Abeßer): Technische Aspekte in der KI-Musikanalyse. Jahrestagung des DMV (Deutscher Musikverleger-Verband e.V.), Erfurt, Oktober 16, 2023

Projected-Related Resources and Demonstrators

The following list provides an overview of the most important publicly accessible sources created in the ISAD2 project:

Projected-Related Publications

The following publications reflect the main scientific contributions of the work carried out in the ISAD2 project.

  1. Jakob Abeßer, Zhiwei Liang, and Bernhard Seeber
    Sound recurrence analysis for acoustic scene classification
    EURASIP Journal on Audio, Speech, and Music Processing, 2025(1), 2025. DOI
    @article{AbesserLS25_SoundRecurrence_JASMP,
    author =       {Jakob Abe{\ss}er and Zhiwei Liang and Bernhard Seeber},
    title =        {Sound recurrence analysis for acoustic scene classification},
    journal = {EURASIP Journal on Audio, Speech, and Music Processing},
    volume      = {2025},
    number =       {1},
    year =         {2025},
    doi = {10.1186/s13636-024-00390-2}
    }
  2. Jakob Abeßer, Simon Schwär, and Meinard Müller
    Pitch Contour Exploration across Audio Domains: A Vision-Based Transfer Learning Approach
    arXiv preprint arXiv:2503.19161, 2025. DOI
    @article{AbesserSM25_PitchContour_arXiv,
    author       = {Jakob Abe{\ss}er and Simon Schw{\"a}r and Meinard M{\"u}ller},
    title        = {Pitch Contour Exploration across Audio Domains: {A} Vision-Based Transfer Learning Approach},
    journal      = {arXiv preprint arXiv:2503.19161},
    year         = {2025},
    doi = {10.48550/arXiv.2503.19161}
    }
  3. Hans-Ulrich Berendes, Ben Maman, and Meinard Müller
    Tuning Matters: Analyzing Musical Tuning Bias in Neural Vocoders
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 166–173, 2025. PDF DOI
    @inproceedings{BerendesMM25_TuningMatters_ISMIR,
    author    = {Hans-Ulrich Berendes and Ben Maman and Meinard M{\"u}ller},
    title     = {Tuning Matters: {A}nalyzing Musical Tuning Bias in Neural Vocoders},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    pages     = {166--173},
    address   = {Daejeon, Korea},
    year      = {2025},
    doi       = {doi.org/10.5281/zenodo.17706359},
    url-pdf   = {2025_BerendesMM_TuningMatters_ISMIR_ePrint.pdf}
    }
  4. Amir Latifi Bidarouni and Jakob Abeßer
    Towards Domain Shift in Location-Mismatch Scenarios for Bird Activity Detection
    In Proceedings of the Euopean Signal Processing Conference (EUSIPCO): 1267–1271, 2024. DOI
    @InProceedings{BidarouniA24_DomainShift_EUSIPCO,
    author =    {Amir Latifi Bidarouni and Jakob Abe{\ss}er},
    title =     {Towards Domain Shift in Location-Mismatch Scenarios for Bird Activity Detection},
    booktitle = {Proceedings of the Euopean Signal Processing Conference (EUSIPCO)},
    year =      {2024},
    address =   {Lyon, France},
    pages = {1267--1271},
    doi = {10.23919/EUSIPCO63174.2024.10715313}
    }
  5. Sebastian Strahl and Meinard Müller
    Semi-Supervised Piano Transcription Using Pseudo-Labeling Techniques
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 173–181, 2024. PDF DOI
    @inproceedings{StrahlM24_PianoTranscriptionSemiSup_ISMIR,
    author    = {Sebastian Strahl and Meinard M{\"u}ller},
    title     = {Semi-Supervised Piano Transcription Using Pseudo-Labeling Techniques},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {San Francisco, CA, United States},
    year      = {2024},
    pages     = {173--181},
    doi       = {10.5281/zenodo.14877303},
    url-pdf   = {2024_StrahlM_PianoTranscriptionSemiSup_ISMIR_ePrint.pdf}
    }
  6. Michael Krause and Meinard Müller
    Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 2567–2578, 2023. PDF Details DOI
    @article{Krause23_HierarchicalClassificationInstrument_IEEE-TASLP,
    author = {Michael Krause and Meinard M{\"u}ller},
    title = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings},
    journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
    year={2023},
    volume={31},
    pages={2567--2578},
    doi = {10.1109/TASLP.2023.3291506},
    url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass/},
    url-pdf = {https://ieeexplore.ieee.org/abstract/document/10171391}
    }
  7. Jakob Abeßer, Sascha Grollmisch, and Meinard Müller
    How Robust are Audio Embeddings for Polyphonic Event Tagging?
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 2658–2667, 2023. PDF Demo DOI
    @article{AbesserGM23_PolyphonicSound_TASLP,
    author      = {Jakob Abe{\ss}er and Sascha Grollmisch and Meinard M{\"u}ller},
    title       = {How Robust are Audio Embeddings for Polyphonic Event Tagging?},
    journal     = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
    volume      = {31},
    pages       = {2658--2667},
    year        = {2023},
    doi         = {10.1109/TASLP.2023.3293032},
    url-pdf     = {https://ieeexplore.ieee.org/document/10178070},
    url-demo    = {https://zenodo.org/record/7912746}
    }
  8. Jakob Abeßer, Asad Ullah, Sebastian Ziegler, and Sascha Grollmisch
    Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes
    Journal of the Audio Engineering Society (AES), 71(12): 860–872, 2023.
    @article{AbesserUZG23_SoundPolyphony_JAES,
    author =       {Jakob Abe{\ss}er and Asad Ullah and Sebastian Ziegler and Sascha Grollmisch},
    title =        {Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes},
    journal = {Journal of the Audio Engineering Society (AES)},
    year =         {2023},
    volume =       {71},
    number =       {12},
    pages =        {860--872},
    url =          {https://www.aes.org/e-lib/browse.cfm?elib=22348}
    }
  9. Amir Latifi Bidarouni and Jakob Abeßer
    Unsupervised Feature-Space Domain Adaptation applied for Audio Classification
    In Proceedings of the IEEE International Symposium on the Internet of Sounds (IS2): 1–7, 2023. DOI
    @InProceedings{BidarouniA23_DomainAdaptation_I2S,
    author =    {Amir Latifi Bidarouni and Jakob Abe{\ss}er},
    title =     {Unsupervised Feature-Space Domain Adaptation applied for Audio Classification},
    booktitle = {Proceedings of the {IEEE} International Symposium on the Internet of Sounds ({IS2})},
    address   = {Pisa, Italy},
    year =      {2023},
    pages     = {1--7},
    doi       = {10.1109/IEEECONF59510.2023.10335455},
    }
  10. Sascha Grollmisch, Estefanía Cano, Hanna Lukashevich, and Jakob Abeßer
    Uncertainty in Semi-supervised Audio Classification — A Novel Extension for FixMatch
    In Proceedings of the European Signal Processing Conference (EUSIPCO): 161–165, 2023. DOI
    @InProceedings{GrollmischCLA23_UncertaintySemiSupervised_EUSIPCO,
    author =    {Sascha Grollmisch and Estefan{\'i}a Cano and Hanna Lukashevich and Jakob Abe{\ss}er},
    title =     {Uncertainty in Semi-supervised Audio Classification -- A Novel Extension for {FixMatch}},
    booktitle = {Proceedings of the European Signal Processing Conference (EUSIPCO)},
    year =      {2023},
    pages     = {161--165},
    address =   {Helsinki, Finland},
    doi = {10.23919/EUSIPCO58844.2023.10289789.}
    }
  11. Michael Krause, Christof Weiß, and Meinard Müller
    A Cross-Version Approach to Audio Representation Learning for Orchestral Music
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 832–839, 2023. PDF Details DOI
    @inproceedings{KrauseWM23_CrossVersionRep_ISMIR,
    author    = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller},
    title     = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Milano, Italy},
    year      = {2023},
    pages     = {832--839},
    doi       = {10.5281/ZENODO.10265419},
    url-details  =  {https://doi.org/10.5281/zenodo.10265419},
    url-pdf   = {2023_KrauseWM_CrossVersionRep_ISMIR_ePrint.pdf}
    }
  12. Michael Krause, Sebastian Strahl, and Meinard Müller
    Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2023.
    @inproceedings{KrauseSM23_WeakPitchCrossVersion_ISMIR,
    author    = {Michael Krause and Sebastian Strahl and Meinard M{\"u}ller},
    title     = {Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Milano, Italy},
    year      = {2023},
    pages     = {},
    }
  13. Hanna Lukashevich, Sascha Grollmisch, and Jakob Abeßer
    Temperature scaling for reliable uncertainty estimation: Application to automatic music genre classification
    In Proceedings of the Uncertainty Meets Explainability Workshop, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2023.
    @InProceedings{LukashevichGA23_TemperatureScaling_ECMLPKDD,
    author =    {Hanna Lukashevich and Sascha Grollmisch and Jakob Abe{\ss}er},
    title =     {Temperature scaling for reliable uncertainty estimation: {A}pplication to automatic music genre classification},
    booktitle = {Proceedings of the Uncertainty Meets Explainability Workshop, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
    year =      {2023},
    location =  {Torino, Italy}
    }
  14. Hanna Lukashevich, Sascha Grollmisch, Jakob Abeßer, Sebastian Stober, and Joachim Bös
    How reliable are posterior class probabilties in automatic music classification?
    In Proceedings of the Audio Mostly Conference: 45–50, 2023. DOI
    @InProceedings{LukashevichGASB23_PosteriorClass_AM,
    author =    {Hanna Lukashevich and Sascha Grollmisch and Jakob Abe{\ss}er and Sebastian Stober and Joachim B{\"o}s},
    title =     {How reliable are posterior class probabilties in automatic music classification?},
    booktitle = {Proceedings of the Audio Mostly Conference},
    year =      {2023},
    address =  {Edinburgh, Scotland},
    pages     = {45--50},
    doi = {10.1145/3616195.36162}
    }
  15. Jakob Abeßer
    Classifying Sounds in Polyphonic Urban Sound Scenes
    In Proceedings of the AES Convention, 2022.
    @InProceedings{Abesser22_UrbanSounds_AES,
    author =    {Jakob Abe{\ss}er},
    title =     {Classifying Sounds in Polyphonic Urban Sound Scenes},
    booktitle = {Proceedings of the AES Convention},
    address   = {The Hague, Netherlands},
    year =      {2022}
    }
  16. Stefan Balke, Julian Reck, Christof Weiß, Jakob Abeßer, and Meinard Müller
    JSD: A Dataset for Structure Analysis in Jazz Music
    Transaction of the International Society for Music Information Retrieval (TISMIR), 5(1): 156–172, 2022. PDF Demo DOI
    @article{BalkeRWAM22_JSD_TISMIR,
    author = {Stefan Balke and Julian Reck and Christof Wei{\ss} and Jakob Abe{\ss}er and Meinard M{\"u}ller},
    title = {{JSD}: {A} Dataset for Structure Analysis in Jazz Music},
    journal = {Transaction of the International Society for Music Information Retrieval ({TISMIR})},
    volume = {5},
    number = {1},
    pages = {156--172},
    year = {2022},
    publisher = {Ubiquity Press},
    doi = {doi.org/10.5334/tismir.131},
    url       = {https://doi.org/10.5334/tismir.131},
    url-pdf   = {2022_BalkeRWAM_JSD_TISMIR_ePrint.pdf},
    url-demo = {https://github.com/stefan-balke/jsd}
    }
  17. Sascha Grollmisch, Estefanía Cano, and Jakob Abeßer
    Audio Augmentations for Semi-Supervised Learning with Fixmatch
    In Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR), 2022.
    @inproceedings{GrollmischVA22_Fixmatch_ISMIR,
    author =    {Sascha Grollmisch and Estefan{\'i}a Cano and Jakob Abe{\ss}er},
    title =     {Audio Augmentations for Semi-Supervised Learning with Fixmatch},
    booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address     = {Bengaluru, India},
    year =      {2022}
    }
  18. Michael Krause and Meinard Müller
    Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 406–410, 2022. DOI
    @inproceedings{KrauseM22_HierarchyClass_ICASSP,
    author    = {Michael Krause and Meinard M{\"u}ller},
    title     = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    pages     = {406--410},
    address   = {Singapore},
    year      = {2022},
    doi       = {10.1109/ICASSP43922.2022.9747690}
    }
  19. Christon Ragavan Nadar, Michael Taenzer, and Jakob Abeßer
    Towards Interpreting and Improving the Latent Space for Musical Chord Recognition
    In Proceeding of the International Computer Music Conference (ICMC), 2022.
    @InProceedings{NadarTA_2022_ChordLatentSpace_ICMC,
    author =    {Christon Ragavan Nadar and Michael Taenzer and Jakob Abe{\ss}er},
    title =     {Towards Interpreting and Improving the Latent Space for Musical Chord Recognition},
    booktitle = {Proceeding of the International Computer Music Conference (ICMC)},
    address =  {Limerick, Ireland},
    year =      {2022}
    }
  20. Jakob Abeßer and Meinard Müller
    Jazz Bass Transcription Using a U-Net Architecture
    Electronics, 10(6): 1–11, 2021. PDF DOI
    @article{AbesserM21_JazzBassTranscription_Electronics,
    author    = {Jakob Abe{\ss}er and Meinard M{\"u}ller},
    title     = {Jazz Bass Transcription Using a {U}-Net Architecture},
    journal   = {Electronics},
    volume    = {10},
    number     = {6},
    pages     = {670:1--11},
    year      = {2021},
    doi       = {10.3390/electronics10060670},
    url-pdf   = {2021_AbesserM_JazzBassTranscription_Electronics.pdf}
    }
  21. Michael Krause, Meinard Müller, and Christof Weiß
    Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization
    Electronics, 10(10): 1–14, 2021. PDF DOI
    @article{KrauseMW21_OperaSingingActivity_Electronics,
    author    = {Michael Krause and Meinard M{\"u}ller and Christof Wei{\ss}},
    title     = {Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization},
    journal   = {Electronics},
    volume    = {10},
    number    = {10},
    pages     = {1214:1--14},
    year      = {2021},
    doi       = {10.3390/electronics10101214},
    url-pdf   = {2021_KrauseMW_OperaSingingActivity_Electronics.pdf}
    }

Projected-Related Student Projects and Thesis

  • Franca Bittner: Melody Extraction of Music Recordings Using Transformer Models. Master Thesis, TU Ilmenau, 2026
  • Bora Aydin: Zero- and Few-Shot Learning for Marine Mammal Detection and Classification. Master Thesis, University of Toulon, Norwegian University of Science and Technology - NTNU, 2025
  • Anjana Rajasekhar: Assessing the effectiveness of obfuscation techniques for speech privacy preservation and their impact on utility. Master Thesis, FAU, 2025
  • Daniel Wägner: Automatic Music Transcription and Tuning Frequency Estimation for Guitar. Master Thesis, Saarland University, 2025
  • Zhiwei Liang: Self-Similarity Based Representations of Soundscape Recordings for Acoustic Scene Classification. Master Thesis, Technische Universität München, 2024
  • Rasmus Merten: Real-Time Piano Multipitch Estimation using Convolutional Neural Networks. Master Thesis, TU Ilmenau, 2024
  • Robert Viehweg: Deep Learning based Drum Transcription. Bachelor Thesis, TU Ilmenau, 2024
  • Asad Ullah: Improving a System for Bioacoustics Sound Event Detection based on Few-Shot Learning. Master Thesis, TU Ilmenau, 2023

Projected-Related Ph.D. and Habilitation Theses

  1. Jakob Abeßer
    Computational Analysis of Sounds and Music
    Habilitation Thesis, Technische Universität Ilmenau, Submitted for Review, 2026.
    @misc{Abesser26_CompAnalSoundMusic_Habil,
    author    = {Jakob Abe{\ss}er},
    title     = {Computational Analysis of Sounds and Music},
    howpublished = {Habilitation Thesis},
    note      = {Technische Universit{\"a}t Ilmenau},
    year      = {Submitted for Review, 2026}
    }
  2. Sascha Grollmisch
    Semi-Supervised and Transfer Learning for Few-Shot Audio Classification
    PhD Thesis, Technische Universität Ilmenau, 2025. PDF
    @phdthesis{Grollmisch25_SemisupervisedTransferLearning_PhD,
    author      = {Sascha Grollmisch},
    year        = {2025},
    title       = {Semi-Supervised and Transfer Learning for Few-Shot Audio Classification},
    school      = {Technische Universit{\"a}t Ilmenau},
    url-pdf     = {https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00070024/ilm1-202500041.pdf}
    }
  3. Michael Krause
    Activity Detection for Sound Events in Orchestral Music Recordings
    PhD Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, 2023. PDF
    @phdthesis{Krause23_ActivityDetectionMusic_PhD,
    author      = {Michael Krause},
    year        = {2023},
    title       = {Activity Detection for Sound Events in Orchestral Music Recordings},
    school      = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg},
    url-pdf     = {https://open.fau.de/items/e75949af-4aad-4a40-a374-b850dcb5676a}
    }