A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2

Thomas Serre; Mathieu Fontaine; Éric Benhaim; Geoffroy Dutour; Slim Essid

Communication Dans Un Congrès Année : 2024

A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2

(1, 2, 3) , (1, 2) , (3) , (3) , (1, 2)

1
2
3

Thomas Serre

Fonction : Auteur
PersonId : 1374014

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Orosound, Signal Processing and Machine Learning Lab

Mathieu Fontaine

Fonction : Auteur
PersonId : 13405
IdHAL : mathieu-fontaine
ORCID : 0000-0002-7657-6271
IdRef : 236886681

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Éric Benhaim

Fonction : Auteur

Orosound, Signal Processing and Machine Learning Lab

Geoffroy Dutour

Fonction : Auteur

Orosound, Signal Processing and Machine Learning Lab

Slim Essid

Fonction : Auteur
PersonId : 181234
IdHAL : slimessid
ORCID : 0000-0002-0028-327X
IdRef : 11025130X

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Résumé

Isolating the desired speaker’s voice amidst multiple speakers in a noisy acoustic context is a challenging task. Per- sonalized speech enhancement (PSE) endeavours to achieve this by leveraging prior knowledge of the speaker’s voice. Recent research efforts have yielded promising PSE mod- els, albeit often accompanied by computationally intensive architectures, unsuitable for resource-constrained embedded devices. In this paper, we introduce a novel method to per- sonalize a lightweight dual-stage Speech Enhancement (SE) model and implement it within DeepFilterNet2, a SE model renowned for its state-of-the-art performance. We seek an optimal integration of speaker information within the model, exploring different positions for the integration of the speaker embeddings within the dual-stage enhancement architec- ture. We also investigate a tailored training strategy when adapting DeepFilterNet2 to a PSE task. We show that our personalization method greatly improves the performances of DeepFilterNet2 while preserving minimal computational overhead.

Mots clés

Target speech extraction speech enhancement real-time

Domaines

Intelligence artificielle [cs.AI] Traitement du signal et de l'image [eess.SP]

Fichier principal

main.pdf (263.26 Ko)

Origine : Fichiers produits par l'(les) auteur(s)
licence : Copyright (Tous droits réservés)

Thomas SERRE : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-04541350

Soumis le : mercredi 10 avril 2024-17:12:34

Dernière modification le : samedi 13 avril 2024-03:19:37

Dates et versions

hal-04541350 , version 1 (10-04-2024)

Licence

Identifiants

HAL Id : hal-04541350 , version 1

Citer

Thomas Serre, Mathieu Fontaine, Éric Benhaim, Geoffroy Dutour, Slim Essid. A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2. ICASSP, Apr 2024, Seoul (Korea), South Korea. ⟨hal-04541350⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM PARISTECH LTCI IDS S2A IP_PARIS

0 Consultations

0 Téléchargements

A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager