A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2 - Equipe Signal, Statistique et Apprentissage Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2

Résumé

Isolating the desired speaker’s voice amidst multiple speakers in a noisy acoustic context is a challenging task. Per- sonalized speech enhancement (PSE) endeavours to achieve this by leveraging prior knowledge of the speaker’s voice. Recent research efforts have yielded promising PSE mod- els, albeit often accompanied by computationally intensive architectures, unsuitable for resource-constrained embedded devices. In this paper, we introduce a novel method to per- sonalize a lightweight dual-stage Speech Enhancement (SE) model and implement it within DeepFilterNet2, a SE model renowned for its state-of-the-art performance. We seek an optimal integration of speaker information within the model, exploring different positions for the integration of the speaker embeddings within the dual-stage enhancement architec- ture. We also investigate a tailored training strategy when adapting DeepFilterNet2 to a PSE task. We show that our personalization method greatly improves the performances of DeepFilterNet2 while preserving minimal computational overhead.
Fichier principal
Vignette du fichier
main.pdf (263.26 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
licence : Copyright (Tous droits réservés)

Dates et versions

hal-04541350 , version 1 (10-04-2024)

Licence

Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-04541350 , version 1

Citer

Thomas Serre, Mathieu Fontaine, Éric Benhaim, Geoffroy Dutour, Slim Essid. A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2. ICASSP, Apr 2024, Seoul (Korea), South Korea. ⟨hal-04541350⟩
0 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More