A. , R. Aissaioui, A. Martinet, J. Et, D. et al., Ré-identification de personnes dans les journaux télévisés basée sur les histogrammes spatio-temporels, Extraction et gestion des connaissances (EGC'2012), pp.547-548, 2012.

B. , C. Zhu, X. Meignier, S. Et, G. et al., Multistage speaker diarization of broadcast news, Trans. on Audio, Speech and Language Processing, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01434241

B. , F. Et, C. , and E. , Unsupervised knowledge acquisition for extracting named entities from speech, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.

D. , M. Et, G. , and C. , Text detection with convolutional neural networks, International Conference on Computer Vision Theory and Applications, 2008.

F. , B. Hakkani-tür, D. Et, C. , and S. , , 2007.

F. , C. Evans, and N. , New implementations of the E-HMM-based system for speaker diarisation in meeting rooms, Proc. ICASSP'08, 2008.

L. , G. Nocéra, P. Massonié, D. Et, M. et al., The lia speech recognition system : from 10xrt to 1xrt, Lecture Notes in Computer Science, 4629 LNAI, pp.302-308, 2007.

L. , A. Fei, J. Tang, S. Fan, J. Zhang et al., Confusion network based video ocr post-processing approach, IEEE International Conference on Multimedia and Expo, 2009.

P. , R. Saleem, S. Macrostie, E. Natarajan, P. Et et al., Multi-frame combination for robust videotext recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, 2008.

Z. , W. Chellappa, R. Rosenfeld, A. Phillips, and P. J. , Face recognition : A literature survey, ACM Computing Surveys, pp.399-458, 2003.