Validation of predicted anonymous proteins simply using Fisher’s exact test - Aix-Marseille Université Accéder directement au contenu
Article Dans Une Revue Bioinformatics Advances Année : 2021

Validation of predicted anonymous proteins simply using Fisher’s exact test

Résumé

Motivation Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding “Open Reading Frames” (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test. Results We introduce the use of the straightforward Fisher’s exact test as a post processing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the non-synonymous vs. synonymous position (as determined by CODEML), are turned into entries into a 2x2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins.
Motivation Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding “Open Reading Frames” (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test. Results We introduce the use of the straightforward Fisher’s exact test as a post processing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the non-synonymous vs. synonymous position (as determined by CODEML), are turned into entries into a 2x2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins.
Fichier principal
Vignette du fichier
vbab034.pdf (626.69 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03463386 , version 1 (02-12-2021)

Licence

Paternité

Identifiants

Citer

Jean-Michel Claverie, Sébastien Santini. Validation of predicted anonymous proteins simply using Fisher’s exact test. Bioinformatics Advances, 2021, ⟨10.1093/bioadv/vbab034⟩. ⟨hal-03463386⟩

Collections

CNRS UNIV-AMU IGS
39 Consultations
32 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More