https://hal-amu.archives-ouvertes.fr/hal-01307492Gaudart, JeanJeanGaudartSESSTIM - U912 INSERM - Aix Marseille Univ - IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche MédicaleHuiart, LaetitiaLaetitiaHuiartSESSTIM - U912 INSERM - Aix Marseille Univ - IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche MédicaleMilligan, P. J.P. J.MilliganLSHTM - London School of Hygiene and Tropical Medicine Thiebaut, RodolpheRodolpheThiebautEpidémiologie et Biostatistique [Bordeaux] - Université Bordeaux Segalen - Bordeaux 2 - Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED) - INSERM - Institut National de la Santé et de la Recherche MédicaleGiorgi, RochRochGiorgiSESSTIM - U912 INSERM - Aix Marseille Univ - IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche MédicaleReproducibility issues in science, is P value really the only answer?HAL CCSD2014[MATH.MATH-DS] Mathematics [math]/Dynamical Systems [math.DS][MATH.MATH-PR] Mathematics [math]/Probability [math.PR][SDV.MP.PAR] Life Sciences [q-bio]/Microbiology and Parasitology/Parasitology[SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie[STAT.AP] Statistics [stat]/Applications [stat.AP][STAT.ME] Statistics [stat]/Methodology [stat.ME][MATH.MATH-ST] Mathematics [math]/Statistics [math.ST][SDE.ES] Environmental Sciences/Environmental and Society[SDV.EE.SANT] Life Sciences [q-bio]/Ecology, environment/Health[SDV.MHEP.MI] Life Sciences [q-bio]/Human health and pathology/Infectious diseases[SDV.MHEP.ME] Life Sciences [q-bio]/Human health and pathology/Emerging diseasesGaudart, Jean2016-04-26 17:52:412022-06-09 12:10:022016-04-27 09:05:47enJournal articleshttps://hal-amu.archives-ouvertes.fr/hal-01307492/document10.1073/pnas.1323051111application/pdf1Reproducibility issues in science, is P value really the only answer? Johnson describes the lack of reproducibility of scientific studies, attributed, according to the author, to the low level of significance (1). We appreciate the quality of this work and its importance for the interpretation of statistical evidence. These results should be considered in statistical guidelines. Nevertheless, we would like to point out some important points not thoroughly discussed in this publication. Not publishing " nonsignificant " results leads to the well-known publication bias whereby studies with low statistical power are underrepresented. This bias would become more severe, despite recommendations to allow for publication of " negative " results. Lowering the significance level will further increase the type II error, which is clinically as important as type I error. Fo-cusing only on the type I error may lead to an excessive false nondiscovery rate. In the case of severe diseases, it is not uncommon to fix a significance level at 0.1 (2), at the early stages, to avoid excluding an effective treatment. Johnson argues that this may be corrected by increasing the sample size. However, increasing the size of clinical trials will reduce their feasibility and increase their duration. Aside from these issues, including more patients means exposing more patients to an experimental treatment and may challenge the equipoise concept. The issue of fixing a threshold defining significance refers to the Fisher–Pearson controversy. Estimating a P value is needed to quantify the strength of evidence. However, fixing a threshold is needed to make a decision controlling for the risk of type I and type II error. Actually, regarding the issue addressed by Johnson, it would be interesting to assess if a priori specification of the threshold is required, or if research results could be compared using the P value and the magnitude of the tested statistic. The issue of significance level is only the tip of the iceberg. Indeed, design issues should not be overlooked when discussing lack of reproducibility. Selection bias leads to extrapolation of results to a population different from the target population (3). Furthermore, the " poor reporting " practice highlighted by Altman et al. (4) and the lack of compliance to reporting recommendations (e.g., Consolidated Standards of Reporting Trials) hinder a proper assessment of the quality of the study and hide selection bias or misuse of statistical tests; the latter leads to nonreproducibility of the reported research. In an extreme example, monthly American Air passengers and the Australian electricity production in the late 1950s are highly correlated (Pearson's correlation = 0.88, P = 8.8 × 10 −13) without any meaning. The causality criteria defined by Hill (5) highlight other important considerations in the interpretation of results. Reliance on P values remains surprisingly widespread, but good decision making depends on the magnitude of effects, the plausibility of scientific