Skip to Main content Skip to Navigation
Journal articles

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

Abstract : Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.
Document type :
Journal articles
Complete list of metadata

https://hal-amu.archives-ouvertes.fr/hal-01624366
Contributor : Lionel Spinelli Connect in order to contact the contributor
Submitted on : Wednesday, November 10, 2021 - 10:02:13 AM
Last modification on : Thursday, March 17, 2022 - 10:08:43 AM
Long-term archiving on: : Friday, February 11, 2022 - 6:17:39 PM

Licence


Distributed under a Creative Commons Attribution - NonCommercial 4.0 International License

Identifiers

Collections

Citation

Jaime Abraham Castro-Mondragon, Sébastien Jaeger, Denis Thieffry, Morgane Thomas-Chollier, Jacques Van helden. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Research, Oxford University Press, 2017, 45 (13), pp.e119--e119. ⟨10.1093/nar/gkx314⟩. ⟨hal-01624366⟩

Share

Metrics

Record views

72

Files downloads

22