O-GlcNAcylation site mapping by (azide-alkyne) click chemistry and mass spectrometry following intensive fractionation of skeletal muscle cells proteins

The O-linked-N-acetyl-D-glucosaminylation (O-GlcNAcylation) modulates numerous aspects of 2 cellular processes. Akin to phosphorylation, O-GlcNAcylation is highly dynamic and reversible, and 3 responds rapidly to extracellular demand. Despite the absolute necessity to determine post- 4 translational sites to fully understand the role of O-GlcNAcylation, it remains a high challenge for the 5 major reason that unmodified proteins are in excess comparing to the O-GlcNAcylated ones. Based on 6 a click chemistry approach, O-GlcNAcylated proteins were labelled with azido-GalNAc and coupled 7 to agarose beads. The proteome extracted from C2C12 myotubes was submitted to an intensive 8 fractionation prior to azide-alkyne click chemistry. This combination of fractionation and click 9 chemistry is a powerful methodology to map O-GlcNAc sites; indeed, 342 proteins were identified 10 through the sequencing of 620 peptides containing one or more O-GlcNAc sites. We localized O- 11 GlcNAc sites on proteins involved in signalling pathways or in protein modification, as well as 12 structural proteins. Considering the recent role of O-GlcNAcylation in the modulation of sarcomere 13 morphometry and interaction between key structural protein, we focused on proteins involved in the 14 cytoarchitecture of skeletal muscle cells. In particular, several O-GlcNAc sites were located into 15 protein-protein interaction domain, suggesting that O-GlcNAcylation could be strongly involved in the 16 organisation and reorganisation of sarcomere and myofibrils. 17

GlcNAc site localization; thus, it remains indispensable to precisely map the O-GlcNAcylated sites to 23 fully understand its role on a given protein. For this purpose, we combined extensive fractionation of 24 skeletal muscle cells proteome with click chemistry to map O-GlcNAc sites without an a priori 25 consideration. A total of 620 peptides containing one or more O-GlcNAc sites were sequenced; 26 interestingly, several of them belong to low expressed proteins, in particular proteins involved in 27 signalling pathways. We also focused on structural proteins in view of recent data supporting the role 28 of O-GlcNAcylation in the modulation of sarcomere cytoarchitecture; importantly, some of the O-29 GlcNAc sites were mapped into protein-protein interaction domain, reinforcing the involvement of O-30 GlcNAcylation in the organisation and reorganisation of sarcomere, and in larger extent, of myofibrils. 31 The O-N-acetyl-β-D-glucosaminylation, termed O-GlcNAcylation, is an atypical glycosylation 2 corresponding to the transfer of a unique monosaccharide, the N-acetyl-β-D-glucosamine, on the 3 hydroxyl group of serine and threonine amino acids of nuclear, cytosolic and mitochondrial proteins 4 (1, 2). The O-GlcNAcylation has emerged as a key regulator of several cellular processes such as 5 transcription, translation, regulation of signalling pathways, degradative processes, subcellular 6 localization of targets, and so on (1,(3)(4)(5)(6)(7). Because of its involvement in nearly all if not all cellular 7 processes, O-GlcNAcylation is nowadays clearly associated with the aetiology of several acquired 8 diseases, in particular diabetes, neuro-degenerative disorders, cardiovascular diseases or cancer (8). 9 The O-GlcNAcylated proteins bear similarities with the phosphorylated ones, in particular the 10 reversibility of both processes since the phosphate and the GlcNAc moieties could be added and 11 removed several times along the protein lifetime, and their turn-over is shorter than the protein 12 backbone turn-over (9). The O-GlcNAcylation rapidly emerged as a major cellular mechanism which 13 could compete with phosphorylation in terms of modified proteins and their importance in cellular 14 physiology. But in contrast of the plethora of kinases and phosphatases responsible of the 15 phosphorylation/ dephosphorylation process on specific proteins, a unique couple of antagonist 16 enzymes (OGT/OGA) is involved in the O-GlcNAcylation process. While kinases recognize a 17 consensus sequence, phosphorylation sites are easily predictable from primary sequence of a protein . 18 However, no consensus sequence was clearly defined for OGT, but it appears that peptidic sequences 19 modified by O-GlcNAcylation are enriched in small amino acids, with a proximal proline residue; 20 these sequences also present preferential secondary structures such as loop and disorganized regions 21 instead of α-helix and β-strand (10)(11)(12)(13)(14)(15). The O-GlcNAcylated sites could also correspond to 22 phosphorylated ones; thus, many proteins are modified by both O-GlcNAc and phosphates groups, and 23 these two post-translational modifications could compete to the same or to neighbouring sites (4, 16). fractionation of the muscle cell proteome according to solubility, hydrophobicity and isoelectric point 30 of proteins prior to the click chemistry. Thus, the method of click chemistry was achieved (i) on whole 31 proteome extracted from C2C12 differentiated myotubes, (ii) on a subproteome, the cytosol-enriched 32 extract, and (iii) on the cytosol-enriched extract extensively fractionated. The non-glycosylated 33 peptides, and the glycosylated peptides released by beta-elimination, were analysed on mass 34 spectrometry. Through the analysis of peptides retained on agarose beads, we identified 342 O-35 GlcNAcylated proteins in the fractionated subproteome, corresponding to a 2-fold increase of the 36 number of identified proteins from the whole extract, or a 3.5-fold increase of identified proteins from 37 the non-fractionated subproteome, which reinforce the strength of the fractionation. Among these O-1 GlcNAcylated proteins, we also sequenced 620 peptides containing one or several dehydrated serine 2 or threonine amino acids, corresponding so to O-GlcNAcylated sites. 3

EXPERIMENTAL SECTION
1

Cell culture 12
Mouse C2C12 skeletal myoblasts were obtained from ATCC (American Type Culture Collection). 13 Myoblasts were grown on 100 mm Petri Dishes in proliferation medium (DMEM supplemented with 14 10% FBS and 1% antibiotic-antimycotic) at 37 °C in a humidified atmosphere of 5% CO 2 still 15 reaching 90-95% confluence. They were then induced to differentiate into myotubes by switching to 16 differentiation medium (DMEM containing 2% HI-HS and 1% antibiotic-antimycotic). Medium was 17 changed every two-days, and myotubes were maintained for 5 days until they were mature. 18

Fractionation of the cytosolic extract 3
Ammonium sulphate precipitation. One hundred milligrams of the protein extract were fractionated 4 through 3-steps of ammonium sulphate (AS) precipitation. Briefly, extract was salted out with AS at 5 25 % saturation for 2 h at 4°C and centrifuged (10000 g, 4°C, 15 min). The pellet, which corresponds 6 to the fraction AS25, was stored, while the supernatant was salted out with 50 % saturation in AS. The 7 third step corresponded to a precipitation with AS at 75% saturation. The three fractions obtained 8 corresponded to AS25, AS50 and AS75 fractions, respectively. The pellets were solubilized by UTCD 9 buffer (4M urea; 2M thiourea; 2% CHAPS (w/v); 5mM DTT), desalted with Zeba Spin columns, and 10 assayed using reducing agent and detergent compatible protein assay (RC DC™ protein assay). Twenty micrograms of each fraction (of 10 µl in case of IEF fractionation) were boiled in Laemmli 27 buffer (62.5 mM Tris/HCl, pH 6.8; 10% glycerol; 2% SDS; 5% ß-mercaptoethanol; 0.02% 28 bromophenol blue) and separated electrophoretically on 7.5% or on any kD Mini-PROTEAN TGX 29 Stain-Free™ (SF) Precast Gels (25 min, 300 V). The SF imaging was performed with ChemiDoc MP 30 Imager and Image Lab 4.0.1 software (Bio-Rad); a 5-min activation time was used for the whole 31 protein pattern imaging. 32

O-GlcNAc proteins enrichment 34
Click chemistry (azide/alkyne click reaction and enrichment) was performed on whole extract as 35 well as on cytosol-enriched extract using the Click-iT TM O-GlcNAc Enzymatic Labelling System and 1 the Click-iT TM Protein Enrichment Kit according to the manufacturer's instructions and to the protocol 2 described by Hahne and coworkers (55). The click chemistry protocol was also applied on each 3 fraction issued from the MicroRotofor runs (themselves issued from ammonium sulphate precipitation 4 of cytosol-enriched proteins); a total of 30 fractions were labelled. It is worth to note that for each 5 sample, the same protein quantity (i.e. 2 mg) was used for the click chemistry-based enrichment. 6 After chloroform/methanol precipitation, performed at room temperature, the O-GlcNAc proteins 7 were labelled overnight at 4°C with the Click-iT TM O-GlcNAc Enzymatic Labelling System. Briefly, 8 Gal-T1 (Y289L) was incubated with proteins in labelling buffer (20 mM HEPES, pH 7.9; 50 mM 9 NaCl; 2% NP-40; 5.5 mM MnCl 2 ; 25 µM UDP-GalNAz), according to manufacturer's 10 recommendations. All reagents were provided in the kit, but the volume of each reagent was adjusted 11 according to protein quantities. Reaction was performed at 4°C under gentle agitation for 20h, and 12 then azide-labelled proteins were chloroform/methanol precipitated. 13 The azide-labelled proteins were then resuspended in urea lysis buffer, according to manufacturer's μg of trypsin/Lys-C mix. Following the on-resin digestion, the remaining solution was discarded, and 29 the resin washed with 500 µL of digestion buffer; both solutions, corresponding to non-retained 30 peptides (NR peptides, i.e. the non-linked peptides), were pooled together and stored before desalting. 31 The resin was then washed twice with 1.5 mL of MS grade water, following by 2 x 1.5 mL washes 32 with dephosphorylation buffer (50 mM Tris/HCl, pH 7.6; 100 mM NaCl; 1 mM DTT; 10 mM MgCl 2 ; 33

mM MnCl 2 ). 34
Peptides-linked to agarose beads were submitted to dephosphorylation at 37 °C for 6 h in 400 μL of 35 dephosphorylation buffer using 800 U λ phosphatase and 20 U calf intestine phosphatase. Following 36 dephosphorylation, the resin was washed twice with 1.8 mL of H 2 O, and the slurry volume was 37 adjusted to 300 μL with H 2 O before β-elimination with the GlycoProfile β-elimination kit. The β-1 elimination reaction was incubated on an end-over-end shaker with extensive mixing at 4 °C and 2 quenched after 24 h with 1% TFA until pH6-8. Agarose beads were discarded, and the resulting 3 solution contained the β-eliminated peptides which correspond to the initially O-GlcNAcylated 4 peptides. The non-retained and the β-eliminated peptides were desalted with C18 reversed-phase 5 columns and drained by vacuum concentrator before mass spectrometry analysis. 6 7

Protein identification 3
The acquired raw LC Orbitrap MS data were processed using Proteome Discoverer 1.4.1.14 4 (Thermo Fisher Scientific). This software was used to search data via in-house Mascot server (version

Fractionation of cytosol-enriched fraction from C2C12 myotubes 3
The workflow applied in our study was presented on Fig.1. Thus, whole proteins were extracted 4 from C2C12 differentiated myotubes (Whole Extract, WE), or submitted to successive fractionation 5 steps as described on Fig.1; the protein profiles of the resulting fractions were shown on Fig.2. Briefly, 6 a differential protocol of extraction was applied to recover a cytosol-enriched fraction (CYT), a 7 membrane-enriched fraction (MB) and a myofilament-enriched fraction (MYO); these fractions 8 contained 64.2%, 18.9% and 16.9% of WE proteins, respectively ( Fig.2A). It is noteworthy that the 9 profile of cytosol-enriched fraction was quite similar than those corresponding to whole extract 10 ( Fig.2A). This fraction was then submitted to two successive fractionation protocols, while the 11 membrane-and the myofilament-enriched fractions were discarded from the following analysis for the 12 main reason that these fractions were poorly resolubilized following the chloroform/methanol 13 precipitation. 14 The cytosol-enriched fraction was firstly partitioned through ammonium sulphate (AS) 15 precipitation. Four fractions were obtained, annotated as AS25, AS50, AS75 and AS100 according to were obtained for AS25, AS50, AS75 and AS100 fractions, respectively. Because of the poor protein 19 content on AS100 fraction, it was discarded from the resting analysis. A second fractionation was done 20 on the AS25, AS50 and AS75 fractions, based on the separation of proteins according to their 21 isoelectric point, using the MicroRotofor apparatus. Ten fractions were obtained in each case, 22 containing on average 1.7%, 3.6%, 4.7%, 5%, 7%, 10.6%, 9.9%, 9.8%, 16.5% and 31.2% from F1 (the 23 more acidic fraction) to F10 (the more basic fraction). The corresponding pH for each fraction was 24 indicated on supplemented Table I At least, a total of 30 fractions were obtained for cytosol-enriched fractions (fCYT). Each of them, 29 as well as non-fractionated cytosol-enriched fraction (CYT) and whole extract (WE), were submitted 30 to click chemistry protocol in order to map the O-GlcNAc sites. These fractions, submitted to 31 labelling-coupling protocol and analysed on mass spectrometry, were indicated in bold italic on Fig.1.  32 33

Efficiency of the O-GlcNAcylation mapping after extensive fractionation 34
The improvement of the fractionation on O-GlcNAcylation mapping was demonstrated in table I,  1 which indicated the number of the sequenced peptides and the resulting identified proteins. Data were 2 expressed for the peptides which were not linked to agarose beads (corresponding to non-O-3 GlcNAcylated peptides, but which belong to the O-GlcNAcylated proteins coupled to agarose beads), 4 as well as for peptides which were linked to agarose beads (corresponding to O-GlcNAcylated 5 peptides covalently linked to agarose beads and released from beads by beta-elimination). We also 6 indicated the percentage of peptides containing one or more dehydrated serine or threonine 7 (corresponding so to peptides containing one or more O-GlcNAc sites). Data were presented for the 8 non-fractionated C2C12 extract (WE, whole extract), the non-fractionated cytosol-enriched extract 9 (CYT), and the fractionated cytosol-enriched extract (CYT-AS-IEF combined fractions, corresponding 10 to fCYT fraction). Based on these data, we determined the efficiency factor as the ratio of identified 11 peptides/proteins after extensive fractionation (fCYT fraction) comparing with the non-fractionated 12

cytosol-enriched extract (CYT). 13
We firstly analysed the non-linked peptides to order to identify the O-GlcNAcylated proteins linked 14 to agarose beads, independently of their O-GlcNAc sites. In the non-fractionated whole extract, 554 15 proteins were identified, while 449 were identified in the cytosol-enriched fraction. Once the 16 fractionation protocol applied, the number of identifications was increased by a factor 3 comparing 17 with CYT fraction since 1362 proteins were identified in fCYT fraction. It is worth to note that 14573 18 peptides (containing 9.3% of dehydrated peptides) led to the identification of these 1362 O-19 GlcNAcylated proteins, while "only" 5540 peptides (containing 6.2% of dehydrated peptides) were 20 identified in the non-fractionated cytosol-enriched fraction. Thus, three times more peptides were 21 identified when extensive fractionation was applied on CYT fraction. Interestingly, we identified 620 22 contained one or more dehydrated serine or threonine (corresponding so to O-GlcNAc sites) derived 23 from the beta-elimination (BE) of peptides covalently linked to agarose beads; in parallel, 311 and 142 24 dehydrated peptides were identified in WE and CYT fractions. Thus, extensive fractionation of cytosol 25 enriched-fraction increased the number of identified peptides bearing one or more O-GlcNAc site(s) 26 by a factor 4.4 compared with non-fractionated cytosol-enriched fraction. It should be mentioned that 27 among the peptides identified in the beta-eliminated peptides fractions, some of them were not 28 dehydrated, suggesting that some peptides remained aspecifically retained on agarose beads despite 29 the pre-clearing step with agarose beads to avoid the non-specific retention of proteins on agarose 30 beads, and despite the extensive washing steps applied to abrogate protein-protein interaction and so to 31 eliminate the proteins which were not covalently linked to agarose beads. 32 All data files corresponding to the identification of proteins and the peptides sequenced were 33 presented as supplemental data (Supplemented Table II

Global analysis of O-GlcNAcylated proteins and focus on particular protein classes 3
As indicated in Table I, 342 proteins were identified consecutively to the sequencing of peptides 4 released from agarose beads by beta-elimination, i.e. threefold than those identified in non-fractionated 5 CYT fractions. We classified these proteins using the PANTHER classification system (Protein 6 Analysis THrough Evolutionary Relationships, http://www.pantherdb.org/) (64, 65). Thus, proteins 7 were classified according to their molecular function (Fig.3A) or the protein class (Fig.3B). Among 8 the classified proteins according to their molecular function, 44.1% had a binding activity, 27.8% a 9 catalytic activity, and 16.8% a structural molecule activity; the other proteins, represented less than 10 5%, had transporter, translation regulator, channel regulator, receptor, signal transducer or antioxidant 11 activities. Proteins were also classified according to the class they belong: nucleic acid metabolism 12 (32.5%), enzymatic activities (27.7%), cytoskeletal proteins and chaperones (14.5%), signalling 13 proteins (7.3%), structural proteins (5.7%), transporter and binding proteins (5.3%), or cell adhesion 14 molecule (4.4%); 2.6% of proteins (others) were not assigned to the classes of proteins described just 15

above. 16
We focused on proteins identified through the sequencing of beta-eliminated peptides containing 17 one or several dehydrated serine and threonine. We mapped the O-GlcNAcylation site(s) (indicated in 18 bold and underlined in peptide sequence) within different classes of proteins, in particular those 19 involved in cytoskeleton and sarcomeric organization (Table II) (Table III). Table II  UniProtKB (http://www.uniprot.org) or GeneCards database (http://www.genecards.org/). We 29 indicated the fraction (WE, CYT, fCYT) in which the peptides were identified. As shown on these 30 tables, some peptides were identified in all fractions; in contrast, a large number of them were only 31 identified when cytosolic-enriched extract was extensively fractionated. We also observed that in some 32 case, within the same peptide, serine or threonine residues were differently dehydrated (for example, 33 this was particularly observed for the Nascent polypeptide-associated complex subunit alpha, in Table  34 III). Beside the method, we opted for the enzymatic labelling of O-GlcNAcylated proteins rather than 10 metabolic labelling since GlcNAz incorporation preferentially occurs in complex glycans (66). We 11 added several steps of fractionation prior to the click chemistry; from the analysis of linked peptides 12 on agarose beads and released from beads by beta-elimination, we identified 342 proteins in the 13 fractionated subproteome, corresponding to a 3.5-fold increase of identified proteins compared with a 14 non-fractionated cytosol-enriched fraction. Indeed, through the fractionation protocol, based on the 15 physicochemical properties of proteins, the sample is made less complex. Thus, there is an enrichment 16 of certain proteins that could not have been identified in a complex mixture. As consequence, the 17 number of identified peptides/proteins increased after fractionation. We also sequenced 620 peptides 18 containing one or several O-GlcNAcylated sites. Over increasing the number of identified 19 peptides/proteins, the fractionation protocol performed prior to click chemistry led to the mapping of 20 O-GlcNAc sites on numerous signalling proteins, such as proteins involved in MAPK pathway, 21 including the TGF-beta pathway. In addition, we identified several proteins involved in the 22 ubiquitination process, in particular several E3-ubiquitin ligases, as well as proteins responsible of 23 deubiquitination. Thus, our data suggest that modulation of ubiquitination through O-GlcNAcylation 24 could be involved in the modulation of degradative process (and so in the regulation of protein 25 homeostasis essential for muscle healthcare), as well as intracellular processes modulated by 26 ubiquitination. It is worth to note that we opted for fractionation according to ionic force or isoelectric 27 point; of course, it could be envisaged to apply other protocols of fractionation to yield complementary 28 data to complete the non-exhaustive list of O-GlcNAcylated peptides. 29 About fifteen years ago, we attempted to cartography the O-GlcNAcylated proteins in skeletal 30 muscle and we identified structural proteins, proteins involved in signalling pathways and contractile 31 proteins as being O-GlcNAcylated (54). Five O-GlcNAc sites were mapped on actin and myosin using 32 a BEMAD approach, and interestingly, some of them were located into or close to protein-protein 33 interaction domain, suggesting that O-GlcNAcylation could play an important role in the modulation 34 of sarcomeric protein interaction (32). In this way, we recently demonstrated that O-GlcNAcylation 35 was a key modulator of sarcomere morphometry, in particular through the modulation of protein-36 protein interaction within multiprotein complexes including key structural proteins such as desmin, 1 B-crystallin, -actinin, filamin-C and moesin (53). In addition, we showed that, consecutively to 2 global changes of O-GlcNAcylation level, the interaction between desmin and B-crystallin was 3 modulated; in this paper, we localized the O-GlcNAc sites of these two proteins. For desmin, a protein  GlcNAcylated peptides (BE, beta-eliminated peptides). Table II corresponds to proteins identified 5 from whole extract (WE), table IV to proteins identified from cytosol-enriched extract (CYT), and 6 table VI to proteins identified from fractionated cytosol-enriched extract (fCYT). 7 8 Table III, V and VII: Data files corresponding to the identification of proteins from sequencing 9 of peptides released from agarose beads by trypsin. These peptides correspond to non-O-10 GlcNAcylated peptides (NR, non-retained peptides), but they belong O-GlcNAcylated proteins 11 covalently linked to agarose beads.