Non-contiguous-Finished Genome Sequence and Description of Paenibacillus camerounensis sp. nov.

Strain G4T was isolated from the stool sample of a wild gorilla (Gorilla gorilla gorilla) from Cameroon. It is a facultative anaerobic, Gram-negative, rod-shaped bacterium. This strain exhibits a 16S rRNA nucleotide sequence similarity of 97.48 % with Paenibacillus typhae, the phylogenetically closest species with standing nomenclature. Moreover, the strain G4T presents some phenotypic differences when compared to other Paenibacillus species and shows a low MALDI-TOF Mass Spectrometry score that does not allow any identification. Thus, it is likely that this strain represents a new species. Here, we describe the characteristics of this organism, complete genome sequence, and annotation. The 6,933,847 bp size genome (1 chromosome but no plasmid) contains 5972 protein-coding genes and 54 RNAs genes, including 44 tRNA genes. In addition, digital DNA-DNA hybridization values for the genome of the strain G4T against the closest Paenibacillus genomes range between 19.7 and 22.1, once again confirming its new status as a new species. On the basis of these polyphasic data, consisting of phenotypic and genomic analyses, we propose the creation of Paenibacillus camerounensis sp. nov. that contains the strain G4T.


Introduction
The genus Paenibacillus, described by Ash et al. [1,2]about 20 years ago, currently includes 177 species (167 validly and 10 non-validated but published species) [3]. Species of this genus are Gram-positive, negative or variable, frequently motile, and spore-forming bacteria. Many studies have described Paenibacillus species in various environments including soil, water, and food. Moreover, Paenibacillus species are rarely associated with human diseases, but they may be involved in some infections such as endocarditis, bacteremia, and wound infections [4][5][6][7][8][9].
Strain G4 T (= CSUR P208 = DSM 26182) is the type strain of Paenibacillus camerounensis sp. nov. This bacterium is a Gram-negative, facultative anaerobic, and indole-negative bacillus that has rounded-ends. It was isolated from the feces of western lowland gorilla as part of a culturomics study to describe the bacterial communities of the gorilla gut [10]. Indeed, the use of various culture conditions has allowed the identification of numerous new bacterial species from gorilla fecal samples [10].
In this study, we present a summary classification, phenotypic features for P. camerounensis sp.nov.strainG4 T ,together with the description of the complete genome sequence and its annotation. These characteristics support the circumscription of the species P. camerounensis [11].

Strain Isolation and Phenotypic Tests
Information about the fecal sample collection and conservation are described previously [10]. Strain G4 T was isolated in January 2012 as part of a culturomics study [10]bycultivation on a novel medium which was designed as follows: Mango fruit was crushed and lyophilized and a solution containing 12 mg of mango per ml of sterile water was prepared and filtered using 0.2 μm filters. In addition, a solution of 14 mg of agar per ml of sterile water was prepared. Using these solutions, the medium was prepared (20 ml of filtered mango solution + 80 ml of agar solution). 16S rRNA sequence was performed on this strain [10]. A phylogenetic tree was obtained using the maximum-likelihood method and Kimura 2-parameter model within the MEGA 6 software [12]. Moreover, matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis was carried out using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany), and 12 distinct deposits were performed for strain G4 T from 12 isolated colonies. The 12 G4 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against 6253 bacterial spectra including 124 spectra from 68 Paenibacillus strains, used as reference data, in the BioTyper database. Interpretation of scores was as the following: a score ≥2 enabled the identification at the species level; a score between 1.7 and 2 enabled the identification at the genus level; and a score less than 1.7 did not enable any identification (these scores were established by the manufacturer Bruker Daltonics). Different growth temperatures (25,30,37, and 45°C) were tested. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux, Marcy l'Etoile, France), and under aerobic conditions, with or without 5 % CO 2 . API 50CH and API ZYM systems (BioMérieux) were used for carbohydrate metabolism tests and enzyme detection, respectively, as recommended by the manufacturer. The standard disc method was applied for antimicrobial susceptibility testing according to the Société Française de Microbiologie (SFM).

Genomic DNA Preparation
P. camerounensis sp. nov. strain G4 T was cultured aerobically on four Petri dishes (5 % sheep blood-enriched Columbia agar) at 37°C. Then, the strain was collected from the Petri dishes and suspended in 3 × 500μl of TE buffer and stored at 80°C. Five hundred microliters of this suspension was thawed, centrifuged 3 min at 10,000 rpm, and resuspended in 3 × 100μl of G2 buffer (EZ1 DNA Tissue kit, Qiagen, Courtaboeuf, France). A mechanical lysis was performed using glass powder on the Fastprep-24 device (Sample Preparation system, MP Biomedicals, USA) twice for 20 s. Then, lysozyme (2.5 μg/μl) was added and the tube was incubated at 37°C for 30 min. Finally, the extraction was performed using the BioRobot EZ1 Advanced XL (Qiagen). The yield and the concentration were measured by the Quant-it Picogreen kit (Invitrogen, Cergy Pontoise, France) on the Genios Tecan fluorometer at 50 ng/μl.

Genome Sequencing and Assembly
A 3-kb paired end library was sequenced using the 454_Roche_Titanium. This project was loaded on a 1/4 region for each application on PTP Picotiterplate. The library was prepared from 5 μg of bacterial DNA by the DNA fragmentation on the Covaris S-Series (S2) instrument (Woburn, Massachusetts, USA) with an enrichment size at 3.2 kb. The DNA fragmentation was visualized through the Agilent 2100 BioAnalyzer on a DNA labchip 7500. The library was constructed according to the 454 GS FLX Titanium paired-end protocol (Roche). Circularization and nebulization were performed and generated a pattern with an optimum at 606 bp. Following PCR amplification through 17 cycles and double size selection, the single stranded paired-end library was quantified using the Quant-it Ribogreen kit (Invitrogen) on the Genios Tecan fluorometer at 420 pg/μL. The library concentration equivalence was calculated as 1.27E + 9 molecules/μL. The library was clonally amplified with 0.5 cpb in 3 emPCR reactions and using the GSTitanium SVemPCR Kit (Lib-L) v2. The yield of the emPCR was 13.88 % between the expected ranges of 5 to 20 % and according to Roche recommendation.
Beads (790,000) for a 1/4 region per application were loaded on the GS Titanium PicoTiterPlate PTP Kit 70 × 75 and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche). The run was performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 236,286 passed filter wells were obtained and generated 79.84 Mb of sequences with an average length of 337 bp. The passed filter sequences were assembled using Newbler with 90 % identity and 40-bp as overlap. The final assembly identified 153 contigs (>200 bps) generating a genome size of 6.93 Mb, which corresponds to a genome coverage of 52.7×.

Genome Annotation
Open reading frames (ORFs) were predicted using Prodigal [13] with default parameters, but the predicted ORFs were excluded if they spanned a sequencing region gap. The predicted bacterial protein sequences were searched against the GenBank database [14] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [15]w a s used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [16] and BLASTn against the GenBank  Fig. 2 Gel view comparing Paenibacillus camerounensis G4 T spectra with other members of the Paenibacillus genus. The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel-like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a grayscale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units    [17] for detecting orthologous proteins between genomes compared two by two, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm. Moreover, we used the Genome-to-Genome Distance Calculator (GGDC) web server available at (http://ggdc.dsmz.de) to estimate the overall similarity among the compared genomes and to replace the wet-lab DNA-DNA hybridization (DDH) by a digital DDH (dDDH) [18,19]. GGDC 2.0 BLAST+ was chosen as an alignment method and the recommended formula 2 was taken into account to interpret the results.

Strain and Sequences Deposition
Strain G4T was deposited in two microbial culture collections; the German collection of microorganisms (Deutsche Sammlung von Mikroorganismen, DSM) under the accession number DSM 26182 and the French culture collection (Collection de Souches de l'Unité des Rickettsies, CSUR) under the accession number CSUR P208. The 16S rRNA and genome sequences are available in GenBank database under accession numbers JX650057 and CCDG000000000, respectively.

Classification and Phenotypic Features
Strain G4 T had a 97.48 % 16S rRNA nucleotide sequence similarity with Paenibacillus typhae, the phylogenetically closest validly published Paenibacillus species (Fig. 1), when it was compared against the NCBI database and Ribosomal Database Project (RDP). This value was lower than the percentage of 16S rRNA gene sequence threshold recommended by Meier-Kolthoff et al. for Firmicutes to delineate a new species without carrying out DNA-DNA hybridization with maximum error probability of 0.01 % [20]. Moreover, for strain G4 T , a poor MALDI-TOF-MS score (<1.4) was obtained that did not allow any identification, suggesting it was not a member of any known species. We added the spectrum from strain G4 T to our database. Spectrum differences with other Paenibacillus species are presented in Fig. 2.

Genome Sequencing Information and Genome Properties
On the basis of phenotypic characteristics and MALDI-TOF results of this strain and because of the low16S rRNA similarity to other members of the genus Paenibacillus, it is likely that the strain represents a new species and thus it was chosen for genome sequencing. It was the 45th genome of a Paenibacillus species (Genomes Online Database) and the first genome of P. camerounensis sp. nov. Italicized numbers indicate numbers of proteins per genome  The genome is 6,933,847 bp long (one chromosome, but no plasmid) (Fig. 3) with a 51.4 % G+C content. It is composed of 153 contigs. Of the 6022 predicted genes, 5972 were protein-coding genes, 54 were RNAs (one gene is 16S rRNA, one gene is 23S rRNA, eight are 5S rRNA, and 44 genes whose two pseudogenes of tRNA) and 133 (2.22 %) were annotated as peptide signals. A total of 4491 genes (75.25 %) were assigned to COGs, Genes (3956) (66.8 %) with function prediction and 1750 genes (29.32 %) as transmembrane helices. In addition, 1418 genes were assigned as hypothetical proteins and the number of Orfans found was 1406. The distribution of genes into COGs functional categories is presented in Table 2.

Comparison with Other Paenibacillus Species Genomes
The genome of P. camerounensis strain G4 T was compared to those of seven close Paenibacillus species (Table 3). The draft genome of P. camerounensis is larger in size than those of Paenibacillus odorifer, Paenibacillus stellifer, Paenibacillus sabinae, and Paenibacillus zanthoxyli (6.93 vs 6.81, 5.66, 5.27, and 5.05 Mb, respectively), but smaller in size than that of Paenibacillus graminis, Paenibacillus sonchi,a n d Paenibacillus borealis (6.93 vs 7.17, 7.51, and 8.16 Mb). P. camerounensis has a higher G + C content than those observed in P. graminis, P. sonchi, P. odorifer,andP. zanthoxyli (51.40 vs 50.60 %, 50.40, 44.20, and 50.90 %, respectively) but lower than those of P. stellifer and P. sabinae (51.40 vs 53.50 and 52.60 %, respectively) and equal to that of P. b o re a l i s ( Table 3). The protein content of P. camerounensis is lower than those of P. sonchi, P. b o re a l i s ,an dP. graminis (5972 vs 6705, 6967, and 6211, respectively) but higher than those of P. zanthoxyli, P. s a b i n a e , P. stellifer,a n d P. odorifer (5972 vs 4907, 4865, 5161, and 5960, respectively) ( Table 4). The distribution of genes into COG categories was similar in all the six compared genomes (Fig. 4). In addition, P. c a m e ro u n e n s i s shares 3445, 2494, 2851, 4016, 2956, 3743, and 3664 orthologous genes with P. sonchi, P. zanthoxyli, P. sabinae, P. b o re a l i s , P. stellifer, P. graminis,andP. odorifer,respectively(T able4). Based on the analysis of MAGi, the Average Genomic Identity of Orthologus Gene Sequence [AGIOS] ranged from 66.79 to 91.06 % among Paenibacillus species. The range of AGIOS calculated using MAGi varies from 69.21 to 75.58 between P. camerounensis and other compared Paenibacillus species. Strain G4 T is closer to P. b o re a l i s with 75.58 % genomic identity, with over 4016 orthologus genes shared between them. dDDH estimation of the strain G4 T against the compared genomes ranged between 19.7 and 22.1. These values are very low and below the cutoff of 70 %, thus confirming again the new species status of the strain G4 T . Tables 3 and 4 summarize the number of orthologous genes and the average percentage of nucleotide sequence identity between the different genomes studied.

Conclusions
On the basis of phenotypic characteristics (Table 1), phylogenetic position (Fig. 1), MALDI-TOF analyses, genomic analyses (taxonogenomics) (Tables 3 and 4), and GGDC results, we formally propose the creation of P. camerounensis (ca.me.rou.ne'n.sis. L. gen. masc. n. camerounensis of Cameroun the French name of Cameroon where the gorilla fecal sample was collected) sp. nov. that contains the strain G4 T .
P. camerounensis is a facultative anaerobic, rod-shaped, endospore-forming, motile, and Gram-negative bacterium. Optimal growth occurs at 37°C. Bacterial cell has a diameter of 0.73 μmandalengthof14μm. Colonies are brown and 1 to 2.5 mm in diameter on blood-enriched Columbia agar. The G + C content of the genome is 51.4 %. The GenBank accession numbers for 16S rRNA and genome sequences are JX650057 and CCDG000000000, respectively. The type strain G4 T (= CSUR P208 = DSM 26182) was isolated from the fecal sample of a western lowland gorilla from Cameroon.