• No results found

University of Groningen The ecology and evolution of bacteriophages of mycosphere-inhabiting Paraburkholderia spp. Pratama, Akbar Adjie

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen The ecology and evolution of bacteriophages of mycosphere-inhabiting Paraburkholderia spp. Pratama, Akbar Adjie"

Copied!
31
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The ecology and evolution of bacteriophages of mycosphere-inhabiting Paraburkholderia spp.

Pratama, Akbar Adjie

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Pratama, A. A. (2018). The ecology and evolution of bacteriophages of mycosphere-inhabiting Paraburkholderia spp. Rijksuniversiteit Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Species boundaries and ecological

features among Paraburkholderia terrae

DSM 17804

T

, P. hospita DSM 17164

T

and

P. caribensis DSM 13236

T

Akbar Adjie Pratama, Diego Javier Jiménez, Qian Chen, Boyke Bunk, Cathrin Spröer, Jörg Overmann and Jan Dirk van Elsas

In preparation for publication

(3)

Abstract

Paraburkholderia is a recently defined bacterial genus that encompasses diverse versatile and adaptable species. In the light of its interactive lifestyle with soil fungi, the P. terrae – like subgroup (containing P. terrae DSM 17804T, P. hospita DSM 17164T

and P. caribensis DSM 13236T) is of particular interest. Here, we describe and examine

the genomes of the type strains of P. terrae (DSM 17804T; 10 Mb), P. hospita (DSM

17164T; 11.5 Mb) and P. caribensis (DSM 13236T; 9 Mb), in comparison to those of the

ecophysiologically diverse P. terrae –like strains BS001, BS007, BS110 and BS437. Based on the phylogenetic analysis of 16S rRNA and housekeeping genes, shared orthologous genes, average nucleotide identity with the MUMer algorithm (ANIm) and tetranucleotide frequency correlations (TETRA), we found a close relationship between P. hospita DSM 17164T and P. terrae DSM 17804T (ANIm = 95.42; TETRA

= 0.99784); further, low divergence was found with P. terrae strains BS001, BS007, BS110 and BS437 (ANIm = 99%; TETRA = 0.99). Analyses of the genomes, carbon source utilization (BIOLOG assays) and CAZyme patterns showed a range of genomic features explaining interactions with fungi in the Paraburkholderia type strains, including biofilm formation (pgaABCD, alginate biosynthesis), motility/chemotaxis, type-4 pili and diverse secretion systems (including a type-3 secretion system). Furthermore, P. terrae DSM 17804T as well as P. hospita DSM 17164T showed high

migratory behavior towards fungi in soil, while P. caribensis DSM 13236T was a

relatively poor migrator. On the basis of the genomic information and capability to colonize fungi, we propose the reclassification of the investigated organisms into two highly-related species, i.e. P. hospita (including P. terrae strain BS001, BS007, BS110, BS437, DSM 17804T and P. hospita DSM 17164T) and P. terrae, that together form one

species cluster.

Keywords: Average nucleotide identity, comparative genomics, Paraburkholderia,

(4)

3

Introduction

Classical studies have revealed the genus Burkholderia to be ubiquitous in soils (Salles et al., 2002), plants and humans (Estrada-De Los Santos et al., 2013; Sahl et al., 2015; Stoyanova et al., 2007). Several species of this genus have been examined for their potential to produce novel antibiotics, bioactive metabolites and serve as plant growth promoting agents (Depoorter et al., 2016; Paungfoo-Lonhienne et al., 2014; Sasso et al., 2016). A recent study provides robust evidence for the contention that the classical genus Burkholderia actually is split into two clades, denoted Burkholderia and Paraburkholderia (Sawana et al., 2014). The second clade comprises mainly environmental as well as poorly characterized species. A particularly interesting soil-derived species cluster in this clade is formed by the tightly knit species P. terrae, P. hospita and P. caribensis.

Type strains of these species, i.e. P. terrae strain DSM 17804T, P. hospita strain

DSM 17164T and P. caribensis strain DSM 13236T have been used as taxonomical and

ecophysiological references for the genus (Achouak et al., 1999; Goris et al., 2002; Yang et al., 2006). The species P. caribensis was first isolated from a vertisol soil in Martinique, French West Indies. Members of P. caribensis produce high amounts of exopolysaccharides on carbon-rich media (Achouak et al., 1999). The species P. hospita was first isolated from agricultural soil in Pittem, Belgium, as a key in situ recipient of introduced catabolic plasmids pJP4 and pEMT1. These plasmids encode the degradation of 2,4-dichlorophenoxyacetic acid (2,4-D) (Goris et al., 2002). P. terrae was originally isolated from a forest soil in Daejeon, South Korea (Yang et al., 2006).

In our initial studies on bacteria that are interacting with the soil fungus Lyophyllum sp. strain Karsten, P. terrae came up as a prime bacterial associate (Nazir et al., 2012; Warmink et al., 2011). The relevant P. terrae strain BS001 possessed genes for strategies that enable such interactions, i.e. biofilm formation (Warmink et al., 2011), a type three secretion system (T3SS) (Yang et al., 2016), chemotaxis encoded by flagella genes, type 4 pili (T4P) and adherence traits (Haq et al., 2016). Moreover, co-migration of P. terrae strain BS001 with fungal hyphae was dependent on the presence of flagella, with a T4P- system having minor effects (Yang et al., 2017). A five-gene cluster has also been reported to be up-regulated when BS001 was in contact with the fungus (Haq et al., 2016). Several other strains identified as P. terrae, notably BS110, BS007 and BS437, were later identified from other soils as being also fungal-interactive. Also, evidence was provided for fungal interactivity of P. hospita DSM 17164T and P. caribensis DSM 13236T (Nazir et al., 2012).

Here, we hypothesize that the aforementioned three species share common genetic systems that allow them to interact with fungi. In order to address this hypothesis,

(5)

we determined their genome sequences and explored their evolutionary relationship and ecological versatilities in a comparative genomics approach. The questions posed were: how related are the type strains to one another and to other Paraburkholderia strains? What are their unique versus common features? How may their evolutionary trajectory have shaped their strategies to interact with soil fungi? Can we find sets of genes or gene clusters that enable such bacterial-fungal interactions?

We here describe and analyze the complete genomes of P. hospita DSM 17164T,

P. caribensis DSM 13236T and P. terrae DSM 17804T. Furthermore, we calculated the

average nucleotide identity (ANI) and tetranucleotide frequencies (TETRA) values (Richter and Rossello-Mora 2009), weighted against those of the genomes of P. terrae strains BS001, BS007, BS110 and BS437. The purpose was to examine whether the organisms are different or form a coherent ‘species cluster’.

Materials and Methods

Growth conditions and genomic DNA preparation

P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM 13236T were

cultured aerobically in Luria-Bertani medium, with shaking at 28°C, 180 rpm shaking (overnight). Genomic DNA was extracted using modified UltraClean Microbial DNA isolation kit (MOBio Laboratories Inc., Carlsbad, CA, USA). The modification consisted of adding glass beads to the cultures to spur mechanical cell lysis. Following the manufacturer’s instructions. The extracted DNA was purified with the Wizard DNA cleanup system (Promega, Madison, USA), then DNA quality and quantity were determined with a Nanodrop spectrometer (Thermo Scientific, Wilmington, USA). The quality and quantity of the extracted DNAs were assessed using electrophoresis in 1% agarose.

Bacterial metabolism, ecological capacities

Metabolic tests using BIOLOG GN2 (Biolog Inc., Hayward, CA) were performed according to manufacturer’s protocols (Nazir et al., 2012). Briefly, early exponential phase cultures were used as inocula for the test plates (150 ml per well). Each plate contained 96 microwells with one of 95 different carbon sources in each and tetrazolium violet as an indicator of metabolic activity. Plates were incubated for 28oC

at 48 h to allow the development of a purple color as positive for metabolic utilization. Interaction assays with fungi were done according to Nazir et al., (2012). Briefly, single strain migratory assays were done using petri dishes with three compartments (Greiner Bio one, Frickenhausen, Germany), of which two were filled with pre-sterilized (autoclaved) test soil (at 60% of water holding capacity, bulk density

(6)

3

of about 1.3 g/cm3 and 8 mm depth). The third compartment was filled with oat

flake agar (OFA, 30 g/L oat flake, 15 g/L agar) (Warmink and Van Elsas, 2009) and served as a nutrient source for the fungus. Overnight bacterial cultures were washed and introduced evenly in one 3 mm-wide streak in the soil compartment directly adjacent to the front of the growing fungal hyphae and in a system without fungi as a negative control. The systems were then incubated at 23oC for 12-14 days of

incubation. Afterwards, 100 mg soil suspensions were plated onto R2A media, the plates incubated, and numbers of colony forming units (CFU) were determined. Paraburkholderia strains with detectable CFU were considered to be good migrators. In a second assay, Lyophyllum sp. strain Karsten was grown on propionate-containing minimal medium for two weeks at 28oC. The fugal mycelia were then harvested by

centrifugation and the supernatant filtered (0.45 µm pore size), and used as a medium to monitor the growth of selected Paraburkholderia.

Genome sequencing and assembly

Complete genome sequences were determined using a combination of two genomic libraries, of which one was prepared for sequencing with the PacBio RSII (Pacific Biosciences, Menlo Park, CA, USA) platform. This SMRTbell™ template library was prepared and sequenced according to the instructions from Pacific Biosciences following the Procedure & Checklist “Greater than 10 kb Template Preparation and Sequencing”. Briefly, for preparation of 15 kb libraries, 5 µg genomic DNA was end-repaired and ligated overnight to hairpin adapters applying components from the DNA/ Polymerase Binding Kit P6 from Pacific Biosciences, Menlo Park, CA, USA. Reactions were carried out according to the instructions of the manufacturer. BluePippin™ size selection to greater than 4 Kb was performed according to the manufacturer´s instructions (Sage Science, Beverly, MA, USA). Conditions for annealing of sequencing primers and binding of polymerase to purified SMRTbell™ template were assessed with the Calculator in RS Remote, PacificBiosciences, Menlo Park, CA, USA. SMRT sequencing was carried out on the PacBio RSII (PacificBiosciences, Menlo Park, CA, USA) taking one 240-min movie for one SMRT cell using the P6 Chemistry. Totals of around 712, 722 and 689 million bases were produced for P. terrae DSM 17804T, P.

hospita DSM 17164T and P. caribensis DSM 13236T, respectively. Paired-end

short-read libraries for hybrid error correction were generated and sequenced on the Illumina HiSeq 2500 (Illumina, San Diego, CA, USA) with 200 cycles resulting in ~3.5 million paired-end reads per genome.

Long-read genome assemblies were generated using the “RS_HGAP_Assembly.3” protocol included in SMRTPortal version 2.3.0, applying default parameters with exception of P. hospita DSM 17164T, where the target genome size was set to 20 Mbp.

(7)

For P. terrae DSM 17804T and P. caribensis DSM 13236T, four chromosomal contigs

could be assembled, whereas the assembly of P. hospita led to five chromosomal contigs and the additional plasmid pEMT1. All assembled replicons were trimmed, circularized and adjusted to dnaA or their replication gene as first gene. Total genome coverages of 52-61x were calculated within the long-read assembly process. Hybrid error correction was performed for each of the genomes by mapping of Illumina short-read data onto the draft circular genomes using BWA (Li and Durbin, 2009) followed by automated variant calling using VarScan 2 (Koboldt et al., 2009) and GATK (McKenna et al., 2009) for consensus calling.

The genome sequences of P. terrae DSM 17804T, P. hospita DSM 17164T and P.

caribensis DSM 13236T have been deposited at NCBI GenBank under accession

numbers CP026111-CP026114, CP026105-CP026110 and CP026101-CP026104, respectively.

Phylogenetic analyses

Phylogenetic analyses were done for the 16S rRNA genes of three type strains of Paraburkholderia (P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis

DSM 13236T). The 16S rRNA gene sequences were aligned using the Silva 16S rRNA

aligner (Pruesse et al., 2007) and MAFFT (multiple alignment program for amino acid or nucleotide sequences) version 7 (Katoh et al., 2002). A maximum-likelihood phylogenetic tree was built with FastTree and the bootstrap (from 1,000 replicates) (Price et al., 2009). The second phylogenetic tree analysis was done using seven housekeeping genes (aroE, dnaE, groeL, gyrB, mutL, recA and rpoB). Each gene was aligned independently using MUSCLE (Edgar, 2004) and edited in accordance with Gblocks (Talavera et al., 2007). The alignment of all genes was then concatenated, aligned and edited as previously reported (Hug et al., 2016). The maximum-likelihood phylogenetic tree was built with FastTree using a bootstrap value of 1,000 replicates (Price et al., 2009). Both phylogenetic trees were visualized using Interactive tree of life (iTOL) v3 (Letunic and Bork, 2016).

Prophage (PP) phylogenetic tree was based on identified capsid genes. In this analysis, after manual checking of the identified PPs, P. hospita DSM 17164T was

excluded, as no complete PPs were found. Further, top three blast hits for complete prophages of P. terrae DSM 17804T and P. caribensis DSM 13236T were included in

the analysis, i.e. Stenotrophomonas phage S1 (YP_002321458.1), Pseudoalteromonas phage B8b (AII30180.1), Shewanella sp. phage 1/44 (YP_009103725.1), Salmonella phage FSL SP-016 (AGF88089.1), Phage BP-4795 (YP_001449293.1) and Stx2-converting phage Stx2a_WGPS8 (BAT32725.1). We also include known Burkholderia phages such as, Burkholderia cepacia phage Bcep22 (AY349011), B. cenocepacia phage

(8)

3

BcepM(AY539836), B. cenocepacia phage BcepB1A (NC_005886), B. pseudomallei phage 1026b (AY453853), Burkholderia virus E125 (AF447491), Burkholderia phage BcepIL02 (FJ937737), Burkholderia phage 52237 (NC_007145), Burkholderia phage E202 (NC_009234), Burkholderia phage E255 (NC_009237), Burkholderia phage 644-2 (NC_009235), Burkholderia phage E12-2 (NC_009236), Burkholderia phage Bcep1 (NC_005263), Burkholderia phage Bcep43 (NC_005342), Burkholderia phage Bcep781 (NC_004333) and Burkholderia phage BcepNY4 (0096001). As well as other known phages, including Enterobacteria phage T4 (NC_00086), Enterobacteria phage Mu (NC_000929), and Enterobacteria phage lambda (NC_001416). Capsid gene was aligned using MUSCLE (Edgar, 2004) and edited in accordance with Gblocks (Talavera et al., 2007). The maximum-likelihood phylogenetic tree was built with FastTree using a bootstrap value of 1,000 replicates (Price et al., 2009). Both phylogenetic trees were visualized using Interactive tree of life (iTOL) v3 (Letunic and Bork, 2016).

Genome annotation analysis and comparative genome analysis

MicroScope web platforms hosted at Genoscope (MaGe) (Vallenet et al., 2013) was used for genomic comparisons, therefore, locus tags referred to are based on MaGe. Thus, annotations of all type strains are publicly available in MaGe (http://www. genoscope.cns.fr/agc/microscope/home/index.php). Hereby, the gene sequence information of P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM

13236T was analyzed using TrEMBL, SwissProt alignments, PubMed database,

InterPro database and SignalP. In addition, MicroScope also identified the relevant RNA genes (rRNA and tRNA). MicroScope also allowed insight into metabolic profiles, e.g. using the microbial Pathway/Genome Databases (PGDBs). The metabolic profile is comparison based on the computation of a ’pathway completion’ value, i.e. the ratio between the number of reactions for pathway X in a given organism and the total number of reactions of pathway X defined using given database. Finally, the secondary metabolite detection program AntiSMASH was used (Vallenet et al., 2013). A KEGG-based annotation of the three genomes was done using GhostKoala (Kanehisa et al., 2016). The web server OrthoVenn (Wang et al., 2015) was used to compare the clusters of orthologous genes between the genomes of P. terrae DSM 17804T, P.

hospita DSM 17164T, P. caribensis DSM 13236T and P. terrae BS007.

Average nucleotide identity values (ANI) and and tetranucleotide frequency correlations coefficient (TETRA) were obtained using JSpeciesWS (Richter et al., 2015). The measures the average nucleotide identity (ANI) was relied on two different algorithms: BLAST+ (ANIb) and MUMmer-Maximum Unique Matches (ANIm). Additionally, Tetra Correlation Search (TCS) analyses (shown as a Z-value) were also done to provides a hit-list for fast insights into the relationships of the

(9)

organism of interest against entire genomes reference database (Richter et al., 2015). Carbohydrate-active enzymes (CAZymes) potentially involved in the synthesis, degradation and modification of carbohydrates were analyzed using the platform dbCAN (Yin et al., 2012).

Identification of regions of genomic plasticity (RGPs),

prophages and CRISPR spacers

The presence of RGPs were predicted through Microscope (Vallenet et al., 2013). The platform employs RGP finder pipelines together with other well-known genomic island (GI) identifier programs i.e. GI identifier based on Hidden Markov Models (Waack et al., 2006) and AlienHunter-IVOM (Vernikos and Parkhill, 2006). The (GI identifier) pipeline identified RGPs based on criteria, such as, (i) RGPs have minimal length of 5kb, (ii) CDSs not belonging to conserved synteny groups between compared organisms and (iii) regions with less than 50% of gene similarity with reference organism were removed. RGPs in the three type strains were identified in comparison with Paraburkholderia terrae strains BS001, BS007, BS110 and BS437. Moreover, bacteriophage sequences and CRISPR spacers were identified with PHAST (Zhou et al., 2011) and CRISPRFinder (Grissa et al., 2008), respectively. The criteria used to determine complete prophage were mentioned in Pratama et al., (2018).

Results

Summary of phenotypic traits

Microscopic studies of P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis

DSM 13236T confirmed that cells of all three type strains were Gram-negative,

rod-shaped and motile, as described (Achouak et al., 1999; Goris et al., 2002; Yang et al., 2006). On R2A agar plates, all three strains grew between 15-37oC and optimally at

28oC. Based on BIOLOG GN2 assays, all strains used three of 95 carbon sources tested

in common, i.e. D-trehalose, α-ketovareric acid and D,L-carnithine, for growth. Eight additional compounds (L-fucose, lactulose, xylitol, D-alanine, L-ornithine, urocanic acid, D,L α- glycerol phosphate and D-glucose-6-phosphate) were utilized by P. terrae DSM 17804T as well as P. hospita DSM 17164T, but not by P. caribensis DSM 13236T.

P. terrae DSM 17804T and P. hospita DSM 17164T revealed migratory capabilities

along Lyophyllum sp. strain Karsten hyphae (Table 3.1) akin to those previously observed for P. terrae BS001, BS007 and BS110 (Nazir et al., 2012). In contrast, P. caribensis DSM 13236T was a poor migrator, yet its numbers in soil increased in

the presence of the fungus. Furthermore, P. terrae DSM 17804T and P. hospita DSM

(10)

3

propionate, whereas P. caribensis DSM 13236T did not show such responses (Table 3.1).

Thus, the three strains studied here (i) interact with fungi, and (ii) utilize carbon sources to different degrees; clearly, P. terrae DSM 17804T and P. hospita DSM 17164T

were very similar to each other, and diverged from P. caribensis DSM 13236T.

Table 3.1. Fungal-interactive traits (partly modified from Nazir et al., 2012)

Strains Inoculation sitea Migration sitea Fungal exudate (propionate) b

P. terrae DSM 17804T + ++ ++ P. hospita DSM 17164T + +++ ++ P. caribensis DSM 13236T + - -P. terrae BS001 + +++ +++ P. terrae BS007 + ++ ++ P. terrae BS110 + +++ +++ P. terrae BS437 + - ++

a +: log CFU/g 6.0-6.5; +: log CFU/g 6.5-7.5; +++: log CFU/g 7.5-8.5 b +++: excellent respond; ++: moderate and -: no response.

Overall analysis and general properties of the genomes

Final assembly of the three genomes revealed large genome sizes of about 10 Mb, suggesting a multi-faceted “generalistic” lifestyle in soil. Specifically, the genome size of P. terrae DSM 17804T was 10,062,489 bp (G+C content of 61.79%), that of P.

hospita DSM 17164T 11,527,706 bp (G+C content of 61.79%) and that of P. caribensis

DSM 13236T 9,032,490 bp (G+C content of 62.58 %). P. terrae DSM 17804T was

assembled into four, P. hospita DSM 17164T into six and P. caribensis DSM 13236T into

four contigs (Table 3.2). All assembled contigs were deposited as circular bacterial chromosomes, with the exception of a ~100-kb contig in P. hospita strain DSM 17164T

which represents the full plasmid pEMT1 that had o originally been introduced by Goris et al (2002). This circular 61.3% G+C plasmid (GenBank accession no. CP026110) encodes the degradation of 2,4-dichlorophenoxyacetic acid (2,4-D) (Goris et al., 2002). Totals of 8,752, 10,009 and 7,761 coding sequences (CDS) were predicted for P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM 13236T, respectively.

Moreover, the genome of P. terrae DSM 17804T was found to contain 18 ribosomal

RNA (rRNA) and 60 tRNA encoding genes, that of P. hospita DSM 17164T 21 and 67 and

that of P. caribensis DSM 13236T 18 and 61, respectively. A summary of these features

is shown in Supplementary Table 3.1A, and project information in Supplementary

Table 3.1B. The numbers of genes associated with COG functional categories is given

(11)

Table 3.2. Genome statistics of P.terrae strain DSM 17804T, P.hospita strain DSM 17164T and

P.caribensis strain DSM 13236T (NCBI genome statistics).

Attribute P.terrae DSM 17804

T P.hospita DSM 17164T P.caribensis DSM 13236T

Value % of totala Value % of totala Value % of totala

Genome size (bp) 10,062,489 100 11,527,706 100 9,032,490 100

DNA coding (bp) 8,781,534 87.27 9,874,633 85.88 7,889,880 87.35

DNA G+C content 7,122,969 61.79 7,122,969 61.79 5,652,532 62.58

DNA Scaffolds 4 - 6 - 4

-Total genes 9,068 100 10,603 100 8,101 100

Protein coding genes 8,752 96,51 10,009 94,40 7,761 95,80

Pseudogenes 233 2,57 501 4,73 256 3,16

rRNA 18 21 18

tRNA 61 68 62

CRISPR spacers 13 14 9

aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Phylogenetic analyses

Phylogenetic analyses based on alignment of the 16S rRNA genes showed that P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM 13236T (Figure 3.1A) indeed clustered as a tight group within the genus Paraburkholderia. The

group (<1% divergence at the 16S rRNA sequence level), denoted as “species cluster”, had as its closest relatives P.terrae strains BS001 (NZ_AKAU00000000), BS007 (NFVE00000000), BS110 (NFVD00000000), BS437 (NFVC00000000), NBRC100964 (AB201285) and P. caribensis MWAP64 (Y17009) (Figure 3.1A). Interestingly, P. terrae BS007 was closest to P. hospita DSM 17164T (Figure 3.1A). The concatenates

tree (Figure 3.1B) was largely in line with the above tree, in that it confirmed that all three type strains are indeed closely related to each other. The tree also indicated a tight relatedness to the selected P. terrae strains BS007, BS110, BS437 and BS001.

Average nucleotide identity (ANI) and tetranucleotide frequency

(TETRA) analyses

To explore the findings of similarity from the trees built on the basis of the 16S rRNA and seven-gene-concatenate sequences, we compared the evolutionary distances between the three genomes under study, next to nine other ones, together belonging to seven Paraburkholderia species. As an outgroup, we used the Burkholderia cenocepacia J2315 genome (see Figure 3. 2 and Supplementary Table 3.2).

(12)

3

B

P.terrae BS001 (NZ_AKAU00000000)

P.hospita DSM 17164 (CP026105-CP026110)

Burkholderia cenocepacia J2315 (AM747720)

P.terrae BS437 (NFVC00000000) P.xenovorans LB400 (U86373)

P.caribensis DSM 13236 (CP026101-CP026104)

Paraburkholderia phytofirmans PsJN (AY497470)

P.terrae BS007 (NFVE00000000) P.phymatum STM815 (AJ302312) P.terrae BS110 (NFVD00000000) P.terrae DSM 17804 (CP026111-CP026114) 100 100 99 100 100 100 100 0.01 T T T A 54 100 68 98 85 85 61 67 63 52 51 50 86 58 80 P.sprentiae WSM5005 (HF549035) P.terrae BS437 (NFVC00000000) P.phytofirmans PsJN (AY497470) P.oxyphila NBRC105797 (AB488693) P.graminis C4D1M (U96939)

P.caribensis MWAP64 (Y17009) B.cenocepacia J2315 (AM747720)

P.terrae BS007 (NFVE00000000) P.phenoliruptrix AC1100 (AY435213)

P.terrae DSM 17804 (CP026111-CP026114)

P.bryophila 376MFSha3.1 (AM489501)

P.terrae NBRC100964 (AB201285) P.mimosarum NBRC106338 (AY752958) B.pseudomallei K96243 (DQ108392) P.nodosa DSM21604 (AY773189) P.terrae BS110 (NFVD00000000) P.xenovorans LB400 (U86373) P.ferrariae NBRX106233 (DQ514537) P.fungorum NBRC102489 (AF215705) P.sacchari LMG19450 (AF263278) P.terrae BS001 (NZ_AKAU00000000) Burkholderia glumae (U96931)

P.ginsengisoli (AB201286) P.heleia NBRC101817 (AB495123) P.hospita DSM 17164 (CP026105-CP026110) P.caledonica NBRC102488 (AF215704) P.bannensis NBRC103871 (AB561874) P.kururiensis M130 (AB024310) P.caribensis DSM 13236 (CP026101-CP026104) 0.001 T T T P. hospita species cluster P. hospita species cluster

Figure 3.1. Phylogenetic tree of Paraburkholderia terrae DSM 17804T, P. hospita DSM 17164T

and P.caribensis DSM 13236T based on (A) 16S rRNA and (B) seven concatenated core genes

(aroE, dnaE, groeL, gyrB, MutL, recA and rpoB). The 16S rRNAs were aligned with the SILVA 16S rRNA aligner and MAFFT (multiple alignment program for amino acid or nucleotide sequences) version 7. The maximum-likelihood phylogenetic tree was built with FastTree. Bootstrap (from 1000 replicates) values are indicated at the nodes. P. terrae DSM 17804T, P.

hospita DSM 17164T and P. caribensis DSM 13236T belong to clade II (pink box). Clade I (blue

box) mainly consist of pathogenic Burkholderia, while clade II mainly consist of environmental strains assigned in new genus Paraburkholderia (Sawana et al., 2014).

(13)

P. ca ribensis DSM 13236 T P. hospita DSM 17164 T P. ter rae DSM 17804 T P.ter rae BS001 P.ter rae BS007 P.ter rae BS110 P.ter rae BS437 P. glathei DSM 50014 P. p hytofi rmans PsJN P. xen ov or ans LB400 B. cenocepacia J2315 P. fungo rum A TCC BAA-463 100 90.01 90.58 90.01 90.16 89.99 90.01 76.79 78.38 77.80 77.10 77.70 91.41 100 94.24 97.82 97.86 97.73 96.86 76.98 78.41 77.71 77.11 77.52 91.33 93.42 100 93.29 93.39 93.29 93.34 76.92 78.41 77.62 77.09 77.47 91.55 97.85 94.24 100 98.41 99.96 97.13 77.14 78.41 77.70 77.11 77.56 91.58 97.89 94.26 98.37 100 98.34 97.13 77.03 78.36 77.72 77.00 77.45 91.58 97.77 94.26 99.89 98.38 100 97.06 77.11 78.35 77.60 77.03 77.48 91.64 97.04 94.35 97.27 97.22 97.15 100 77.13 78.36 77.77 76.98 77.49 76.52 76.24 76.32 76.16 76.20 76.15 76.20 100 76.74 76.40 76.78 75.98 78.07 77.49 77.73 77.32 77.38 77.18 77.31 76.65 100 85.98 76.76 83.46 77.89 77.17 77.36 76.98 77.13 76.98 77.03 76.77 86.59 100 76.79 82.93 76.10 75.46 75.70 75.33 75.36 75.32 75.17 75.89 75.93 75.53 100 75.67 77.58 76.86 77.01 76.73 76.76 76.63 76.68 76.16 83.65 82.72 76.58 100 P. caribensis DSM 13236T P. hospita DSM 17164T P. terrae DSM 17804T P. terrae BS001 P. terrae BS007 P. terrae BS110 P. terrae BS437 P. glathei DSM 50014 P. phytofirmans PsJN P. xenovorans LB400 B. cenocepacia J2315 P. fungorum ATCC BAA-463

75 ANIb (%) 100 100 93.03 92.87 93.06 93.17 93.16 93.16 84.77 85.20 85.23 85.09 85.13 93.03 100 95.42 99.22 99.25 99.22 98.41 84.75 85.20 85.21 85.04 85.04 92.86 95.42 100 95.45 95.58 95.56 95.58 84.71 85.16 85.10 84.94 85.06 93.06 99.22 95.45 100 99.49 99.98 98.45 84.79 85.16 85.20 84.99 85.03 93.17 99.25 95.58 99.49 100 99.48 98.40 84.69 85.17 85.18 84.98 85.02 93.16 99.22 95.56 99.98 99.48 100 98.44 84.79 85.16 85.18 84.96 85.05 93.16 98.41 95.58 98.45 98.40 98.44 100 84.84 85.21 85.23 85.01 85.07 84.77 84.75 84.71 84.79 84.69 84.80 84.84 100 84.75 84.78 84.91 84.65 85.19 85.21 85.15 85.16 85.17 85.15 85.22 84.75 100 89.91 84.79 87.56 85.23 85.21 85.10 85.19 85.18 85.19 85.24 84.79 89.91 100 84.91 87.57 85.08 85.03 84.94 84.99 84.98 84.96 85.01 84.91 84.78 84.91 100 84.65 85.14 85.04 85.06 85.03 85.02 85.04 85.07 84.65 87.56 87.57 84.65 100 P. caribensis DSM 13236T P. hospita DSM 17164T P. terrae DSM 17804T P. terrae BS001 P. terrae BS007 P. terrae BS110 P. terrae BS437 P. glathei DSM 50014 P. phytofirmans PsJN P. xenovorans LB400 B. cenocepacia J2315 P. fungorum ATCC BAA-463

84 ANIm (%) 100 1 0.990.99 0.99 0.990.99 0.99 0.910.91 0.91 0.840.89 0.99 1 0.99 0.99 0.990.99 0.99 0.920.92 0.91 0.84 0.90 0.99 0.99 1 0.99 0.990.99 0.99 0.910.91 0.90 0.830.89 0.99 0.990.99 1 0.990.99 0.99 0.920.92 0.91 0.840.90 0.99 0.990.99 0.99 1 0.99 0.99 0.920.92 0.91 0.850.90 0.99 0.990.99 0.99 0.99 1 0.99 0.920.92 0.91 0.850.90 0.99 0.990.99 0.99 0.990.99 1 0.920.92 0.91 0.850.90 0.91 0.920.91 0.92 0.920.92 0.92 1 0.92 0.92 0.910.89 0.91 0.920.91 0.92 0.920.92 0.92 0.92 1 0.99 0.850.98 0.91 0.910.90 0.91 0.910.91 0.91 0.920.99 1 0.850.98 0.84 0.840.83 0.84 0.850.85 0.85 0.910.85 0.85 1 0.82 0.89 0.900.89 0.90 0.900.90 0.90 0.890.98 0.98 0.82 1 P. ca ribensis DSM 13236 T P. hospita DSM 17164 T P. ter rae DSM 17804 T P.ter rae BS001 P.ter rae BS007 P.ter rae BS110 P.ter rae BS437 P. glathei DSM 50014 P. p hytofi rmans PsJN P. xen ov or ans LB400 B. cenocepacia J2315 P. fungo rum A TCC BAA-463 P. caribensis DSM 13236T P. hospita DSM 17164T P. terrae DSM 17804T P. terrae BS001 P. terrae BS007 P. terrae BS110 P. terrae BS437 P. glathei DSM 50014 P. phytofirmans PsJN P. xenovorans LB400 B. cenocepacia J2315 P. fungorum ATCC BAA-463

0.8 TETRA 1

A

B

C

Figure 3.2. Heat maps of Average nucleotide identity (ANIb and ANIm) and tetranucleotide frequency (TETRA) analyses of Paraburkholderia terrae DSM 17804T, P.hospita DSM 17164T,

P.caribensis DSM 13236T, P. terrae BS001, P. terrae BS007, P. terrae BS110, P. terrae BS437, P.

glathei DSM 50014, P. phytofirmans PsJN, P. xenovorans LB400, P. fungorom ATCC BSS-46 and Burkholderia cenocepacia J2315. The ANI (threshold 95-96%) and TETRA (>0.99) values were

used species circumscriptions (Richter and Rossello-Mora, 2009).

The data indicated that the aforementioned three genomes were highly similar, with ANIb and ANIm values between P. hospita DSM 17164T and P. terrae DSM 17804T being

93.8% and 95.4%, respectively. In general, the P. hospita DSM 17164T genome showed

very low genomic divergence from the comparator genomes of P. terrae BS001, BS007, BS110 and BS437 (ANIb > 97% and ANIm > 99%). Using TCS analysis, the genome of P. terrae BS001 showed Z-score values of 0.99974, 0.99789 and 0.99613 against those of P. hospita DSM 17164T, P. terrae DSM 17804T and P. caribensis DSM 13236T,

respectively (Supplementary Table 3.3). These data indicate that P. hospita DSM 17164T and P. terrae DSM 17804T, next to the comparator P. terrae strains, are indeed

(14)

3

(and - by inference - ecologically) more distant from the aforementioned organisms than these are among themselves.

Clustering of orthologous proteins

Based on OrthoVenn analysis, 6,076 clusters of predicted proteins were shared across the genomes of the three type strains plus that of the close relative P. terrae BS007; this set was regarded as the “shared set of predicted orthologous proteins” across these genomes. Another 1,699 clusters being shared between P. hospita DSM 17164T

and P. terrae BS007 followed this. Remarkably, another 1,236 clusters were shared between P. hospita DSM 17164T, P. terrae DSM 17804T and P. terrae BS007, whereas

the level of sharedness with P. caribensis DSM 13236T was much lower (Figure 3.3).

93 99 57 52 83 227 1699 192 87 165 94 275 1236 65 6076 P. hospita DSM 17164 P. caribensis DSM 13236 P. terrae DSM 17804 P. terrae BS007 T T T 238 281 130 117 31 27 29 13

Figure 3.3. The orthologous clusters of proteins venn diagram analysis of Paraburkholderia

terrae DSM 17804T P.hospita DSM 17164T and P.caribensis DSM 13236T. Black number

represents the number of orthologous clusters of proteins shared between the genomes; blue represents number of proteins into the unique clusters and red represents number of unique clusters with functional annotation using Swiss-Prot database.

We identified 93 (238 unique proteins), 99 (281), 57 (130) and 52 (117) genome-unique COG clusters across P. hospita DSM 17164T, P. caribensis DSM 13236T, P.

terrae DSM 17804T and P. terrae BS007, respectively. Of these, 31, 27, 29 and 13 COG

clusters were functionally annotated by using the SwissProt database. The analysis also showed that unique clusters were predicted to be involved in nutrient uptake

(15)

Table 3.3. Number of genes associated with general COG functional categories of P.terrae strain DSM 17804T, P.hospita strain DSM 17164T and P.caribensis strain DSM 13236T (as

reported by Microscope platform). Code

P.terrae DSM

17804T P.hospita DSM 17164T P.caribensis DSM 13236T

Description

Value % of totala Value % of totala Value % of totala

J 242 2.35% 250 2.07% 235 2.55% Translation, ribosomal structure and biogenesis

A 1 0.01% 1 0.01% 1 0.01% RNA processing and modification

K 926 8.98% 1023 8.48% 788 8.54% Transcription

L 260 2.52% 520 4.31% 271 2.94% Replication, recombination and repair

B 4 0.03% 4 0.03% 4 0.04% Chromatin structure and dynamics

D 59 0.57% 75 0.62% 53 0.57% Cell cycle control, Cell division, chromosome partitioning

V 79 0.77% 95 0.79% 75 0.81% Defense mechanisms

T 505 4.90% 585 4.85% 466 5.05% Signal transduction mechanisms

M 503 4.88% 537 4.45% 469 5.08% Cell wall/membrane biogenesis

N 166 1.61% 178 1.48% 151 1.64% Cell motility

U 184 1.79% 214 1.77% 181 1.96% Intracellular trafficking and secretion

O 236 2.29% 274 2.27% 236 2.56% Posttranslational modification, protein turnover, chaperones

C 632 6.13% 722 5.99% 555 6.02% Energy production and conversion

G 735 7.13% 733 6.08% 638 6.92% Carbohydrate transport and metabolism

E 1044 10.13% 1129 9.36% 948 10.28% Amino acid transport and metabolism

F 106 1.03% 109 0.90% 102 1.11% Nucleotide transport and metabolism

H 241 2.34% 261 2.16% 219 2.37% Coenzyme transport and metabolism

I 442 4.29% 449 3.72% 359 3.89% Lipid transport and metabolism

P 611 5.93% 658 5.46% 531 5.76% Inorganic ion transport and metabolism

Q 344 3.34% 376 3.12% 260 2.82% Secondary metabolites biosynthesis, transport and

catabolism

R 1354 13.14% 1428 11.84% 1166 12.64% General function prediction only

S 602 5.84% 697 5.78% 553 5.99% Function unknown

W 12 0.12 % 11 0.09 % 11 0.12 % Extracellular structure

Z 1 0.01 % 1 0.01 % - - Cytoskeleton

(16)

3

and membrane transport were 10.5% (n=6), 6.45% (n=6) and 9% (n=9) for P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM 13236T, respectively.

While, we found zero, 11.8% (n=11) and 1% (n=1) unique cluster designated to genetic plasticity (i.e. transposase, phage and insertion element) for P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM 13236T, respectively. However,

the analysis did not show any unique cluster that support the divergence between P. terrae DSM 17804T / P. hospita DSM 17164T and P. caribensis DSM 13236T.

Finally, based on the unique COG clusters, the production of particular enzymes, i.e. carboxyl esterases, dioxygenases, oxidoreductases and oxidases, and the uptake of 4- hydroxybenzoate and glycerol were found to be dominant in P. terrae DSM 17804T. Remarkably, in P. terrae BS007 some unique COG clusters were related with

the transport of benzoate and sulfate, transposases, dehydrogenases and regulator proteins. The clustering of the unique genes is shown in Supplementary Table 3.4.

Genomic insights into the three type strains

Metabolic and ecological competence traits

The genome analyses showed that the genomes of the three type strains contain genes or operons for diverse metabolic capacities (Supplementary Table 3.5 and Supplementary Table 3.6), without a clear division in metabolic range. Sets of various carbohydrate metabolism genes were found, indicating the presence of capacities to degrade simple (e.g. glucose, fructose) to complex sugars (e.g. cellulose and hemicellulose). Metabolic profile analyses showed that all three type strains have relatively similar numbers of metabolic pathways, including the TCA cycle, glycolysis, the Entner-Doudoroff pathway and gluconeogenesis. Moreover, we found some distinctive metabolic that can only be found in P. caribensis DSM 13236T,

these include putrescine biosynthesis III, glutamate biosynthesis IV and V, Aromatic compund o-diquinones biosynthesis, methanol oxidation to formaldehyde II, fructose degradation and nitrate reduction IV (dissimilatory). Whereas these can only be found in P. hospita DSM 17164T, P. terrae DSM 17804T, fatty acids biosynthesis (yeast),

trehalose biosynthesis IV, VI and VII, penicillin K biosynthesis, alanine degradation IV, D-serine degradation, methionine degradation II, anthranilate degradation I (aerobic) and phenol degradation I (aerobic) (for more details see Supplementary Table

3.5). Further, all three type strains synthesize all essential amino acid (i.e. histidine,

isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, arginine, cysteine, glutamine, glycine, proline and tyrosine), next to the non-essential amino acid biosynthesis pathways (i.e. alanine, asparagine, glutamic acid, serine and selenocycteine). Furthermore, evidence for a suite of fermentation pathways was found in the three type strain genomes (Supplementary Table 3.5).

(17)

T2SS T3SS T4SS T6SS Chemotaxis Type 4 pili Flagellar Paraburkholderia terrae DSM 17804T P. hospita DSM 17164T P. caribensis DSM 13236T T1SS T2SS T3SS T4SS T6SS Biofilm system pgaABCD Biofilm system alginate Chemotaxis Flagellar T4P

0.00 7.00 14.00

Gene copy number pgaA pgaB pgaC pgaD alga algB algC algD kinB cheA cheW cheD cheR cheB cheBR cheY cheZ cheV mcp aer tsr pilA pilB pilC pilD, pppA pilQ pilW flp, pilA cpaA, tadV cpaB, rcpC cpaC, rcpA cpaE, tadZ cpaF, tadA fimA fimC fimD, fimC yggT fliH fliI fliOZ, fliO fliP fliQ fliR flhA flhB motA motB fliG fliM fliNY, fliN fliF flgI flgA flgH flgB flgC flgD flgF flgG flgJ flgE fliE flgK flgL fliD fliK fliL fliC flhC fliA flhD flgM flhF flhG, fleN fliJ fliS fliY tolC raxA raxB, cvaB hlyB, cyaB hlyD, cyaD gspD gspE gspF gspG gspH gspJ gspK gspL gspN cpaA, tadV cpaB, rcpC cpaC, rcpA cpaE, tadZ cpaF, tadA tadB tadC comEA comEC yscC, sctC yscD, sctD yscJ, sctJ, hrcJ yscL, sctL yscN, sctN, hrcN yscQ, sctQ, hrcQ yscR, sctR, hrcR yscT, sctT, hrcT yscU, sctU, hrcU yscV, sctV, hrcV virB1 virB2 virB3 virB8 virB4, lvhB4 virB5, lvhB5 virB6, lvhB6 virB9, lvhB9 virB10, lvhB10 virB11, lvhB11 virD4, lvhD4 dsbC traD dotB, traJ dotC, traI dotD, traH icmB, dotO icmE, dotG icmK, traN, dotH icmL, traM, dotI icmO, trbC, dotL icmP, trbA icmT, traK trbD trbE impL, vasK, icmF impK, ompA, vasF, dotU impJ, vasE impH, vasB impG, vasA impF impC impB impA hcp vgrG T4SS lysozyme-related protein vasD, lip vasG, clpV vasJ impM impE lip3 T1SS

Biofilm formation system

pgaABCD

Biofilm formation system alginate

P. ter rae DSM 17804 T P. hospita DSM 17164 T P. ca ribensis DSM 13236 T P. ter rae DSM 17804 T P. hospita DSM 17164 T P. ca ribensis DSM 13236 T P. ter rae DSM 17804 T P. hospita DSM 17164 T P. ca ribensis DSM 13236 T P. ter rae DSM 17804 T P. hospita DSM 17164 T P. ca ribensis DSM 13236 T

Figure 3.4. Heatmaps profile of three type strain ecological traits, such as: biofilm system, chemotaxis system, T4P, Flagellar and type secretion systems. The blue-red bar represents the number of copy gene in all type strains.

(18)

3

Interestingly, all also contain lactose transformation genes (involved in the 3-ketolactose hydrolase and β-galactosidase reactions).

Remarkably, testosterone and androsterone degradation, arsenate detoxification, phenylmercury acetate degradation and oxidized GTP and dGTP detoxification pathways were only present in the genome of P. caribensis DSM 13236T, whereas

these were absent from those of P. terrae strain DSM 17804T and P. hospita DSM

17164T. Finally, alginate and psl biofilm formation genes were found consistently

across the three type strains as well as siderophore biosynthesis systems (see Figure

3.4 and Supplementary Table 3.5).

Carbohydrate-active enzymes (CAZymes)

P. terrae DSM 17804T and P. hospita DSM 17164T contained 303 and 308 genes encoding

CAZymes, respectively. In contrast, P. caribensis DSM 13236T only had genes for 253

carbohydrate-active enzymes. P. terrae strains BS001, BS007, BS110 and BS437 revealed 301, 310, 299 and 301 hits with the CAZy database, respectively.

In general, there is no big difference regarding the CAZymes profile among type strains. All genomes revealed the presence of 31-39 genes for AA (auxiliary activities) family proteins, 14-26 for CBM (carbohydrate-binding modules), 45-59 for CE (carbohydrate esterases), 70-89 for GH (glycoside hydrolases), 92-109 for GT (glycosyltransferases) and 1-2 for PL (polysaccharide lyases). Overall, the most abundant genes for CAZy family proteins were AA3 (oxidoreductases), CE1 and CE10 (involved in hydrolysis of carbohydrate esters), GH109 (hydrolysis of glycoproteins: α-N-acetylgalactosaminidase, EC 3.2.1.49) and GH23 (lytic transglycosylases) (Supplementary Table 3.7).

Remarkably, high numbers (25-26) of genes for CBM family proteins were observed in P. terrae DSM 17804T and P. hospita DSM 17164T, especially those for

CBM32 proteins (about 3 and 8, respectively), while CHM32 is absence in P. caribensis DSM 13236T (Supplementary Table 3.7). One example of a key (GH3 family) enzyme

is the glycoside hydrolase that hydrolyzes glycosidic bonds in complex molecules such as cellulose, hemicellulose and starch (amylase) (Kusaoke et al., 2017). Genes for GH3 family proteins were found across all type strains and Paraburkholderia strains BS001, BS007, BS110 and BS437, and hence analyzed. These were tightly knit and related to those of a suite of related Paraburkholderia strains. However, based on the GH3 tree (Figure 3.5), it is clear that P. caribensis DSM 13236T is diverge than

those of P. terrae DSM 17804T and P. hospita DSM 17164T.

Membrane transporters

The genomes of the three type strains contained a plethora of different membrane transporters. In particular, energy-dependent membrane transporters, i.e. ABC

(19)

transporters, phosphotransferase systems (PTS), secondary membrane transporters, i.e. major facilitator superfamily (MFS), and solute carrier family (SLC) was found (Supplementary Table 3.8). We also found genes for aquaporins and small neutral solute transporters (Supplementary Table 3.8). We found similar glycerol transporter and oxalate:formate antiporter (OFA) family transporters across all three type strains. In contrast, a particular suite of genes for membrane transporters was only found in P. terrae DSM 17804T and P. hospita DSM 17164T, but not in P.

caribensis DSM 13236T. These were: iron (III) transporter- afuA/fbpA; afuB/fbpB,

afuC/fbpC and transmembrane electron carriers- torZ: trimethylamine-N-oxide reductase (cytochrome c) in P. terrae DSM 17804T and torZ in P. hospita DSM 17164T,

respectively. Moreover, a few specific membrane transporters were unique per strain, i.e. erythritol transporter in P. terrae DSM 17804T, arginine/ornithine and

heme transporters in P. hospita DSM 17164T and organophosphate:P

i antiporter

(OPA) family transporter in P. caribensis DSM 13236T (Supplementary Table 3.8).

Motility complexes

As expected, rather similar flagellar, type-4 pili (T4P) and chemotaxis complexes were found in the genomes of the three type strains (Figure 3.4). The flagellum biosynthesis cluster stretches over 45.3 kb, with high similarity among the three type strains and P. terrae BS001 (>90% similarity). Moreover, a set of chemotaxis genes, i.e. cheA, cheW, cheD, cheR, cheB, cheBR, cheY, cheZ and cheV was found across these organisms. The T4P assembly proteins i.e. pilABCDQW were also found, in a syntenous cluster, across the three genomes, next to that of BS001. In addition, methyl-accepting chemotaxis protein (mcp) and aerotaxis receptor (aer) were also found across all strains. A serine sensor receptor, methyl-accepting chemotaxis protein I (tsr) could only be found in P. hospita DSM 17164T (see Supplementary Table 3.9).

Traits predicted to confer associative behavior with soil fungi

All three type strains were found to contain the secretion systems screened for, i.e. type-1, type-2, type-3, type-4 and type-6 secretion systems (T1SS through T6SS) (Figures 3.4 and Figure 3.6). The exception was the absence of a T4SS from the P. caribensis DSM 13236T genome. Specifically, a complete T1SS system was found in

P. terrae DSM 17804T, i.e. OMP: tolC, MFP: raxA and ABC: raxB, cvaB. In contrast, only

ABC protein and hemolysin D protein of T1SS were found in P. hospita DSM 17164T

and P. caribensis DSM 13236T (Figures 3.4 and Figure 3.6). With respect to the

T2SSs, all three type strains contained genes for nine gsp genes, i.e gspD, gspE, gspF, gspG, gspH, gspJ, gspK, gspL and gspN. This T2SS also contained tight adherence (Tad) export apparatuses, i.e. cpaA/tadV, cpaB/rcpC, cpaC/rcpA, cpaE/tadZ, cpaF/tadA,

(20)

3

tadB and tadC (Figure 3.4). With respect to the T3SS, clusters containing 19 genes were identified in P. terrae DSM and P. caribensis DSM 13236T. Ten of the 19 genes

were highly conserved and syntenous (sctC, sctD, sctJ, sctL, sctN, sctQ, sctR, sctT, sctU and sctV), at ~17-60% similarity (Figure 3.6).

Also, copies of the T6SS (consisting of the core component of the imp/Vas secretion system) were found in all three strains (Figure 3.6). With respect to the imp system, one T6SS cluster was found in P. terrae DSM 17804T, three in P. hospita DSM 17164T

and two in P. caribensis DSM 13236T (see Figure 3.5 and Supplementary Table 3.9).

The high synteny found among the T6SS of P. terrae DSM 17804T, cluster 3 of P.

hospita DSM 17164T and cluster 1 of the T6SS of P. terrae BS001 indicated related

divergence. Cluster 1 of P. hospita DSM 17164T was highly syntenous with that of P.

caribensis DSM 13236T and T6SS cluster 3 in P. terrae BS001. Also, high synteny was

observed between cluster 2 of P. hospita DSM 17164T and cluster 2 of P. terrae BS001

(Figure 3.5). Interestingly, a T4SS complex consisting of VirD4, VirB1 to VirB11 was only found in P. terrae DSM 17804T and P. hospita DSM 17164T.

Glycerol and oxalate metabolism and five-gene cluster

Genome analysis of the three type strains did not find the glycerol uptake gene GUP that was previously discovered in P. terrae BS001 (Haq et al., 2014). However, other glycerol transporter genes (glpV, glpP, glpO and glpS) and putative sn-glycerol 3-phosphate transporter genes (ugpB, ugpA, ugpE and ugpC) were found in all type strains (Supplementary Table 3.10).

We found sets of genes potentially involved in oxalate and formate oxidation in all type strains (Supplementary Table 3.10). The high similarities, for example oxalyl-CoA decarboxylase of P. terrae BS001with those in P. terrae DSM 17804T (99%) and

P. hospita DSM 17164T (100%) suggested these strains may have similar responsive

behavior to soil fungi; relatedness to P. caribensis DSM 13236T was lower (98%),

The five-gene cluster hypothesized by Haq et al (2017) to generate energy from small carbonaceous molecules released by soil fungi (e.g. oxalate) was identified as a prime responder to fungal presence. Here, we report the finding of a highly similar (77-100%) and syntenous gene cluster in P. terrae DSM 17804T, P. hospita DSM

17164T and also P. caribensis DSM 13236T (see Figure 3.7 and Supplementary Table 3.9). Moreover, the other P. terrae strains examined, i.e. BS001, BS007, BS110 and

BS437 also contained the cluster (Figure 3.7A). Remarkably, in the five-gene cluster in the P. caribensis DSM 13236T genome, we could not find the putative

(21)

82 88 74 74 94 79 95 85 98 93 94 92 76 83 99 62 99 99 98 76 89 P. bryophila (WP 020069401.1) P. phymatum (WP 012400254.1) P. terrae DSM 17804 (CP026111-CP026114) P. terrae BS001 (AKAU00000000) P. terrae (WP 042306901.1) P. sediminicola (SDP14212.1) P. phenazinium WP 074263851.1 P. terrae BS437 (NFVC00000000) P. hospita DSM 17164 (CP026105-CP026110 ) P. aspalathi (WP 093640447.1) P. phenoliruptrix (WP 035482177.1) Ralstonia pickettii DTP0602 (AGW91241.1)

Paraburkholderia tuberum (WP 090801153.1) P. piptadeniae (WP 087733568.1) P. caribensis DSM 13236 (CP026101-CP026104) P. dilworthii (WP 027804092.1) P. caribensis (WP 035990186.1) P. phytofirmans (WP 012433873.1) P. terrae BS110 (NFVD00000000) P. rhynchosiae (WP 102631239.1) P. caledonica (WP 087752803.1) P. xenovorans (WP 011489379.1) P. fungorum (WP 046568197.1) P. terrae BS007 (NFVE00000000) P. graminis (WP 006051951.1) P. diazotrophica (WP 090867093.1) P. terrae (WP 086917232.1) P. phenoliruptrix (WP 015003386.1) P. phenazinium (WP 090684366.1) P. sp. BN5 (WP 095418628.1) 0.1 T T T P. hospita species cluster

Figure 3.5. The phylogeny analysis of GH3 CAZYmes enzyme of type strains and other

Paraburkholderia strains BS001, BS007, BS110 and BS437. The proteins from the top PSI-

BLASTP hits were aligned using MUSCLES (Edgar, 2004) and edited manually using Gblocks (Talavera et al., 2007). The alignments were used to build maximum-likelihood tree with FastTree and the bootstrap (from 1,000 replicates) (Price et al., 2009) and the tree was visualized using Interactive tree of life (iTOL) v3 (Letunic and Bork, 2016). Red dots correspond type strains, and white dots correspond to other Paraburkholderia (strain BS001, BS007, BS110 and BS437). Pink box represents Paraburkholderia species; purple box is an outgroup. Presence of regions of genomic plasticity (RGPs), prophage-related

sequences and CRISPR-Cas arrays

The genomes of the three types strains all contained a range of RGPs. Specifically, P. terrae DSM 17804T and P. hospita DSM 17164T had raised numbers (97 and

99), whereas P. caribensis DSM 13236T had 76 (Supplementary Table 3.11). The

total sizes of these RGPs were 3,009,744 bp (29.91% of the genome), 4,401,854 bp (38.18%) and 2,133,117 bp (23.62%), respectively.

(22)

3

Type 2 secr etion system (T2SS) P.terrae BS001 P.terrae DSM 17804 T P.hospita DSM 17164 T P.caribensis DSM 13236 T gspD gspE gspF gspC gspG gspH gspI gspJ gspK gspL gspM gspN 73% 100% Type 3 secr etion system (T3SS) 62% 100% P.terrae BS001 P.terrae DSM 17804 T P.hospita DSM 17164 T P.caribensis DSM 13236 T yscC yscJ HrB7 yscN yscL HrpB yscT Type 4 secr etion system (T4SS) 38% 100% VirD4 VirB1 1 VirB10 VirB9 VirB6 VirB4 VirB1 P.terrae BS001 P.terrae DSM 17804 T P.hospita DSM 17164 T VirB2 VirB3 VirB5 VirB8 kikA Type 6 secr etion system (T6SS) 37% 100% P.terrae BS001 (cluster 3) P.caribensis D SM 13236 T (cluster 1) P.hospita DSM 17164 T (cluster 1) CIpB ImpD MotB ImpJ/V asE ImpC ImpB ImpH/V asB ImpG/V asA ImpA VgrG ImpB ImpC ImpJ/V asE P.terrae BS001 (cluster 1) P.terrae DSM 17804 T P.hospita DSM 17164 T (cluster 3) 30% 100% ImpB ImpC ImpD ImpE ImpF VasA/ImpG ClpB VgrG ImpI/V asC IcmF ImpM VasB/ImpH ImpJ/V asE ImpK/V asF 24% 100% ImpB ImpC ImpE ImpF VasA/ImpG ClpB VgrG VasB/ImpH P.terrae BS001 (cluster 2) P.hospita DSM 17164 T (cluster 2) Figure 3.6. Synteny comparison of the secre tion systems of all

type strains. Comparison

percentage was generated using BLAST+ 2.4.0

(tBLASTx with cutoff value 10

-3) and map comparison figures were created with Easyfig program (Sullivan

et al.

(23)

77% 100%

Alkylhydr

operoxidase AhpD family

-core domain containing pr otein

Cupin LysR Putative nucleoside-diphosphate sugar epimerase Conserved exported pr

otein of unknown function P.terrae BS001 P.terrae BS007 P.terrae BS110 P.terrae BS437 P.terrae DSM 17804T P.hospita DSM 17164T P.caribensis DSM 13236T

A

C

B

P. c arib ensi s DSM 132 3T P. terrae B S007 P. terrae BS1 10 P . ter ra e D SM 1 78 04T P. terrae BS001 P. hosp ita DSM 17164 T P. terrae BS437 98 0.01 Shewanella sp. phage 1/44 P. caribensis DSM 13236 T -Phage1 ϕPcari1DS P. caribensis DSM 13236 T -Phage2 ϕPcari2DS Pseudoalteromonas phage B8b Stenotrophomonas phage S1 P. terrae ϕ437 B. cepacia Bcep781 phage Stx2a WGPS8 Phage BP-4795 Salmonella phage FSL SP-016 P. terrae DSM 17804 T - phage ϕPt17804 Burkholderia phi644-2 Burkholderia phiE125

Burkholderia phi1026b Enterobacteria phage λ Enterobacteria phiMu Enterobacteria T4 B.cenocepacia BcepIL02 B.cepacia Bcep22 Burkholderia phiE12-2Burkholderia phi52237 Burkholderia phiE202 1 100 100 100 100 100 67 68 83 58 95 91 97 99 96 98

Figure 3.7. (A) The unrooted concatenated tree of five-gene cluster. The proteins were aligned using MUSCLES (Edgar, 2004) and edited with accordance of Gblocks (Talavera et al., 2007). The alignments were used to build maximum-likelihood tree with FastTree and the bootstrap (from 1,000 replicates) (Price et al., 2009) and the tree was visualized using Interactive tree of life (iTOL) v3 (Letunic and Bork, 2016). (B) Synteny comparison five-gene cluster among strains. Comparison percentage was generated using BLAST+ 2.4.0 (tBLASTx with cutoff value 10-3) and map comparison figures were created with Easyfig program (Sullivan et al., 2011).

(C) The phylogeny analysis of phage capsid genes. The proteins were aligned using MUSCLES (Edgar, 2004) and edited manually with accordance to Gblocks (Talavera et al., 2007). The alignments were used to build maximum-likelihood tree with FastTree and the bootstrap (from 1,000 replicates) (Price et al., 2009) and the tree was visualized using Interactive tree of life (iTOL) v3 (Letunic and Bork, 2016). The accession number of genome used can be seen in Materials and Methods.

(24)

3

Briefly, the biggest RGP in P. terrae DSM 17804T was RGP72 (283,846 bp; 308

CDS), in P. hospita DSM 17164T RGP98 (1,365,074 bp; 1,480 CDS) and in P. caribensis

DSM 13236T RGP.. (267,272 bp; 302 CDS). An RGP (RGP95) previously recognized

as an integrated plasmid in P. terrae BS001 (Haq et al., 2014) was found in P. hospita DSM 17164T (RGP99: 520,079 bp), as well as P. terrae DSM 17804T (RGP72), but not

in P. caribensis DSM 13236T. Furthermore, we also found transposases in P. terrae

DSM 17804T (42) P. hospita DSM 17164T (158) and P. caribensis DSM 13236T (36),

respectively and integrases in P. terrae DSM 17804T (11) P. hospita DSM 17164T (44)

and P. caribensis DSM 13236T (8), respectively were also found in RGPs in all three

type strains (Supplementary Table 3.11).

The analysis of prophage (PPs) region in all type strain suing PHAST (Zhou et al., 2011) identified one predicted PP of 25kb, two of predicted PPs total size 40.7 kb and two of predicted PPs total size 89.9 kb, respectively in P. terrae DSM 17804T,

P. hospita DSM 17164T and P. caribensis DSM 13236T (Supplementary Table 3.12).

However, after manual checking of all identified PPs, we found that both regions in P. hospita DSM 17164T mostly consist of mobile genetic element, hypothetical proteins

and integrases, with no phage structural gene (e.g. capsid, tail, terminase). Therefore, these regions were not “true” PP regions. Furthermore, only one region was found to be complete PP in P. terrae DSM 17804T (ϕPt17804) and two in P. caribensis DSM

13236T (ϕPcari1DS and ϕPcari2DS). None of these phage sequences resembled

sequences in any of the other strains. Based on the unrooted phylogenic tree, it revealed that P. caribensis DSM 13236T prophages are distantly related to P. terrae

DSM 17804T (Figure 3.7B). We also analyzed the three genomes for the presence

of CRISPR-Cas spacer sequences Using the (web-based) CRISPR-Finder program, we found CRISPR spacer sequences in all three strains, respectively 13, 14 and 9 in P. terrae DSM 17804T, P. hospita DSM 17164T and P. caribensis DSM 13236T.

Discussion

Phenotypic traits: fungal- interactivity

In this study, we examined the genomes and metabolic/fungal interactive traits across three Paraburkholderia type strains, next to selected other strains, in order to delineate species boundaries. All strains have a soil origin and to date it was enigmatic to what extent they cluster together, or are divergent, with respect to genomic and ecological features. Here, we first provide evidence for the fact that P. hospita DSM 17164T and P. terrae DSM 17804T can migrate through soil along the hyphae of L.

sp. strain Karsten, much like previously shown for the comparator P. terrae strains BS001, BS007 and BS110 (Table 3.1). Additionally, these two type strains were

(25)

also able to utilize the exudates produced by L. sp. strain Karsten; which can attract Paraburkholderia types that can utilize such compound (Nazir et al., 2012; Zhang et al., 2014; Haq et al 2018). In contrast, we found that P. caribensis DSM 13236T was

much less able to migrate or utilize such compounds. These results suggesting these two strains had similar ecological behavior to the other P. terrae strains, yet different from P. caribensis DSM 13236T.

ANI and TETRA coupled to 16S rRNA gene and concatenated

core gene analyses determine the existence of species

clusters in the genus Paraburkholderia

Since the first delineation of Burkholderia in 1993 (Yabuuchi et al., 1992), over 80 new species have been added to this genus (Depoorter et al., 2016). Recently, the genus Paraburkholderia (encompassing mostly environmental strains) has been split off (Depoorter et al., 2016; Estrada-De Los Santos et al., 2013; Sawana et al., 2014). The use of 16S rRNA and concatenated housekeeping genes (atpD, gltB, lepA and recA) (Estrada-De Los Santos et al., 2013), ANI (using a 95-96% threshold) coupled with TETRA correlation (coefficient; >0.99) have been proposed to substitute DNA-DNA hybridization (DDH) as the new gold standard for species circumscriptions (Richter and Rossello-Mora, 2009).

Here, using the ANIm and TETRA values, we found close yet divergent relatednesses among the three type strains investigated. Interestingly, ANI and TETRA values for P. terrae strains BS001 (ANIb/TETRA of 95.45/0.99789), BS007 (ANIm/TETRA of 95.58/0.99658), BS110 (ANIm/TETRA of 95.56/0.99683) and BS437 (ANIm/TETRA of 95.58/0.99666) revealed coherence with the recommended threshold range for the type strain of the species (P. terrae DSM 17804T). In other words, by these

criteria the designation of these strains as P. terrae would hold. However, the ANIm and TETRA values towards P. hospita DSM 17164T were higher (Figure 3.2). To add

to the complexity of this conundrum, the type strains P. terrae DSM 17804T and P.

hospita DSM 17164T (ANIm/TETRA of 95.42/0.99784) had ANIm and TETRA values

coherent with (higher than) the recommended threshold for species circumscription. Thus, by these criteria, these organisms would constitute one species, that may be described as a ‘species cluster’. This proposition is in line with recognizable polyphasic characters, such as morphology, physiology, pathogenicity, cultural characteristics and secondary metabolites. It has been proposed that ANI values of 95% represent DDH boundaries of 70% and thus delineate species (Richter and Rossello-Mora, 2009). We then considered the DDH analyses previously done for the type strains that identify these as different species (Goris et al., 2002; Yang et al., 2006). We propose that it would be in agreement with the DDH data to use ANIb threshold values >95%

Referenties

GERELATEERDE DOCUMENTEN

Abbreviations: HAS, hyaluronan synthase; ICAM, intercellular adhesion molecule; NCAM, neural cell adhesion molecule; PECAM, platelet endothelial cell adhesion; VECAM,

Collectively, the significant decrease of the OD 600 in strain BS437 cultures upon MMC induction, the phage progeny observed by TEM, and the increased gene copy number of the

We hypothesized that, by analyzing the presence of phages and CRISPR-Cas systems (especially CRISPR spacers) in the genomes of Paraburkholderia spp., we will unearth the evolutionary

emetica, versus the corresponding bulk soil, and hypothesized that (i) the mycosphere contains a high microbial diversity and an unexplored viral community (ii) mitomycin C

Given the postulated importance of phages in the mycosphere, I then first examined the state-of-the-art of soil virome studies and the current understanding of their ecological,

Department of Microbial Ecology, Microbial Ecology - Groningen Institute for Evolutionary Life Sciences, University of Groningen, Nijenborgh 7, Groningen, 9747 AG, The

The genome structures of the 26 complete prophages in Paraburkholderia showed different divergences (see Figure 7.4 and Supplementary Figure 7.1), which, as we postulate here,

Global warming will shift the impact of phages in topsoil by the increased induction of host prophages, resulting in enhanced possibilities of phage-driven horizontal gene