• No results found

University of Groningen Shigella spp. and entero-invasive Escherichia coli van den Beld, Maaike

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen Shigella spp. and entero-invasive Escherichia coli van den Beld, Maaike"

Copied!
16
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Shigella spp. and entero-invasive Escherichia coli

van den Beld, Maaike

DOI:

10.33612/diss.101452646

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

van den Beld, M. (2019). Shigella spp. and entero-invasive Escherichia coli: diagnostics, clinical

implications and impact on public health. University of Groningen. https://doi.org/10.33612/diss.101452646

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Maaike van den Beld

Genomic taxonomy of Shigella spp.

and entero-invasive Escherichia coli (EIEC);

current classification needs to be reconsidered

Submitted

Maaike J.C. van den Beld1,2, Monika A. Chlebowicz-Flissikowska2, Mithila Ferdous2,

Natacha Couto2 , Alexander W. Friedrich2, A.M.D. (Mirjam) Kooistra-Smid2,3,

John W.A. Rossen2,4#, Frans A.G. Reubsaet1

1Infectious Disease Research, Diagnostics and laboratory Surveillance, Centre for Infectious Disease

Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands

2Department of Medical Microbiology and Infection Prevention, University of Groningen,

University Medical Center Groningen, Groningen, The Netherlands

3Department of Medical Microbiology, Certe, Groningen, The Netherlands

4ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD), Basel, Switzerland

(3)

2

Abstract

The description of Shigella spp. and Escherichia coli in different genera and species is maintained, despite their genetic relatedness. This hinders routine diagnostics and practical applicability of infectious disease control measures of shigellosis, because distinction of Shigella spp. and E. coli, particularly its pathotype EIEC, is difficult. This study is the first that uses modern standards in genomic taxonomy with inclusion of all typestrains of Shigella and Escherichia, particularly with reference EIEC isolates, to proof the misclassification of Shigella spp. as a separate genus. Short-read sequence data of isolates were used to analyze the 16S rRNA gene, the Average Nucleotide Identity (ANI), the Average Amino acid Identity (AAI) and for in silico dDDH analysis. All analyses resulted in similarities above the species thresholds for Shigella species and E. coli, and below the species thresholds for other Escherichia species. Additionally, eight previously described E. coli and Shigella identification methods were evaluated in silico, using Shigella and E. coli genomes, and confirmed their inapplicability for identification of EIEC isolates following the current classification. In conclusion, this study provides more evidence that, based on recognized genomic taxonomy including all relevant typestrains, the current classification of Shigella spp. and E. coli does not reflect actual genetic relationships. Therefore, the current classification should be reconsidered, in which either Shigella spp. should be incorporated into E. coli, or the pathotype EIEC should be incorporated into the genus Shigella. Both options will largely facilitate routine diagnostics and infectious disease control management, simultaneously complying with nomenclature rules.

Introduction

The genus Shigella consists of the species Shigella dysenteriae, Shigella flexneri, Shigella boydii and Shigella sonnei, which all have a strong evolutionary relationship with Escherichia coli. Upon discovery, Shigella was first named ‘Bacillus dysenteriae’, because of the high resemblance to E. coli, formerly known as ‘Bacillus coli’ [1, 2]. As early as in the 1950s, it was discovered that genetic material of E. coli could be transferred into Shigella isolates by using phage transduction, indicating their close genetic relationship [3]. Since the 1970s, DNA-DNA hybridization (DDH) techniques proved the genetic relatedness of all Shigella species and E. coli [4, 5]. In the last two decades, the sequencing of multiple housekeeping genes [6, 7] and complete genomes confirmed the genetic relationships of Shigella spp. with E. coli [8-10]. One of the pathotypes of E. coli is the entero-invasive E. coli (EIEC), which is genetically more related to Shigella spp. than to the other pathotypes or commensal E. coli [7]. Shigella spp. and EIEC have relatively recently evolved from E. coli lineages on multiple occasions, explaining the presence of multiple genetic lineages of Shigella spp. and EIEC, that are not necessarily reflecting their taxonomic species designations [6]. For these lineages, the acquisition of a large virulence plasmid (pINV) marked the onset of the parallel evolution into an invasive lifestyle [9, 11].

EIEC isolates also possess the pINV, other Shigella-associated virulence genes and pathogenicity islands located on the chromosome, which enables them to cause a dysentery-like disease [1, 6, 12]. EIEC is, as Shigella spp., associated with community diarrhea and is able to cause food related outbreaks [13-15].

Despite the genetic relatedness of Shigella spp. and E. coli and the similar disease outcomes of EIEC, classification into two genera and current nomenclature are maintained because of clinical implications of Shigellosis, for epidemiological surveillance purposes, to facilitate communication and for historical reasons [16, 17]. However, since these publications, detection of Shigella spp. from fecal samples shifted from culture techniques with limited sensitivity to molecular techniques, that cannot distinguish Shigella spp. from EIEC [18-20]. Culture-based methods seldomly recovered EIEC and played no significant role in diagnosing EIEC infections. The need for diagnostic identification methods able to distinguish Shigella spp. from EIEC is high, as in many countries shigellosis cases need to be reported to health authorities for infectious disease control measures, whilst infections with EIEC are currently not under these regulations [21, 22].

The classification, nomenclature and identification of bacteria are part of bacterial taxonomy. Since the 1970s, classification of undescribed bacteria is performed by polyphasic taxonomy, using a broad range of techniques to evaluate genomic, phenotypic

(4)

2

and it was anticipated that the sequencing of more genomes of EIEC coincides with an

increase in diversity [9, 42].

Although multiple studies have already demonstrated that Shigella spp. and E. coli should be classified as the same species, this is the first study in which a proof of this principle is provided using modern standards in genomic taxonomy with inclusion of all Shigella and Escherichia typestrains. Additionally, as proof of the inapplicability of identification techniques following the current classification, above described assays for identification of Shigella spp. and EIEC were evaluated in silico. Finally, practical recommendations for reclassification of Shigella spp. and E. coli were explored; these are complying with the rules of nomenclature, and simultaneously facilitate diagnostics in laboratories, epidemiological surveillance, communication between medical microbiology laboratories, clinicians and public health services and guidelines for infectious disease control measures.

Material and Methods

Isolates and identification

The isolates (n = 21) sequenced in this study are listed in Table 1. From the species S. dysenteriae, S. flexneri, S. boydii, S. sonnei and E. coli the typestrains were included. Because EIEC is a pathotype of E. coli, without the status of species, no typestrain is assigned. To compensate for this, reference EIEC isolates from culture collections, isolates used for O-antigen preparation and clinical EIEC isolates were included. The clinical isolates were identified with classical phenotypic testing complemented with molecular tests and serology as previously described [43]. To provide for context, nearly complete 16S rRNA gene sequences of the typestrains of other Escherichia spp., Escherichia albertii (LMG 20976, acc: AJ508775), Escherichia fergusonii (ATCC 35469, acc: AF530475), Escherichia hermanni (GTC 347, acc: AB273738) and Escherichia marmotae (HT073016, acc: KJ787692) were downloaded from the National Center for Biotechnology Information (NCBI). For context in taxonomic analyses based on whole genomes, additional to 16SrRNA gene sequences, the following genomes were downloaded from NCBI: the typestrains of E. fergusonii (acc: NC_011740.1) and E. marmotae (acc: CP025979.1), and additionally, E. albertii isolate CDC05-3106 (acc: CP030778.1), because an assembled genome of the typestrain of the latter species is lacking in public databases.

Genome sequencing, quality control, trimming, assembly and annotation

High molecular weight DNA was extracted from all isolates using the Ultraclean Microbial DNA isolation kit (Mo Bio Laboratories, Carlsbad, CA, USA). DNA fragmentation and barcoding were performed using the Nextera DNA Library Preparation Kit (Illumina Inc., San Diego, USA), after which isolates were sequenced using a MiSeq® Reagent Kit v3 (600-cycles paired-and chemotaxic traits. DNA-DNA hybridization (DDH) is considered as gold stpaired-andard if a 16S

ribosomal RNA gene (16S rRNA gene) similarity percentage of ≥98.7% (formerly > 97%) is obtained [23-25]. More recently, taxonomists call for a revision of this polyphasic taxonomy in favor of genomic taxonomy. Different analyses are used in genomic taxonomy, and their species thresholds are often assessed for their congruence with 70% DDH similarity [26-29]. The classification of Shigella spp. and E. coli predates the techniques of polyphasic and genomic taxonomy and was conducted based on morphological, phenotypical and antigenic properties [16, 30].

In nomenclature, typestrains are important entities; they represent isolates with a consensus degree of similarity that bare the same name [31, 32]. The rules and recommendations for the naming of classified bacteria and assignment of typestrains is described in the International Code of Nomenclature of Prokaryotes (ICNP) [32]. The current nomenclature and assignment of typestrains for the genus Shigella and its species was issued by the Judicial Commission in opinion 11 in 1954 [32]. However, in DDH studies, only the S. flexneri typestrain was included [4, 5] and in studies that assessed the phylogenetic relationships of rRNA sequences only S. flexneri and S. dysenteriae typestrains were used [33, 34]. Identification of bacteria is the assignment of unknown bacteria to formerly classified and validly published taxa. In modern diagnostics, molecular procedures that target virulence genes, as the ipaH gene, are used for detection and identification of Shigella spp. These virulence genes successfully distinguish the Shigella-EIEC pathovar from other E. coli, but are not suitable markers for distinction between Shigella spp. and EIEC. Several research groups started the quest for more reliable molecular markers. In 2011, a duplex real-time PCR targeting the uidA- and lacY-genes was described [35], which was modified to a multiplex real-time PCR in 2016 [36]. Later, other researchers have concluded that these targets are unsuitable for distinction of Shigella spp. and EIEC [9, 37]. In 2013, a pentaplex PCR was developed to distinguish the different Shigella species, using the invC-, rfpB-, rfc- and wbgZ-genes as genus or species-specific targets and the ompA-gene as internal control [38]. Unfortunately, four of the five targets used in this assay were plasmid-borne, and no EIEC strains were used to determine the specificity.

In recent years, research groups have used complete genomes to evaluate genus and species specific molecular markers for Shigella spp. and for E. coli in general or EIEC in particular, using k-mer approaches or alignments of all coding regions [39, 40]. In addition, genes uniquely present in EIEC and absent in Shigella spp. were used to develop a multiplex PCR to subtype EIEC isolates [37]. Phylogenomic analyses based on alignments of conserved regions [41] or core Single Nucleotide Polymorphisms (SNPs) [9] classified Shigella spp., non-invasive E. coli and EIEC strains in clades, which not necessarily reflect species designation [9, 41]. It was acknowledged that a large genomic diversity exists in EIEC isolates,

(5)

2

In-silico evaluation of identification methods

As most identification methods were developed for Shigella spp. and E. coli, only Shigella, E. coli and EIEC isolates were used for this evaluation. First, identification based on the 16S-23S rRNA encoding-region was performed as described [48], by selecting fragments flanked by the primers that were described from EIEC reference isolate 53638 (Genbank accession: NZ_AAKB02000001.1). One of the 16S-23S rRNA regions was selected as reference, trimmed reads of all copies of the 16S rRNA genes were randomly mapped against it and a consensus fragment of each genome was blasted with BLASTn of the NCBI (https://blast. ncbi.nlm.nih.gov/). Species identification was based on alignment of fragments with 16S-23S rRNA sequences deposited in “Nucleotide Collection (NC/nt)” and the “Refseq_representative genome” databases (access date: 2019-04-02). Bacterial species was assigned when the similarity score was 99% or higher and the similarity score differences of the first match with the next closest species was equal to or greater than 0.2% [48]. If the difference between the first and second match was < 0.2%, both hits were reported.

end) on an Illumina MiSeq® (Illumina Inc., San Diego, USA). Quality Control, trimming, de novo assembly and consensus extraction was performed with CLC Genomic Workbench version 8.5.3 (QIAGEN, Aarhus, Denmark). Trimming was performed with the default parameters (p=0.05), whereas de novo assembly options were adjusted to a word size of 29 and a minimum contig length of 1000. Genomes were annotated and amino-acid FASTA files were downloaded from the RAST annotation server using the default settings [44]. All sequences are available in the European Nucleotide Archive under study number PRJEB27313 https://www.ebi.ac.uk/ena.

Genomic classification analyses

The minimal proposed standards for taxonomy of prokaryotes based on genomics were described earlier [45]. It consists of a 16S rRNA gene analysis first, if similarity between isolates is equal or above the species threshold of 98.7%, whole genome analyses Average Nucleotide Identity (ANI) or digital DDH (dDDH) needs to be performed to confirm the species status, as the discriminatory power of the 16S rRNA gene is not sufficient to separate all species [45]. In this study, after next generation sequencing of the 16S rRNA gene, ANIb, Average Amino acid Identity (AAI) and dDDH analyses were performed. The species thresholds were ≥98.7% for the 16S rRNA gene, >95% for ANI, >95% for AAI, and >70% for dDDH [27]. For 16S rRNA gene analysis, trimmed reads were mapped against one of the copies of the rrs gene (16S rRNA gene, position 1769491-1771032) of EIEC reference isolate 53638 (Genbank accession: NZ_AAKB02000001.1). For each isolate, a 16S rRNA gene consensus fragment was exported to Clone Manager Professional 9.3 (Sci-ed Software, Denver, USA, http://www. scied.com) in which similarity was analyzed. As the consensus sequences contain reads resulting from different copies of the 16S rRNA genes, representative ambiguous bases were ignored in analysis of similarity. In addition, 16S rRNA genes for the typestrains of other E. albertii, E. fergusonii, E. hermanni, E. marmotae were analyzed for their similarity with isolates sequenced in this study and with each other. Strains above the 16S rRNA similarity species threshold were included in the ANIb, AAI and dDDH value calculations.

For ANIb and AAI calculation, the tool genome based distance matrix calculator of the Environmental Microbial Genomics Laboratory from the Georgia Institute of Technology was used [46]. Reciprocal BLASTN matches were used in which the average of the best hits and reciprocal best hits is calculated, because the taxonomic value of these methods compared to DDH has been proven [29]. For dDDH, genome-to-genome distance was calculated for all genomes against all genomes, using the genome-to-genome distance calculator 2.1 (GGDC) web software [26]. The local realignment tool BLAST+ was used and genomic distances were calculated as identities/ High-scoring Segment pairs (HSP) length. Of these distances, dDDH values and their confidence intervals were predicted using a generalized linear model [47].

Table 1 Strains sequenced in this study Genus, species,

or pathotype Strain numbera

O-typeb From

original collectionc

Country of

origin Country of isolation Year of isolation Accession(ENA)

Shigella dysenteriae CIP 57.28T 1 CIP Unknown UK 1934 ERR2642343

Shigella flexneri CIP 82.48T 2a CIP Unknown USA Prior 1973 ERR2642344

Shigella boydii CIP 82.50T 2 CIP Unknown India Prior 1938 ERR2642346

Shigella sonnei CIP 82.49T CIP Unknown USA Prior 1973 ERR2642345

Escherichia coli U5-41T O1 CDC > Cibd Denmark Denmark 1941 ERR2642354

Entero-invasive DSM 9027 O112ac DSM Portugal Portugal Prior 1948 ERR2642340 Escherichia coli

(EIEC) CCUG 11335 O28CCUG 38092 O143 CCUGCCUG unknownHungary unknownHungary Prior 19561986 ERR2642341ERR2642342 CCUG 38080 O124 CCUG USA USA 1992 ERR2642347 EW227 O124 CDC > Cibd Italy Italy 1944-1945 ERR2642355

1624-56 O144 CDC > Cibd unknown unknown Prior 1972 ERR2642356

1184-68 O152 CDC > Cibd Japan Japan 1971-1972 ERR2642358

145/46 O164 CDC > Cibd North Africa UK 1946 ERR2642360

L119B-10 O173 CDC > Cibd Thailand Thailand 1984 ERR2642362

BD11-00138 O102 Cib unknown Netherlands 2011 ERR2642348 BD12-00018 O29 Cib unknown unknown unknown ERR2642349 BD12-00020 O124 Cib unknown unknown unknown ERR2642350 BD13-00032 O159 Cib unknown Netherlands 2013 ERR2642351 BD13-00037 NT Cib unknown Netherlands 2013 ERR2642352 BD13-00229 O124 Cib unknown Netherlands 2013 ERR2642353 BD16-00087 NT Cib unknown Netherlands 2016 ERR2642369 NT = Not typable. T = typestrain of the species.aClinically encountered isolates start with “BD”, other isolates are reference

strains. bE. coli O-type if E. coli, Shigella serotype if Shigella spp.; either provided by culture collection or established by classic

methods. cCDC: Centers for Disease Control and Prevention, Atlanta, USA. Cib: Centre for Infectious Disease Control, Bilthoven,

the Netherlands. CCUG: Culture Collection, University of Göteborg, Göteborg, Sweden. CIP: Collection de l’Institute Pasteur, Paris, France. DSMZ: Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Braunschweig, Germany.dHistorical isolates donated by CDC to Cib for O-antiserum preparation and validation

(6)

2

The ANIb and AAI ranged from 97-100% and 96-100%, respectively for Shigella spp., E. coli

typestrain and EIEC isolates (Table 2B and Figure 1C and 1D), which is above the species threshold of 95%. The typestrains of E. fergusonii, E. albertii and E. marmotae had an ANIb of 88% to 92% and an AAI of 90% to 95% respectively, with all other isolates, which is below the species threshold (Table 2B and Figure 1C and 1D).

In-silico evaluation of identification methods

With the analysis of the 16S-23S rDNA fragments using the blast database “Nucleotide Collection”, the typestrains of E. coli, S. dysenteriae and S. sonnei were correctly identified, as well as ten of the 16 (63%) EIEC isolates (Table 3, Supplementary File 2). When using the Refseq_representative genome database, the typestrains of E. coli, S. dysenteriae, S. flexneri and nine (56%) EIEC isolates were correctly identified (Table 3, Supplementary File 2). Almost all assays that target species-or-genus-specific fragments, k-mers and diagnostic SNPs were able to identify all Shigella spp. and the E. coli typestrains correctly (Table 3). The method of Kim et al. performed best for EIEC isolates, as 14 out of 16 (88%) EIEC isolates from this study were correctly identified in silico (Table 3). With the assay of Ohja et al., it was not possible to detect any of the specific markers for S. boydii and S. sonnei typestrains, and most of the EIEC isolates were incorrectly identified (Table 3). For detailed results of the evaluation of these assays with the isolates sequenced in this study, see Supplementary Files 3 and 4.

In this study, the diagnostic SNPs described by Pettengill et al. were found 11 nucleotides further downstream from the position than stated by the authors, in both the isolates sequenced in this study as well as the isolates used by the authors themselves [9]. Therefore, all positions were corrected for these 11 nucleotides. In all but one isolate (S. dysenteriae typestrain), one out of the 37 diagnostic SNPs from Pettengill’s SD10 cluster was found on position 2,481,651 with the described nucleotide. Additionally, one SNP of the SF cluster at position 2,259,738 was found, but thymine was replaced by adenine in the isolates from this study, instead of guanine as reported earlier (Supplementary File 5) [9]. Additionally, in seven EIEC isolates (CCUG 38092, EW227, 1184-68, BD11-00138, BD12-00020, BD13-00229 and BD16-00087) one SNP out of 71 of the ExPEC cluster was found, in one isolate (BD11-00138 at position 817401) with the described nucleotide , and in six isolates with cytosine instead of the described adenine (position 3664060, Supplementary File 5). If these single SNPs were disregarded in the interpretation, 100% of Shigella spp. and the E. coli typestrains were designated to the expected phylogenetic clades consisting of at least the corresponding species (Table 3). Of the EIEC isolates, eight (50%) were correctly assigned to phylogenetic clades that contain EIEC isolates; Pettengill’s EIEC_EHEC_EAEC and EIEC_small clades. Of the other eight EIEC isolates, three were in possession of all specific SNPs that belong to the clusters consisting of S. boydii and S. dysenteriae isolates, while five EIEC isolates from this study possessed none of the diagnostic SNPs as described, apart from the single SNPs that were disregarded earlier [9].

Second, assays that target species-or-genus-specific fragments, identified by the groups of Pavlovic et al. [35], Sahl et al. [41], Ojha et al. [38], Kim et al. [39] and Dhakal et al. [37] were checked in silico for accurate identification of the isolates (Table 1). Primers and probes for targets of all tested assays were aligned using BLASTn against the database of isolates with an expected value of 100 and default settings in CLC Genomics Workbench (version 9.1.1). If expected primers were absent, trimmed reads were mapped against these primers as verification.

Third, Pettengill et al., described 254 diagnostic SNPs that should be clade specific. To assess to which clades the isolates in this study belonged, they were examined for the described diagnostic SNPs in CLC Genomic Workbench (version 9.1.1) using basic variant detection with read mapping against reference strain S. dysenteriae Sd197 (accession: CP000034) using the default settings. Variant tracks were exported and an Access database (Microsoft Office 2010) was created to query the diagnostic SNPs from the specified positions.

Finally, an identification method based on k-mers was performed on the isolates in this study using the k-mer ID software available, with a k-mer size of 18 bp [40]. A local database of reference genomes was created to compare with the k-mers of isolates in this study (Supplementary File 1). Selection of genomes in this database was based on the highest scoring reference genomes per species from the study of Chattaway et al. [40], supplemented with five EIEC reference genomes (Supplementary File 1).

Results

Genomic classification analyses

Similarity percentages for the 16S rRNA gene ranged from 98.83% to 100% between Shigella and E. coli isolates, including EIEC isolates, in this study (Figure 1A). For all Shigella spp., E. coli, EIEC isolates, E. fergusonii, E. albertii and E. marmotae, 16S rRNA gene similarity percentages exceeded the threshold of 98.7% (Table 2A and Figure 1a). Therefore, taxonomic analyses based on draft genomes were performed for all isolates as modern standards describe [45], except for E. hermannii, which had 16S rRNA gene similarity percentages below 98.7% with all other isolates.

For all Shigella spp., E. coli and EIEC isolates, dDDH values were above species threshold (70%) and varied from 73.4% [70.4 - 76.3] for isolates U5-41 (typestrain of E. coli) and CCUG 11335 (EIEC isolate) to 99.6 [99.3 - 99.7] for isolates EW227 and CCUG 38080 (Figure 1B). Mutual dDDH values of the typestrains ranged from 74.3% [71.3 - 77.1] to 89.3% [86.9 - 91.3] (Table 2A). The typestrains of E. fergusonii, E. albertii and E. marmotae had dDDH values with each other and the other isolates below species threshold (Table 2A, Figure 1B).

(7)

2

Table 2 Similarity matrices of 16S rRNA gene, dDDH, ANIb, AAIa values for typestrains and EIEC reference isolates

A S. dysent eriae T S. bo ydii T S. flexneri T S. sonnei T E. c oli T E. f er gusonii T E. albertii a E. marmot ae T E. hermannii T

CIP57.28 CIP82.50 CIP82.48 CIP82.49 U5-41 ATCC 53469 LMG

20976 / CDC 05-3016* HT073016 GTC 347 100 98.8 99.2 99.4 99.2 98.9 98.3 98.9 97.4 CIP57.28 S. dysenteriae T 81.2 [78.3 - 83.8] 100 99.9 99.9 99.6 99.7 99.4 98.5 97.4 CIP82.50 S. boydii T 81.0 [78.1 - 83.6] 86.0 [83.3 - 88.3] 100 99.9 99.9 100 99.5 98.8 97.6 CIP82.48 S. flexneri T 81.6 [78.7 - 84.1] 89.3 [86.9 - 91.3] 87.3 [84.8 - 89.5] 100 99.9 99.9 99.5 99.1 98.1 CIP82.49 S. sonnei T 74.3 [71.3 - 77.1] 74.4 [71.4 - 77.2] 74.4 [71.4 - 77.2] 74.3 [71.3 - 77.1] 100 99.8 99.3 98.9 97.5 U5-41 E. coli T 40.8 [38.3-43.3] 40.9 [38.4-43.5] 40.9 [38.4-43.5] 40.5 [38-43] 40 [37.5-42.5] 100 99.2 98.6 97.4 ATCC 53469 E. fergusonii T 40.2 [37.8-42.8] 40.5 [38-43] 39.8 [37.3-42.4] 40.3 [37.8-42.8] 40 [37.5-42.5] 34.1 [31.6-36.6] 100 98.1 96.8 LMG 20976 / CDC 05-3016a E. albertii Tb 43.4 [40.9-46] 43.4 [40.9-46] 43.6 [41-46.1] 43.7 [41.2-46.3] 43.2 [40.7-45.8] 34.8 [32.4-37.3] 38.6 [36.1-41.1] 100 97.4 HT073016 E. marmotae T 100 GTC 347 E. hermannii T B 100 98 98 98 97 92 89 90 CIP57.28 S. dysenteriae T 97 100 98 99 97 92 90 90 CIP82.50 S. boydii T 97 98 100 99 97 92 89 90 CIP82.48 S. flexneri T 97 98 98 100 97 92 90 90 CIP82.49 S. sonnei T 97 97 96 97 100 92 89 90 U5-41 E. coli T 92 91 91 91 92 100 88 89 ATCC 53469 E. fergusonii T 93 94 93 93 93 90 100 89 LMG 20976 / CDC 05-3016a E. albertii Tb 95 94 94 94 94 91 90 100 HT073016 E. marmotae T 100 GTC 347 E. hermannii T

A. 16S rRNA gene and dDDH values (gray shaded). B. ANIb and AAI (gray shaded). dDDH = digital DNA-DNA Hybridization, ANIb = Average Nucleotide Identity, AAI = Average Amino acid Identity. aFor dDDH, ANIb and AAI calculations, isolate CDC 05-3016 was used, because

lack of a genome of E. albertii.

Figure 1 Similarity matrices of 16S rRNA gene (A), dDDH values (B), ANIb (C), and AAI (D)

aFor dDDH, ANIb and AAI calculations, isolate CDC 05-3016 was used, because lack of a genome of E. albertii typestrain.

S. d ys ent er iae T S. boy di i T S. fle xn er i T S. so nne i T E. co li T EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC E. ferg us on ii T E. a lb er tii a E. ma rmo ta e T CI P57. 28 CI P82. 50 CI P82. 48 CI P82. 49 U 5-41 DS M 9027 CC U G11335 CC U G38092 CC U G38080 EW 227 1624-56 1184-68 145-46 L119B -10 BD 11-00138 BD 12-00018 BD 12-00020 BD 13-00032 BD 13-00037 BD 13-00229 BD 16-00087 AT CC 35469 CD C 05-3016 HT 073016 CIP57.28 S. dysenteriae T CIP82.50 S. boydii T CIP82.48 S. flexneri T CIP82.49 S. sonnei T U5-41 E. coli T DSM9027 EIEC CCUG11335 EIEC CCUG38092 EIEC CCUG38080 EIEC EW227 EIEC 1624-56 EIEC 1184-68 EIEC 145-46 EIEC L119B-10 EIEC BD11-00138 EIEC

BD12-00018 EIEC ANIb and AAI similarity BD12-00020 EIEC 100% BD13-00032 EIEC 99% BD13-00037 EIEC 98% BD13-00229 EIEC 97% BD16-00087 EIEC 96% ATCC 35469 E. fergusonii T <95% CDC 05-3016 E. albertiia HT073016 E. marmotae T

D

C

S. dy se nt eri ae T S. bo ydi i T S. fle xne ri T S. so nn ei T E. col i T EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC EI EC E. fergu so ni i T E. al be rt ii T a E. m arm ot ae T E. he rm an ni i T CIP5 7.2 8 CIP8 2.5 0 CIP8 2.4 8 CIP8 2.4 9 U 5-41 DS M 9027 CC U G 11335 CC U G 38092 CC U G 38080 EW 227 1624-56 1184-68 145-46 L1 19B -10 BD 11-00138 BD 12-00018 BD 12-00020 BD 13-00032 BD 13-00037 BD 13-00229 BD 16-00087 AT CC 35469 LM G 20976 HT 073016 G TC 347 CIP57.28 S. dysenteriae T CIP82.50 S. boydii T CIP82.48 S. flexneri T

CIP82.49 S. sonnei T 16S rRNA gene similarity U5-41 E. coli T 100% DSM9027 EIEC >99.8-100% CCUG11335 EIEC >99.6-99.8% CCUG38092 EIEC >99.4-99.6% CCUG38080 EIEC >99.2-99.4% EW227 EIEC >99.0-99.2% 1624-56 EIEC >98.7-99.0% 1184-68 EIEC < 98.7% 145-46 EIEC L119B-10 EIEC dDDH similarity BD11-00138 EIEC 100% BD12-00018 EIEC >95-99% BD12-00020 EIEC >90-95% BD13-00032 EIEC >85-90% BD13-00037 EIEC >80-85% BD13-00229 EIEC >75-80% BD16-00087 EIEC >70-75% ATCC 35469 E. fergusonii T < 70% LMG 20976 E. albertii Ta No data HT073016 E. marmotae T GTC 347 E. hermannii T

A

B

(8)

2

In contrast to Shigella spp. and E. coli, the genomic classification analyses in this study

support the current separate species designations for the other species of Escherichia. For E. fergusonii, E. albertii and E. marmotae, 16S rRNA similarity is >98.7%, however, ANIb, AAI and dDDH are below species threshold, and therefore they do not belong to the same species as E. coli and Shigella spp. The results of the dDDH analyses of the typestrains of Escherichia in this study are in concordance with another study that had access to the assembled genome sequence of the E. albertii typestrain [50].

In our study, the assay of Ohja et al. and the identification based on the 16S-23S rRNA region were not able to identify all typestrains correctly. For the first assay, this is explainable by the fact that most targets used are only located on plasmids, which are unsuitable as marker because of their unstable nature. For 16S-23S rRNA analysis, cut-off values for species boundaries are not available, however, isolates in this study have a similarity above 99.4% with both Shigella spp. and E. coli fragments present in the NCBI databases, once again confirming their relatedness. The main disadvantage of identification based on the 16S-23S rRNA sequencing method is a lack of 16S-23S rRNA reference sequences for many bacterial species, which hinders proper identification. The differences between the results of the two used databases are indicating the high dependence of this method on the additions and changes in the public databases as has been reported before [51]. Moreover, the quality of sequences or the qualities of species designations in the public databases are not constant. For a proper use of this identification method, it is necessary to develop a curated database containing reference sequences of the 16S-23S rRNA region for all Shigella spp., EIEC and their closely related species, and to develop cut-off criteria for genus-and species-level identification.

With the other six identification methods, all typestrains of Shigella spp. and E. coli were identified correctly, however, all had difficulties with the EIEC isolates. This confirms the heterogeneity of EIEC genomes already described by other groups [9, 42]. More sequenced EIEC genomes equals more diversity, as multiple groups developed assays that worked perfectly well with their own EIEC isolates, but not with the isolates sequenced in this study [9, 35, 39, 41]. Clearly, the close relationship between EIEC and Shigella spp., despite the diversity of their genomes indicates that assays based on a single feature will identify only a subgroup.

Although the number of analyzed isolates was relatively small, all essential typestrains for the genomic classification analyses were included. Incorrect species designations and diversity were proved in the genomic classification analyses and assay evaluation even with these small numbers of EIEC isolates. Nevertheless, it would be of value to include larger numbers of isolates which are covering more pathotypes of E. coli in future studies, for instance to properly benchmark the identification assays. Additionally, in order to validate

Discussion

In this study, genomic classification using modern standards was performed on Shigella spp., E. coli, and the pathotype EIEC in particular, with inclusion of all available assigned typestrains of the genera Shigella and Escherichia. Additionally, the Shigella and E. coli isolates from this study were used to evaluate eight identification assays that were described previously [9, 35, 37-41, 48].

All genomic classification analyses revealed that all Shigella species and E. coli are genetically related to such an extent that they should be classified as one species, and were above the thresholds of ≥ 98.7% 16S rRNA gene similarity, >95% ANI, >95% AAI, and >70% dDDH similarity [27]. Within these species borders, the EIEC strains related more to the Shigella typestrains than to the E.coli typestrain in all performed classification analyses. This is in concordance with other studies that also indicated that the EIEC-Shigella pathovar forms a different subgroup within E. coli, originated from independent lineages [7, 9]. The calculated dDDH values are in correspondence with a relative binding of >80% at 60°C and >71% at 75°C in the classical in vitro DDH experiments, although in the in vitro experiments E. coli K-12 and other Shigella isolates were used in addition to the S. flexneri typestrain only [49]. Other studies have also demonstrated that all Shigella species and E. coli should be classified as one species [5, 9, 11]. However, in former studies not all typestrains were used, and the proposed minimal standards in genomic taxonomy were not applied [45].

Table 3 Identification accuracy predicted by in silico analyses of earlier described assays with isolates sequenced in

this study Assays Isolates Ohja et al . a Pavlo vic et al . a Sahl et al . a Kim et al . a Dhak al et al . a 16S -23S r egion (nucleo tide collec tion) a 16S -23S r egion (r ef seq_r ep_ genomes) a Pe tt engill et al . (diagno stic SNP s) a Cha tt aw ay et al . (k -mer ID) a U5-41T (E. coli) 1 1 1 1 NA 1 1 1 1 CIP 57.28T (S. dysenteriae) 1 1 1 1 1 1 1 1 1 CIP 82.50T (S. boydii) 0 1 1 1 1 0 0 1 1 CIP 82.49T (S. sonnei) 0 1 1 1 1 1 0 1 1 CIP 82.48T (S. flexneri) 0 1 1 1 1 0 1 1 1 EIEC (n=16) 2 9 10 14 13 10 9 8 9

(9)

2

An alternative option for reclassification is the incorporation of the pathotype EIEC into the

genus Shigella. With this option, serovar designations can also be used, for instance Shigella serovar Flexneri 2a, or Shigella variant Coli O124 in case of current EIEC isolates. An advantage of the incorporation of EIEC isolates into the genus Shigella is that it reflects the similarity in implications and impact on public health, as Shigella spp. and EIEC form a distinct pathovar with the same virulent behavior and both are able to cause dysentery. Another advantage is that it provides more manageability for diagnostics, epidemiological surveillance and infectious disease control measures. Although the incorporation of EIEC into the genus Shigella improves manageability of diagnostics and the application of disease control measures, it would be a compromise, as it still does not reflect the actual genetic relatedness in the classification and nomenclature. Specifically, isolates from the newly formed Shigella genus in this option have genetic relatedness above species thresholds with E. coli isolates that do not belong to the pathotype EIEC. Another disadvantage of the incorporation of EIEC into the genus Shigella is that the current description of Shigella needs to be amended, as it includes key phenotypical features for the genus Shigella, as the absence of lysine decarboxylase activity, motility, fermentation of salicin, eculin hydrolysis and the combined features gas from D-glucose and indole production [16, 57]. This description cannot be applied to all EIEC isolates, as some of them are biochemically more similar to other E. coli and possess the features that are by definition negative for Shigella spp.

With both options for reclassification, problems with the current difficult diagnostics for identification of Shigella spp. and EIEC are solved, as there will be no longer need for distinction. In both options either EIEC or the genus Shigella can be separated from other E. coli by the presence of the ipaH gene facilitating communication, as the ipaH gene is a multicopy virulence gene on the pINV and on the chromosome, present in all Shigella spp. and EIEC, but absent in other E. coli pathotypes or commensals. The correlation of the presence of the ipaH gene with invasiveness of the isolate was proven [58-60], although even avirulent Shigella spp. and EIEC could be detected when using the conserved core of the ipaH gene as target [61, 62]. Implementation of a conventional PCR method targeting one fragment provides for a low-cost and effortless diagnostic test, and is feasible in all laboratories, even in low-resource areas.

Conclusions

This study provided more evidence that the current classification of Shigella spp. and E. coli does not reflect actual genetic relationships, by using modern standards in genomic classification techniques with the use of all essential typestrains. The genetic relatedness and current classification hinder the diagnostics, epidemiological surveillance, communication between medical microbiology laboratories, clinicians and public health the separate species status and combined genus status of all members of the genus

Escherichia it would be of value to sequence the whole genome of typestrains of E. albertii and E. hermannii for future taxonomic studies regarding Escherichia spp. and Shigella spp. The current classification of Shigella spp. and E. coli introduces difficulties with identification, particularly for EIEC isolates, because it does not reflect phylogenetic relationships. For a useful taxonomic scheme, defined taxa should be identifiable with easy applicable identification methods [52, 53]. Additionally, the artificial framework of bacterial taxonomy is designed to provide manageability of applications in science [54]. Both the criteria of identifiability and manageability do not apply to the classification of Shigella spp. and EIEC in separate genera. In contrast, the current classification complicates the development of diagnostics and the application of proper infection disease control measures. Additionally, one of the arguments for maintaining the current classification is to facilitate communication as Shigella spp. causes dysentery, while E. coli consists of various pathotypes and commensals [17]. However, the pathotype EIEC also causes dysentery, thus the separate classification does not facilitate communication about Shigella spp. in relation to EIEC. Moreover, genetic relationships should be reflected in nomenclature, or taxa should be reconstructed [53, 55]. This is applicable to Shigella spp. and E. coli, because all Shigella species have a separate species status and altogether a separate genus status exists for Shigella spp. and E. coli, while phylogenetic relations indicate they all should be classified and named as one species.

This study, which complied with modern taxonomic standards, added to earlier proof from other studies that showed evidence for the reclassification of Shigella spp. Two options for reclassification with their advantages and disadvantages are explored below. The first option is to reclassify Shigella spp. as E. coli, pathotype EIEC, as other researchers already have suggested [9, 11, 56]. In this option, it was suggested that all serotypes of the newly formed pathogroup EIEC should be renamed using their common O antigen names, for instance, EIEC O13 for current S. flexneri isolates [9]. Another option is to use serovar designations as a reference to the historical classification, according to rule 5c and appendix 10 of the ICNP, for instance EIEC serovar Flexneri [32]. The incorporation of Shigella into the species E. coli is the preferred option, as it reflects their genetic and evolutionary relationship, because Shigella spp. and EIEC are pathotypes of E. coli, just as STEC, ETEC and other described pathotypes. However, E. coli comprises different pathotypes with different infection mechanisms and outcomes, as well as commensal isolates, whilst Shigella spp. are only pathogenic and cause an invasive illness with disruption of the colonic epithelial cells. Therefore, there is a risk that the proposition for reclassification of Shigella species as E. coli, pathotype EIEC, will be rejected based on rule 56a of the ICNP [32]. This rule describes that perilous names should be rejected; these are names that cause confusion that can lead to health deterioration or death.

(10)

2

References

1. Hale, T.L., Genetic basis of virulence in Shigella species. Microbiol Rev, 1991. 55(2): p. 206-24.

2. Nataro, J.P.B., C.A.; Fields, P.I.; Kaper J.B.; Strockbine N.A., Escherichia, Shigella and Salmonella, in Manual of Clinical Microbiology, J. Versalovic, Editor. 2011, ASM Press: Washington D.C. p. 603-626.

3. Luria, S.E. and J.W. Burrous, Hybridization between Escherichia coli and Shigella. J Bacteriol, 1957. 74(4): p. 461-76.

4. Brenner, D.J., Polynucleotide sequence relatedness among Shigella species. Int J Syst Bacteriol, 1973. 23: p. 1-7.

5. Brenner, D.J., et al., Confirmation of aerogenic strains of Shigella boydii 13 and further study of Shigella serotypes by DNA relatedness. J Clin Microbiol, 1982. 16(3): p. 432-6.

6. Pupo, G.M., R. Lan, and P.R. Reeves, Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc Natl Acad Sci U S A, 2000. 97(19): p. 10567-72.

7. Lan, R., et al., Molecular evolutionary relationships of enteroinvasive Escherichia coli and Shigella spp. Infect Immun, 2004. 72(9): p. 5080-8.

8. Jin, Q., et al., Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res, 2002. 30(20): p. 4432-41.

9. Pettengill, E.A., J.B. Pettengill, and R. Binet, Phylogenetic analyses of Shigella and enteroinvasive Escherichia coli for the identification of molecular epidemiological markers: whole-genome comparative analysis does not support distinct genera designation. Front Microbiol, 2015. 6: p. 1573.

10. Alnajar, S. and R.S. Gupta, Phylogenomics and comparative genomic studies delineate six main clades within the family Enterobacteriaceae and support the reclassification of several polyphyletic members of the family. Infect Genet Evol, 2017. 54: p. 108-127.

11. Lan, R. and P.R. Reeves, Escherichia coli in disguise: molecular origins of Shigella. Microbes Infect, 2002. 4(11): p.

1125-32.

12. DuPont, H.L., et al., Pathogenesis of Escherichia coli diarrhea. N Engl J Med, 1971. 285(1): p. 1-9.

13. Platts-Mills, J.A., et al., Pathogen-specific burdens of community diarrhoea in developing countries: a multisite birth cohort study (MAL-ED). Lancet Glob Health, 2015. 3(9): p. e564-75.

14. Escher, M., et al., A severe foodborne outbreak of diarrhoea linked to a canteen in Italy caused by enteroinvasive Escherichia coli, an uncommon agent. Epidemiol Infect, 2014. 142(12): p. 2559-66.

15. Herzig, C.T.A., et al., Notes from the Field: Enteroinvasive Escherichia coli Outbreak Associated with a Potluck Party - North Carolina, June-July 2018. MMWR Morb Mortal Wkly Rep, 2019. 68(7): p. 183-184.

16. Strockbine, N.A., Maurelli, A.T., Genus XXXV. Shigella, in Bergey’s manual of systemic bacteriology. 2005, Springer science and business Media, Inc.: New York, USA. p. 811-823.

17. Brenner, D.J., Family I. Enterobacteriaceae Rahn 1937, Nom. fam. cons. Opin. 15, Jud. Com. 1958, 73; Ewing, Farmer, and Brenner 1980, 674; Judicial Commission 1981, 104, in Bergey’s Manual of Systematic Bacteriology, N.R. Krieg, Editor. 1984. p. 408-420.

18. Van Lint, P., et al., A screening algorithm for diagnosing bacterial gastroenteritis by real-time PCR in combination with guided culture. Diagn Microbiol Infect Dis, 2016. 85(2): p. 255-9.

19. Liu, J., et al., Use of quantitative molecular diagnostic methods to identify causes of diarrhoea in children: a reanalysis of the GEMS case-control study. Lancet, 2016. 388(10051): p. 1291-301.

20. Lede IO, K.-D.M., van den Kerkhof JHTC, Notermans DW, Gebrek aan uniformiteit bij meldingen van Shigatoxineproducerende Escherichia coli en Shigella aan en door GGDen. Infect. Bull., 2012. 23: p. 116-118.

21. EU. Comission Implementing Decision (EU) 2018/945 of 22 June 2018 on the communicable diseases and related special health issues to be covered by epidemiological surveillance as well as relevant case definitions Official Journal of the European Union 2018 6 July 2018 [cited 61 L170].

22. CDC. Shigellosis (Shigella spp.) 2017 Case Definition 2017 21 November 2018]; Available from: https://wwwn.cdc.gov/ nndss/conditions/shigellosis/case-definition/2017/.

23. Colwell, R.R., Polyphasic taxonomy of the genus vibrio: numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. J Bacteriol, 1970. 104(1): p. 410-33.

24. Tindall, B.J., et al., Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst Evol Microbiol, 2010. 60(Pt 1): p. 249-66.

25. Kim, M., et al., Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol, 2014. 64(Pt 2): p. 346-51.

26. Meier-Kolthoff, J.P., et al., Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics, 2013. 14: p. 60.

27. Thompson, C.C., et al., Microbial genomic taxonomy. BMC Genomics, 2013. 14: p. 913.

28. Chun, J. and F.A. Rainey, Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. Int J Syst Evol Microbiol, 2014. 64(Pt 2): p. 316-24.

29. Goris, J., et al., DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol, 2007. 57(Pt 1): p. 81-91.

services and infectious disease control measures of Shigella spp. and the pathotype EIEC that have similar clinical implications and impact on public health. The first option for reclassification is a rejection of the genus name Shigella, followed by the incorporation of Shigella spp. into E. coli as the pathotype EIEC. The second option is that EIEC isolates are incorporated into the genus Shigella. Although the first option is preferred because it reflects the actual relatedness of E. coli and Shigella spp., both options are easy manageable in routine diagnostics, as the Shigella/EIEC pathovar is easily distinguished from other E. coli by the presence of the ipaH gene. For a formal Request of Opinion to the Judicial Commission of the International Commission on Systematics of Prokaryotes more isolates need to be analyzed to support one option or the other.

Acknowledgements

We would like to thank our colleagues from the Molecular Unit, from the department of Medical Microbiology and Infection Prevention, UMCG Groningen for sequencing our isolates. For help with the analyses we would like to thank Sigrid Rosema from the department of Medical Microbiology and Infection Prevention, UMCG Groningen, Evert van Zanten from the department of Medical Microbiology, Certe, Groningen and Tom van Wijk from the Centre of Infectious Disease Control, Institute for Public Health and the Environment in Bilthoven.

(11)

2

59. Talukder, K.A., et al., The emerging strains of Shigella dysenteriae type 2 in Bangladesh are clonal. Epidemiol Infect, 2006. 134(6): p. 1249-56.

60. Oberhelman, R.A., et al., Evaluation of alkaline phosphatase-labelled ipaH probe for diagnosis of Shigella infections. J Clin Microbiol, 1993. 31(8): p. 2101-4.

61. Venkatesan, M.M., J.M. Buysse, and A.B. Hartman, Sequence variation in two ipaH genes of Shigella flexneri 5 and homology to the LRG-like family of proteins. Mol Microbiol, 1991. 5(10): p. 2435-45.

62. Venkatesan, M.M., J.M. Buysse, and D.J. Kopecko, Use of Shigella flexneri ipaC and ipaH gene sequences for the general identification of Shigella spp. and enteroinvasive Escherichia coli. J Clin Microbiol, 1989. 27(12): p. 2687-91.

30. Scheutz, F.a.S., N.A., Genus I. Escherichia Castellani and Chalmers, in Bergey’s Manual of Systematic Bacteriology, G.M.

Garrity, Editor. 2005, Springer Science: New York. p. 616.

31. Sneath, P.H.A., Bacterial Nomenclature, in Bergey’s Manual of Systematic Bacteriology, D.J. Brenner, N.R. Krieg, and J.T. Staley, Editors. 2005, Springer Science and Business Media, Inc.: New York.

32. Parker, C.T., B.J. Tindall, and G.M. Garrity, International Code of Nomenclature of Prokaryotes. Int J Syst Evol Microbiol, 2015. 69(1A): p. S1-S111.

33. Christensen, H., S. Nordentoft, and J.E. Olsen, Phylogenetic relationships of Salmonella based on rRNA sequences. Int J Syst Bacteriol, 1998. 48 Pt 2: p. 605-10.

34. Cilia, V., B. Lafay, and R. Christen, Sequence heterogeneities among 16S ribosomal RNA sequences, and their effect on phylogenetic analyses at the species level. Mol Biol Evol, 1996. 13(3): p. 451-61.

35. Pavlovic, M., et al., Development of a duplex real-time PCR for differentiation between E. coli and Shigella spp. J Appl Microbiol, 2011. 110(5): p. 1245-51.

36. Lobersli, I., et al., Molecular Differentiation of Shigella Spp. from Enteroinvasive E. Coli. Eur J Microbiol Immunol (Bp), 2016. 6(3): p. 197-205.

37. Dhakal, R., et al., Novel multiplex PCR assay for identification and subtyping of enteroinvasive Escherichia coli and differentiation from Shigella based on target genes selected by comparative genomics. J Med Microbiol, 2018. 67(9): p.

1257-1264.

38. Ojha, S.C., et al., A pentaplex PCR assay for the detection and differentiation of Shigella species. Biomed Res Int, 2013.

2013: p. 412370.

39. Kim, H.J., et al., Multiplex Polymerase Chain Reaction for Identification of Shigellae and Four Shigella Species Using Novel Genetic Markers Screened by Comparative Genomics. Foodborne Pathog Dis, 2017. 14(7): p. 400-406.

40. Chattaway, M.A., et al., Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences. J Clin Microbiol, 2017. 55(2): p. 616-623.

41. Sahl, J.W., et al., Defining the phylogenomics of Shigella species: a pathway to diagnostics. J Clin Microbiol, 2015. 53(3):

p. 951-60.

42. Hazen, T.H., et al., Investigating the relatedness of enteroinvasive Escherichia coli to other E. coli and Shigella isolates by using comparative genomics. Infect Immun, 2016. 84(8): p. 2362-71.

43. van den Beld, M.J.C., et al., Evaluation of a culture dependent algorithm and a molecular algorithm for identification of Shigella spp., Escherichia coli, and enteroinvasive E. coli (EIEC). J Clin Microbiol, 2018. 56: p. e00510-18.

44. Overbeek, R., et al., The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res, 2014. 42(Database issue): p. D206-14.

45. Chun, J., et al., Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol, 2018. 68(1): p. 461-466.

46. Rodriguez-R, L.M. and K.T. Konstantinidis, The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ Preprints, 2016. 4:e1900v1.

47. Auch, A.F., et al., Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Stand Genomic Sci, 2010. 2(1): p. 117-34.

48. Sabat, A.J., et al., Targeted next-generation sequencing of the 16S-23S rRNA region for culture-independent bacterial identification - increased discrimination of closely related species. Sci Rep, 2017. 7(1): p. 3434.

49. Brenner, D.J., Characterization and clinical identification of Enterobacteriaceae by DNA hybridization. Prog Clin Pathol, 1978. 7: p. 71-117.

50. Liu, S., et al., Escherichia marmotae sp. nov., isolated from faeces of Marmota himalayana. Int J Syst Evol Microbiol, 2015. 65(7): p. 2130-4.

51. Peker, N., et al., A Comparison of Three Different Bioinformatics Analyses of the 16S-23S rRNA Encoding Region for Bacterial Identification. Front Microbiol, 2019. 10: p. 620.

52. Vandamme, P., et al., Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev, 1996. 60(2):

p. 407-38.

53. Wayne, L.G., et al., Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Bacteriol, 1987. 37: p. 463-464.

54. Brenner, D.J., N.R. Krieg, and J.T. Staley, Classification of Prokaryotic Organisms and the Concept of Bacterial Speciation, in Bergey’s Manual of Systematic Bacteriology, D.J. Brenner, N.R. Krieg, and J.T. Staley, Editors. 2005, Springer Science and business media, Inc.: New York.

55. Murray, R., et al., Report of the ad hoc committee on approaches to taxonomy within the Proteobacteria. Int J Syst Bacteriol, 1990. 40: p. 213-215.

56. Chaudhuri, R.R. and I.R. Henderson, The evolution of the Escherichia coli phylogeny. Infect Genet Evol, 2012. 12(2): p.

214-26.

57. Bopp, C.A.B., F.W.; Fields, P.I.; Wells, J.G.; Strockbine, N.A., Escherichia, Shigella and Salmonella, in Manual of Clinical Microbiology, P.R. Murray, Editor. 2003, ASM Press: Washington D.C. p. 654-671.

58. Talukder, K.A., et al., Temporal shifts in the dominance of serotypes of Shigella dysenteriae from 1999 to 2002 in Dhaka, Bangladesh. J Clin Microbiol, 2003. 41(11): p. 5053-8.

(12)

2

Supplementary Materials

Supplementary File 1 Used references in our local database for K-mer identificationa

EIEC Accession Shigella_dysenteriae Accession

Escherichia coli_4608_58 NZ_JTCO00000000 Shigella_dysenteriae_1012 NZ_AAMJ00000000.2 Escherichia coli_53638 NZ_AAKB00000000.2 Shigella_dysenteriae_155_74 NZ_AFFZ00000000.1 Escherichia coli_CFSAN029787 CP011416.1 Shigella_dysenteriae_225_75 NZ_AKNG00000000.1 Escherichia coli_LT68 ADUP00000000.1 Shigella_dysenteriae_CDC_74_1112 NZ_AERM00000000.1 Escherichia coli_M4163 JTCN00000000.1 Shigella_dysenteriae_Sd197 NC_007606.1

Escherichia_coli Accession Shigella_flexneri Accession

Escherichia_coli_042 NC_017626.1 Shigella_flexneri_2002017 NC_017328.1 Escherichia_coli_clone_Di2 CP002211.1 Shigella_flexneri_2a str. 2457T ADUV00000000 Escherichia_coli_IAI1 NC_011741.1 Shigella_flexneri_2a_301 AE005674.2 Escherichia_coli_K12 NC_000913.3 Shigella_flexneri_5_8401 CP000266.1 Escherichia_coli_KTE138 ANYB00000000 Shigella_flexneri_6603_63 AGRC00000000 Escherichia_coli_KTE143 ANVB00000000 Shigella_flexneri_CDC_796_83 AERO00000000 Escherichia_coli_KTE237 ANUA00000000 Shigella_flexneri_K_1770 AKMX00000000 Escherichia_coli_O157H7Sakai NC_002695.1 Shigella_flexneri_K_218 AFGV00000000 Escherichia_coli_SE15 AP009378.1 Shigella_flexneri_K_315 AKMY00000000 Escherichia_coli_UMN026 NC_011751.1 Shigella_flexneri_VA-6 AFGW00000000

Shigella_boydii Accession Shigella_sonnei Accession

Shigella_boydii_5216_82 AFGE00000000 Shigella_sonnei_4822_66 AKNE00000000 Shigella_boydii_965_58 NZ_AKNA00000000.1 Shigella_sonnei_53G GCA_000283715.1 Shigella_boydii_CDC_3083_94 NC_010658.1 Shigella_sonnei str. Moseley AGRD00000000.1 Shigella_boydii_Sb227 NC_007613.1 Shigella_sonnei_Ss046 NC_007384.1

(13)

2

Supplementary File 2A Detailed results of identification based on the 16S-23S region A Using the blast database “Nucleotide Collection”

Sample 16-23S DNA Star BLAST GenBank

(4kb) 1st match NGS Score 16-23S DNA Star BLAST GenBank (4kb) 2nd match NGS Score ID 16S-23S R/C

a ID biochemical &

serological

DSM 9027 Escherichia coli strain NCTC9066 4383/4395 (99.73%) No other speciesb na E. coli R EIEC, O112ac

CCUG 11335 Escherichia coli strain RM10042 4389/4395 (99.86%) Shigella dysenteriae ATCC 9752 4383/4395 (99.73%) E. coli/S. dysenteriae/S. boydii R EIEC, O28 Shigella boydii ATCC BAA-1247 4381/4395 (99.68%)

CCUG 38092 Escherichia coli strain 2016C-3936C1 4385/4394 (99.7%) No other speciesb na E. coli R EIEC, O143

CCUG 38080 Escherichia coli isolate WI2 4390/4396 (99.9%) No other speciesb na E. coli R EIEC

BD11-00138 Escherichia coli strain 89-3156 4382/4396 (99.7%) No other speciesb na E. coli C EIEC, O102

BD12-00018 Escherichia coli strain RM10042 4386/4395 (99.84%) Shigella dysenteriae ATCC 9752 4382/4395 (99.70%) E. coli/S. dysenteriae/S. boydii/S. sonnei C EIEC, O29 Shigella boydii ATCC BAA-1247 4380/4395 (99.66%)

Shigella sonnei Ss046 4380/4396 (99.64%)

BD12-00020 Escherichia coli PCN061 4389/4396 (99.8%) No other speciesb na E. coli C EIEC, O124

BD13-00032 Shigella dysenteriae strain ATCC 12037 4394/4395 (99.98%) Escherichia coli strain 13E0767 4382/4395 (99.7%) S. dysenteriae/E. coli/S. flexneri C EIEC, O159 Shigella flexneri strain 64-5500 4389/4396 (99.84%)

BD13-00037 Shigella boydii strain ATCC BAA-1247 4393/4395 (99.95%) Shigella dysenteriae strain ATCC 12037 4393/4395 (99.95%) S. boydii/S. dysenteriae/S. flexneri C EIEC, O-untypeable Shigella flexneri strain 64-5500 4389/4395 (99.84%)

BD13-00229 Escherichia coli strain 90-9272 4391/4398 (99.84%) No other speciesb na E. coli C EIEC, O124

BD16-00087 Escherichia coli strain 90-9272 4390/4398 (99.82%) No other speciesb na E. coli C EIEC, O-untypeable

EW227 Escherichia coli strain 90-9272 4389/4398 (99.80%) No other speciesb na E. coli R EIEC

1624-56 Escherichia coliRM10466 4386/4396 (99.77%) No other speciesb na E. coli R EIEC

1184-68 Escherichia coli strain strain 3426 4392/4397 (99.89%) No other speciesb na E. coli R EIEC

145/46 Escherichia coli strain RM10042 4389/4395 (99.86%) Shigella dysenteriae strain ATCC 9752 4383/4395 (99.73%) E. coli/S. dysenteriae/S. boydii/S. sonnei R EIEC Shigella boydii ATCC BAA-1247 4381/4395 (99.68%)

Shigella sonnei Ss046 4381/4396 (99.66%)

L119B-10 Escherichia coli strain RM10042 4388/4395 (99.84%) Shigella dysenteriae ATCC 9752 4382/4395 (99.70%) E. coli/S. boydii/S. sonnei R EIEC Shigella boydii ATCC BAA-1247 4381/4395 (99.66%)

Shigella sonnei Ss046 4380/4396 (99.64%)

CIP 57.28 Shigella dysenteriae strain 80-547 4386/4386 (100%) Escherichia coli strain Combat 11-9 4363/4386 (99%) E. coli R S. dysenteriae CIP 82.48 Shigella flexneri AUSMDU00008332 4395/4396 (99.98%) Shigella boydii strain 54-1621 4393/4396 (99.93%) S. flexneri/S. boydii R S. flexneri

Escherichia coli strain 89-3156 4380/4396 (99.64%)

CIP 82.49 Shigella sonnei Ss046 4391/4396 (99.89%) Escherichia coli CFSAN004176 4388/4396 (99.82%) E. coli R S. sonnei CIP 82.50 Shigella boydii ATCC8700 4396/4396 (100%) Shigella flexneri strain 64-5500 4394/4396 (99.95%) S. boydii/S. flexneri R S. boydii

Shigella dysenteriae ATCC 12037 4386/4396 (99.77%)

(14)

2

Supplementary File 2B Detailed results of identification based on the 16S-23S region B Using the Refseq_representative genome database

Sample 16-23S DNA Star BLAST GenBank (4kb)

1st match NGS Score GenBank (4kb) 2nd match16-23S DNA Star BLAST NGS Score ID 16S-23S R/C* ID biochemical & serological

DSM 9027 Escherichia coli O157:H7 str. Sakai 4373/4398 (99.43%) Shigella flexneri 2a str. 301 4365/4398 (99.25%) E. coli/S. flexneri R EIEC, O112ac CCUG 11335 Escherichia coli str. K-12 substr. MG1655 4377/4396 (99.57%) Shigella flexneri 2a str. 301 4369/4396 (99.39%) E. coli/S. flexneri R EIEC, O28 CCUG 38092 Escherichia coli UMN026 4376/4396 (99.55%) Shigella flexneri 2a str. 301 4360/4398 (99.14%) E. coli R EIEC, O143 CCUG 38080 Escherichia coli str. K-12 substr. MG1655 4387/4396 (99.80%) Shigella flexneri 2a str. 301 4364/4396 (99.27%) E. coli R EIEC BD11-00138 Escherichia coli str. K-12 substr. MG1655 4373/4396 (99.48%) Shigella flexneri 2a str. 301 4357/4396 (99.11%) E. coli C EIEC, O102 BD12-00018 Escherichia coli str. K-12 substr. MG1655 4376/4396 (99.55%) Shigella flexneri 2a str. 301 4368/4396 (99.36%) E. coli/S. flexneri C EIEC, O29 BD12-00020 Escherichia coli str. K-12 substr. MG1655 4378/4397 (99.57%) Shigella flexneri 2a str. 301 4363/4397 (99.23%) E. coli C EIEC, O124 BD13-00032 Escherichia coli str. K-12 substr. MG1655 4371/4396 (99.43%) Shigella flexneri 2a str. 301 4364/4396 (99.27%) E. coli/S. flexneri C EIEC, O159 BD13-00037 Escherichia coli str. K-12 substr. MG1655 4371/4396 (99.43%) Shigella flexneri 2a str. 301 4364/4396 (99.27%) E. coli/S. flexneri C EIEC, O-untypeable BD13-00229 Escherichia coli str. K-12 substr. MG1655 4386/4396 (99.77%) Shigella flexneri 2a str. 301 4365/4396 (99.29%) E. coli C EIEC, O124 BD16-00087 Escherichia coli str. K-12 substr. MG1655 4385/4396 (99.75%) Shigella flexneri 2a str. 301 4366/4396 (99.32%) E. coli C EIEC, O-untypeable EW227 Escherichia coli str. K-12 substr. MG1655 4384/4396 (99.73%) Shigella flexneri 2a str. 301 4363/4396 (99.25%) E. coli R EIEC

1624-56 Escherichia coli str. K-12 substr. MG1655 4380/4396 (99.64%) Shigella flexneri 2a str. 301 4363/4396 (99.25%) E. coli R EIEC 1184-68 Escherichia coli str. K-12 substr. MG1655 4385/4396 (99.75%) Shigella flexneri 2a str. 301 4363/4396 (99.27%) E. coli R EIEC 145/46 Escherichia coli str. K-12 substr. MG1655 4377/4396 (99.57%) Shigella flexneri 2a str. 301 4369/4396 (99.39%) E. coli/S. flexneri R EIEC L119B-10 Escherichia coli str. K-12 substr. MG1655 4376/4396 (99.55%) Shigella flexneri 2a str. 301 4368/4396 (99.36%) E. coli/S. flexneri R EIEC CIP 57.28 Shigella dysenteriae Sd197 4386/4386 (100%) Escherichia coli UMN026 4349/4398 (98.89%) S. dysenteriae R S. dysenteriae

CIP 82.48 Shigella flexneri 4394/4396 (99.95%)

Escherichia coli str. K-12 substr.

MG1655 4377/4396 (99.57%) S. flexneri R S. flexneri

CIP 82.49 Escherichia coli str. K-12 substr. MG1655 4383/4396 (99.70%) Shigella flexneri 2a str. 301 4376/4396 (99.55%) E. coli/S. flexneri R S. sonnei CIP 82.50 Escherichia coli str. K-12 substr. MG1655 4371/4396 (99.43%) Shigella flexneri 2a str. 301 4365/4396 (99.29%) E. coli/S. flexneri R S. boydii U5-41 Escherichia strain NRG 857C 4389/4396 (99.84%) Shigella flexneri 2a str. 301 4360/4499 (99.11%) E. coli R E. coli

Referenties

GERELATEERDE DOCUMENTEN

The work presented in this thesis was performed at and funded by the Centre for Infectious Disease research, diagnostics and laboratory Surveillance (IDS) of the National Institute

Lan, R., et al., Molecular evolution of large virulence plasmid in Shigella clones and enteroinvasive Escherichia coli.. Hale, T.L., Genetic basis of virulence in

Material and Methods Evaluation of culture dependent diagnostic methods Two digital surveys, which comprised questions about the culture-dependent and molecular methods used to

All isolates except for one EIEC strain (97%) were identified in concordance with the original identification, or had an inconclusive result of which one of the results was

Figure 1 The classes in the different discrimination levels to which isolates were assigned Table 1 Continued Pathotype Genus Group Species ▪ Shigella ▪ Escherichia

flexneri isolate was obtained or detected in the fecal samples were used in the comparison of culture- positive cases with culture-negative cases.. flexneri and one EIEC isolate

We investigated the association of symptoms and disease severity of shigellosis patients with genetic determinants of infecting Shigella and entero-invasive Escherichia coli (EIEC),

As notifications from MMLs towards health authorities were not uniform, the comparability of the current culture dependent and molecular methods used by MMLs in the Netherlands