• No results found

A Missense Mutation in SARS-CoV-2 Potentially Differentiates Between Asymptomatic and Symptomatic Cases

N/A
N/A
Protected

Academic year: 2022

Share "A Missense Mutation in SARS-CoV-2 Potentially Differentiates Between Asymptomatic and Symptomatic Cases"

Copied!
6
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

A Missense Mutation in SARS-CoV-2 Potentially Differentiates Between

Asymptomatic and Symptomatic Cases

Alejandro Lopez-Rincon1, Alberto Tonda2, Lucero Mendoza-Maldonado3, Eric Claassen4, Johan Garssen1 & Aletta D. Kraneveld1

1 Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht.

2 INRAE, France.

3 Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Jalisco, Mexico

4 Athena Institute, Vrije Universiteit, De Boelelaan 1085, 1081 HV Amsterdam, the Netherlands.

Correspondence to Alejandro Lopez-Rincon (email: alejandro.lopezrn@hotmail.com)

(Submitted: 9 April 2020 – Published online: 9 April 2020)

DISCLAIMER

This paper was submitted to the Bulletin of the World Health Organization and was posted to the COVID-19 open site, according to the protocol for public health emergencies for

international concern as described in Vasee Moorthy et al.

(http://dx.doi.org/10.2471/BLT.20.251561).

The information herein is available for unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited as indicated by the Creative Commons Attribution 3.0 Intergovernmental Organizations licence (CC BY IGO 3.0).

RECOMMENDED CITATION

Lopez-Rincon A, Tonda A, Mendoza-Maldonado L, Claassen E, Garssen J & Kraneveld AD.

A Missense Mutation in SARS-CoV-2 Differentiates Between Asymptomatic and Symptomatic Cases. [Preprint]. Bull World Health Organ. E-pub: 9 April 2020. doi:

http://dx.doi.org/10.2471/BLT.20.258889

(2)

Abstract

Analyzing a convolutional neural network, trained to separate asymptomatic from symptomatic COVID-19 samples from the GISAID repository, we are able to identify several 21-bps sequences that can be used to predict patient status starting from the viral genome. By checking only 3 of the identified mutations, we show that the status of a patient can be correctly classified with 95.11% accuracy. These data could be used to identify a strain of SARS-CoV-2 with a higher likelihood of a positive outcome. In fact, the most significant identified mutation (11083G>T) is frequent in samples originating from the Diamond Princess cruise ship, where 46.5% of the cases positive for SARS- CoV-2 were asymptomatic, and the mortality rate was significantly low (1.3%).

Considering samples coming from other locations worldwide, the same mutation only appears in 11.88% of symptomatic cases. Since asymptomatic patients are less likely to be tested, there is a possibility that the changes induced by the mutation could positively affect the patients’ status.

In December 2019, SARS-CoV-2, a novel, human-infecting Coronavirus was identified in Wuhan, China [1]. As of April 8 of 2020, the new SARS-CoV-2 has 1,279,722 confirmed cases across almost all countries, with 72,614 related deaths [2]. Given the size of the outbreak, one of the major problems is the lack of the necessary medical equipment for care of patients, in contrast to the number of infected people. Thus, using a genomic analysis of SARS-CoV-2 viral sequences we try to develop a tool to predict the number of cases that will require medical care by differentiating between asymptomatic and symptomatic cases.

From the Global Initiative on Sharing All Influenza Data (GISAID) repository we downloaded 3,498 complete viral sequences with host=”homo sapiens”, where 3,091 were unique [3]. From these 3,091 sequences, 358 have metadata information of patient status, being asymptomatic, symptomatic and hospitalized. Among the laboratories that submitted the sequences, the ones we were able to reach confirmed that being hospitalized meant that the patient presented Covid-19 pneumonia. From these 358 samples we then created a dataset of asymptomatic (n=55) and symptomatic (also including hospitalized) patients (n=303).

Through a Convolutional Neural Network (CNN), with an architecture composed of one convolutional layer with 12 filters (each with window size 21) with max pooling (with pool size and stride of 148), a fully connected layer (196 rectified linear units with dropout probability 0.5), and a final softmax layer with 2 units, to differentiate the patient's status in SARS-CoV-2 samples. The convolutional layer of the network, in simple terms, is analyzing subsequences of 21 base pairs (bps) that can appear in different points of the viral genome. With the CNN, we uncovered 18,258 features (21-bps length sequences) that allow the network to differentiate between symptomatic and asymptomatic patients. From these features, we ran a feature reduction algorithm [4] to find the 21-bps sequences that gave the maximum mean accuracy using 8 different classifiers (Gradient Boosting, Passive Aggressive, Logistic Regression, Support Vector, Random Forest, Stochastic Gradient Descent, Ridge and Bagging) from the scikit-learn toolbox [5], ultimately obtaining 53 meaningful 21-bps sequences (Fig.1) with a global accuracy of 97.05%.

(3)

Fig. 1. Output of the feature reduction algorithm used to find the minimal number of 21-bps sequences necessary to differentiate between symptomatic and asymptomatic patients with high accuracy. The accuracy peaks at 53 sequences.

For each of the 53 features, we verified how many times the corresponding sequence appears among the samples of each patient type (see results in Table 1).

Table 1. Frequence of appearance among the samples, for selected sequences. The chosen sequences all present a relevant difference in frequency (at least >10%) among symptomatic and asymptomatic patients.

From the analysis summarized in Table 1, we found 3 sequences that appear mostly (94.55%) in asymptomatic cases, and 2 that are mostly found (85.15%) in symptomatic cases. Further analysis on these 5 sequences, each one being 21 bps in length, showed that they were all part of the same 28-bps viral sequence, and that the difference between symptomatic and asymptomatic cases can be reduced to a single mutation, see Fig. 2.

(4)

Fig 2. A single mutation between discovered sequences separates asymptomatic from symptomatic cases in 311 out of the selected 358 samples, for a resulting accuracy of 86.87%.

Further analysis of the mutation showed that 94.54% of the asymptomatic samples contain the sequence TTTTTTTTTTATGAAAATGCCTTTTTAC, and 85.47% of the symptomatic patients present the sequence TTTTTTTTGTATGAAAATGCCTTTTTAC (Table 2). This sequence is located in the region of the ORF1ab gene, but contains a transversion that discriminates between symptomatic and asymptomatic patients (11083G>T). This genetic variation results in the substitution of leucine to phenylalanine in non-structural protein 6 (nsp6) protein (p.L3606F).

Table 2. Frequency of appearance of the two identified sequences in asymptomatic and symptomatic cases.

Given the importance of the mutation, using primer3Plus [6] and fastPCR [7], we calculated a primer set to obtain the region of interest, which yields the following results;

TTCCAAAGTGCAGTGAAAAGAA (position 10967->10988) as forward primer, and TTGCAAAAGCAGACATAGCAA (position 11121<-11141) as reverse primer, with an amplicon size of 175 bps in the NC045512.2 sequence [8] used as a reference. Then, it will be necessary to sequence the amplicon.

The presented SARS-Cov-2 mutation has been reported in [9-10], with predicted possible functional implications. Additionally, our data analysis suggests a correlation of this mutation with the severity of the Covid-19. ORF1ab proteins play an important role in pathogenesis, specifically in viral replication. Such a role might be carried out through the interaction of structural and nonstructural proteins, besides the regulatory sequences in viral RNA.

Furthermore, it has been noted that mutations in ORF1ab are positively selected during trans- species transmission of SARS-CoV and SARS-like coronaviruses [11]. Nonstructural proteins (nsps), like NSP6, have the ability to induce double membrane vesicles in endoplasmic reticulum of SARS-CoV infected cells and belong to replication-transcription complexes [12,13]. Also, NSP6 has 6 conserved transmembrane domains and participates in limiting autophagosome expansion [14,15]. This may favor coronavirus infection by compromising the ability of autophagosomes to deliver viral components to lysosomes for degradation. Thus, we hypothesize that the missense variation in ORF1ab gene (11083G>T), using the reference genome NC045512.2, located in NSP6 protein, could affect to the NSPs interaction and the double membrane vesicles formation resulting in a reduced viral load during infection, and less severe symptoms in patients infected with SARS-CoV-2.

A further analysis on the metadata revealed that 51 out of 55 the asymptomatic samples originated from passengers of the Diamond Princess (DP) cruise [16]. The mutation is present in only another one of the four remaining asymptomatic cases. Although no complementary information is available from the DP samples, like age of the patient or if they developed symptoms afterwards or not, from [17] we do know that the super-spreading in the DP started before Feb. 3, and the samples were measured in Feb. 15-20. Out of 3,711, 712 were positive with 331 asymptomatic (46.5%) and 9 died (1.3%) [18-20], whereas the global deaths are 72,614 of 1,279,722 (5.67%) as of April 8 [2].

The identified mutation is present in all of the samples from passengers (51 asymptomatic, 6 symptomatic), appearing only in a few samples from other locations (36 symptomatic, 1

(5)

asymptomatic). Nevertheless, we found that we can get a 95.11% accuracy checking two mutations in other parts of the genome (26,144G>T) and (1,397G>A) (Fig. 3). It is of course possible to reach even higher accuracies by checking the presence of all 53 previously identified sequences, but that would imply considering complex, non-linear relationships that could generalize poorly, and would also be difficult to interpret for human experts.

Fig 3. Necessary mutations to verify to have an accuracy of 95.11%.

The sequences presented in Table 3 differentiate symptomatic cases that have the main mutation, from the asymptomatic cases. The results we obtained are promising, and, as more data becomes available, we aim to improve our analysis.

Table 3. Percentage of appearance of the sequences able to discriminate between Asymptomatic and Symptomatic cases of SARS-CoV-2 with an accuracy of 95.11%.

In conclusion, we believe that this report shows for the first time that a missense mutation in SARS-CoV-2 has a strong correlation with a difference between asymptomatic and symptomatic cases. This could explain the small frequency of appearance of the mutation (11.88%) in symptomatic samples that were not connected to the DP, as patients with no symptoms are less likely to be tested. Nevertheless, it is necessary to collect more asymptomatic samples to provide further evidence for the generality of our findings.

References

[1] Lu, Roujian, et al. "Genomic characterisation and epidemiology of 2019 novel coronavirus:

implications for virus origins and receptor binding." The Lancet 395.10224 (2020): 565-574.

[2] World Health Organization. "Coronavirus disease 2019 (COVID-19): situation report, 78."

(2020).

[3] Shu, Yuelong, and John McCauley. "GISAID: Global initiative on sharing all influenza data–

from vision to reality." Eurosurveillance 22.13 (2017).

[4] Lopez-Rincon, Alejandro, et al. "Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection." BMC bioinformatics 20.1 (2019): 480.

[5] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of machine learning research 12.Oct (2011): 2825-2830.

[6] Untergasser, Andreas, et al. "Primer3Plus, an enhanced web interface to Primer3." Nucleic acids research 35.suppl_2 (2007): W71-W74.

[7] Kalendar, Ruslan, David Lee, and Alan H. Schulman. "FastPCR software for PCR primer and probe design and repeat search." Genes, Genomes and Genomics 3.1 (2009): 1-14.

(6)

[8] Wassenaar, Trudy M., and Ying Zou. "2019_nCoV: Rapid classification of betacoronaviruses and identification of traditional Chinese medicine as potential origin of zoonotic coronaviruses." Letters in Applied Microbiology (2020).

[9] Phan, Tung. "Genetic diversity and evolution of SARS-CoV-2." Infection, Genetics and Evolution 81 (2020): 104260.

[10] Cárdenas‐Conejo, Yair, et al. "An exclusive 42 amino acid signature in pp1ab protein provides insights into the evolutive history of the 2019 novel human‐pathogenic coronavirus (SARS‐CoV2)." Journal of Medical Virology (2020).

[11] Graham, Rachel L., et al. "SARS coronavirus replicase proteins in pathogenesis." Virus research 133.1 (2008): 88-100.

[12] Angelini, Megan M., et al. "Severe acute respiratory syndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles." MBio 4.4 (2013): e00524-13.

[13] Beachboard, Dia C., Jordan M. Anderson-Daniels, and Mark R. Denison. "Mutations across murine hepatitis virus nsp4 alter virus fitness and membrane modifications." Journal of virology 89.4 (2015): 2080-2089.

[14] Baliji, Surendranath, et al. "Detection of nonstructural protein 6 in murine coronavirus- infected cells and analysis of the transmembrane topology by using bioinformatics and molecular approaches." Journal of virology 83.13 (2009): 6957-6962.

[15] Cottam, Eleanor M., Matthew C. Whelband, and Thomas Wileman. "Coronavirus NSP6 restricts autophagosome expansion." Autophagy 10.8 (2014): 1426-1441.

[16] Mizumoto, Kenji, et al. "Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020." Eurosurveillance 25.10 (2020).

[17] Sekizuka, Tsuyoshi, et al. "Haplotype networks of SARS-CoV-2 infections in the Diamond Princess cruise ship outbreak." medRxiv (2020).

[18] Xu, Xiao-Wei, et al. "Clinical findings in a group of patients infected with the 2019 novel coronavirus (SARS-Cov-2) outside of Wuhan, China: retrospective case series." Bmj 368 (2020).

[19] Mizumoto, Kenji, et al. "Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020." Eurosurveillance 25.10 (2020).

[20] Moriarty, Leah F. "Public health responses to COVID-19 outbreaks on cruise ships—

worldwide, February–March 2020." MMWR. Morbidity and Mortality Weekly Report 69 (2020).

 

Referenties

GERELATEERDE DOCUMENTEN

Gebaseerd op deze inzichten lijkt de kans op infectie van vleermuizen in België eerder klein, maar niet onbestaande. Soorten van het genus Rhinolophus lopen de grootste kans om

Although the discrimination sensitivity of the untrained listeners was relatively high compared to untrained listeners with a different native- language background (Köster et al.,

52) Wat zijn de leuke kanten van dit kind? m.a.w. Wat gaat goed? Waar genieten jullie van? 53) Hebben jullie dezelfde ideeën over opvoeden?. 54) Reageren jullie hetzelfde op

The positive correlations of the cash payment and focused merger dummy with the post-merger operating cash flow means that it can be expected that these variables have

Omdat andere dieren in de veehouderij niet gevoelig lijken voor SARS-CoV-2, wordt nu alleen voor nertsen een meldplicht ingesteld (zie ook de brief aan uw Kamer van 22 april

In this large kindred of genotyped thrombophilic subjects, we found a 7.4% incidence of deep venous valvular incompetence among those with PC deficiency and no history of

Interestingly, with regard to the waveform mxcorr, reconstructions from the second fully connected layer (layer 12) are only slightly worse than reconstructions from the

We found that the complexity of the HP variability at short time scale was under vagal control, being significantly increased during NIGHT and BBon both in ASYMP and SYMP groups,