02 Apr

pangolin lineage covid

Avian influenza a virus (H7N7) epidemic in The Netherlands in 2003: course of the epidemic and effectiveness of control measures. Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. The web application was developed by the Centre for Genomic Pathogen Surveillance. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. 26, 450452 (2020). Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Yres, D. L. et al. Maclean, O. 4 TMRCAs for SARS-CoV and SARS-CoV-2. Sorting these breakpoint-free regions (BFRs) by length results in two segments >5kb: an ORF1a subregion spanning nucleotides (nt) 3,6259,150 and the first half of ORF1b spanning nt13,29119,628 (sequence numbering given in Source Data, https://github.com/plemey/SARSCoV2origins). Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Google Scholar. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). Med. In regionA, we removed subregion A1 (ntpositions 3,8724,716 within regionA) and subregion A4 (nt1,6422,113) because both showed PI signals with other subregions of regionA. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. Coronavirus: Pangolins found to carry related strains. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. Despite the high frequency of recombination among bat viruses, the block-like nature of the recombination patterns across the genome permits retrieval of a clean subalignment for phylogenetic analysis. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. 23, 18911901 (2006). Hu, B. et al. Two other bat viruses (CoVZXC21 and CoVZC45) from Zhejiang Province fall on this lineage as recombinants of the RaTG13/SARS-CoV-2 lineage and the clade of Hong Kong bat viruses sampled between 2005 and 2007 (Fig. Note that breakpoints can be shared between sequences if they are descendants of the same recombination events. If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. In early January, the aetiological agent of the pneumonia cases was found to be a coronavirus3, subsequently named SARS-CoV-2 by an International Committee on Taxonomy of Viruses (ICTV) Study Group4 and also named hCoV-19 by Wu et al.5. The genetic distances between SARS-CoV-2 and RaTG13 (bottom) demonstrate that their relationship is consistent across all regions except for the variable loop. Press, H.) 3964 (Springer, 2009). We focused on these three non-recombining regions/alignments for divergence time estimation; this avoids inappropriate modelling of evolutionary processes with recombination on strictly bifurcating trees, which can result in different artefacts such as homoplasies that inflate branch lengths and lead to apparently longer evolutionary divergence times. a, Breakpoints identified by 3SEQ illustrated by percentage of sequences (out of 68) that support a particular breakpoint position. As a proxy, it would be possible to model the long-term purifying selection dynamics as a major source of time-dependent rates43,44,52, but this is beyond the scope of the current study. The 2009 influenza pandemic and subsequent outbreaks of MERS-CoV (2012), H7N9 avian influenza (2013), Ebola virus (2014) and Zika virus (2015) were met with rapid sequencing and genomic characterization. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. Current sampling of pangolins does not implicate them as an intermediate host. Extensive diversity of coronaviruses in bats from China. Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. These datasets were subjected to the same recombination masking approach as NRA3 and were characterized by a strong temporal signal (Fig. 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. Evol. The SARS-CoV divergence times are somewhat earlier than dates previously estimated15 because previous estimates were obtained using a collection of SARS-CoV genomes from human and civet hosts (as well as a few closely related bat genomes), which implies that evolutionary rates were predominantly informed by the short-term SARS outbreak scale and probably biased upwards. 5 Comparisons of GC content across taxa. Viruses 11, 174 (2019). Get the most important science stories of the day, free in your inbox. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). Open reading frames are shown above the breakpoint plot, with the variable-loop region indicated in the Sprotein. Lin, X. et al. Microbiol. Eden, J.-S., Tanaka, M. M., Boni, M. F., Rawlinson, W. D. & White, P. A. Recombination within the pandemic norovirus GII.4 lineage. Boni, M. F., Zhou, Y., Taubenberger, J. K. & Holmes, E. C. Homologous recombination is very rare or absent in human influenza A virus. These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. These differences reflect the fact that rate estimates can vary considerably with the timescale of measurement, a frequently observed phenomenon in viruses known as time-dependent evolutionary rates41,43,44. matics program called Pangolin was developed. Lancet 395, 949950 (2020). In the absence of a strong temporal signal, we sought to identify a suitable prior rate distribution to calibrate the time-measured trees by examining several coronaviruses sampled over time, including HCoV-OC43, MERS-CoV, and SARS-CoV virus genomes. Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). This is not surprising for diverse viral populations with relatively deep evolutionary histories. The presence in pangolins of an RBD very similar to that of SARS-CoV-2 means that we can infer this was also probably in the virus that jumped to humans. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Extended Data Fig. Discovery and genetic analysis of novel coronaviruses in least horseshoe bats in southwestern China. When the genomic data included both coding and non-coding regions we used a single GTR+ substitution model; for concatenated coding genes we partitioned the alignment by codon position and specified an independent GTR+ model for each partition with a separate gamma model to accommodate inter-site rate variation. A distinct name is needed for the new coronavirus. 3). & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Nature 538, 193200 (2016). performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. This produced non-recombining alignment NRA3, which included 63 of the 68genomes. Lam, T. T. et al. The new paper finds that the genetic sequences of several strains of coronavirus found in pangolins were between 88.5 percent and 92.4 percent similar to those of the novel coronavirus. Its origin and direct ancestral viruses have not been . These residues are also in the Pangolin Guangdong 2019 sequence. Chernomor, O. et al. PubMed Central Next, we (1) collected all breakpoints into a single set, (2) complemented this set to generate a set of non-breakpoints, (3) grouped non-breakpoints into contiguous BFRs and (4) sorted these regions by length. PubMed The sizes of the black internal node circles are proportional to the posterior node support. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. 90, 71847195 (2016). While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. This statement informs us of the possibility that a virus has spilled over from a very rare and shy reptile-looking mammal . Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4. July 26, 2021. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. The pangolin coronaviruses show lower similarity to SARS-CoV-2 than bat coronavirus RaTG13 across the whole genome, but higher similarity in the spike receptor binding domain, although the similarity at either scale remains too low to implicate . ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). T.L. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Zhou, P. et al. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). In our analyses of the sarbecovirus datasets, we incorporated the uncertainty of the sampling dates when exact dates were not available. This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. We used an uncorrelated relaxed clock model with log-normal distribution for all datasets, except for the low-diversity SARS data for which we specified a strict molecular clock model. For weather, science, and COVID-19 . All authors contributed to analyses and interpretations. Biazzo et al. 04:20. The research leading to these results received funding (to A.R. Wong, A. C. P., Li, X., Lau, S. K. P. & Woo, P. C. Y. Evol. The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . 82, 18191826 (2008). Mol. Although the human ACE2-compatible RBD was very likely to have been present in a bat sarbecovirus lineage that ultimately led to SARS-CoV-2, this RBD sequence has hitherto been found in only a few pangolin viruses. Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. Rev. Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. COVID-19 lineage names can be confusing to navigate; there are many aliases and if you want to catch them all to examine further in data analyses it helps to Allen O'Brien on LinkedIn: #r #rstudio #rstats #pangolin #covid19 #datascience #epidemiology J. Med Virol. Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. Sci. However, formal testing using marginal likelihood estimation41 does provide some evidence of a temporal signal, albeit with limited log Bayes factor support of 3 (NRR1), 10 (NRR2) and 3 (NRA3); see Supplementary Table 1. Google Scholar. NTD, N-terminal domain; CTD, C-terminal domain. Our third approach involved identifying breakpoints and masking minor recombinant regions (with gaps, which are treated as unobserved characters in probabilistic phylogenetic approaches). Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. The fact that they are geographically relatively distant is in agreement with their somewhat distant TMRCA, because the spatial structure suggests that migration between their locations may be uncommon. 11,12,13,22,28)a signal that suggests recombinationthe divergence patterns in the Sprotein do not show evidence of recombination between the lineage leading to SARS-CoV-2 and known sarbecoviruses. P.L. 1. The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. Because 3SEQ identified ten BFRs >500nt, we used GARDs (v.2.5.0) inference on 10, 11 and 12 breakpoints. Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Bioinformatics 30, 13121313 (2014). To examine temporal signal in the sequenced data, we plotted root-to-tip divergence against sampling time using TempEst39 v.1.5.3 based on a maximum likelihood tree. In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. The lineage B.1 has been the major basal and widespread lineage from the initial SARS-CoV-2 spread and it became the more prevalent lineage in Colombia ( 13 ), while the B.1.111 lineage, first detected in the USA from a sample collected on March 7, 2020 and subsequently in Colombia on March 13, 2020 is currently circulating and mainly represented J. Virol. We considered (1) the possibility that BFRs could be combined into larger non-recombinant regions and (2) the possibility of further recombination within each BFR. A new coronavirus associated with human respiratory disease in China. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. eLife 7, e31257 (2018). Patino-Galindo, J. Li, Q. et al. Stegeman, A. et al. 94, e0012720 (2020). Lancet 383, 541548 (2013). Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. Conducting analogous analyses of codon usage bias as Ji et al. Identifying the origins of an emerging pathogen can be critical during the early stages of an outbreak, because it may allow for containment measures to be precisely targeted at a stage when the number of daily new infections is still low. RegionsAC had similar phylogenetic relationships among the southern China bat viruses (Yunnan, Guangxi and Guizhou provinces), the Hong Kong viruses, northern Chinese viruses (Jilin, Shanxi, Hebei and Henan provinces, including Shaanxi), pangolin viruses and the SARS-CoV-2 lineage. Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Virus Evol. Alternatively, combining 3SEQ-inferred breakpoints, GARD-inferred breakpoints and the necessity of PI signals for inferring recombination, we can use the 9.9-kb region spanning nucleotides 11,88521,753 (NRR2) as a putative non-recombining region; this approach is breakpoint-conservative because it is conservative in identifying breakpoints but not conservative in identifying non-recombining regions. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. 27) receptors and its RBD being genetically closer to a pangolin virus than to RaTG13 (refs. pango-designation Public Repository for suggesting new lineages that should be added to the current scheme Python 968 73 pangolin Public Software package for assigning SARS-CoV-2 genome sequences to global lineages. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. Temporal signal was tested using a recently developed marginal likelihood estimation procedure41 (Supplementary Table 1). Microbes Infect. Nature 583, 286289 (2020). D.L.R. J. Virol. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. Syst. In our second stage, we wanted to construct non-recombinant regions where our approach to breakpoint identification was as conservative as possible. Regions AC were further examined for mosaic signals by 3SEQ, and all showed signs of mosaicism. 4). To evaluate the performance procedure, we confirmed that the recombination masking resulted in (1) a markedly different outcome of the PHI test64, (2) removal of well-supported (bootstrap value >95%) incompatible splits in Neighbor-Net65 and (3) a near-complete reduction of mosaic signal as identified by 3SEQ. The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. Nature 583, 282285 (2020). J. Med. G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. 5, 536544 (2020). Pangolin relies on a novel algorithm called pangoLEARN. PureBasic 53 13 constellations Public Python 42 17 6, eabb9153 (2020). You signed in with another tab or window. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination.

Lexical Vs Compositional Semantics, Articles P