

Title: Meta-transcriptomics and the evolutionary biology of RNA viruses

Mang Shi ...

2 January 2018


  • Meta-transcriptomics (bulk RNA-Seq) is a powerful new way to characterise viromes.
  • Meta-transcriptomic data are changing our understanding of virus evolution.
  • Invertebrates harbor an enormous phylogenetic and genomic diversity of RNA viruses.
  • Present sampling schemes have only revealed a miniscule fraction of the virosphere.
  • The new wealth of virus genomic data presents a major challenge to classification.


Metagenomics is transforming the study of virus evolution, allowing the full assemblage of virus genomes within a host sample to be determined rapidly and cheaply. The genomic analysis of complete transcriptomes, so-called meta-transcriptomics, is providing a particularly rich source of data on the global diversity of RNA viruses and their evolutionary history. Herein we review some of the insights that meta-transcriptomics has provided on the fundamental patterns and processes of virus evolution, with a focus on the recent discovery of a multitude of novel invertebrate viruses. In particular, meta-transcriptomics shows that the RNA virus world is more fluid than previously realized, with relatively frequent changes in genome length and structure. As well as having a transformative impact on studies of virus evolution, meta-transcriptomics present major new challenges for virus classification, with the greater sampling of host taxa now filling many of the gaps on virus phylogenies that were previously used to define taxonomic groups. Given that most viruses in the future will likely be characterized using metagenomics approaches, and that we have evidently only sampled a tiny fraction of the total virosphere, we suggest that proposals for virus classification pay careful attention to the wonders unearthed in this new age of virus discovery.


1. Introduction: virology in the age of metagenomics

2. Overview of meta-transcriptomics

3. Implications of meta-transcriptomics for virus evolution

3.1 A new view of virus diversity

3.2 Linking the vertebrate and invertebrate worlds

3.3 Cross-species transmission and emergence

3.4 The evolution of genome structures

4. Implications of meta-transcriptomics for virus taxonomy

5. Conclusions and future directions 


    Although viruses are the most abundant source of nucleic acid on earth, with every species of cellular life likely harboring multiple viruses, until recently most studies of virus biodiversity and evolution were of limited scope, with a strong focus on aquatic environments and prokaryotic DNA viruses 
    As the transcriptome data generated by RNA-Seq is able to provide an unbiased and likely comprehensive view of all the viruses present within a host sample – that is, their complete virome – it can also be thought of as ‘meta-transcriptomics’. 
    Aside from its evolutionary utility which we will discuss in more detail below, meta-transcriptomics allows the identification of novel microbial pathogens 
    meta-transcriptomics may eventually be used for routine microbiological diagnostics. 
    we will review what, in our opinion, meta-transcriptomics has told us about virus diversity, evolution and taxonomy, and provide some suggestions for future work in this area.

    2.Overview of meta-transcriptomics
    The most robust, although costly, method of virus discovery is through a coupling of metagenomics and high-throughput sequencing technology.
    Among the various metagenomics approaches are available, meta-transcriptomics has recently come to the fore. 
    宏转录组学与viral particle enrichment方法的对比,有很多优势
    Compared to metagenomics protocols that involve viral particle enrichment (reviewed in Kumar et al., 2017), this method is far simpler yet still achieves a high level of sensitivity, generality, and efficiency for virus discovery (Fig. 1).
    Previous methodologies were often based on removing as much nucleic acid outside viral particles as possible by filtering, centrifugation, lysis, and nuclease treatment, although this seldom results in a complete depletion of host RNA (Firth and Lipkin, 2013, Mokili et al., 2012). In contrast, in meta-transcriptomics total RNA (i.e. the transcriptome) is directly extracted from untreated homogenates and used for library preparation without filtering and nuclease digestion steps.

    meta-transcriptomics 的另一个优势:提供了一种量化样品中每种病毒的简便方法。
    Another benefit of meta-transcriptomics is that it provides a ready way to quantify each virus present in a sample. Specifically, the percentage of reads that map to a particular virus genome is a good indication of how abundant any virus is, especially in the context of conserved host genes (Shi et al., 2016a, Shi et al., 2017).
    In turn, abundance level can provide important pointers to disease associations, whether viruses are segmented (such that genomic components have similar or different expression levels), and help identify those viruses that are in fact derived from other eukaryotic organisms present in the host sampled, such as in undigested food or prey, gut micro flora, and parasites, or simply contamination (and the greater the virus abundance, the more likely that active viral infection has occurred in the host under consideration).
    In addition, compared to genomic nucleic acid, the transcriptome comprises compact information that is more balanced across domains of life, thereby preventing the over-dominance of genetic information from large cellular organisms.

    3. Implications of meta-transcriptomics for virus evolution
        3.1 A new view of virus diversity
        Those meta-transcriptomic studies undertaken to date have transformed our understanding of the extent and nature of viral biodiversity, making it abundantly clear that we have only sampled a tiny fraction of RNA virus biodiversity (as will also be true of DNA viruses). 
        it is possible that such highly biased sampling has distorted our view of virus evolution. 高度偏向的采样可能会扭曲我们对病毒进化的看法。
        实际上,最近估计真核病毒球体约99.995%仍未被发现或未分类(Geoghegan和Holmes,2017年)。 因此,现实是,我们对病毒多样性和进化以及分类学的研究才刚刚开始。
        Indeed, it was recently estimated that approximately 99.995% of the eukaryotic virosphere remains undiscovered or unclassified (Geoghegan and Holmes, 2017). The reality, therefore, is that our study of virus diversity and evolution, and hence taxonomy, has only just begun. 
        The new wealth of diversity revealed by meta-transcriptomics also shows that the virus world is far more connected than we previously thought. New broad-scale RdRp phylogenies have shown that virus families, orders, floating genera, and undefined lineages can often be amalgamated into larger groups, such that they exhibit an evolutionary continuity (Shi et al., 2016a), in turn providing compelling evidence for their common origin (Koonin et al., 2015). It is obvious that the increasing number of newly described viruses from diverse hosts will continue to fill ‘gaps’ in phylogenetic diversity (i.e. the long branches present in inter-virus phylogenies) resulting in a more robust and stable depiction of virus evolutionary history.

        3.2 Linking the vertebrate and invertebrate worlds
        It is now clear that invertebrates carry a huge diversity of RNA viruses,
        What is far less clear is how frequently this huge array of invertebrate viruses is associated with overt disease in their hosts and, if invertebrates are largely refractory to disease, how this is mediated.
        Clearly, the monophyletic nature of vertebrate-specific viruses implies that have had a long-term evolutionary association with vertebrate hosts.
        Therefore, while it is tempting to conclude that most, if not all, families of vertebrate viruses will have their ultimate ancestry with invertebrates, particularly as so very few of the latter have been sampled, it would be wrong to think that this a forgone conclusion.
        3.3 Cross-species transmission and emergence
        Determining the host range of viruses is essential to understanding the process of cross-species transmission that underpins disease emergence. Meta-transcriptomic data provide a ready means to determine what viruses are present in which hosts and allows a simple measure of virus abundance. 

        The combination of meta-transcriptomics and phylogenetics has also told us that virus evolution is a complex interaction between cross-species transmission and virus-host co-divergence, with the evolutionary history of many virus groups reflecting an interweaving of both processes (Geoghegan et al., 2017).

        At the same time, however, it is clear that cross-species transmission has occurred frequently, even among phylogenetically divergent taxa, and is likely the dominant mode of RNA virus evolution

        Finally, although meta-transcriptomics has profound implications for our understanding of virus evolution, it likely undermines biodiversity-based attempts to predict the virus source of the next major disease pandemic (Olival et al., 2017). 

        Meta-transcriptomics tells us that there are so many viruses in nature that trying to establish which will ultimately appear in a new host from diversity sampling alone is almost certainly a futile exercise. This is apparent in the current vogue to study bat viruses. Since the emergence of SARS coronavirus in humans – a pathogen that has its ultimate ancestry in bats – sampling bat viruses as a means to determine which next might emerge in humans has received considerable attention (Smith and Wang, 2013). While these studies have made it clear that bats indeed harbor an enormous number of viruses (Anthony et al., 2017, Luis et al., 2013, Olival et al., 2017), at the same time they clearly show that the vast majority of these viruses have not jumped to humans. 

        The true goal of studies of disease emergence should therefore be to reveal that combination of genetic and ecological factors that underpins successful cross-species transmission and emergence.

        3.4 The evolution of genome structures
        One of the most important impacts of metagenomic data has been to change our understanding of the structure of virus genomes and the evolutionary processes that have given rise to them.

        Indeed, an emerging view is that RNA viruses experience as complex processes of genome evolution as in DNA organisms. To better determine the evolutionary processes that shape viral genome structures, and hence how new viruses are created, it is important to use the new wealth of meta-transcriptomic data to carefully determine the frequency, pattern and history of gene duplications and losses, lateral gene transfers, and genomic rearrangements; combined, these will provide a more complete picture of genome-scale evolutionary processes obtained.

        However, segmentation no longer appears to be a strong taxon defining trait, and a combination of segmented and unsegmented genomes has now been observed within families of RNA viruses. 

        Despite such a data revolution, one key feature of RNA virus genomes that has held firm in the metagenomics revolution is an upper-limit on genome length of <35 kb, with ball python nidovirus exhibiting the largest RNA virus genome reported to date – at 33.5 kb (Stenglein et al., 2014).

    4. Implications of meta-transcriptomics for virus taxonomy
    Indeed, there is now a growing recognition that the primary way in which viruses will be characterized in the future will be through metagenomic surveys

    Most importantly, phylogenetic trees are only ever able to depict the relationship among those viruses that are present in the sample of viruses under study; as our sample is likely negligible, so our classification is necessarily incomplete. A more fundamental question is whether the current classification scheme can withstand the onslaught of metagenomic data? The proliferation of ‘family-like’ viruses revealed from meta-transcriptomic surveys amply highlights the scale of the challenge facing taxonomists.

    5. Conclusions and future directions 
    It is therefore of critical importance to perform unbiased metagenomics surveys of prokaryotic taxa that have not been examined to date, followed by novel bioinformatics analyses that are able to accurately identify viruses and reveal their phylogenetic relationships. 
    Key questions for future research that can be addressed with the new wealth of meta-transcriptomic data include (i) determining the flow of viruses between host taxa and the processes that shape virus ecosystems; (ii) revealing the mechanisms of long-term virus macroevolution, particularly lineage birth and death, and (iii) revealing the mechanisms and evolutionary processes that structure viral genomes. 


本文标签: 二十六MetareadingpaperIntensive