Did you know SARS-CoV-2 is shed in the feces of individuals with symptomatic or asymptomatic infection? Viral particles shed into wastewater via the sewer system are no longer infectious but can still be measured. Therefore, recent public health monitoring efforts target sewers to identify known genotypes of SARS-CoV-2. Genotyping by sequencing SARS-CoV-2 from wastewater correlates with sequencing results in patients in the wastewater catchment area, providing an efficient monitoring tool for viral epidemiology. Wastewater is readily available at sewage plants, and collection of wastewater samples avoids biases associated with sampling from hospitals or testing facilities (1).

PCR approaches are highly effective on well-targeted variants, and multiplexing strategies capable of simultaneously targeting several mutations can unravel the mutation patterns of circulating variants. However, NGS approaches can find new variants, increase the sensitivity of variant detection and provide an unbiased representation of the variants circulating in populations. It is also used for whole genome SNP analysis in local epidemiological analyses, such as hospital infection control and local outbreak tracing.

Whether using Oxford Nanopore, Illumina, PacBio or IonTorrent technology, and whether using ARTIC or vendor-designed panels, QIAGEN CLC Genomics Workbench has standard SARS-CoV-2 analysis workflows that can easily be modified towards any platform, protocol and application by exchanging workflow elements, primer design files or parameter settings.

The general approach of the workflows is mapping the reads to a reference, calling variants, generating a consensus sequence and generating outputs that enable efficient review of results, including cross-sample comparison. See (2) for examples of building workflows.

When working with several samples, multi-FASTA export of consensus sequences, as well as PDF export of the quality report, is easily accomplished.

Typically, the generated consensus sequences are manually submitted to Nextclade and Panoglin to annotate the samples with the latest phylogenetic lineage information.

For high-throughput use, any manual steps introduce errors and inefficiencies. QIAGEN CLC Genomics Server software has the capability to automate linage annotation processes by making use of its “external applications” functionality, where regularly-updated docker images of Nextclade or Pangolin can be included in CLC workflows (Figure 1). For other examples of external applications, see (3).

a)

b)

c)

Figure 1. a) Using QIAGEN CLC Genomics Server, Nextclade and Pangolin docker images are added to CLC as an “external application” so that the functionalities can be integrated into CLC workflows to assign lineage information to the sample. b) Example output of the Nextclade functionality and c) example output of the Pangolin functionality of the CLC workflow shown.

The server software is also well-suited for handling many workflow executions in parallel, as it has a “scheduler” functionality that manages the execution queue. This queuing ability ensures that parallel workflow execution is coordinated, and individual steps do not interfere with each other by competing for computational resources. External applications can also be executed in the cloud by using QIAGEN CLC Genomics Cloud Engine, reducing local hardware needs to a minimum. QIAGEN CoV-2 Insights service is an instance of this architecture, available if you wish to use this pipeline without setting up the software on your own.

These bioinformatic workflows work fine in cases where it can be assumed that there is only one dominant strain in circulation. However, in situations where a novel strain is emerging and there are several possibilities to monitor, it is a better strategy to test for evidence of marker mutations in the reads. A tool that can be used for this purpose, by monitoring predefined reference positions in read mappings, is the “Identify Known Mutations from Sample Mappings” algorithm, which outputs whether the variant could be detected or not, whether the coverage was sufficient at the given position, the frequency and other statistics of the variant(s) in the sample. As input, the tool takes the read mapping and a variant track that holds the specific variants that you wish to test for. By applying the mutation tester tool iteratively, in series, with variant tracks for each SARS-CoV-2 strain one wishes to monitor, you can test for evidence of many strains in a single workflow (Figure 2), which can then be applied on batches of samples simultaneously, providing a fully-scalable solution that only needs updating when new strains are expected to enter the population.

Figure 2. A QIAGEN CLC Genomics Workbench workflow interrogating input sample read mapping to a SARS reference at genomic positions defining known variants of the virus. The workflow can be executed in batch mode to monitor many samples simultaneously.

References:

  1. Wurtz, N., et al. (2021). Monitoring the Circulation of SARS-CoV-2 Variants by Genomic Analysis of Wastewater in Marseille, South-East France. Pathogens 10, 1042. https://doi.org/10.3390/pathogens10081042
  2. Theiagen Consulting LCC video tutorials on how to build workflows in CLC Genomics Workbench for SARS CoV-2 analysis
  3. Theiagen Consulting LCC video tutorials on how to include external applications to the CLC Genomics Server: a) RAxML b) MAFFT c) iVAR

Additional resources:

Related blog posts:

Learn more about the capabilities of QIAGEN CLC Genomics Workbench Premium and download your free trial today.

It’s already happened: Several SARS-CoV-2 variants are upon us, lurking in various populations, bringing more uncertainty about how this pandemic will ever get under control.

SARS-CoV-2 is mutating and, in the process, potentially getting more infectious and dangerous, increasing the potential for immune escape and possibly jeopardizing vaccine efficacy. Several geographic variants have recently been discovered. Some have increased transmissibility. Yet for others, it's too early to tell if they are more transmissible, more deadly, will impact vaccine efficacy or escape an already established immunity. Nevertheless, the possibility is there, with every new, significant variant that we discover.

By now, the whole world has learned what 'PCR' means, which merely a year ago few people would have had a clue about. Nevertheless, the virus is getting smarter and there is a growing conviction that relying solely on PCR tests won't cut it any longer. Not now, and certainly not in the future.

Soon, the whole world might know the term 'next-generation sequencing'. With more and more variants emerging and spreading, it is imperative to ramp up genomic surveillance of different variants using whole-genome next-generation sequencing of positive COVID-19 samples – to help avoid further spread locally, nationally and globally.  Public health laboratories, hospitals, sequencing centers and governing authorities/organizations must come together to tackle the challenge of genomic surveillance of SARS-CoV-2.

Along with increased sequencing of positive COVID-19 samples comes the need to scale up operations by laboratories. Every aspect must be scaled, from sample management, sequencing, data analysis, IT infrastructure and results reporting. If you're working in this field and are not already scaling operations, you need to be prepared to do so soon. This could require hiring new people, spending resources on optimizing workflows and time acquiring the bioinformatics and IT expertise necessary to perform data analysis and reporting.

The last, critical part of scaling up your operations is to embrace what's already here and to be ready for what's coming. Labs cannot afford not to be ready. The world cannot afford not to be ready.

The good news is, there is a smarter, more efficient way. We've got a solution that will help.

At QIAGEN Digital Insights, we specialize in bioinformatics software, data analysis, IT infrastructure and interpretation and reporting of results. We have risen to the challenge by developing a fully-automated, constantly-updated, scalable service solution for SARS-CoV-2 genomic surveillance data analysis and reporting of major existing and new variants, with automated quality control and clear visualizations.

With our QIAGEN CoV-2 Insights Service solution, there's no need for extensive IT infrastructure or additional expert bioinformatics personnel. This service automatically processes samples, and within minutes, delivers concise results ready to share with public health partners. It produces a report identifying the lineage of each sample and provides a list of important mutations. Results can also be visualized in QIAGEN CLC Genomics Workbench at no extra cost.  We use a global cloud infrastructure that scales and adapts on-demand, providing the fastest and most secure data processing available.

If there's one thing we've learned so far, it is that SARS-CoV-2 is unpredictable. We must be ready to adapt, work quickly and innovate. To work together and play according to our strengths as individuals, institutions and companies.

Get ahead of the curve! Learn about our QIAGEN Digital Insights SARS-CoV-2 resources and request a consultation about our QIAGEN CoV-2 Insights Service.  Browse our QIAGEN CoV-2 Insights Service resource library, and if you have any questions, don't hesitate to reach out to your local QIAGEN Digital Insights representative at bioinformaticssales@qiagen.com.

During the current pandemic, the importance of continually monitoring viral genomes for new mutations has become fundamental to help guide decisions. The combined efforts of labs across the world have generated enormous amounts of SARS-CoV-2 sequencing data that must be analyzed in order to place it into the broader context of the pandemic.

QIAGEN has several resources to support SARS-CoV-2 data analysis. This includes the CoV-2 Insights Service for genomic surveillance, which offers full bioinformatics analysis for QIAGEN’s QIAseq SARS-CoV-2 Primer Panel, Ion Torrent’s Ion AmpliSeq SARS-CoV-2 Research Panel or the Illumina panels. However, if you find your panel data is currently not supported by this prebuilt solution, don’t worry. You can easily analyze any panel data by creating a simple workflow in QIAGEN CLC Genomics Workbench.

Here we show an example of building a workflow to process the long reads generated by Oxford Nanopore Technology using QIAGEN CLC Genomics Workbench with the Long Read Support plugin. We show a simple workflow that can process the data to generate variant calls. Using sequencing data from the University of Exeter (Baker et al., 2020), we provide an example analysis by examining the mutation signatures at different time points in the pandemic.

Trim and map reads, and call and filter variants

Download reference data as a GenBank file from NCBI and extract annotations using the ‘Convert to Tracks’ tool.

Figure 1 shows an example of a simple workflow, which consists of the following steps:

Figure 1. End-to-end workflow for variant calling in SARS-CoV-2 samples.

 

Visualizing mutations in a track list

The called variants can be visualized in a track list using the reference genome and the variant tracks. This view makes it easy to monitor new mutations. An amino acid track helps us by distinguishing synonymous from non-synonymous mutations. In Figure 2, we show a subset of 9 variant tracks from samples collected at various time points in the pandemic. The tracks span from March 2020 to December 2020 and have been sorted chronologically by sample collection date from top to bottom. Here, we can see that variants are accumulating over time.

Figure 2. Variants from different data sets were visualized on the MN908947.3 reference. The Gene track is shown in blue.

 

The latest data set from December 12, 2020 is a sequencing run of the B.1.1.7 strain. We identify the strain by adding the amino acid changes to the track list. In Figure 3, two of the variants characteristic of this strain can be seen, namely N501Y and P681H in the spike protein.

Figure 3a. An overview of detected SNVs found in the spike protein.

 

 

Figure 3b.N501Y variant in the spike protein.
Figure 3c. P681H variant in the Spike protein

 

 

 

 

We visualize the viral evolution in a SNP tree using one of the oldest samples (2020-03-25 as root).

Figure 4. SNP tree of the 9 samples.

 

As you can see, constructing a workflow for the analysis of SARS-CoV2 variants in QIAGEN CLC Genomics Workbench is quick and easy. The entire workflow shown here can be run in less than 5 minutes from input to variant calling on a standard laptop for a sample of 400,000 reads. This allows for great scalability and efficiency in sample analysis.

 

Learn more and sign up for a free trial today.

Reference:

Baker, Dave J. et al. (2020) CoronaHiT: Large scale multiplexing of SARS-CoV-2 genomes using Nanopore sequencing. bioRxiv  2020.06.24

Imagine this scenario: You are working on a high profile scientific finding (for example the Nevada SARS-CoV-2 reinfection case), and you need to 1) perform a bioinformatics analysis; 2) provide detailed methods of the exact analysis for publication; 3) share those methods so that others could recreate your analysis; 4) provide publication-quality images. What’s more, you needed this all done yesterday. What do you do? Dr. Joel R. Sevinsky, Ph.D., recently found himself in this situation while working with the Nevada State Public Health Laboratory on a potential SARS-CoV-2 reinfection case (published The Lancet Infectious Diseases; access the article here). He had numerous analysis software options to choose from and decided upon QIAGEN CLC Genomics Workbench. The main reason for doing so is it satisfied all the requirements mentioned above and enabled high productivity during a time of limited bandwidth.

Bioinformatics analysis

Dr. Sevinsky developed an analysis pipeline for SARS-CoV-2 using ARTIC amplicons and the Illumina DNA Prep library preparation kit. He had designed a workflow in QIAGEN CLC Genomics Workbench and was preparing a tutorial (access tutorial here). Given the modularity of the designed workflow, he was able to modify the pipeline in just a few minutes to accept metagenomics data as input, rather than amplicons, and perform the analysis. No coding, no command line. All he had to do was point and click in the workflow diagram, remove one step, redirect a couple of outputs, and the new pipeline was ready. When the analysis was done, he had a complete visualization of the variant differences between the two SARS-CoV-2 strains, confirming their hypothesis that this was a clear case of SARS-CoV-2 reinfection supported by genomic data. This visualization could be shared and viewed with anyone that has QIAGEN CLC Genomics Workbench installed, even without a license. Furthermore, Dr. Sevinsky and his team were able to compare the results with other bioinformatics platforms because the software allowed the export of results files in many standard open formats (.bam, .vcf and others).

Detailed methods for publication

According to many researchers, the most important aspect of a scientific publication is the methods section. This section should document in exact detail how an experiment and analysis were accomplished so that other scientists can replicate the findings. The bioinformatics results from Dr. Sevinsky’s analysis were accompanied by a full history of algorithms, workflows, reference files and input files, all with version documentation, used to generate the results. This detailed history was included as supplemental data in their publication.

Share methods with the scientific community

Sometimes, especially in bioinformatics, the most detailed methods can still provide obstacles to recreating an analysis. Unless a documented workflow uses containers and workflow managers, which require significant bioinformatics expertise to maintain, getting the environment correct to recreate the analysis can be difficult. It can also be very time consuming to set up. Fortunately, the entire workflow with input data, references and parameters can be packaged in a single file, exported and shared with the scientific community. Dr. Sevinsky’s journal article will include this file as supplemental data, and for the SARS-CoV-2 tutorial mentioned above you can find a package that includes input fastq files, reference files, primer files and workflows at this resource center here.

Publication-quality images

Lastly, to get his findings published in a top-tier journal, Dr. Sevinsky and his team needed high-quality images that clearly communicated their findings. QIAGEN CLC Genomics Workbench provided advanced visualization tools that could easily be exported into editable formats for publication. Moreover, the visualization settings can be saved, so you can further refine your analysis without having to recreate the format of the final figure, which is an enormous time saver.

Overall, QIAGEN CLC Genomics Workbench allowed Dr. Sevinsky and his team to communicate their results as quickly as possible. No github sites to create, no Docker containers to manage. Just efficient analysis and publication of results.

If your scientific position utilizes NGS data and requires a lot of “getting things done”, QIAGEN CLC Genomics Workbench is an invaluable tool for your laboratory.

Ready to ramp up your NGS productivity? Take QIAGEN CLC Genomics Workbench for a spin. Start your free trial, or request a consultation today.

Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.