Discover an easy way to get reproducible analyses, traceability of results and efficient bulk analysis with QIGEN CLC workflows.
Stringing together bioinformatics tools into pipelines enables reproducible execution of complex workflows, producing, among other things, QC reports, data visualizations, statistical analyses, annotation and filtering of output from raw NGS data. Combined with parallel execution, the potential for efficient throughput of reproducible analyses with traceability of results can be done using QIAGEN CLC workflows.
In QIAGEN CLC Workbenches, workflows are easily created and configured using a graphical editor (1, 2). Tools can be added through drag-and-drop from the Workbench Toolbox or by selecting from a list. The output of one element is defined as the input to another simply by drawing a line between them. Fine-grained control over the execution pattern within a workflow can be added with control flow elements, supporting cases such as RNA-Seq and differential expression analysis in a single workflow or providing sets of different inputs per workflow run.
With a QIAGEN CLC Genomics Server, third-party applications (e.g., your own tools or open-source tools) can be configured as external applications, thereby expanding the analysis potential beyond the software provided by QIAGEN (3). External applications can be added to workflows using the graphical workflow editor.
Getting started with workflows is simple. Examples are provided in the Template Workflows folder in the Workbench Toolbox. These workflows can be run directly or edited to add or remove tools, change parameters, reconfigure output naming patterns and much more. The Template Workflows folder initially contains two subfolders: Basic Workflow Designs (containing RNA-seq and DNA-seq workflows) and Prepare Raw Data. When QIAGEN CLC Workbench plugins containing workflows are installed, additional subfolders are created containing those template workflows.
Outputs generated using QIGEN CLC workflows include information on provenance relevant for auditing or publication. This history information includes the version of the software used, the tool and parameter settings used, the name of the user who ran the workflow, the date and time the element was created and the data that the output was derived from. When analyses are run on a QIAGEN CLC Server, a record of the analysis is also written to the audit log.
For bulk processing, a workflow can be submitted in batch mode, where the workflow is run multiple times, once for each input, or set of inputs, specified. When a workflow is launched in batch mode on a QIAGEN CLC Workbench, the individual jobs in that batch are carried out serially – one workflow run after another. For small analyses, this is fine. However, for routine analyses and for large analyses, we recommend the parallel execution potential and intelligent queuing facilities afforded by QIAGEN CLC Genomics Server using a Job Node or Grid Node setup.
When a workflow is submitted in batch mode to a QIAGEN CLC Genomics Server with nodes, each workflow run can be executed in parallel. The server administrator can choose the level of parallelization desired. Options include executing each workflow run on a single node, splitting execution of individual workflows across nodes or specifying parallelization at the level of sub-workflows (blocks), which are created behind the scenes during execution.
Figure 1. Serial (top) versus parallel (bottom) execution of workflows. On CLC Servers with nodes, queuing and parallel execution capacity supports optimal computational resources. On a QIAGEN CLC Genomics Server without nodes, workflows would be queued and processed serially, however they were submitted. On QIAGEN CLC Workbenches, batch jobs are run serially. Parallel execution of workflows on a QIAGEN CLC Workbench, triggered by multiple individual job launches in relatively quick succession, is not recommended; each workflow run assumes it has access to the entire system. Thus there is a risk that jobs will crash due to issues such as memory limitations.
Learn more about the features and request your free trial of QIAGEN CLC Genomics Server and QIAGEN CLC Genomics Workbench, and explore the benefits for yourself.
Have questions? Request a consultation today.
References:
Did you know SARS-CoV-2 is shed in the feces of individuals with symptomatic or asymptomatic infection? Viral particles shed into wastewater via the sewer system are no longer infectious but can still be measured. Therefore, recent public health monitoring efforts target sewers to identify known genotypes of SARS-CoV-2. Genotyping by sequencing SARS-CoV-2 from wastewater correlates with sequencing results in patients in the wastewater catchment area, providing an efficient monitoring tool for viral epidemiology. Wastewater is readily available at sewage plants, and collection of wastewater samples avoids biases associated with sampling from hospitals or testing facilities (1).
PCR approaches are highly effective on well-targeted variants, and multiplexing strategies capable of simultaneously targeting several mutations can unravel the mutation patterns of circulating variants. However, NGS approaches can find new variants, increase the sensitivity of variant detection and provide an unbiased representation of the variants circulating in populations. It is also used for whole genome SNP analysis in local epidemiological analyses, such as hospital infection control and local outbreak tracing.
Whether using Oxford Nanopore, Illumina, PacBio or IonTorrent technology, and whether using ARTIC or vendor-designed panels, QIAGEN CLC Genomics Workbench has standard SARS-CoV-2 analysis workflows that can easily be modified towards any platform, protocol and application by exchanging workflow elements, primer design files or parameter settings.
The general approach of the workflows is mapping the reads to a reference, calling variants, generating a consensus sequence and generating outputs that enable efficient review of results, including cross-sample comparison. See (2) for examples of building workflows.
When working with several samples, multi-FASTA export of consensus sequences, as well as PDF export of the quality report, is easily accomplished.
Typically, the generated consensus sequences are manually submitted to Nextclade and Panoglin to annotate the samples with the latest phylogenetic lineage information.
For high-throughput use, any manual steps introduce errors and inefficiencies. QIAGEN CLC Genomics Server software has the capability to automate linage annotation processes by making use of its “external applications” functionality, where regularly-updated docker images of Nextclade or Pangolin can be included in CLC workflows (Figure 1). For other examples of external applications, see (3).
a)
b)
c)
Figure 1. a) Using QIAGEN CLC Genomics Server, Nextclade and Pangolin docker images are added to CLC as an “external application” so that the functionalities can be integrated into CLC workflows to assign lineage information to the sample. b) Example output of the Nextclade functionality and c) example output of the Pangolin functionality of the CLC workflow shown.
The server software is also well-suited for handling many workflow executions in parallel, as it has a “scheduler” functionality that manages the execution queue. This queuing ability ensures that parallel workflow execution is coordinated, and individual steps do not interfere with each other by competing for computational resources. External applications can also be executed in the cloud by using QIAGEN CLC Genomics Cloud Engine, reducing local hardware needs to a minimum. QIAGEN CoV-2 Insights service is an instance of this architecture, available if you wish to use this pipeline without setting up the software on your own.
These bioinformatic workflows work fine in cases where it can be assumed that there is only one dominant strain in circulation. However, in situations where a novel strain is emerging and there are several possibilities to monitor, it is a better strategy to test for evidence of marker mutations in the reads. A tool that can be used for this purpose, by monitoring predefined reference positions in read mappings, is the “Identify Known Mutations from Sample Mappings” algorithm, which outputs whether the variant could be detected or not, whether the coverage was sufficient at the given position, the frequency and other statistics of the variant(s) in the sample. As input, the tool takes the read mapping and a variant track that holds the specific variants that you wish to test for. By applying the mutation tester tool iteratively, in series, with variant tracks for each SARS-CoV-2 strain one wishes to monitor, you can test for evidence of many strains in a single workflow (Figure 2), which can then be applied on batches of samples simultaneously, providing a fully-scalable solution that only needs updating when new strains are expected to enter the population.
Figure 2. A QIAGEN CLC Genomics Workbench workflow interrogating input sample read mapping to a SARS reference at genomic positions defining known variants of the virus. The workflow can be executed in batch mode to monitor many samples simultaneously.
References:
Additional resources:
Related blog posts:
Learn more about the capabilities of QIAGEN CLC Genomics Workbench Premium and download your free trial today.
A host of new features help you scale your research, and allow you to ramp up your productivity by taking your multi-sample analyses to the next level:
Figure 1. The ‘Iterate’ and ‘Collect and Distribute’ control elements allow batching over sections of the workflow. In this example, fastq files from a two-level factorial RNA-seq experiment performed in triplicate can be analyzed in a single workflow. The reads are trimmed, quality controlled (QC’ed) and the RNA-seq analysis reads are mapped, sample by sample. Then the RNA-seq expression levels are compared among groups, and comparisons are collected to create heat maps, Venn diagrams and PCA plots. Finally, trimming, QC and RNA-seq analysis read mapping reports are combined across samples. The workflow was used to analyze data from De Maio et al. (2016), comparing the transcriptional profile (RNA-seq) of Dengue virus 2 and mock infected human cells at 24 and 36 hours post-infection. The samples (accessions) are described in a CLC metadata table according to infection status and time point prior to workflow execution.
Figure 2. With the ‘Combined Reports’ tool you can gain a quick overview of the main results in your analysis. In this case, the GC-content has been summarized from the QC reports of 12 RNA-seq samples from De Maio et al. (2016).
Figure 3. Minimum Spanning Tree produced by QIAGEN CLC Microbial Genomics Module.
QIAGEN CLC Genomics Workbench now supports even more QIAseq UMI-based library preparation kits and panels, via a series of new ready-to-use workflows accessible through the Biomedical Genomics Analysis plugin, including:
View all supported QIAseq panels here.
Don't miss our on-demand webinar where we review these latest features of the QIAGEN CLC Genomics Workbench 20, and discuss:
References:
De Maio F.A. et al. (2016). The Dengue virus NS5 protein intrudes in the cellular spliceosome and modulates splicing. PLoS
Pathog. 12(8):e1005841.
Here at QIAGEN, we frequently fine-tune our solutions to better serve and support our customers in the international research and clinical communities, so they can continue to advance science and patient care. Changes range from minor tweaks — like bug fixes — to entirely new capabilities, like new templates or plugins. If you missed any of our recent updates about new features and capabilities of our line of bioinformatics solutions, here’s a brief roundup of some of the highlights you might want to know about.
This fall, we announced that our CLC Genomics Workbench 11 can be used as a genome browser to share, view and explore NGS analysis results, with no license required. This release also includes faster speeds, improved trimming and updated executables. We also released Biomedical Genomics Workbench 5, which debuted the QIAseq Targeted Panel Analysis plugin. This plugin enables accurate identification of genetic variants with ease, offers a user-friendly interface to simplify QIAseq data analysis, and introduces unique molecular indices and advanced algorithms to improve the accuracy of variant calling. The fall release of Ingenuity Variant Analysis included improvements to the Phenotype Driven Ranking feature by offering further sub-ranks for variants with identical scores. For QCI Interpret for Hereditary Cancer and QCI Interpret for Somatic Cancer, we introduced four new changes, including alignment of AMP/ASCO/CAP interpretation and reporting guidelines, increased flexibility, improved reporting templates and the ability for lab managers to set up groups. We also released updates that comprise the genome interpretation sector of our end-to-end sequencing solution: CLC Main Workbench, CLC Genomics Server 10, CLC Command Line Tools 5 and CLC Sequence Viewer 8.
Overall, we’re delighted to be ending 2017 with our solutions primed to take on even tougher bioinformatics challenges! If you’d like to learn more about one of these solutions or updates, please contact us here.
We're happy to announce that new releases of our products are now available. The releases offer a number of new features and improvements. You can see a few of the highlights below and visit the individual product pages to view the detailed release notes.
Determine which isoforms have interesting biological properties or enhance your multi-omics research approaches - here are the highlights of the latest IPA release:
See the more detailed release notes:
IPA Fall Release 2016
With our fall release of Ingenuity Variant Analysis comes a number of improvements. The headlines are:
Get more details:
Ingenuity Variant Analysis Fall Release 2016
Take a look at these highlights of features and benefits you get from the QIAGEN Clinical Insight (QCI) Interpret September 2016 release:
See more feature improvements and details on the benefits:
QCI Interpret September 2016 Release
It's a pleasure to present the new releases of both workbenches and servers in our CLC product line. Here are a few highlights:
Read more about these and other new features and improvements:
Biomedical Genomics Workbench 3.5
Biomedical Genomics Server Solution 8.5
CLC Genomics Workbench 9.5
CLC Genomics Server 8.5
At Bio-IT World we had the pleasure to demonstrate the results of our work with the information technology leaders Intel and BioTeam.
Our collaborations create infrastructure solutions that make population-scale genomic analysis feasible for more researchers. We’ve been working together with Intel to bring world-class infrastructure together with industry-leading genome analysis tools to enable massively scalable whole genome analysis at lower cost. Together with BioTeam we're creating a proof-of-concept high-performance computing (HPC) appliance.
For more details about the partnerships, please read the official press release below.
QIAGEN partners with IT leaders on novel infrastructure for genomics
Demonstrates high-performance computing and genome analysis solutions at Bio-IT World
“By combining our industry-leading genome analysis applications with hardware solutions from leaders like Intel and BioTeam, QIAGEN Bioinformatics is providing world-class infrastructure to help scientists reveal actionable insights from genomic data,” said Dr. Laura Furmanski, Senior Vice President and head of QIAGEN’s Bioinformatics Business Area. “While next-generation sequencing is a momentous advance, society cannot realize the full potential without a corresponding ability to analyze NGS data quickly and accurately. Researchers and clinicians need cost-effective, comprehensive tools for calling and interpreting variants across whole human genomes, and we are providing these novel solutions.”
QIAGEN’s collaboration with Intel developed a reference architecture designed to produce high-volume whole genome data analysis, keeping up with the world’s highest-capacity sequencers, helping NGS scientists keep their sequencing pipelines running smoothly and efficiently. This offering leverages QIAGEN’s CLC Genomics Server software on a compute cluster of 32 Intel® Xeon® processor E5 family based nodes. It provides built-in analysis tools, scalability, fast connection and parallel storage, using Intel Enterprise Edition for Lustre, the world’s largest parallel storage system. In tests, the solution analyzed data quickly and for as little as $22 per genome. It will be described in a conference presentation at Bio-IT World from 3:30-3:50 p.m. on April 6.
“The collaboration with QIAGEN Bioinformatics targets the vexing challenges presented by soaring demand for genome analysis, commonly faced by NGS scientists,” said Ketan Paranjape, GM Life Sciences at Intel. “Optimized solution architectures for these workloads enable researchers to keep pace as sequencers process more genomes than we could have imagined, even a few years ago — all while taking advantage of open systems to save money as well.”
BioTeam and QIAGEN’s proof-of-concept appliance packages CLC Genomics Server with the BioTeam Appliance scientific computing platform to provide a cost-effective, high-performance offering. The flexible, customizable solution delivers a system that maps the computational requirements of the CLC Bio software to an infrastructure that complements its capabilities. The BioTeam Appliance demo at Bio-IT World will be at the QIAGEN Bioinformatics booth (#229) from 1:30-1:45 p.m. on April 6.
“Bioinformatics is an ideal market for high-performance computing, and our simple, end-to-end appliance removes a significant barrier to adoption for many customers,” said Stan Gloss, Founding Partner and Chief Executive Officer at BioTeam. “Our plug-and-play solution enables scientists to focus on research rather than on creating complex IT systems from scratch. We look forward to continuing development of this proof-of-concept model with the QIAGEN Bioinformatics team.”
About BioTeam
BioTeam is a high-performance consulting practice dedicated to delivering objective, technology agnostic solutions to the life science researchers. We leverage the right technologies customized to our client’s unique needs in order to enable them to reach their scientific objectives.
About QIAGEN
QIAGEN N.V., a Netherlands-based holding company, is the leading global provider of Sample to Insight solutions to transform biological materials into valuable molecular insights. QIAGEN sample technologies isolate and process DNA, RNA and proteins from blood, tissue and other materials. Assay technologies make these biomolecules visible and ready for analysis. Bioinformatics software and knowledge bases interpret data to report relevant, actionable insights. Automation solutions tie these together in seamless and cost-effective molecular testing workflows. QIAGEN provides these workflows to more than 500,000 customers around the world in Molecular Diagnostics (human healthcare), Applied Testing (forensics, veterinary testing and food safety), Pharma (pharmaceutical and biotechnology companies) and Academia (life sciences research). As of December 31, 2015, QIAGEN employed approximately 4,600 people in over 35 locations worldwide. Further information can be found at http://www.qiagen.com.
Certain of the statements contained in this news release may be considered forward-looking statements within the meaning of Section 27A of the U.S. Securities Act of 1933, as amended, and Section 21E of the U.S. Securities Exchange Act of 1934, as amended. To the extent that any of the statements contained herein relating to QIAGEN's products, markets, strategy or operating results, including without limitation its expected operating results, are forward-looking, such statements are based on current expectations and assumptions that involve a number of uncertainties and risks. Such uncertainties and risks include, but are not limited to, risks associated with management of growth and international operations (including the effects of currency fluctuations, regulatory processes and dependence on logistics), variability of operating results and allocations between customer classes, the commercial development of markets for our products in applied testing, personalized healthcare, clinical research, proteomics, women's health/HPV testing and nucleic acid-based molecular diagnostics; changing relationships with customers, suppliers and strategic partners; competition; rapid or unexpected changes in technologies; fluctuations in demand for QIAGEN's products (including fluctuations due to general economic conditions, the level and timing of customers' funding, budgets and other factors); our ability to obtain regulatory approval of our products; difficulties in successfully adapting QIAGEN's products to integrated solutions and producing such products; the ability of QIAGEN to identify and develop new products and to differentiate and protect our products from competitors' products; market acceptance of QIAGEN's new products, the consummation of acquisitions, and the integration of acquired technologies and businesses. For further information, please refer to the discussions in reports that QIAGEN has filed with, or furnished to, the U.S. Securities and Exchange Commission (SEC).
It's a pleasure to announce that new releases of a range of our products are now available. The new releases offer a number of new features and improvements specific for each of the products.
For more details on the updates, please visit the latest improvements/statistics pages for the products of your interest:
Biomedical Genomics Workbench 3.0
CLC Genomics Workbench 9.0
CLC Main Workbench 7.7
CLC Drug Discovery Workbench 3.0
CLC Sequence Viewer 7.7
CLC Genomics Server 8.0
CLC Server Command Line Tools 3.0
CLC Bioinformatics Database 4.7
CLC Developer Kit 9.0
CLC Genomics Developer Kit 9.0
CLC Developer Kit Server 9.0
We're committed to enabling our customers to analyze vast amounts of NGS data quickly and at the lowest total cost possible. This year, we made investments designed to enable scalable discovery through the optimization of the speed, accuracy, and cost of our server solution consisting of CLC Genomics Server with the Biomedical Genomics Server extension and Biomedical Genomics Workbench platform. Through extensive benchmark testing, we were able to show that our solution is able to process the maximum throughput from an Illumina HiSeq X Ten, with high accuracy, and at a total cost of ownership (TCO) much lower than alternative solutions.
Data analysis to keep pace with maximum throughput
The maximum throughput of an Illumina HiSeq X Ten has been established at a total of 18,000 whole genome sequences per year. This equates to an average rate of analysis of one whole human genome sequence every 30 minutes. By testing the optimized speed of our solution (including SSE/SIMD code optimizations for Intel x86), we were able to demonstrate that CLC Genomics Server is not only able to keep pace with the data output of an Illumina HiSeq X Ten sequencer running at maximum throughput, but able to do so with less computing nodes than recommended by others. Testing revealed that CLC Genomics Server requires a computer cluster of only 35 nodes, as contrasted to the 85 nodes recommended by Illumina (variant calling based on BWA+GATK in the HiSeq X System Lab Setup and Site Prep Guide (Part #15050093 Rev. H July 2015)). Our comparison benchmark testing was carried out by installing the CLC Genomics Server software on a compute cluster of 35 nodes, each equipped with a 28-core E5-2697 v3 @ 2.60GHz, 128 GB RAM on a shared lustre file system. We used the standard CLC variant calling workflow that comes with the Biomedical Genomics Server solution.
Full analysis of whole human genomes for as little as $22 each
By minimizing the hardware requirements from 85 nodes to just 35, we also minimize the total cost of ownership (TCO) of the solution over a four-year period, which includes everything from software licenses and hardware, to power, cooling, networking, and floor space. Our calculations of the total ownership costs show that with the given specifications, the cost will be as low as $22 per whole human genome analyzed. Given the high throughput enabled by a HiSeq X Ten, the savings can be sizable.
Accurate identification of disease-causing variants
Of course, the total cost of ownership and speed of the overall solution doesn’t mean much unless the results of the analysis are also accurate. To prove accuracy, we chose hereditary disease trio analysis as a test case, and are proud to say that in most cases the Biomedical Genomics Server solution (CLC Genomics Server and Biomedical Genomics Workbench) together with Ingenuity® Variant Analysis™ for interpretation accurately identified the disease-causing variant without calling any false-positive de novo or causal variants.
But this is not the end of the story; we’re just getting started. Our focus on application performance and accuracy of results is essential, so we expect to improve these even more in the future.
Learn more about Biomedical Genomics Server solution
Read the story on the Intel Health & Life Sciences blog