A recent publication in the Annals of Internal Medicine — nicely summarized in this STAT News article — reports results from MedSeq, a randomized trial of the utility of whole genome sequence data in the clinic. The project involved 100 participants, so it’s important to remember the small sample size when considering outcomes.

But the upshot of the study was sobering for those who believe that genome sequencing is speeding to the clinic: there were far more disease-causing mutations discovered than people who actually had any disease. Of the 11 people who got reports saying their genomes harbored variants that should cause disease, only two of them actually had the disease. These conditions are not ones expected to have unusually late onset.

From our perspective, these results are not an indication that genomics has little to contribute to healthcare; instead, they are a stark reminder that efforts to accurately interpret the genome still have a long way to go. Programs like the Allele Frequency Community should help with this, but what we need most of all are more genome sequences and really strong phenotype/clinical data associated with them so that we can hone interpretation algorithms.

For the nine people who were expected to have a disease based on their genomic data but didn’t, it is likely that we will eventually discover protective mechanisms that offset the mutation, or perhaps environmental or dietary factors that explain the disconnect. For now, though, we are still at the earliest stages of truly understanding the human genome, and that’s the main message of the MedSeq results.

At QIAGEN Bioinformatics, we’re working hard to make sure that our genome analysis and interpretation tools incorporate the newest discoveries, rely on high-confidence findings, and help scientists see the big picture of how various mutations, pathways, and other factors fit together. We applaud the MedSeq team for drawing attention to this important topic.

It is not surprising to find mutations that are disease causing in healthy people because almost none of the known disease-causing mutations are 100% penetrant, or predictive of developing the resulting disease in all cases. If you get the disease depends on genomic context – some diseases are late onset and depend on your age, while others can manifest in a spectrum of how strong the effect is, and the effect may be below the threshold of calling it out as a disease.

That's why our software solutions allow for phenotype supported ranking: the ability to combine observed phenotypes of the patients with the genomic data for better interpretation results. Because of the insights we have in our QIAGEN Knowledge Base about mutations, genes, diseases, phenotypes and their relationships, we are able to prioritize mutations that are related to the individual phenotype, and we can show that this increases the rate of resolving causative mutations.

 

Will you be visiting ECCMID this April? If so, you might find our poster presentation interesting:

Whole genome sequencing for outbreak analysis and pathogen typing

Introduction: Whole genome sequencing via next generation sequencing (NGS) is becoming a standard for the surveillance and and epidemiological investigation of outbreaks1. We introduce a best practices workflow for outbreak analysis streamlining NGS-based Multilocus Sequence Typing (MLST), antimicrobi- al resistance typing, and taxonomic identification of patho- gen isolates.

We further demonstrate that whole genome SNP trees offer unparalleled resolution for source tracking or for the analysis of transmission.

Learn more about the findings and conclusions at ECCMID. We hope to meet you - we'll be at booth 11a.

June 6-9 the 2016 APHL Annual Meeting and Tenth Government Environmental Laboratory Conference takes place in Albuquerque, New Mexico. Join your colleagues for networking and learn about the newest in laboratory technology, supplies and services.

You can find us at booth #313 where we're looking forward to present our solutions, and on Tuesday, June 7, we're hosting a workshop:

QIAGEN Next Generation Sequencing and Bioinformatics Analysis of Bacterial and Viral Pathogens  

Speakers:
Bi Linton: QIAGEN Automation Solutions for the NGS Workflow
Neha Jalan:  NGS data analysis using CLC Genomics Workbench - WGS Salmonella case study
Cecilie Boysen: Bioinformatics solutions for NGS data analysis: Detection and Characterization of Flu, Mtb, and other pathogens

Date and time: Tuesday June 7 at 8:00 a.m. - 8:45 a.m.

Location: Santa Ana Room

Continental breakfast provided

We're looking forward to seeing you in Albuquerque!

Learn more about CLC Genomics Workbench
Read more about the 2016 APHL Annual Meeting

Your $1,000 genome will only cost $22 to analyze

We're committed to enabling our customers to analyze vast amounts of NGS data quickly and at the lowest total cost possible. This year, we made investments designed to enable scalable discovery through the optimization of the speed, accuracy, and cost of our server solution consisting of CLC Genomics Server with the Biomedical Genomics Server extension and Biomedical Genomics Workbench platform. Through extensive benchmark testing, we were able to show that our solution is able to process the maximum throughput from an Illumina HiSeq X Ten, with high accuracy, and at a total cost of ownership (TCO) much lower than alternative solutions. 

Data analysis to keep pace with maximum throughput

The maximum throughput of an Illumina HiSeq X Ten has been established at a total of 18,000 whole genome sequences per year. This equates to an average rate of analysis of one whole human genome sequence every 30 minutes. By testing the optimized speed of our solution (including SSE/SIMD code optimizations for Intel x86), we were able to demonstrate that CLC Genomics Server is not only able to keep pace with the data output of an Illumina HiSeq X Ten sequencer running at maximum throughput, but able to do so with less computing nodes than recommended by others. Testing revealed that CLC Genomics Server requires a computer cluster of only 35 nodes, as contrasted to the 85 nodes recommended by Illumina (variant calling based on BWA+GATK in the HiSeq X System Lab Setup and Site Prep Guide (Part #15050093 Rev. H July 2015)). Our comparison benchmark testing was carried out by installing the CLC Genomics Server software on a compute cluster of 35 nodes, each equipped with a 28-core E5-2697 v3 @ 2.60GHz, 128 GB RAM on a shared lustre file system. We used the standard CLC variant calling workflow that comes with the Biomedical Genomics Server solution.

Full analysis of whole human genomes for as little as $22 each

By minimizing the hardware requirements from 85 nodes to just 35, we also minimize the total cost of ownership (TCO) of the solution over a four-year period, which includes everything from software licenses and hardware, to power, cooling, networking, and floor space. Our calculations of the total ownership costs show that with the given specifications, the cost will be as low as $22 per whole human genome analyzed. Given the high throughput enabled by a HiSeq X Ten, the savings can be sizable.

Accurate identification of disease-causing variants

Of course, the total cost of ownership and speed of the overall solution doesn’t mean much unless the results of the analysis are also accurate. To prove accuracy, we chose hereditary disease trio analysis as a test case, and are proud to say that in most cases the Biomedical Genomics Server solution (CLC Genomics Server and Biomedical Genomics Workbench) together with Ingenuity® Variant Analysis™ for interpretation accurately identified the disease-causing variant without calling any false-positive de novo or causal variants.

But this is not the end of the story; we’re just getting started. Our focus on application performance and accuracy of results is essential, so we expect to improve these even more in the future.

More information 

Learn more about Biomedical Genomics Server solution

Read the story on the Intel Health & Life Sciences blog

 

Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.