Whole human genome analysis

Author:

QIAGEN Digital Insights

Whole human genome analysis

Your $1,000 genome will only cost $22 to analyze

We’re committed to enabling our customers to analyze vast amounts of NGS data quickly and at the lowest total cost possible. This year, we made investments designed to enable scalable discovery through the optimization of the speed, accuracy, and cost of our server solution consisting of CLC Genomics Server with the Biomedical Genomics Server extension and Biomedical Genomics Workbench platform. Through extensive benchmark testing, we were able to show that our solution is able to process the maximum throughput from an Illumina HiSeq X Ten, with high accuracy, and at a total cost of ownership (TCO) much lower than alternative solutions. 

Data analysis to keep pace with maximum throughput

The maximum throughput of an Illumina HiSeq X Ten has been established at a total of 18,000 whole genome sequences per year. This equates to an average rate of analysis of one whole human genome sequence every 30 minutes. By testing the optimized speed of our solution (including SSE/SIMD code optimizations for Intel x86), we were able to demonstrate that CLC Genomics Server is not only able to keep pace with the data output of an Illumina HiSeq X Ten sequencer running at maximum throughput, but able to do so with less computing nodes than recommended by others. Testing revealed that CLC Genomics Server requires a computer cluster of only 35 nodes, as contrasted to the 85 nodes recommended by Illumina (variant calling based on BWA+GATK in the HiSeq X System Lab Setup and Site Prep Guide (Part #15050093 Rev. H July 2015)). Our comparison benchmark testing was carried out by installing the CLC Genomics Server software on a compute cluster of 35 nodes, each equipped with a 28-core E5-2697 v3 @ 2.60GHz, 128 GB RAM on a shared lustre file system. We used the standard CLC variant calling workflow that comes with the Biomedical Genomics Server solution.

Full analysis of whole human genomes for as little as $22 each

By minimizing the hardware requirements from 85 nodes to just 35, we also minimize the total cost of ownership (TCO) of the solution over a four-year period, which includes everything from software licenses and hardware, to power, cooling, networking, and floor space. Our calculations of the total ownership costs show that with the given specifications, the cost will be as low as $22 per whole human genome analyzed. Given the high throughput enabled by a HiSeq X Ten, the savings can be sizable.

Accurate identification of disease-causing variants

Of course, the total cost of ownership and speed of the overall solution doesn’t mean much unless the results of the analysis are also accurate. To prove accuracy, we chose hereditary disease trio analysis as a test case, and are proud to say that in most cases the Biomedical Genomics Server solution (CLC Genomics Server and Biomedical Genomics Workbench) together with Ingenuity® Variant Analysis™ for interpretation accurately identified the disease-causing variant without calling any false-positive de novo or causal variants.

But this is not the end of the story; we’re just getting started. Our focus on application performance and accuracy of results is essential, so we expect to improve these even more in the future.

More information 

Learn more about Biomedical Genomics Server solution

Read the story on the Intel Health & Life Sciences blog