Scalable whole genome analysis

Author:

QIAGEN Digital Insights

Scalable whole genome analysis

New resource available: Pairing QIAGEN Bioinformatics tools with Intel technology for scalable whole genome analysis

Scalable whole genome analysis

As we discussed in a blog post earlier, we’ve been working together with Intel to bring world-class infrastructure together with industry-leading genome analysis tools to enable massively scalable whole genome analysis at lower cost. Now, we have released a new white paper detailing the reference architecture and other technical information for our joint solution.

Designed to help NGS scientists keep their sequencing pipelines running smoothly even at capacity – all while saving money and producing better results – our solution provides whole genome analysis for as little as $22 per genome. It meets the computational and analysis demands of Illumina’s HiSeq X Ten, but Intel’s 32-node offering can save researchers up to $1.3 million in total ownership costs compared to the 85-node cluster recommended by the vendor for a BWA+GATK variant calling pipeline.

Here’s a quick look at what makes our solution different:

  • Built-in analysis tools: The system uses Biomedical Genomics Server solution.
  • Scalability: Designed to scale on-demand for computing, networking, and storage, the cluster allows labs to manage capacity easily and cost-effectively.
  • Proven accuracy: While efficiency and cost-effectiveness is an important factor for NGS data analysis, accuracy in both variant calling and interpretation for the solution is proven to be among the best.
  • User friendly: The solution masks the complexity of cluster computing with the easy-to-use Biomedical Genomics Workbench.
  • Fast connection to data: We used a high-speed interconnect system based on Intel True Scale Fabric to link the compute nodes and centralized storage, providing up to 40 Gbps of bandwidth per port.
  • Parallel storage: The solution incorporates Intel Enterprise Edition for Lustre, the world’s leading parallel storage system, to keep all the nodes, cores, and threads operating at high efficiency.

For more details, check out the full white paper.

Our tests showed that the 32-node system could process and analyze 48 genomes in 24 hours, on average – enough capacity to handle all the data produced by a HiSeq X Ten. We also tested the system with exome data and successfully analyzed approximately 1,440 human exomes every 24 hours.

Together with Intel we were presenting this joint solution at the Bio-IT World 2016 conference in a presentation addressing the growing demand for population-scale genomics.

If you’d like to learn more but are not able to attend the conference, please feel free to email us.
More information about Bio-IT World