Here at QIAGEN Bioinformatics, we can’t get enough of the “ten simple rules” guides published in PLoS Computational Biology. They’ve been coming out for years now and they make handy resources for a host of topics. Recently, several lists have really resonated with our team and we wanted to share them here.
From Philip Bourne and collaborators, this list includes rules such as “preprints do not lead to being scooped” and “preprints support the rapid evaluation of controversial results.” Preprints themselves have been somewhat controversial in the biomedical field, despite being widely accepted in other scientific fields. But as members of the bioinformatics community, we support any effort to get information into the public domain faster — after all, the end result is accelerated scientific discoveries that benefit all of us. Kudos to Bourne et al. for bringing attention to this important topic.
This list comes from Morgan Taschuk and Greg Wilson, but we felt like it could just as easily have come from our own R&D team. We spend a ton of time thinking about how to make software robust, reliable, and easy to use — but the academic research labs that have to accomplish these things for their own algorithms rarely have extra resources to support these efforts. Among the rules on this list: “document your code and usage,” “version your releases,” and “eliminate hard-coded paths.” As the authors note in their abstract, “Software produced for research, published and otherwise, suffers from a number of common problems that make it difficult or impossible to run outside the original institution or even off the primary developer’s computer.” Indeed, we hear this complaint from scientists throughout the biomedical field — including many who eventually decide that with all the work it takes to piece them together and keep them running, open-source pipelines aren’t as free as they seem.
This one really hits home for those of us trying to enable big data operations in biology and medicine. Lead author Matthew Zook and collaborators were spurred to write the list by the rapid growth of big data (largely based on human data sets) and the ethical questions that raises. “The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone,” they report. “We exhort researchers to recognize the human participants and complex systems contained within their data and make grappling with ethical questions part of their standard workflow.” The list includes rules like “practice ethical data sharing” and “recognize that privacy is more than a binary value.” It’s a must-read for scientists who find themselves crunching larger and larger data sets to answer all kinds of biological questions.