The world of biotechnology is increasingly driven by data. As the number of biotech companies continues to grow, so does this industry’s need for data scientists, data analysts, and other technologists.
According to the Bureau of Labor Statistics the job market for biotechnology has been growing at an above-average rate, with the median salary standing at about $80,455 a year. Burning Glass, which collects and analyzes millions of job postings from across the country, arrived at a similar conclusion, projecting that research and development jobs in the biotechnology sector will grow 22.8 percent over the next 10 years.
With demand for data—and the associated concerns of where to store it, how to process it, and how data pipelines will be structured—on the rise, there will be plenty of opportunities in biotech for those specializing in app development, software engineering, cloud architecture, and other fields.
“We do a lot of work on robotics and automation, tracking what’s actually happening in the lab, and for that you need a lot of data capturing and data quality organization and curating capabilities,” explained Flo Mazzoldi, head of digital technology for Ginkgo Bioworks, a Boston-based biotechnology firm.
Because of the need for bioinformatics—a field that develops methods and software tools for understanding large, complex sets of biological data—Mazzoldi said companies like Ginkgo are on the hunt for specialists with deep software engineering backgrounds and an understanding of biology.
“We need AWS people that are cloud-native, because traditionally bioinformatics platforms don’t live in the cloud, and as we leverage new algorithms we need to scale infinitively in the cloud—that’s very important,” he explained.
Ginkgo Bioworks is building an API for its team that automates and scales the process of organism design, so anyone building proteins can request services in the same way one would request micro-services in other industries.
“Currently, the main data analytics platform that biotech companies use, in my opinion, is Jupyter Notebooks. This gives scientists total flexibility,” Mazzoldi said. “Our regular tech stack is Python, Jungle, React on the front end, Snowflake for data lake.”
He predicted data lakes would become more and more common in biotech. On top of Snowflake, his team uses Tableau to visualize and explore the data: “These tools are very useful if you don’t need to do lots of data manipulation… Otherwise scientists default back to Jupyter Notebooks, either with Python or R as the language.”
Open-source software, it is heavily leveraged in biotech R&D, which must sometimes engage in heavy customization in order to produce a useful tool. That’s in addition to software firms such as Benchling trying to meet the needs of this market with specialized products.
Because of the high demand, finding technologists who are also interested in biology can prove challenging. When it comes to biotech, some firms are willing to hire technologists with relatively little knowledge of biology, betting that they’ll be able to train those workers in the necessary concepts as time goes on. In any case, qualified candidates are in an excellent bargaining position at the moment.
“In the mission of making biology easier to engineer, the way you do that is doing a lot of biology and learning from what you did,” Mazzoldi said. “Learning means data, making sure you have high quality [data], building the analytics tools so you can scout for patterns, building predictive models with machine learning that can give you good predictions.”
Because there are a lot of areas in biology that remain mysteries to scientists. Solving those problems requires tons of data curation, with a focus on data quality and organization. Datasets and toolsets must be standardized in order to optimize the resulting analytics, especially once the datasets in question become extraordinarily large.
All of that means applicants with a combination of machine learning, data analytics, and biology knowledge will only increase in value as time goes on. DevOps and Agile practices are also essential; at Ginkgo Bioworks, for example, both are used, and such skills are applicable to pretty much any biotechnology firm.
“Another area throughout the industry we need are people who specialize in integrating toolsets—hooking up your instruments to Benchling (a cloud-based software platform for biology researchers), and make sure your bioinformatics are integrated on the back end,” Mazzoldi said. That will require backend and data engineers who can successfully build pipelines of data.