Case study: Using data science in biochemistry research

Applying data science skills to biochemistry research

This course provided me with an overview of other tools in data science that have application in my research, an ability to implement some of these tools, and a stronger basis from which I can learn new methods.
Dr Nicola Moloney, Postdoctoral Researcher at the Department of Biochemistry, University of Cambridge

The Accelerate Programme for Scientific Discovery equips researchers with the skills to advance the frontiers of science through the application of AI. Dr Nicola Maloney, a postdoctoral researcher in biochemistry, recently benefited from one of their training programmes delivered with Cambridge Spark.

Supported by a donation from Schmidt Futures, a philanthropic initiative founded by Eric and Wendy Schmidt, the Accelerate Programme provides young researchers with specialised training in AI techniques. Cambridge Spark are delighted to have worked with the Accelerate Programme on an initiative spearheaded by Professor Neil Lawrence at the University of Cambridge, equipping scientific researchers with the skills they need to use machine learning and AI to power their research.

The Accelerate-Cambridge Spark data science for science residency is open to scientific researchers at the University of Cambridge.

Dr Nicola Moloney, a researcher at the Department of Biochemistry, recently completed the residency and shared her experience below:

My research involves spatially mapping the proteomes of parasites that cause African trypanosomiasis, a disease which substantially reduces the productivity of livestock animals in sub-Saharan Africa. Understanding the proteome – the set of proteins that a cell produces – is crucial in understanding how a cell works, and (in the case of a disease-causing organism) developing effective treatments. One important way to understand the proteome is to study where proteins are found within a cell. Protein function is often tightly linked with localisation and, consequently, where a protein is found can provide information on its role within the cell.

Spatial proteomics can use machine learning methods with quantitative proteomic data to determine where proteins are localised within a cell. To do this, researchers use purpose-built supervised machine learning methods on multidimensional experimental data, to classify which proteins are found in which parts of a cell (known as subcellular compartments). Such classification requires training data, in the form of proteins with a literature-supported localisation within a cell (i.e. data about proteins with characterised functions and localisations).

My main motivation for attending the Accelerate-Cambridge Spark data science for science residency was to learn ways to expedite the curation of this training data via programmatic access to scientific literature, in addition to learning new ways to explore large datasets.

During the course I worked on a project to mine scientific literature in an automated way for information on localisations for individual proteins. This was achieved by accessing online literature databases programmatically via their APIs. I had no previous experience with using an API before and learned how to from the course lectures and with the help of my project mentor throughout. After the course, I was then equipped to develop this project further into a fully working version. This enabled me to build a database that allows me to efficiently curate a training dataset for use in my research. Other researchers in the domain will also be able to use this tool.

The course also trained me in how to generate interactive visualisations that could be shared with others easily. This is especially helpful as it facilitates the sharing of processed data in informative visualisations without requiring others to run code. Altogether, this course provided me with (i) an overview of other tools in data science that have application in my research, (ii) an ability to implement some of these tools, and (iii) a stronger basis from which I can learn new methods. Importantly, the experience instilled me with a confidence to further explore data science for use in my work. As a result, I more readily attempt to use new tools I learn about and have done so successfully.

Generally, I’ve found that the data science methods I learned on the course enable me to better understand my data by interrogating it with a toolbox of methods, more exhaustively analyse my data by expediting processes, proficiently communicate it with others through visualisations, and better organise my data through improved data management practices. I increasingly use data science in my research and continue to actively develop my skills herein. I plan to continue this tract in future roles.

For Individuals

Own the AI Workflow

For Organisations

AI for Leaders

About

Revolutionise Data & AI Upskilling with EDUKATE.AI

Resources

The Data & AI Mastery Podcast

Applying data science to biochemistry research

Applying data science skills to biochemistry research

Dr Nicola Moloney, a researcher at the Department of Biochemistry, recently completed the residency and shared her experience below:

Upskill your workforce