Mining the microbiome

Machine learning approaches for novel biosynthetic gene cluster (BGC) discovery ⛏️

🧪 What is this project about?

Microbes produce a wide variety of secondary metabolites, which allow them to interact with each other, as well as with their environment more generally. Also known as natural products, microbial secondary metabolites can have BIG impacts on human health...both good and bad! Rather than spend lots of time and money looking for metabolites, we can mine massive amounts of microbial (meta)genomes for the genes responsible for secondary metabolite production, which appear as biosynthetic gene clusters (BGCs). In this project, we're developing machine learning approaches to mine microbial genomes for novel BGCs with the potential to produce novel secondary metabolites.

Workflow used by GECCO, our software for novel bacterial biosynthetic gene cluster (BGC) discovery!

🧐 Why is it important to research this?

Microbial secondary metabolites play important roles in human health. Some microbial secondary metabolites, for example, are carcinogenic or toxic, while others enable pathogens to cause disease (e.g., colorectal cancer-causing colibactin; the foodborne illness-causing toxin cereulide). However, not all secondary metabolites are bad for human health, as many of the drugs that we rely on every day are derived from microbial secondary metabolites (e.g., antibiotics, anticancer drugs, antifungal drugs). As a result, microbial (meta)genomes can be a treasure trove of novel compounds...both good and bad for human health!

🤞 What can we hope to get out of this project?

The methods and tools developed through this project will allow biologists and chemists to discover novel microbial secondary metabolites with applications in medicine or industry or important roles in human health.