Microbes produce a wide variety of secondary metabolites, which allow them to interact with each other, as well as with their environment more generally. 🦠 Also known as natural products, microbial secondary metabolites can have BIG impacts on human health…both good and bad! 💊 Rather than spend lots of time and money looking for metabolites, we can mine massive amounts of microbial (meta)genomes for the genes responsible for secondary metabolite production, which appear as biosynthetic gene clusters (BGCs). 🧬 However, making sense of ALL of this BGC data is a major challenge! 🤯

One way we can make sense of large BGC datasets is by clustering them together into Gene Cluster Families (GCFs), units that attempt to group BGCs encoding similar biosynthetic pathways together. GCFs can then be used to e.g. identify and filter out redundant BGCs, assign functions to unknown BGCs, and identify putative novel BGCs. The only problem? Existing GCF delineation tools are too slow for massive BGC datasets! 🦥

To overcome this, we developed IGUA, a fast, scalable method for GCF delineation! 🦎 IGUA is over an order of magnitude faster than the state-of-the-art, just as accurate, and can be used for any type of genomic segment clustering task! 🧬 Want to know more? Try out IGUA for yourself and check out the preprint! 🦎