Mathematical Biology seminar (contact Erica Rutter)
9:00am-10am in ACS 362C and broadcast via Zoom
Speaker: Akshay Paropkari
Title: Predicting novel transcription factor-target gene interactions in the Candida albicans biofilm network
Abstract: Biofilms are surface-adhered communities of microbial cells that can serve as reservoirs of infection. Candida albicans is a common human fungal pathogen, capable of forming biofilms on biotic and abiotic surfaces. Transcription factors (TFs), defined as sequence specific DNA binding proteins, are important players in regulating transcription during complex developmental processes, such as biofilm formation. The transcriptional network controlling biofilm formation in C. albicans, consisting of six “master” regulators, Bcr1, Brg1, Efg1, Ndt80, Rob1, and Tec1, and 1,007 downstream “target” genes, has been previously elucidated for a mature C. albicans biofilm. However, the roles of these TFs in controlling target gene expression at different stages of biofilm development have yet to be determined. In this study, we use a supervised support vector machine (SVM) classifier and a validated set of TF binding sites (TFBSs), to predict novel TF-target gene interactions temporally over the course of C. albicans biofilm formation. First, target sequences were created using previously identified transcription factor binding site (TFBS) consensus sequences that represent potential binding sites. The number of TFBS consensus sequences for each TF depended on both the number of validated sites as well as the fidelity of the motifs and ranged from a few hundred (for Tec1) to over a million (for Rob1). Second, a feature matrix was built to capture the DNA shape and sequence qualities of each candidate TFBS motif. Next, a positive/true set of potential TFBSs were predicted using a trained SVM classifier based on the feature matrix. The sequence similarity score was the top contributing feature to classify novel TFBSs. Finally, active TF-target gene interactions were identified by correlating TF binding activity with the time-series gene expression data of target genes. Interestingly, Ndt80 and Efg1 are predicted to control the greatest number of target genes at any given stage of biofilm development. Overall, by coupling TFBS sequence and DNA shape information, here we predict novel TFBSs, TF-target gene interactions, and ultimately, entire gene regulatory networks controlling each stage of C. albicans biofilm development.