Speaker: Thomas Rube
Title: Measuring Sequence-Specific Biopolymer Interactions using Biophysically Informed Machine Learning
Abstract: Sequence-specific protein-ligand interactions are critical for numerous cellular processes, including transcriptional regulation, RNA-processing, post-translational modifications, and immune recognition. In recent years, high-throughput methods that combine affinity selection of randomized ligand libraries with DNA sequencing have revolutionized our ability to quantify such sequence recognition. In this seminar, I will discuss computational challenges in interpreting such data, why mathematical modeling is critical, and introduce a general modeling framework that learns accurate and biophysically interpretable models of sequence recognition using these data. This framework employs multi-task learning to jointly analyze complementary datasets, and I will discuss how this can be used to decrease the generalization error, identify readout of modified DNA bases, and make quantitative measurements of interaction strengths and enzyme kinetics.