Additional course websites: UIUC Canvas

Instructor

Ge Liu

Instructor, Assistant Prof. @ UIUC CS

geliu[AT]illinois[dot]edu

TA

Stefan Ivanovic

TA

stefan4@illinois.edu

Course description

In this graduate-level course, we will introduce foundational and state-of-the-art machine-learning approaches to key problems in life sciences and biology, with an emphasis on deep learning for molecular engineering. The course is structured into four main modules: functional genomics, protein language models, geometric DL, and protein structure models. We will cover a wide range of DL architectures (CNN, Transformer, GNN, etc.), generative models (e.g., language models and diffusion) and machine learning techniques (supervised, unsupervised, optimization), and discuss their applications to modeling, understanding, and designing the sequence, structure, and functions of genomes and proteins. Students will learn about both theoretical concepts and applications to biological problems, and gain practical experience through implementing solutions to problem sets, reading papers, and conducting final course project.

Logistics

  • Lecture: WF 11:00 - 12:15 in 1302 Siebel Center for Comp Sci
  • Office hours: Thursday 3pm-4pm in 3212 Siebel Center for Comp Sci
  • TA office hours: F 10:00-10:50 in Siebel Center floor 0 lobby

Grading

Grading will be based upon four problem sets containing either programming questions or written questions (40%), a Course Project (35%), reading assignments (20%), and participation (5%). Attendance in lecture is important as the class moves quickly and you will need to be present. You can use three late days for problem set deadlines (or email the course staff).

Syllabus and schedule

Date                  Lecture    TitleAssignment
8/28/2024 Lecture 1 Logistics and intro
8/30/2024 Lecture 2 ML foundations
9/4/2024 Lecture 3 ML foundations
Module I: Functional Genomics
9/6/2024 Lecture 4 DL for functional genomics I
9/11/2024 Lecture 5 CNN/ResNet
9/13/2024 Lecture 6 RNN, LSTM, StateSpaceModel(S4,Mamba,Hyena) Reading assignment 1
9/18/2024 Lecture 7 Model uncertainty and interpretability Problem set 1 out
9/20/2024 Lecture 8 DL for functional genomics II
Module II: Protein Language Models
9/25/2024 Lecture 9 Protein sequenece modeling I
9/27/2024 Lecture 10 Transformer and MLM I
10/1/2024 Lecture 11 Transformer and MLM II Reading assignment 2
10/3/2024 Lecture 12 Protein sequenece modeling II Problem set 1 due, Finalize project team-up
10/8/2024 Lecture 13 Protein function prediction (sequence based) Problem set 2 out
10/10/2024 Lecture 14 Protein function optimization (sequence based) I
10/15/2024 Lecture 15 Protein function optimization (sequence based) II
Module III: Intro to Geometric DL
10/17/2024 Lecture 16 GNN
10/22/2024 Lecture 17 Intro to Geometric DL for molecular representation I Problem set 2 due, Reading assignment 3
10/24/2024 Lecture 18 Intro to Geometric DL for molecular representation II Problem set 3 out
Module IV: Protein Structure Models
10/29/2024 Lecture 19 Protein structure prediction I
10/31/2024 Lecture 20 Protein structure prediction II Project proposal due
11/5/2024 Lecture 21 Folding and inverse folding Reading assignment 4
11/7/2024 Lecture 22 Protein interaction, function prediction
11/12/2024 Lecture 23 Generative models primers: VAE,Diffusion, Flow Matching I Problem set 3 due
11/14/2024 Lecture 24 Generative models primers: VAE,Diffusion, Flow Matching II Problem set 4, Reading assignment 5
11/19/2024 Lecture 25 Generative model for protein design I
11/21/2024 Lecture 26 Generative model for protein design II
Final projects
11/26/2024 No lecture Fall break
11/28/2024 No lecture Fall break
12/5/2024 Lecture 27 Project Presentations I
12/7/2024 Lecture 28 Project Presentations II

Prerequisites

Linear Algebra, python programming, intro to ML.

Project

This subject has a substantial project component. We recommend (but do not require) working on projects in team of 2-3 students. You are free to choose any problem related to the lectures of the course, and develop a deep learning solution.