Welcome to Genomic Insights
Welcome to Genomic Insights
This website is my personal note compilation, based on the statistical foundations and machine learning applications in genomics course. If you are interested in the application of statistic, spatial omics, multi omics, ML/DL in bioinformatics, you may find this resource helpful!
The course consists of a series of theoretical lectures and programming exercises, with a primary focus on analysis using R and Python. However, in my notes, I will mainly cover fundamental theoretical concepts and explain the reasoning behind each step rather than focusing on code implementation.
About the Course
The Compgen2025 course is organized by the Max Delbrück Center at the Berlin Institute for Medical Systems Biology. It is designed for computational biology PhD students, experimental biologists, and medical scientists who want to enhance their skills in data analysis.
Course Notes
Module 1: AI-assisted data analysis
Statistics for Genomics - Notes, Course record, Book Link
Unsupervised Learning - Notes, Course record, Book Link
Supervised Learning - Notes, Course record, Book Link
Module 2: Spatial omics data analysis
Module 3: Multi-omics data integration and Neural Network
Introduction of Multiomics - Notes, Course record, Slides
Neural Network Training - Notes, Course record, Slides
Genomics Model - Notes, Course record, Slides (In the note, I used lots of supplyments. Therefore, it might be different from the course record)
Practical Applications
In addition to the theoretical notes, I also worked on several hands-on projects during the course, applying the concepts to real-world problems in computational genomics. Below are some of the complete analyses I conducted:
Visium Analysis
The markdown is using 10X Visium data (Mouse Brain) and covers normalization, dimensionality reduction, clustering, visualization, and spot deconvolution using single-cell RNA-seq reference for cell type inference.
Xenium Analysis
- Xenium is a cell-level spatial transcriptomics platform that offers subcellular resolution. This allows us to examine not just spatial zones of gene expression but individual cells and their microenvironments. Here, we perform a full analysis including: data filtering, disk-based representation for large-scale handling, clustering, marker analysis, and hotspot detection.
- Code
Xenium Analysis with Image Registration
- Registered multimodal images (cell assays, molecular assays, H&E, ROI) to integrate spatial and morphological features. Studied spatially resolved gene expression and microenvironment interactions.
- Code
Breast Cancer Subtype Prediction
- Integrated METABRIC multi-omics data (gex, cna, clinical); selected features with Laplacian score; trained model with Bayesian hyperparameter tuning; evaluated classification performance.
- Code
Survival Analysis
- Integrated TCGA LGG/GBM data (mutation, cna, clinical); performed EDA; built survival models targeting OS STATUS/OS MONTHS; computed top 5 features with Integrated Gradients; built Cox-PH model with markers and clinical variables; visualized log hazard ratios.
- Code
Drug Response Prediction
- Integrated CCLE/GDSC data (mutation, rna) to build models predicting Erlotinib responses; enhanced neural network models by incorporating STRINGDB protein network with Bayesian hyperparameter tuning; computed top 10 features with Integrated Gradients for correlation analysis.
- Code
Acknowledgments
First of all, I would like to express my heartfelt gratitude to the Scientific Organization, especially to Dr. Altuna Akalın from the Max Delbrück Center, for organizing this course. I am also deeply thankful to Dr. Artür Manukyan, Dr. Bora Uyar, and all the other instructors for their valuable contributions. This course is highly beneficial for those interested in machine learning and bioinformatics, providing a strong foundation in these fields.