Welcome to Genomic Insights

Author

Ding Yang Wang

Published

April 5, 2025

Welcome to Genomic Insights

This website is my personal note compilation, based on the statistical foundations and machine learning applications in genomics course. If you are interested in the application of statistic, spatial omics, multi omics, ML/DL in bioinformatics, you may find this resource helpful!

The course consists of a series of theoretical lectures and programming exercises, with a primary focus on analysis using R and Python. However, in my notes, I will mainly cover fundamental theoretical concepts and explain the reasoning behind each step rather than focusing on code implementation.

About the Course

The Compgen2025 course is organized by the Max Delbrück Center at the Berlin Institute for Medical Systems Biology. It is designed for computational biology PhD students, experimental biologists, and medical scientists who want to enhance their skills in data analysis.

🔗 More details about the course here

Course Notes

Practical Applications

In addition to the theoretical notes, I also worked on several hands-on projects during the course, applying the concepts to real-world problems in computational genomics. Below are some of the complete analyses I conducted:

Visium Analysis

  • The markdown is using 10X Visium data (Mouse Brain) and covers normalization, dimensionality reduction, clustering, visualization, and spot deconvolution using single-cell RNA-seq reference for cell type inference.

  • Code

Xenium Analysis

  • Xenium is a cell-level spatial transcriptomics platform that offers subcellular resolution. This allows us to examine not just spatial zones of gene expression but individual cells and their microenvironments. Here, we perform a full analysis including: data filtering, disk-based representation for large-scale handling, clustering, marker analysis, and hotspot detection.
  • Code

Xenium Analysis with Image Registration

  • Registered multimodal images (cell assays, molecular assays, H&E, ROI) to integrate spatial and morphological features. Studied spatially resolved gene expression and microenvironment interactions.
  • Code

Breast Cancer Subtype Prediction

  • Integrated METABRIC multi-omics data (gex, cna, clinical); selected features with Laplacian score; trained model with Bayesian hyperparameter tuning; evaluated classification performance.
  • Code

Survival Analysis

  • Integrated TCGA LGG/GBM data (mutation, cna, clinical); performed EDA; built survival models targeting OS STATUS/OS MONTHS; computed top 5 features with Integrated Gradients; built Cox-PH model with markers and clinical variables; visualized log hazard ratios.
  • Code

Drug Response Prediction

  • Integrated CCLE/GDSC data (mutation, rna) to build models predicting Erlotinib responses; enhanced neural network models by incorporating STRINGDB protein network with Bayesian hyperparameter tuning; computed top 10 features with Integrated Gradients for correlation analysis.
  • Code

Acknowledgments

First of all, I would like to express my heartfelt gratitude to the Scientific Organization, especially to Dr. Altuna Akalın from the Max Delbrück Center, for organizing this course. I am also deeply thankful to Dr. Artür Manukyan, Dr. Bora Uyar, and all the other instructors for their valuable contributions. This course is highly beneficial for those interested in machine learning and bioinformatics, providing a strong foundation in these fields.