Statistics for Genomics

Author

Ding Yang Wang

Published

April 3, 2025

1. Statistics for Genomics: Establishing a Reliable Base

Genomic research begins with complex datasets, such as gene expression levels, which contain natural variation and experimental noise. Statistics provides a systematic way to process and interpret this data, ensuring a dependable foundation for further analysis. Below are the key statistical steps and their roles:

1.1 Summarizing Data: Finding “Typical” and “Spread”

Gene expression data often varies significantly. We need a “typical” value to grasp the big picture. Measures like the mean and median summarize the central tendency. The mean uses all data but can be swayed by outliers; the median stays steady, perfect for uneven datasets. Variance and standard deviation quantify the data’s spread, indicating consistency or highlighting potential biological diversity or experimental errors.

1.2 Distribution Models: Understanding Data Shape

Data frequently follows patterns, such as the normal distribution for continuous traits like gene expression or the Poisson distribution for discrete counts like mutations. Identifying these distributions guides the selection of appropriate analytical methods, many of which rely on specific assumptions about data shape.

1.3 Confidence Intervals: Measuring Uncertainty

We estimate population traits (e.g., average gene expression) from samples. Confidence intervals give a range (e.g., 95% likely to hold the true value), showing how much trust we can place in our estimate.

1.4 Hypothesis Testing: Confirming Real Differences

We compare groups, like healthy vs. diseased cells. Hypothesis testing (e.g., t-tests, Moderated t-test, ANOVA, Wilcoxon Signed-Rank Test, Mann-Whitney U Test) uses p-values to check if differences are random. Small p-values (< 0.05) suggest real effects. With thousands of genes, multiple testing corrections (e.g., Bonferroni, FDR) curb false positives.

2. Overall Purpose

Statistics turns raw data into a trustworthy base, revealing what’s happening and what’s worth investigating.

References