1 Description

This R Markdown document demonstrates how to:

  • Perform image registration to align transcriptomic and histological data.
  • Integrate cell-level and molecule-level spatial transcriptomics (Xenium).
  • Combine tissue morphology from H&E with molecular insights to study spatial infection patterns in COVID-19 lungs.

We will use:

  • Xenium breast cancer dataset with accompanying H&E image.
  • A Xenium COVID-19 lung dataset with annotated hyaline membrane regions and SARS-CoV-2 viral RNAs (S2_N, S2_orf1ab).

If you want to reproduce the experiment: 👉 Click here to the Docker and the data setup instructions

1.1 Description of the Files

  • Xen_R1: Processed Xenium object of breast cancer section (cells).
  • Xen_R1_image: Corresponding H&E image.
  • vr2_merged_acute1: COVID-19 Xenium dataset with both cells and molecules (Xenium and Xenium_mol).
  • acutecase1_membrane.geojson: Annotated hyaline membrane regions.
  • acutecase1_heimage.jpg: H&E image of lung tissue.

2 Load Required Libraries

library(VoltRon)
library(dplyr)
library(Seurat)
library(ComplexHeatmap)
library(basilisk)
library(vitessceR)
library(ggnewscale)

3 Image Registration – Xenium Breast Cancer Dataset

3.1 Why Image Registration?

Spatial transcriptomics often captures DAPI-based coordinates, while histology gives tissue context (e.g., H&E). Registration bridges these layers. These integrated features enables morphology-guided cell analysis:

For example:

  • Cell localization within tissue landmarks
  • Detection of tumor margins, immune cell hotspots, etc.

3.2 Import Data

Xen_R1 <- loadVoltRon("../workshop/data/ondisk/Xen_R1/")

We store them on disk using an HDF5VoltRon format. Disk-backed formats allow us to analyze large spatial datasets without crashing R due to memory limits.

Xen_R1_image <- importImageData("../workshop/data/BreastCancer/Xenium_R1/Xenium_FFPE_Human_Breast_Cancer_Rep1_he_image_highres.tif",
                                sample_name = "HEimage",
                                channel_names = "H&E", tile.size = 100)
Xen_R1_image_disk <- saveVoltRon(Xen_R1_image, 
                                 format = "HDF5VoltRon", 
                                 output = "../data/ondisk/Xen_R1_image/", 
                                 replace = TRUE)

# load H&E data from disk
Xen_R1_image_disk <- loadVoltRon("../data/ondisk/Xen_R1_image/")

3.2.1 Perform Registration

We now perform automated image registration. Behind the scenes, this detects features (e.g., nuclei, shapes) and applies homography to align two spatial maps.

xen_reg <- registerSpatialData(object_list = list(Xen_R1, Xen_R1_image_disk)) # This will give you a interface to do the registration for two images like below

We then transfer the aligned image and visualize the clusters overlayed on H&E.

Xenium_reg <- xen_reg$registered_spat[[2]]
vrImages(Xen_R1[["Assay1"]], channel = "H&E") <- vrImages(Xenium_reg, name = "main_reg", channel = "H&E")
vrSpatialPlot(Xen_R1, group.by = "Clusters", channel = "H&E")
## Warning: Converting to a dense matrix may use excessive memory
## This message is displayed once every 8 hours.
## Warning: Removed 347 rows containing missing values or values outside the scale range
## (`geom_tile()`).


4 Multi-omics Integration – COVID-19 Lung

4.1 Biological Background

COVID-19 lungs contain viral RNAs (S2_N, S2_orf1ab), some inside cells, some in hyaline membrane regions (mucosal damage zones). We analyze:

  • Cell types: Immune and structural cells.
  • Molecules: Viral RNA transcripts.

4.2 Biological Question

We study:

  • Which cells (e.g., immune, stromal) are present in COVID-affected lung
  • Where viral RNA molecules (e.g., S2_N, ORF1ab) appear
  • How spatial location (e.g., hyaline membrane) modulates infection

4.3 Import Xenium Object

vr2_merged_acute1 <- readRDS(file = "../workshop/data/COVID19/acutecase1_annotated.rds")
SampleMetadata(vr2_merged_acute1)
##             Assay    Layer       Sample
## Assay7     Xenium Section1 acute case 1
## Assay8 Xenium_mol Section1 acute case 1

4.4 Visualize Cells & Molecules

Xenium captures not only transcripts within cells but also those outside the cells. They use DAPI staining to mark the boundary and calculate the transcripts within the cells to get the Cell assay. This allows Xenium to provide Cell assay and Molecule assay. This feature provides a more comprehensive understanding of spatial transcriptomics.

# Cells
p1 <- vrSpatialPlot(vr2_merged_acute1, assay = "Xenium", group.by = "CellType", plot.segments = TRUE)

# Molecules
p2 <- vrSpatialPlot(vr2_merged_acute1, assay = "Xenium_mol", group.by = "gene", group.ids = c("S2_N", "S2_orf1ab"), n.tile = 500)

p3 <- vrSpatialPlot(vr2_merged_acute1, assay = "Xenium_mol", group.by = "gene", group.ids = c("S2_N", "S2_orf1ab"), n.tile = 500) |>
  addSpatialLayer(vr2_merged_acute1, assay = "Xenium", group.by = "CellType", plot.segments = TRUE)
(p1 | p2 | p3)
## Warning: Removed 54 rows containing missing values or values outside the scale range
## (`geom_tile()`).
## Removed 54 rows containing missing values or values outside the scale range
## (`geom_tile()`).

💬 Explanation: Xenium can capture all transcripts with in the slides. Molecules often lie outside cells, particularly in damaged zones, making tissue morphology crucial for interpretation.

Extra Notes: The expression of S2_N and S2_orf1ab genes are done by 10X company. Also, we don’t know the expression here. I think they capture those gene with some threshold and everything above the threshold they will count as a dot in in image. That’s why we do not discuss the expression for S2_N and S2_orf1ab genes but discuss the counts of them!


5 H&E Alignment with ROI

5.1 Import & Register H&E Image

We load an H&E image annotated by a pathologist for hyaline membranes. These are regions of epithelial damage. With this, we could study how hyaline membrane affects COVID-19 activity.

After registration, we register the image and its annotations to our main dataset.

imgdata <- importImageData("../workshop/data/COVID19/acutecase1_heimage.jpg", tile.size = 10,
                           segments = "../workshop/data/COVID19/acutecase1_membrane.geojson", sample_name = "acute case 1 (HE)",
                           channel_names = "H&E")
imgdata <- flipCoordinates(imgdata, assay = "ROIAnnotation")
vr2_merged_acute1 <- modulateImage(vr2_merged_acute1, brightness = 300)
xen_reg <- registerSpatialData(object_list = list(vr2_merged_acute1, imgdata)) # This will have interface like below

5.2 Transfer Image and ROI

imgdata_reg <- xen_reg$registered_spat[[2]]
vrImages(vr2_merged_acute1[["Assay7"]], name = "main", channel = "H&E") <- vrImages(imgdata_reg, assay = "Assay1", name = "main_reg")
vrImages(vr2_merged_acute1[["Assay8"]], name = "main", channel = "H&E") <- vrImages(imgdata_reg, assay = "Assay1", name = "main_reg")

# Add ROI Assay
vr2_merged_acute1 <- addAssay(vr2_merged_acute1,
                              assay = imgdata_reg[["Assay2"]],
                              metadata = Metadata(imgdata_reg, assay = "ROIAnnotation"),
                              assay_name = "ROIAnnotation",
                              sample = "acute case 1", layer = "Section1")
vrMainSpatial(vr2_merged_acute1[["Assay9"]]) <- "main_reg"
vrMainAssay(vr2_merged_acute1) <- "ROIAnnotation"
vr2_merged_acute1$Region <- "Hyaline Membrane"

p1 <- vrSpatialPlot(vr2_merged_acute1, assay = "Xenium_mol", group.by = "gene", 
              group.ids = c("S2_N", "S2_orf1ab"), n.tile = 500) |>
  addSpatialLayer(vr2_merged_acute1, assay = "ROIAnnotation", 
                  group.by = "Region", alpha = 0.3, spatial = "main_reg",
                  colors = list(`Hyaline Membrane` = "blue")) 
p2 <- vrSpatialPlot(vr2_merged_acute1, assay = "ROIAnnotation", 
              group.by = "Region", alpha = 0.3, spatial = "main_reg",
              colors = list(`Hyaline Membrane` = "blue"))
(p1 | p2)
## Warning: Duplicated `override.aes` is ignored.
## Warning: Removed 54 rows containing missing values or values outside the scale range
## (`geom_tile()`).

💬 Explanation: Here, can see the images are overlapping. But we still don’t know how many S2_N and S2_orf1ab are there within hyaline membrane. Therefore, we need to do the following steps.


6 Transfer ROI to Molecule Data

To assess whether molecules are enriched in hyaline membrane zones. This allows us to tag each molecule with the region it resides in — helping quantify infection by zone.

vrSpatialNames(vr2_merged_acute1, assay = "all")
vrMainSpatial(vr2_merged_acute1[["Assay9"]]) <- "main_reg"
vr2_merged_acute1 <- transferData(object = vr2_merged_acute1, 
                                  from = "Assay9", 
                                  to = "Assay8", 
                                  features = "Region")

6.1 Summary of Molecule Counts

s2_summary_hyaline <- Metadata(vr2_merged_acute1, assay = "Xenium_mol") %>%
  filter(gene %in% c("S2_N", "S2_orf1ab"), Region == "Hyaline Membrane") %>%
  summarise(S2_N = sum(gene == "S2_N"), 
            S2_orf1ab = sum(gene == "S2_orf1ab"), 
            ratio = sum(gene == "S2_N")/sum(gene == "S2_orf1ab")) %>% 
  as.matrix()
s2_summary_hyaline
##       S2_N S2_orf1ab    ratio
## [1,] 50323     32769 1.535689

🧠 Interpretation: S2_N / S2_orf1ab ratio informs us about viral replication. A higher ratio indicates more active replication. So from this observation, we can say the viral is growing within the hyaline membrane.

If you wonder why we use ratio instead of expression, see the Explanation section in here.


7 Interactive Visualization

After knowing the viral is growing within the hyaline membrane, we want to explore more about the Xenium dataset. The annotation step doesn’t show here but bascially just annotate the cells base on their gene expression.

Here, we have a cell type called “H.I. Cells”, which means highly infected cells.

We also curious about whether there is a region have lots of “H.I. Cells” and is COVID-19 highly active in the region or not.

## Use can use Rstduio locally, You can use this function.
as.AnnData(vr2_merged_acute1, assay = "Assay7", file = "../workshop/data/ondisk/COVID19/vr2_merged_acute1_annotated_registered.zarr", 
           flip_coordinates = TRUE, create.ometiff = TRUE, name = "main", channel = "H&E")


# If you use Docker, use this function! But the function may not as flexible as the function above
# vrSpatialPlot(vr2_merged_acute1, assay = "Xenium",channel = "H&E", group.by = "CellType", interactive = TRUE)

🧠 Interpretation: We could see there indeed has a region that is full of “H.I. Cells”. As we zoom in, we could see there just like a battle zone of the cells. So now we curious about what’s the S2_N / S2_orf1ab ratio.


8 Region-Based Comparison

Before calculating the S2_N / S2_orf1ab ratio, we need to mark down this region. Here, we used an interactive annotation interface to define infected zones. You can see the video as a demo.

# Manually annotate infected zones
vr2_merged_acute1 <- annotateSpatialData(vr2_merged_acute1, assay = "Xenium_mol", label = "Region2", 
                                         use.image = TRUE, channel = "H&E", annotation_assay = "InfectedAnnotation")
# The functionwill give you GUI interface. I use video for demo here

8.1 Integrate with other layers

Here, we integrated the infected region with “S2_N”, “S2_orf1ab” image and hyaline membrane image.

vrSpatialPlot(vr2_merged_acute1, assay = "Xenium_mol", group.by = "gene", 
              group.ids = c("S2_N", "S2_orf1ab"), n.tile = 500) |>
  addSpatialLayer(vr2_merged_acute1, assay = "ROIAnnotation", group.by = "Region", alpha = 0.3, spatial = "main_reg",
                  colors = list(`Hyaline Membrane` = "blue")) |>
  addSpatialLayer(vr2_merged_acute1, assay = "InfectedAnnotation", group.by = "Region2", alpha = 0.8, 
                  colors = list(`Infected` = "yellow"))
## Warning: Duplicated `override.aes` is ignored.
## Duplicated `override.aes` is ignored.
## Warning: Removed 54 rows containing missing values or values outside the scale range
## (`geom_tile()`).

Now, we can compare the S2_N / S2_orf1ab ratio of infected region with hyaline membrane region.

# Compare abundance
s2_summary_infected <- 
  Metadata(vr2_merged_acute1, assay = "Xenium_mol") %>%
  filter(gene %in% c("S2_N", "S2_orf1ab"), 
         Region2 == "Infected") %>% 
  summarise(S2_N = sum(gene == "S2_N"), 
            S2_orf1ab = sum(gene == "S2_orf1ab"), 
            ratio = sum(gene == "S2_N")/sum(gene == "S2_orf1ab")) %>% 
  as.matrix()

rbind(s2_summary_infected, s2_summary_hyaline)
##       S2_N S2_orf1ab    ratio
## [1,]  6437      2421 2.658819
## [2,] 50323     32769 1.535689

🧠 Interpretation: We could see the infected region has higher S2_N / S2_orf1ab ratio than hyaline membrane region. But how significant it is?

Here, we do the fish test to see how significant the infected region is higher than hyaline membrane region.

# check all abundances
S2_table <- matrix(c(s2_summary_hyaline[,1:2], 
                     s2_summary_infected[,1:2]), 
                   dimnames = list(Region = c("Hyaline", "Infected"), 
                                   S2 = c("N", "orf1ab")),
                   ncol = 2,  byrow = TRUE)
fisher.test(S2_table, alternative = "two.sided")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  S2_table
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.5498699 0.6065638
## sample estimates:
## odds ratio 
##  0.5775876

🧠 Interpretation: With p-value 2.2e-16, we could confidently say the infected region has higher ratio than hyaline membrane region.


9 Hotspot Analysis

Using a spatial graph, we locate clusters of high local expression (i.e., hotspots) of S2_N and ORF1ab.

vr2_merged_acute1 <- getSpatialNeighbors(vr2_merged_acute1, assay = "Xenium_mol", 
                                         group.by = "gene", group.ids = c("S2_N", "S2_orf1ab"),
                                         method = "radius", radius = 50)

vr2_merged_acute1 <- getHotSpotAnalysis(vr2_merged_acute1, assay = "Xenium_mol", 
                                        features = "gene", graph.type = "radius")
                                        
vrSpatialPlot(vr2_merged_acute1, assay = "Xenium_mol", 
              group.by = "gene_hotspot_flag", group.ids = c("cold", "hot"), 
              alpha = 1, background.color = "white")


10 Summary

This pipeline illustrates how integrating multiple spatial data types can enhance biological insight:

  • Image registration aligns cells and morphology.
  • ROI annotation enables pathology-informed molecular analysis.
  • Multi-layer visualization reveals complex infection patterns.
  • Ratio analysis and hotspot detection highlight viral activity and tissue damage zones.

Together, they provide a powerful framework for spatial biology and disease pathology studies.


11 Environment & Dataset Setup

11.1 Run RStudio in Docker

Use a pre-built Docker image with VoltRon and all dependencies pre-installed.

  1. Make sure Docker Desktop is installed.

  2. Pull the image:

docker pull amanukyan1385/rstudio-voltron:main
  1. Start the container:
docker run --rm -ti \
  -e PASSWORD=yourpassword \
  -p 8787:8787 \
  -v Path/to/your/dataset:/home/rstudio/project \
  amanukyan1385/rstudio-voltron:main
  1. Access RStudio:
    http://localhost:8787
    • Username: rstudio
    • Password: yourpassword # you can change your own password

You can also use Docker Desktop GUI if you prefer volume mounting.


11.2 Datasets

Download the complete dataset bundle used in this notebook:

Download all datasets (ZIP) - including datasets for other markdown


11.2.1 Original Dataset Sources