2025-03-20

Remember that setwd() is not working in rmarkdown but rmarkdown set as default the folder in which it is, so, if you need to take data from other folders put the entire path
For each point of the exercise put the text of the exercise
Make at least 1 chunck of code for each exercise (you can do more than one if you think it’s more easy to understand)
Manipulate figure dimensions to make them visible or change figure parameters

Spatial_expression_data.rds file in Datasets: it contains normalized gene expression values for genes annotated in kegg pathways. (load in in the expr object)
Spatial_expression_annotation_genes.csv: it contains information about marker genes for clusters of spots. (load in in the mark object)
Spatial_expression_annotation_pathways.rds: it contains gene annotations (pathway they belong to). (load in in the path object)
Spatial_expression_annotation_clusters.rds: it contains information about spot division in clusters. (load in in the clus object)

Remember: you have to take every information and manipulate data only using r functions. Make your code reproducible

only genes that are at least markers for three clusters in mark (you can explore functionalities of rowSums() function)
spots in clus belonging to annotated clusters (the ones for with markers are available in mark).

Then:

make complex heatmap plotting genes on rows and spots on columns. Change color scale to make it meaningful.

Add annotations on both rows (pathway annotation) and columns (spot clusters).
Split columns according to clusters.
Add a barplot annotation on rows showing for each gene the total number of genes that belong to the pathway it belongs to (according to the entire kegg annotation).

Use Spatial_expression_annotation_genes.csv in Datasets (you already read it).

Compare markers for cluster 0 ,1 and 3 using a venn diagram. You can try to add colors and to change labels.
Select the genes that are in common between all the three clusters and make a boxplot for each gene (use facet) to see if there are differences in expression values between the three clusters.