Exercise summary

  1. Read the following two RDS files available from Datasets folders:


How many genes are in each table?


  1. For each dataset, select differentially expressed genes, defined as genes with:


Create a new column diff based on the sign of log2FoldChange:

How many differentially expressed genes are there per cell line model? How many are up-regulated and down-regulated?


Summarize the results in a table.


  1. Create a Venn diagram to show the intersection of differentially expressed genes between VCAP and PC3.
    Ensure that the cell line names are labeled for each category in the plot.

  1. Create an UpSet plot to visualize the intersection of the following gene sets:

  1. Select genes that are up-regulated in both PC3 and VCAP cell models. Add the correspondent log2FoldChange values from both PC3 and VCAP experiments. How many such genes are there?

Now, create a scatter plot, using:

Evaluate the correlation between log2FoldChange values.


  1. Select genes that are down-regulated in both PC3 and VCAP cell models. Add the correspondent log2FoldChange values from both PC3 and VCAP experiments. How many such genes are there?

Now, create a scatter plot, using:

Evaluate the correlation between log2FoldChange values.


  1. Arrange the scatter plots of up-regulated and down-regulated genes together in a single layout for better visualization and comparison.

  1. Create a function to classify genes based on their log2FoldChange values:

  1. Create a unified dataframe all, by concatenating the differentially expressed gene dataframes from PC3 and VCAP.

Before merging, add a new column to each dataframe to indicate the cell model (PC3 or VCAP) for proper distinction.


  1. Apply the function created in Exercise 8 to all values in the log2FoldChange column of the all dataframe:

  1. Create a barplot to visualize the distribution of genes across categories de_category.

  1. Now, assess whether gene length impacts the category of differentially expressed genes.

  2. Create a TxDb object from the GTF file:
    File: gencode.v28lift37.basic.annotation.gtf.gz (located in the Datasets folder).

  3. Extract genes that are present in the all dataframe.

  4. Compute gene length for these genes.

  5. Add the gene length as a new column in the all dataframe.


  1. Create a boxplot to visualize gene lengths across cell line models (PC3 and VCAP) within each differential expression category de_category.

Add Wilcoxon test results to assess whether there is a statistically significant difference in gene length between PC3 and VCAP within each de_category.