Biostrings and BSgenome packages (Genomic sequences)

  1. Transform the character DNA sequence "TGCTCAGGTAGCCTCACCTCC" into a DNAString object, then evaluate:

Hint: explore DNAString( ) function.


  1. Using this vector sequences <- c("AAATCGA", "ATACAACAT", "TTGCCA"), transform these sequences into a DNAStringSet object.

Hint: explore DNAStringSet( ) function.


  1. What happens when you try to create a DNAStringSet() from an object that does not contain a DNA sequence?
    And what if you try DNAStringSet("ACGTMRW")? Why?

Check this resource for more information.


  1. From the available BSgenome genomes, identify the Apis mellifera assembly from BeeBase and install the related package.
    Assign the assembly to a variable named dna.

  1. In which year the assembly has been built?

  1. How may chromosomes contain the the Apis mellifera genome?

  1. How many bases each chromosome contain?

  1. Use letterFrequency() to evaluate the frequency of N bases into the “Group1”

  1. As before, but try as.prob = TRUE as option in letterFrequency(). What does it change?

  1. As [ is used to subset a DNAStringSet, it cannot be used to take substrings from a sequence. Instead, this can be done usiing the subseq( ) function.
    Create a DNAStringSet by extracting sequences for Group1, Group2 and Group5 from the genome of the previous exercises.
    Then, extract sequences from the second to the tenth position for all groups using subseq().

  1. Read the FASTQ file Homo_sapiens.GRCh38.dna.chromosome.11.22.fa from the Datasets folder. Explore the obtained object.