ApiDB/EuPathDB Workshop

Expression data II (Proteomics) Exercises

Mark Heiges
Tuesday, June 10th - 3:30 pm

Exercise 1: Mass spec and EST data, Giardia


  • From the EuPathDB site, go to the GiardiaDB site by clicking on the GiardiaDB icon
  • Making use of the available queries and the query history function, where needed, answer the following questions:
    • How many Giardia lamblia genes have evidence of protein expression from mass spectrometry analysis but do not have evidence of transcript expression from ESTs?
      • One way to examine this question is to use the "Mass Spec Evidence" and "EST evidence" queries. Each will need to be run separately, and then using the Query History feature, the genes with EST evidence can be removed from the genes with MS evidence by using the "NOT" operator (1 not 2).

        1674 with MS, 2855 with EST. In query history, use "1 minus 2" to get 422 results.

        The number of genes you find will be influenced by how may EST and Mass Spec studies are selected as options for each of the individual queries. The most complete set of genes will be found by searching for genes that have EST evidence from ALL of the EST libraries and all of the Mass Spec experiments.
    • What are some explanations for why a gene would have expression evidence from MS but not from EST sequencing?

        A few explanations for the differences between experimental evidences include:
      • Long lived protein persists after mRNA has degraded.
      • Different lifecycle stages used in protein and transcript assays. (not in this case - trophozoites were used for both)
      • Different growth conditions in different labs.
      • sensitivity of different assays to low copy number
      • biochemical characteristics of mRNA and protein sequences can affect detection in different assays
      • MS peptide associated with a protein in the database may actually belong to another gene or the MS predicted peptide may be wrong.

Exercise 2 : Mass spec, CryptoDB and ToxoDB


  • From the EuPathDB site, go to the ToxoDB site or the CryptoDB site by clicking on the icon for that site.
  • Making use of the available queries and the query history function, where needed, answer the following questions:
    • Find examples of mass spectrometry derived peptides in CryptoDB or ToxoDB that support the annotation of intron/exon boundaries.
    • What approach did you take to find them?
      • One way to examine the first part of this question is to run the query "Intron/Exon structure" to search for genes that have at least 2 exons and hence one intron. The results of this query can then be intersected, using the history function, with the query "Mass Spec Evidence". The intersection of these two data sets will produce the list of genes that contain introns that also have Mass Spec evidence. NOTE: The above queries do not prove that the Mass Spec Peptide actually spans the intron, this will need to be determined via visual inspection.
    • What other types of experimental evidence contained within CryptoDB or ToxoDB can be used to verify the annotation of intron/exon boundaries?
      • Another type of data that can be used to verify the presence and location of introns is an alignment of ESTs to the genomic sequence, since ESTs presumably have the introns spliced out. If the query, "EST evidence" is run, you have the option of searching for ESTs that significantly overlap with the predicted gene, for example by 300 or 400nt. An intersection of the EST evidence query with the Intron/Exon query would yield a set of genes that can be manually viewed to look for regions of genomic DNA located within a predicted gene that do not occur within an EST.