ApiDB/EuPathDB Workshop

Expression I Exercises

John Brestelli
Tuesday, June 10th - 1:30 pm

Exercise 1: Introduction to EST expression evidence; examining repeat genes in TrichDB


  1. There are >99,000 annotated genes in TrichDB. Run a query that returns only genes that have EST evidence of expression.
  2. The annotated genes for Trichomonas vaginalis contain over 38,000 repeated genes. Use the "Gene Attributes" query to get all repeated genes. How many repeat genes have EST evidence of expression?

Exercise 2 : Interrogation of a GiardiaDB scaffold for DNA Binding Activity in a particular phase of the parasite's life cycle using SAGE-Tag expression evidence.


  1. Find all non-demoted genes for scaffold "CH991769".
  2. Find genes that show evidence of expression by SAGE in any of the Encystation libraries. Allow the tag to align 20 bp from either end and align to only one place in the genome. Find only genes with a tag count >= 5.
  3. Roughly, what percentage of genes on this scaffold show evidence of expression by SAGE?
  4. Add a column for "Predicted GO Function" to this combined list. Which genes are shown to have the exact function of "nucleic acid binding"?
  5. In which Library(s) are these genes maximally expressed?
  6. Are there additional genes which show any nucleic acid binding activity and evidence for expression by SAGE in an encystations library? (hint: The GO Term ID for "nucleic acid binding" is GO:0003676).

Exercise 3 : Examination of differential expression of SAGE-Tags falling outside of annotated gene models during the transition between Trophozoite and Early Encystation in GiardiaDB.


  1. Find SAGE-Tag Alignments that show evidence of differential expression (R > 10) in the transition between Trophozoite and early Encystation (Troph1, 4hr and 12hr encystations).
  2. How many results don't have a gene associated? Inspect each. Which of these disagrees dramatically with both annotated and deprecated gene boundaries?
  3. View the region surrounding the SAGE-Tag with the highest R score (from previous query) in GBrowse. Add the track for "EST Alignments" and "EST Assemblies". How many EST's are contained in the assembly for this gene?
  4. Perform the corresponding query at the gene level. What are possible reasons for the differences in the list of genes?

Exercise 4 : use of the differential expression query in ToxoDB


  1. How many total genes are differentially expressed in RH in High Glucose v. RH with No Glucose?
  2. How many genes are RH-specific? ie. Differentially expressed in RH v. Pru and RH v. VEG, but not VEG v. Pru.
  3. How many up regulated genes in High Glucose v. No Glucose are RH Specific? Down Regulated? How do these compare?

Exercise 5: use of microarrays to show strain differences influencing expression in PlasmoDB.


  1. "reticulocyte binding protein 2, homolog b" (pfRh2b) is included in a family of proteins that localize to the apical complex. However, the pfRh2b knock-out shows no measurable difference in invasion compared to wild type. Find this gene using the keyword search, and examine the expression graphs.
  2. The WT v. KO graph shows a large difference for this gene in D10 WT v. 3D7 WT 48HR. Using the KO vs. WT query, find genes which are down regulated in the D10 WT v. 3D7 WT at 48 hours.
  3. This gene is naturally missing from strain D10. Find all genes which are down regulated in 3D7 PfRb2b KO v. 3D7 WT at 48 hours. How does this list compare to above?