ApiDB/EuPathDB Workshop

Expression I Exercises

John Brestelli
Tuesday, June 10th - 1:30 pm

Exercise 1: Introduction to EST expression evidence; examining repeat genes in TrichDB


  1. There are >99,000 annotated genes in TrichDB. Run a query that returns only genes that have EST evidence of expression.
    • 10,470
  2. The annotated genes for Trichomonas vaginalis contain over 38,000 repeated genes. Use the "Gene Attributes" query to get all repeated genes. How many repeat genes have EST evidence of expression?
    • 57

Exercise 2 : Interrogation of a GiardiaDB scaffold for DNA Binding Activity in a particular phase of the parasite's life cycle using SAGE-Tag expression evidence.


  1. Find all non-demoted genes for scaffold "CH991769".
    • 434
  2. Find genes that show evidence of expression by SAGE in any of the Encystation libraries. Allow the tag to align 20 bp from either end and align to only one place in the genome. Find only genes with a tag count >= 5.
    • 1318
  3. Roughly, what percentage of genes on this scaffold show evidence of expression by SAGE?
    • ~1/4
  4. Add a column for "Predicted GO Function" to this combined list. Which genes are shown to have the exact function of "nucleic acid binding"?
    • GL50803_2902, GL50803_93463 and GL50803_5942
  5. In which Library(s) are these genes maximally expressed?
    • 4 HR, 42 HR, and 12 HR respectively
  6. Are there additional genes which show any nucleic acid binding activity and evidence for expression by SAGE in an encystations library? (hint: The GO Term ID for "nucleic acid binding" is GO:0003676).
    • 250 genes with GO term "nucleic acid binding"
    • intersection of scaffold, SAGE, and GO gives 7 genes

Exercise 3 : Examination of differential expression of SAGE-Tags falling outside of annotated gene models during the transition between Trophozoite and Early Encystation in GiardiaDB.


  1. Find SAGE-Tag Alignments that show evidence of differential expression (R > 10) in the transition between Trophozoite and early Encystation (Troph1, 4hr and 12hr encystations).
    • 17
  2. How many results don't have a gene associated? Inspect each. Which of these disagrees dramatically with both annotated and deprecated gene boundaries?
    • 4
    • CH991782-1061784-1061799.0
    • This tag shows consistently high counts during endocystation but is not within or nearby an annotated or deprecated gene.
  3. View the region surrounding the SAGE-Tag with the highest R score (from previous query) in GBrowse. Add the track for "EST Alignments" and "EST Assemblies". How many EST's are contained in the assembly for this gene?
    • The SAGE-Tag with the highest R score is CH991763-1355866-1355881.1
    • Assembly GlDT.36882.tmp contains 3 EST's
  4. Perform the corresponding query at the gene level. What are possible reasons for the differences in the list of genes?
    • Many tags can align to one gene contributing to its R value.
    • One tag can align to multiple genes contributing to the R value of each.

Exercise 4 : use of the differential expression query in ToxoDB


  1. How many total genes are differentially expressed in RH in High Glucose v. RH with No Glucose?
    • 732
    • Union of 329 up regulated and 403 down regulated
  2. How many genes are RH-specific? ie. Differentially expressed in RH v. Pru and RH v. VEG, but not VEG v. Pru.
    • Query 1: Pru v. RH up-regulated
    • Query 2: Pru v. RH down-regulated
    • Query 3: RH v. VEG up-regulated
    • Query 4: RH v. VEG down-regulated
    • Query 5: Pru v. VEG up-regulated
    • Query 6: Pru v. VEG down-regulated
    • The History Query - ((1 or 2) and (2 or 4)) not (5 or 6)
    • 1245 Results
  3. How many up regulated genes in High Glucose v. No Glucose are RH Specific? Down Regulated? How do these compare?
    • Intersection of 329 up regulated with 1245 RH Specific = 46 genes
    • Intersection of 403 down regulated with 1245 RH Specific = 152 genes

Exercise 5: use of microarrays to show strain differences influencing expression in PlasmoDB.


  1. "reticulocyte binding protein 2, homolog b" (pfRh2b) is included in a family of proteins that localize to the apical complex. However, the pfRh2b knock-out shows no measurable difference in invasion compared to wild type. Find this gene using the keyword search, and examine the expression graphs.
    • Search using the exact string "reticulocyte binding protein 2, homolog b"
    • The expression profile peaks >40 hours (schizont stage).
  2. The WT v. KO graph shows a large difference for this gene in D10 WT v. 3D7 WT 48HR. Using the KO vs. WT query, find genes which are down regulated in the D10 WT v. 3D7 WT at 48 hours.
    • 100 genes
  3. This gene is naturally missing from strain D10. Find all genes which are down regulated in 3D7 PfRb2b KO v. 3D7 WT at 48 hours. How does this list compare to above?
    • 65 genes; 41 of which are down-regulated in D10 v. 3D7