ApiDB/EuPathDB Workshop

Solutions to UI Exercises
Eileen Kraemer, Brian Brunk
Monday, June 9th - 10:15 am

Exercise 1: Secreted Kinases


  1. Begin at the site: ui.cryptodb.org. Making use of the available queries and search strategy functionality, perform the following tasks:
  2. Identify all genes in C. parvum that are putative kinases (hint, use the keyword search). How many genes did you find?

    Enter 'kinase' in keyword search. See the "Results 274" in the box labeled Step 1.
    However, you'll note that this includes both C. hominis and C. parvum genes.
    We'll filter out the hominis genes in some of the following steps.
  3. Now, determine the subset of these that have a signal peptide (and thus may be secreted). How many genes are in the result now?

    Click on 'Add Step'.
    See the list of queries that appears.
    Click on 'Protein Features'.
    Now click on 'Predicted Signal Peptide'.
    Note that C. parvum is selected - this will limit our results to parvum genes.
    Accept the other default parameters.
    In the box on the right, note that "Intersect" is selected. (This is what we want).
    Now click on the "Run Step" box.
    Notice the pink box. It represents the result of combining the keyword query
    and the signal peptide query.
    We now have 6 genes in our result. Note that they are listed below.
    Note also that if you click on the Signal Peptide box (right now you need to click
    on the word "Signal Pep") that it turns pink and the results of the Signal Peptide
    query are shown in the listing below. Similarly, if you click on the "Keyword"
    text, then the Keyword box turns pink and the results of that query are shown
    in the list below.
  4. Further reduce the result to contain only those genes that have evidence for expression from EST alignment data. How many genes are in the result now? What are their gene ids?

    Click on 'Add Step'.
    See the list of queries that appears.
    Click on 'Transcript Expression'.
    Now click on 'EST evidence'.
    Click on "select all" under EST library, but accept all other advanced parameters.
    In the box on the right, note that "Intersect" is selected. (This is what we want).
    Now click on the "Run Step" box.
    See that we have 1 result .. cgd5_1470.
  5. Alter the search parameters to identify phosphatases rather than kinases. How many genes did you find? What are their gene ids?
  6. Edit the signal peptide query to match any of the advanced parameters rather than all of the advanced parameters. How does this affect the number of genes in the result?

Exercise 2 : Membrane phosphatases


  1. Making use of the available queries and search strategy functionality, find putative secreted or membrane bound enzymes that are between 10,000 and 50,000 molecular weight and have evidence for expression based on proteomics data.
  2. Identify all genes that contain a signal peptide OR (union) a transmembrane domain. How many genes did you find?
  3. Which of these have a GO term "catalytic activity"? How many genes are in the result now?
  4. Of these putative secreted enzymes, which have a molecular weight between 10000 and 50000 daltons? How many genes are in the result now?
  5. Finally, which of these also have evidence for expression in any experiment based on proteomics data? How many genes are in the result now? What are their gene_ids?