NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {upg dh|ds}
Creative Commons License Last modified: Thursday, 28-Feb-2019 00:08:30 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

For our first XQuery exercise we’ll be working with a special collection of Shakespeare’s plays coded in TEI that are part of our eXist XML database. Because the XML elements in this collection are coded in the TEI namespace, we need to begin by declaring that TEI is our default element namespace (otherwise we will be unable to access the element nodes in the collection). Open eXide, and a new XQuery window, and paste in the following line, all the way to the semicolon, to establish that we are working in the TEI namespace:

declare default element namespace "http://www.tei-c.org/ns/1.0";

You can then access this collection:

collection('/db/apps/shakespeare/data/')

As you work on this it will help you to refer to our XQuery tutorial page to look up how to access files in a collection and see examples of queries. Write XQuery expressions for each of the following tasks using the eXide window in our eXist database, and test them by hitting the Eval button. Then paste your XQuery expressions into a text file, adding comments as needed. You will be submitting your text file to Courseweb.

  1. Find all of the main titles of each of the 42 Shakespeare plays in the collection, by stepping down the descendant axis from the collection. You will need to look at the TEI code of the collection first to see where the main titles are (hint: the play’s main title is coded near the top of the file in a special element called the titleStmt). The simplest answer is a single XPath expression starting with the collection function and descending to the nodes you want. The output should look something like:
    1
    <title xmlns="http://www.tei-c.org/ns/1.0">Love's Labour's Lost</title>
    2
    <title xmlns="http://www.tei-c.org/ns/1.0">Macbeth</title>
    3
    <title xmlns="http://www.tei-c.org/ns/1.0">A Lover's Complaint</title>
    4
    <title xmlns="http://www.tei-c.org/ns/1.0">Pericles, Prince of Tyre</title>
    5
    <title xmlns="http://www.tei-c.org/ns/1.0">Cymbeline</title>
    6
    <title xmlns="http://www.tei-c.org/ns/1.0">Romeo and Juliet</title>
    7
    <title xmlns="http://www.tei-c.org/ns/1.0">All's Well That Ends Well</title>
    ...
                
  2. Modify your XPath above to return just the text of the titles, without the tags. You can do that by using text() or data() or string() . Your output should look something like:
    1
    Love's Labour's Lost
    2
    Macbeth
    3
    A Lover's Complaint
    4
    Pericles, Prince of Tyre
    5
    Cymbeline
    6
    Romeo and Juliet
    7
    All's Well That Ends Well
                
  3. Write an XPath expression that isolates the root element TEI of each play. Notice how you can page through the results using the arrows on top of the return window in eXide. We want to be able to isolate specific plays with interesting features, and to do that we will write filters on the root elements of each one.
  4. Speeches are coded in the Shakespeare plays like this:
    <sp who="ID"><speaker>Name</speaker> text of the speech</sp>
    Write an expression that locates a play holding a speaker named Ferdinand. Which play is it? Record your expression.
  5. Modify your expression to return only the main title of that play, and record your expression.
  6. Now, let’s see if we can find three very special plays that contains a count of more than 58 unique (distinct) speakers! First, see if you can find the play, and then return only its main title (recalling the code you wrote previously). You will need to use count() and distinct-values(), and you’ll need a construction involving a count(of something) greater than 58 .

    Starting from the collection, drill down to the <TEI> elements in the collection (you know there are 42 TEI root elements—one for each play), then filter them based on whether or not they contain more than 58 distinct speakers. You will need to tinker a little to make a filter based on getting a count() of distinct values(), either of @who attribute on sp or of the contents of speaker elements (that is up to you; either will sp/@who or speaker will work for our purposes). And you want to find out if that count() is greater than 58. Once you’re retreiving the three plays that meet that description, you can add path steps to retrieve just the main titles of those three plays.

  7. FLWOR Statement or XPath expression?: Did you write your XQuery for the play with the count of more than 58 distinct speakers with a long XPath expression (from left to right)? Or did you write it up as a FLWOR statement? (Review our tutorial for details and examples on writing FLWOR statements using variables.) Whichever way you chose to write your XQuery in the previous steps, try the other way and see if you can duplicate your results. Record your XQuery expressions in your text file.

When you have completed the assignment copy and paste your expressions into a text file. Upload your text file containing your XQuery expressions to Courseweb.