Till Grallert
2 Jun 2015
The slides are based on those supplied by the various Digital Humanities Summer Schools at the University of Oxford under the Creative Commons Attribution license and have been adopted to the needs of the 2015 Introduction to TEI at DHSI.
Slides were produced using MultiMarkDown, Pandoc, Slidy JS, and the Snippet jQuery Syntax highlighter.
We will cover:
Every use of the TEI involves making use of a customisation of the TEI.
<gi>) for the element, and optionally other names in other languages<schemaSpec>) is made by selecting modules or elements and (optionally) modifying their contents| Module name | Chapter of the P5 |
|---|---|
| analysis | Simple analytical mechanisms |
| certainty | Certainty and responsibility |
| core | Elements available in ALL TEI documents |
| corpus | Language corpora |
| dictionaries | Dictionaries |
| drama | Performance texts |
| figure | Tables, formulae, and graphics |
| gaiji | Representation of non-standard characters and glyphs |
| header | the TEI header |
| iso-fs | Feature structures |
| linking | Linking, segmentation, and alignment |
| msdescription | Manuscript description |
| namesdates | Names, dates, people, and places |
| nets | Graphs, networks, and trees |
| spoken | Transcription of speech |
| tagdocs | Documentation elements |
| tei | the TEI infrastructure |
| textcrit | Critical apparatus |
| textstructure | Default text structure |
| transcr | Representation of primary sources |
| verse | verse |
Here comes Roma a command line script, with a web frontend, designed to make this process much easier http://www.tei-c.org/Roma/

Screen shot: select a starting point

Screen shot: customise metadata

Screen shot: select a schema language for download

Screen shot: generate documentation
We processed a pre-existing ODD file which contained (as well as some discursive prose) the following schema specification:
<schemaSpec ident="tei_bare" start="TEI">
<moduleRef key="core"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="textstructure"/>
<elementSpec ident="abbr" mode="delete" module="core"/>
<elementSpec ident="add" mode="delete" module="core"/>
<!-- ... -->
<elementSpec ident="trailer" mode="delete" module="textstructure"/>
<elementSpec ident="title" mode="change" module="core">
<attList>
<attDef ident="level" mode="delete"/>
</attList>
</elementSpec>
<!-- ... -->
</schemaSpec>
We selected four modules, deleted loads of elements, and also deleted an attribute.

Screen shot: select modules

Screen shot: edit selected modules
A simple selection of elements, but also
@type on <div>, for instance, “section”, “article”, “masthead”, “verse”, “bill”@xml:lang, such as, “ar”, “ar-Latn-x-ijmes”, “ota”, “ota-Latn-x-ijmes”, “en”, “fr” etc.Other constrains are possible–we might want to insist that a <div @type="bill"> contains only <div type="section"> and <div type="article"> and that the latter should be numbered through a @n attribute
We can express these constraints in our ODD meta-schema, and then generate a formal schema to enforce them using whichever schema language we like.

Screen shot: select and change attributes for selected elements

Screen shot: limit attributes to a list of values
Our ODD now includes something like this:
<elementSpec ident="div" mode="change" module="textstructure">
<attList>
<attDef ident="type" mode="change" usage="req">
<valList mode="replace" type="closed">
<valItem ident="section"/>
<valItem ident="article"/>
<valItem ident="verse"/>
<valItem ident="masthead"/>
<valItem ident="bill"/>
<valItem ident="letter"/>
<!-- ... -->
</valList>
</attDef>
</attList>
</elementSpec>
Note that we can also add documentation to the ODD
<valItem ident="verse">
<gloss>contains (parts of ) a poem</gloss>
</valItem>
When defining a new element, we need to consider
The TEI class system helps us answer all these questions (except the first).
@key and @ref ; all members of att.typed inherit from it @type and @subtype@type attribute, therefore, we add the element to the att.typed class, rather than define those attributes explicitly.All elements are usually members of att.global; this class provides, among others:
@xml:id: a unique identifier@xml:lang: thel anguage of the element content@n: a number or name for an element@rend: how the element in question was rendered or presented in the source text.<bibl> is allowed, add it to the model.biblLike classmodel.pLike are all things that ‘behave like’ paragraphs, and are permitted in the same places as paragraphsmodel.pPart are all things which can appear within paragraphs. This class is subdivided into
model.pPart.edit elements for simple editorial intervention such as <corr>, <del> etc.model.pPart.data ‘data-like’ elements such as <name>, <num>, <date> etc.model.pPart.msdesc extra elements for manuscript description such as <seal> or <origPlace>Simplifying wildly, one may say that the TEI recognises three kinds of element:
There are ‘base model classes’ corresponding with each of these, and also with the following groupings:
And yes, there is a class model.global for elements that can appear anywhere inside a text — at any hierarchic level.

Screen shot: defining a new element

Screen shot: defining a new element
We added a new element specification to our ODD, like this:
<elementSpec ident="something" mode="add" ns="http://www.example.org/ns/nonTEI">
<desc>contains something division like.</desc>
<classes>
<memberOf key="model.divPart"/>
<memberOf key="att.typed"/>
</classes>
<content>
<rng:ref name="someThing"/>
<rng:oneOrMore>
<rng:ref name="model.pLike"/>
</rng:oneOrMore>
</content>
</elementSpec>
Note that this new element is not in the TEI namespace. It belongs to this specific project only!
@when attribute of the element <date> contains only a date)data.word a single word or tokendata.name an XML Namedata.enumerated a single XML name taken from a documented listdata.temporal.w3c a W3C datedata.truthValue a truth value (true/false)data.language a human languagedata.sex human or animal sexAn element specification can also contain a <constraintSpec> element which contains rules about its content expressed as ISO Schematron constraints
<elementSpec ident="div" mode="change" module="teistructure" xmlns:s="http://purl.oclc.org/dsdl/schematron">
<constraintSpec ident="div" scheme="isoschematron">
<constraint>
<s:assert test="@type='bill' and .//tei:div[@type='article']">prose must include a paragraph</s:assert>
</constraint>
</constraintSpec>
</elementSpec>
However… - You can only add such rules by editing your ODD file: Roma doesn’t know about them. - Not all schema languages can implement these constraints.
Let’s try out Roma and define our own schema in order to validate our newly created TEI files