The slides are based on those supplied by the various Digital Humanities Summer Schools at the University of Oxford under the Creative Commons Attribution license and have been adopted to the example of Arabic newspapers.
Slides were produced using MultiMarkDown, Pandoc, and the Slidy JS code of the W3C.
We will cover:
Every use of the TEI involves making use of a customisation of the TEI.
<gi>) for the element, and optionally other names in other languages<schemaSpec>) is made by selecting modules or elements and (optionally) modifying their contents| Module name | Chapter of the P5 |
|---|---|
| analysis | Simple analytical mechanisms |
| certainty | Certainty and responsibility |
| core | Elements available in ALL TEI documents |
| corpus | Language corpora |
| dictionaries | Dictionaries |
| drama | Performance texts |
| figure | Tables, formulae, and graphics |
| gaiji | Representation of non-standard characters and glyphs |
| header | the TEI header |
| iso-fs | Feature structures |
| linking | Linking, segmentation, and alignment |
| msdescription | Manuscript description |
| namesdates | Names, dates, people, and places |
| nets | Graphs, networks, and trees |
| spoken | Transcription of speech |
| tagdocs | Documentation elements |
| tei | the TEI infrastructure |
| textcrit | Critical apparatus |
| textstructure | Default text structure |
| transcr | Representation of primary sources |
| verse | verse |
Here comes Roma acommand line script,with a web frontend, designed to make this process much easier http://www.tei-c.org/Roma/

Screen shot of Roma

Screen shot of Roma

Screen shot of Roma

Screen shot of Roma
We processed a pre-existing ODD file which contained (as well as some discursive prose) the following schema specification:
<schemaSpec ident="tei_bare" start="TEI">
<moduleRef key="core"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="textstructure"/>
<elementSpec ident="abbr" mode="delete" module="core"/>
<elementSpec ident="add" mode="delete" module="core"/>
<!-- ... -->
<elementSpec ident="trailer" mode="delete" module="textstructure"/>
<elementSpec ident="title" mode="change" module="core">
<attList>
<attDef ident="level" mode="delete"/>
</attList>
</elementSpec>
<!-- ... -->
</schemaSpec>
We selected four modules, deleted loads of elements, and also deleted an attribute.

Screen shot of Roma

Screen shot of Roma
A simple selection of elements, but also
<div>, for instance, "section", "article", "masthead", "verse", "bill"Other constrains are possible--we might want to insist that a <div @type="bill"> contains only <div type="section"> and <div type="article"> and that the latter should be numbered through a @n attribute
We can express these constraints in our ODD meta-schema, and then generate a formal schema to enforce them using whichever schema language we like.

Screen shot of Roma

Screen shot of Roma
Our ODD now includes something like this:
<elementSpec ident="div" mode="change" module="textstructure">
<attList>
<attDef ident="type" mode="change" usage="req">
<valList mode="replace" type="closed">
<valItem ident="section"/>
<valItem ident="article"/>
<valItem ident="verse"/>
<valItem ident="masthead"/>
<valItem ident="bill"/>
<valItem ident="letter"/>
<!-- ... -->
</valList>
</attDef>
</attList>
</elementSpec>
Note that we can also add documentation to the ODD
<valItem ident="verse">
<gloss>contains (parts of ) a poem</gloss>
</valItem>
When defining a new element, we need to consider
The TEI class system helps us answer all these questions (except the first).
All elements are usually members of att.global; this class provides, among others:
<bibl> is allowed, add it to the model.biblLike classmodel.pLike are all things that ‘behave like’ paragraphs, and are permitted in the same places as paragraphsmodel.pPart are all things which can appear within paragraphs. This class is subdivided into
model.pPart.edit elements for simple editorial intervention such as <corr>, <del> etc.model.pPart.data ‘data-like’ elements such as <name>, <num>, <date> etc.model.pPart.msdesc extra elements for manuscript description such as <seal> or <origPlace>Simplifying wildly, one may say that the TEI recognises three kinds of element:
There are ‘base model classes’ corresponding with each of these, and also with the following groupings:
And yes, there is a class model.global for elements that can appear anywhere inside a text — at any hierarchic level.

Screen shot of Roma
We added a new element specification to our ODD, like this:
<elementSpec ident="something" mode="add" ns="http://www.example.org/ns/nonTEI">
<desc>contains something division like.</desc>
<classes>
<memberOf key="model.divPart"/>
<memberOf key="att.typed"/>
</classes>
<content>
<rng:ref name="someThing"/>
<rng:oneOrMore>
<rng:ref name="model.pLike"/>
</rng:oneOrMore>
</content>
</elementSpec>
Note that this new element is not in the TEI namespace. It belongs to this specific project only!
<date> contains only a date)data.word a single word or tokendata.name an XML Namedata.enumerated a single XML name taken from a documented listdata.temporal.w3c a W3C datedata.truthValue a truth value (true/false)data.language a human languagedata.sex human or animal sexAn element specification can also contain a <constraintSpec> element which contains rules about its content expressed as ISO Schematron constraints
<elementSpec ident="div" mode="change" module="teistructure" xmlns:s="http://purl.oclc.org/dsdl/schematron">
<constraintSpec ident="div" scheme="isoschematron">
<constraint>
<s:assert test="@type='bill' and .//tei:div[@type='article']">prose must include a paragraph</s:assert>
</constraint>
</constraintSpec>
</elementSpec>
However... - You can only add such rules by editing your ODD file: Roma doesn't know about them. - Not all schema languages can implement these constraints.