Abstract

This document describes the goals and principles of the Historical Event Markup and Linking Project . This is $Revision: 1.8 $ and it relates to version 0.7.1 of the project.


[Categories such as time and space] represent the most general relations which exist between things; surpassing all our other ideas in extension, they dominate all the details of our intellectual life. If men did not agree upon these essential ideas at every moment, if they did not have the same conception of time, space, cause, and number, all contact between their minds would be impossible, and with that all life together.

--Emile Durkheim

Like any other form of human communication, the web comprises a good deal of information about the past. Web sites of historical interest range in size and ambition from vast scholarly endeavours, such Athenians, a record of all known people who lived in ancient Athens, to small-scale personal efforts, like Genjirou Inui's My Guadalcanal. Indeed, web materials in an historical mode are produced in and for nearly every discipline of the humanities and social sciences. As diverse as they are, these sites share one important similarity: as works of history, nearly all make assertions about events that took place in the past or argue about those events. The peculiarities of layout and presentation in each historical site obscures this similarity. For example, Le Musée virtuel de la Nouvelle-France provides a time-line with links, entitled Explorations européennes en Amérique ; in contrast, the Prosopography of the Byzantine Empire collects articles on individuals, with significant dates included in parentheses.

As a result, though there exists a wealth of electronic resources which represent historical events, including course materials generated with Critical Tools or genealogical information collected with GeneWeb, these are each sui generis, and their data cannot be compared or combined. Indeed, typically their data are accessible only in the form of text, either as html data, or in bound volumes typeset from the database. This is unfortunate since data recording historical events are rich with information, such as dates and locations, which can be manipulated by computational means.

If we consider the web to be no more than a huge keyword-indexed repository, then this difference in presentation matters little. Searches on an historical person or event are, after all, sometimes remarkably effective. Yet this approach to historical documents omits much. Ideally an electronic historical document includes interfaces for posing other sorts of historical questions, most obviously, searches for evidence pertaining to a given date or range of dates and to a given spatial range. Today's web cannot effectively respond to the question, "what resources are available which describe historical events in Europe during the first century B.C."

However, if the underlying similarity of historical web materials was exposed and interchanged in the web of the future using a shared semantic markup scheme, such queries on the web itself would be possible. Text markup schemes for the humanities, chiefly the Text Encoding Initiative, have been in use for over a decade. Ideally, such schemes also provide a standard and well thought-out basis for encoding such material, thereby aiding the scholar who marks up a text with them, as well as assuring her that the results will interact well with other material so encoded and be presented in a clear and consistent manner.

The potential for doing the same with historical events has been shown by Hockey et al. (1998). Yet the scholar working with this material currently has no such standard data formats established for her, and so each struggles alone against the pitfalls of designing schemata suited to this purpose. (Townsend provides guidelines for the larger question of marking-up of historical texts as documents. The European Community's ACO*HUM Working Group on History and Historical Informatics suggests that such standards are required in their Principia, but, as far as I know, there is no effort under way to provide them.)

In the past, humanities markup was been built around the Standard Generalized Markup Language, or SGML. However, SGML has proven too complex for direct presentation and manipulation on the web. Recognizing the power of generalized markup on the web, a group of experts have developed a new standard in markup called the eXtensible Markup Language, or XML. XML is a means of defining one's own mark-up and including, for instance, tags like <era> or <revision_number>. The excitement has built around XML's usefulness in business and literary studies, where a very rich mark-up language has already been composed by the TEI. The purpose of the current project is to define XML tags that describe historical events and to show how such markup can be transformed into useful and interesting presentations on the web.

An important extension of XML, known as XML Namespaces, allows parts of one markup scheme to be employed within another. With this advance, an entire document need not conform to one single schema; instead, relatively small XML elements pertinent to one domain can be embedded into a larger markup scheme, such as XHTML, the XML-compliant successor to HTML. A well-established example of this is MathML, the XML-based mathematics markup language. Elements prefixed with the MathML namespace can appear within XHTML documents, allowing the parser and display engine to treat these elements as mathematical notation.

It is the goal of this project to define XML elements that expose and outline historical events asserted in documents across the web and to parse and display these elements in interesting and useful ways.

How could these historical XML elements be used?

Consider, for instance, twelve diaries or memoirs, each a different view of the Siege of Sarajevo in 1994 and published on the web. They would include entries about events such as the stationing of troops or the evacuation of a certain region on such-and-such a day. If a standard form of mark-up were used to tag the text that recorded these events and their date and location, a computer could collect this information and associate it with the document. It would then be possible to search for descriptions of events on a certain day, or in a certain region or both, and retrieve references to the proper section of the pertinent memoir or memoirs.

Other alternative views of the events could be produced from these marked-up documents. It is common, for instance, for historical texts to include a chronological chart of events to aid the reader. From our twelve memoirs, each marked up in the same manner, we could create an exhaustive chronological chart detailing every event the texts' editors thought worthy of inclusion. Furthermore, the entries in our chronological chart would refer to their sources. Alternatively, if the mark-up scheme included information about the events' location, the documents could be used to produce historical maps or to search for events that took place in only a certain quarter of the city.

Consider now the usefulness of such a mark-up scheme if it were applied to even a fraction of the thousands of historical documents and web sites published on the web. It would be possible for a curious student to ask what happened worldwide between the years 1500 and 1200 BCE and to receive a list of events linked to scholarly sources or historical arguments. Time-lines and maps could be generated from disparate sources worldwide. Such a scheme would afford humanity a new and exciting means of communicating about its past.

The Heml project has been guided by the following goals, principles and plans.

This project provides a basic set of text mark-up and transformations for historical information. These include:

  1. A lightweight XML Schema for historical events for use in stand-alone documents and with elements embedded in XHTML,
  2. A sample set of XSLT style sheets and Java code that transform data encoded according to (1) above into useful representations, including:
    1. Ordered lists of historical events, linked to their sources and translated, if necessary between calendrical systems
    2. Timelines generated in Scalable Vector Graphics
    3. Maps generated in Scalable Vector Graphics
  3. Means of joining and combining distributed documents that conforming (1), producing a non-centralized repository of historical event markup.

These constitute an integrated approach to the electronic representation of historical events; they make it possible to combine, compare and reuse sources for course materials, on-line resources and research project databases, providing this world with new views on its past.

The Heml Project's products are:

  1. Multi-lingual and multi-calendrical
  2. Free, specifically licensed under the LGPL
  3. Useful: markup elements are not introduced until transformation tools which exploit them are available, and will endeavor to make them accessible to any person with an understanding of HTML
  4. Cooperative: as standard and usable XML namespaces develop in related fields, these are explored and adopted or integrated. Heml is not meant to be the language for marking up historical events; it does aim to be a most information-rich interchange format for historical data.

The Heml Project is exploring how best to make use of general metadata frameworks such as RDF and TopicMaps. Our first efforts in this will probably involve our jackdaw metadata files.

We have begun making use of Dublin Core metadata in our XHTML-extension documents, and notes relating to the applicability of Dublin Core elements appear throughout this document.

Heml markup is a reasonably lightweight description of things that happened in the human past. It might be helpful to imagine it emulating the back pages of historical monographs, wherin a more simple chronological and geographical outline provided through maps, timelines and such keep the reader oriented. Such appendices make reference to the full text for complete discussion; similarly, Heml is a language to link a lightweight historical representation back to full text or text that discuss the event at length.

Accordingly in the Heml schema, an event comprises

Heml markup has some core terms for, and approaches to, the problems of internationalization and references to unique identities. An introduction to these is useful before examining the basic elements of a Heml event.

As shown in the examples, all elements have the namespace http://www.heml.org/schemas/2002-05-29/heml. The prefix heml: is customarily assigned to this namespace. With the exception of xml:lang and xlink:href, attributes are not namespaced.

Element names always begin with a capital letter, and words within their names are written run-on, each with a beginning capital letter, such as PhysicalSourceSet.

Attributes are written in miniscule; new words within their names are separated with an underscore.

The Chronology element records what can be known about the time within or at which an event took place. It does not aim to be anything like a complete ontology of time; rather it is developed in conjunction with Java xslt extension elements that make sense of its markup, sorting and comparing events. The extension elements in turn are based on IBM's developerWorks' International Calendars in the ICU4J classes and on the Java Date class. At present, the only possible primatives are the XML Schema GYear and Date types, wrapped similarly named tags. However, the previous version of Heml accepted primatives from all calendars supported by ICU4J, and that will be made possible once again quite soon.

These primatives express a point in time, called an AbsoluteDate and illustrated in Example 5, “Sample AbsoluteDate Element”, or a DateRange, which represents a span of time, as shown in Example 6, “Sample DateRange Element”.

It therefore includes not only the year, month, day, etc. of the date, but also its calendrical scheme (i.e. Gregorian or Hebrew ), era (i.e. BC or AH ). Dates deriving from diverse calendrical schemes are thus accurately sorted and translated from one scheme, likely the one in which the source material was written, to another, likely the one with which the reader is most familiar. For example:

      <!-- The Battle of al-Yarmuk -->
      <!-- RECORDED IN ISLAMIC CALENDAR -->
      <date calendar="islamicsacred" era="AH" xml:lang="en">
        <year>15</year>
      </date>
      

The University of Virginia's Temporal Modelling Project in the summer of 2001 was most helpful in thinking through this element.

Note

This concept should be distinguished from the Dublin Core Coverage element, which has different goals and admits different data. In all cases it should be possible to generate Coverage elements from DateRanges. Perhaps this should be a goal.

The heml:Participants element contains information about people or groups that participated in an event and the role or roles each played. Person are represented with the heml:Person, located in the heml:Definitions/heml:PersonDefinitions element and illustrated in Example 8, “Sample Person Element”. A corresponding PersonRef element is used in a Event/Participants/ element to indicate that the specified person participates in the event.

Persons in disparate documents will be reconciled in the database if they have the same uri.

This element collects references to a resource that the editor considers evidence for the asserted historical event. Since the same piece of evidence might appear in both physical form and in a networked resource, this element can comprise one or both of the PhysicalSourceSet and NetworkedSourceSet elements. These use the approach described in the section called “'Sets'” in order to refer to the document in various languages.

As Example 10, “An Evidence Element” shows, the PhysicalSource element uses namespaced bibliographic elements from DocBook. In truth, I have not found an XML Schema for DocBook, so I defined a small schema for the necessary db:bibliomixed elements. When the DocBook authority publishes an XML Schema, I will import it. It is likely that the namespace will change then, too.

Before the display transformations, events can be transformed to conform to interests of the viewer.[1] We refer to these as filter transformations. The obvious filtering parameters are language, chronological range, geographical range and person.

Version 0.5 of Heml includes an XSLT that filters Sets (see the section called “'Sets'”) according to their xml:lang attributes. It follows the following rules:

  1. If an element with the desired xml:lang attribute exists, that element alone among its siblings is passed to the output.
  2. Otherwise, the first of the siblings are passed to the output, with the understanding that the first of a set of label elements is the most authoritative, either because it is the first language of the editor, or the language of the text itself, or for some other reason.

The data, encoded in the form of an XML document that conforms with the DTDs, will not be viewed in its raw form, but rather will be formatted in an interesting or useful way. This goal is met using the eXtensible Stylesheet Language ( XSL) and related technologies such as Scalable Vector Graphics and the Document Object Model.

The following sub-sections discuss the views provided with Heml and future plans.

Whenever possible, the transformations performed to produce these output are written in xslt. These in turn are chained together and associated with URLs through the Cocoon2 web publishing framework. Cocoon2 also automates the generation of jpeg images from dynamically produced SVG documents and other conveniences.

Comparisons and conversions of dates are performed with an xslt extension to the xalan xslt engine. The IntDateCompare class matches impedances between IBM's International Calendars for Java and xalan.

The project includes a schema and transformations for XHTML 1.0 documents which may include heml:Event elements within blocks of text. (XHTML is HTML expressed in valid XML.) The heml:Event element here is meant as an historical signpost for the reader. This XHTML profile is suited for:

When viewed with an XHTML-savvy web broswer -- such as Mozilla or Internet Explorer, Konqueror or Opera -- the raw XHTML+HEML document appears as a normal web page. However, when it is passed through the 0.5.6 Heml webapp, a sidebar is appended to the document, with links to the maps and timelines corresponding to the events included in the document. The events themselves are represented in-line by a blue 'H'. Mousing over this brings up the date and label of the event, as well as a link to a map, if appropriate. Upon clicking on the 'H', a popup window appears listing participants, references, etc. for this event.

Existing web pages can be retrofitted with Heml events thus:

  1. Convert HTML web page to XHTML using tidy. This command converts FILE_IN.html and saves the result as FILE_OUT.xhtml:
    tidy -asxml -n -o FILE_OUT.xhtml FILE_IN.html 
  2. Add namespace declarations for heml, geo and xlink to the <html> element:
    <html xmlns="http://www.w3.org/1999/xhtml" 
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    xmlns:heml="http://www.heml.org/schemas/2003-09-17/heml" 
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
    >
    If you wish to validate your xhtml+heml document, you will also need to provide the schema location:
    <html xmlns="http://www.w3.org/1999/xhtml" 
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    xmlns:heml="http://www.heml.org/schemas/2003-09-17/heml" 
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.w3.org/1999/xhtml 
    http://heml.mta.ca/Schemas/2003-09-17/xhtml+heml.xsd"
    >
  3. Add definitions of Persons, Locations, Keywords, etc. in a heml:Definitions element, within the xhtml:head element.
     <heml:Definitions>
      <heml:LocationDefinitions>
        <heml:Location uri="#bonn">
          <heml:LocationLabelSet>
            <heml:Label xml:lang="en" >Bonn</heml:Label>
          </heml:LocationLabelSet>
          <geo:Point>
            <geo:lat>7.0666666666</geo:lat>
            <geo:long>50.733333333</geo:long>
          </geo:Point>
        </heml:Location>
        <-- Further definitions continue here ...  -->
    
    
  4. Add Heml events where text usually appears:
    <b>He was born in the German town of Bonn</b> 
    on the 16th of December 1770.
      <heml:Event xmlns:heml="http://www.heml.org/schemas/2003-09-17/heml"
         ="#beethoven_born">
        <heml:EventLabelSet>
          <heml:Label xml:lang="en">Beethoven Born</heml:Label>
        </heml:EventLabelSet>
        <heml:Chronology>
          <heml:Date>1770-12-16</heml:Date>
        </heml:Chronology>
        <heml:LocationRef uriRef="#bonn"/>
      </heml:Event>
    His grandfather Ludwig and his father Johann were
    both musicians.
    
  5. Check the document for validity and/or well-formedness.
  6. Test out your XHTML+HEML document by passing it through the Heml server. If your document is published on the web at http://mycompany.com/mydocument.xhtml, you can look at a transformed version at the URL: http://heml.mta.ca/heml-cocoon/text.html?url=http://mycompany.com/mydocument.xhtml. When using the Heml webapp as a proxy server in this way, the transformation is quite slow because the webapp does not cache any resources. For long-term use, download the webapp and install your documents on a local filesystem.

A future version of Heml will include the concepts of chronological or chronographic uncertainty, relative offsets and recurring classes of events, such as the coronation of a line of kings.