|
Three SGML metadata formats: TEI, EAD, and CIMI
Work Package 1 of Telematics for Libraries project BIBLINK (LB 4034) |
Title page Table of Contents |
SGML has features which make it a very suitable format in which to hold metadata, which are intended to be long-lasting and system-independent. SGML is a well-established platform- and application- independent format, enforced and verifiable by an international standard, with an expanding user base, which is well respected and supported within the data processing industry. SGML-encoded metadata is likely to remain usable across different computing environments, without loss of information.
SGML is a powerful formalism, which can be used to model anything from very simple and constrained metadata (there is an SGML application for Dublin Core, for example) to rich and complex information structures, such as those made possible by all three of the schemes studied in this report. Metadata can be embedded in the document itself, as in integral TEI headers and CIMI data, or free-standing, as in independent TEI headers and EAD headers.
SGML can be used as an interchange format amongst non-SGML and SGML-aware software systems. In an extreme case, the mapping of n different formats each to and from SGML will be more cost-effective than the n*n mappings needed to support interoperability of n different formats. Even within a single institution, SGML can be adopted as a reference format, into and out of which system-specific representations of the metadata can be automatically translated, in situations where it is not convenient or cost-effective to use the SGML format directly.
This kind of hybrid approach is likely to become less attractive as the availability of low-cost SGML software tools increases. It should also be noted that the very richness and expressive power offered by SGML may pose problems in mapping into less sophisticated formats without information loss.
Each of the three schemes studied offers the possibility of an extremely rich set of metadata, way beyond the level, say, of Dublin Core. However, it is up to implementors to make effective use of these opportunities. There are very few mandatory elements in any of the schemes studied. Also, in the absence of syntax and vocabulary control, software cannot automatically extract or process useful metadata.
There is a major difference in the degree of generality between TEI and the other two schemes. As previously noted, the TEI Headers was originally designed to make feasible the recording of the information which a cataloguer would need to generate an ISBD-conformant catalogue record, but not necessarily without manual intervention and human intelligence. It was also designed to be extended for a wide range of less predictable applications, in fields where standardization is less well entrenched.
The CIMI and EAD schemes, designed for art historical and museum, and archival applications respectively, are more tightly customized to suit the needs of their respective communities. It is interesting to note how closely the basic structure and concepts of EAD overlap with those of the TEI, although the two were apparently developed independently. It is also noteworthy that the CIMI scheme was developed very specifically as an instance of the basic TEI architecture within an specific application field.
Even so, all three schemes remain very general. They provide the implementor with considerable flexibility - indeed, with quite enough rope to hang him or herself! Simpler, more constrained, solutions would not however provide anything like such a wide potential for expansion and customization to suit particular needs.
Another aspect of this flexibility worth comment is that the schemes need not be used in isolation of each other. For example, one might use the EAD scheme to describe individual archival holdings down to the item level and then use TEI headers to describe individual documents, where these were deemed of sufficient importance to warrant the effort. Equally, one could embed CIMI topic descriptors within an otherwise purely TEI conformant document.
The Bodleian Library at Oxford is currently experimenting with the first approach in its catalogue of Western manuscripts. The EAD is used to describe the collection itself in the same way as it has been used for a variety of other special collections. Access to the individual EAD records for resource discovery purposes is provided over the World Wide Web, using specially written software to translate between the HTML required for the Web and the more general SGML used by EAD. In addition, very detailed metadata about each manuscript is stored as a TEI header, using a set of Bodley-defined extensions to the standard header. Further details with examples are available at the web site <URL:http://www.bodley.ox.ac.uk/mss/>.
Similarly, it is easy to imagine systems in which an SGML-encoded metadata scheme might effectively be used in conjunction with a non-SGML scheme.
| Next | Table of Contents |