|
Metadata Formats
Work Package 1 of Telematics for Libraries project BIBLINK (LB 4034) |
Title page Table of Contents |
It is clear that the various communities involved in creating and using different metadata formats are strongly attached to their own formats. This is understandable if one keeps in mind such factors as the effort involved in reaching consensus on formats, the skill levels required to apply formats in a consistent way, and not least the heavy investment in existing systems. For these reasons alone it is unlikely that any one format will become dominant. It would also seem undesirable as the existence of a variety of formats allows for choice of the optimum format for use in a particular context.
The provision and exchange of trade and bibliographic data relating to publications has been part of the book world for a considerable time, and has increased with the availability of data in electronic form. Records are created at various stages and places in the process of supplying documents to the reader, they are created to serve different requirements but have overlapping functions. The Newbury seminar held in 1987 (Bibliographic records in the book world: needs and capabilities) considered whether there could be improvement in the 'information flow' whereby the most appropriate record content and format could be sustained throughout the process to meet the various users' requirements (publishers, suppliers, libraries).
|
One important strand to emerge was the idea of an evolving bibliographic record where through more organised articulation of current record supply, or less likely, through the development of an all through single record system, current requirements might be more efficiently met.
|
In the book world there is still only the beginnings of a more organised evolution of the bibliographic record. The issue of 'a single record system' remains outstanding; at present there is limited integration of records used in the publishing/supplier world and the national bibliography. For electronic publications we need to consider whether it is an ambition to more fully integrate record supply, how far various needs and requirements can be met by the single record system, or whether we accept a more evolutionary system with different record formats available to various users to meet their different requirements.
A variety of formats have been placed in this table, positioned along a continuum from simple records (Band One) to complex, rich records (Band Four). The variety of record types identified in the bibliographic control process can be placed on this continuum as shown below.
| Band One | Band Two | Band Three | Band Four |
| Proprietary simple records: | Dublin Core | MARC | ICPSR |
| NetFirst | IAFA | TEI independent headers | FGDC |
| AltaVista | RFC 1807 | CIMI | |
| Infoseek | SOIF | EAD | |
| Publishers' CIP forms | CIP MARC | EDI messages | |
| SGML article headers |
It is possible to extend this model to associate other factors with the position of the format on the continuum. The simplest record formats are used to create relatively unstructured indexes for locating items whereas the most complex records can be used as the basis of sophisticated analysis and navigational tools. Records can be associated with more or less 'rich' retrieval and analysis processes (Z39.50, emerging query routing, text analysis). The bands of records typically have common characteristics in other aspects, for example:
This pattern of association is summarised in the following table:
| Simple | Rich | ||
| Location | Selection | Evaluation | Analysis |
| Robot generated | Robot plus manual input | Manually input | High level of manual input |
| Unstructured | Attribute value pairs | Subfields, qualifiers | Highly structured mark up |
| http with CGI form interface | directory service protocols (whois++) with query routing (Common Indexing Protocol) | Z39.50 | Z39.50 (in future with collection navigation) |
| proprietary | emerging standards | generic standards used in information world | standards used in specialist subject domains |
Within the context of BIBLINK we need to consider which Band of record is appropriate for further consideration. There may well be a different answer depending whether the requirement is for a CIP type record or a more detailed record.
Project partners need to consider these issues and refine the scoping of the project to allow criteria to be drawn up for decision making.
While accepting the need for consensus from partners on these issues, we will assume that previous practice and discussion of these issues will inform future decisions. So for example we assume that there will be constraints on cost and staff available from national libraries and publishers; that we are attempting to identify standard formats that are controlled by authoritative agencies; and that the level of service provided by the national bibliographic agencies will be comparable to that provided for print material (whether this is at a CIP level or at a level consistent with the full record in the national bibliography).
Given these assumptions, the formats from Band One would be rejected as proprietary solutions. The formats in Band Four would also be rejected as too detailed for the service level required, too specialised in nature for general use, and too costly for system maintenance particularly in terms of specialist staff required with skill levels to manipulate and interpret such records.
Predicated on these assumptions as to scope, it is recommended that BIBLINK should concentrate on formats in Bands Two and Three for the exchange format. This does not preclude the possibility that conversion will be required from formats outside these Bands e.g. from more complex formats in Band Four into a simpler format, but that the formats for data exchange would be located in Bands Two or Three.
One example of this is given by Bearman who proposes a reference model for business acceptable communication. (Bearman, David and Sochats, Ken. Metadata Requirements for evidence. Available at <URL: http://www.lis.pitt.edu/~nhprc/model.htm>). This defines clusters of data elements which would be required to fulfil the range of functions of a record. The functions of records are identified as the provision access and use rights management, networked information discovery and retrieval, registration of intellectual property, authenticity. The clusters of data elements are defined in six layers:
Within the UK, the BIC/BNBRF Book Product Information project has had as its main aim the identification of the content required for an EDI message to communicate product information through the book sector supply chain for non-serial items. This work has now developed into compiling exhaustive sets of data elements which might be used in this context.
| Line item: | the entity described in the Book Product message, in effect a tradable item |
| Work: | body of literary or intellectual content |
| Piece: | single indivisible physical published item. |
| Work: | distinct intellectual or artistic creation |
| Expression: | realisation of the work in text, sound, music, image etc |
| Manifestation: | physical embodiment of an expression in book, sound recording etc |
| Item: | single example of a manifestation |
Recommendation
As part of the consensus building process, publishers and national libraries identify the objects and relationships which need to be represented in metadata describing electronic resources.
| Next | Table of Contents |