|
Metadata Formats
Work Package 1 of Telematics for Libraries project BIBLINK (LB 4034) |
Title page Table of Contents |
The format of metadata used in BIBLINK depends on the requirements we are trying to fulfil. There seem to be two strands that can be followed and both have their merits:
In addition, as part of the consensus building process there are other outstanding questions that need to be answered. These are summarised as follows:
Recommendation
Project participants need to agree a clearer definition of
It seems the choice of metadata standard in this context cannot be divorced from the commercial environment. If publishers are to provide detailed information to national libraries then they may well wish to retain rights in the final record which appears in the national bibliography. Certainly some publishers would be unwilling to include descriptions and TOCs to be included without some commercial arrangement.
Recommendation
We need to make explicit assumptions about the business model for the provision and exploitation of metadata.
If we intend to use a standard that exploits the 'single record' produced by the publisher then the choice must be SGML syntax. This is the syntax with which publishers are familiar and it will allow publishers to re-use the header information they are already creating. The choice of SGML also means that web based documents will be compliant in so far as HTML is compliant to SGML.
From experience we see that publishers cannot agree on a single DTD for article headers, so it would seem they are even less likely to agree on a DTD for all non-serial electronic publications.
In order to encompass the diverse range of publishers and material involved in the process, the use of a minimum set of data is attractive. If we are looking towards use of a core set then the Dublin Core element set is an obvious choice. There is international involvement in the consensus building, and project participants could influence the development of the format. It is a format that small publishers and web publishers could use without incurring significant overhead.
More detailed information could be provided using the data element set defined by the BIC SGML DTD for non-serial publications now under development, and the SSSH DTD for article headers. It is planned to have a minimum set defined in the non-serial DTD and a minimum set is already defined in SSSH. It would seem useful if these minimum sets could encompass at least the Dublin Core elements. If this was the case then the more complex record could be mapped to Dublin Core, either to create a separate Dublin Core record for transmission or as a means to allow interoperability during transactions involving the record. Other services supplying detailed records (e.g. archives using TEI headers) could also map more complex information to Dublin Core.
Recommendation
It seems one metadata format may not be sufficient for the diverse body of publishers described in the scoping document. It would be more realistic to consider two formats to allow for the creation of a brief record and a more complex record.
It might be possible for more detailed records made available from the publisher (when such records exist) to be supplied as a separate physical record in addition to Dublin Core set. It would seem an attractive opportunity to implement the Warwick Framework architecture to enable the simple Dublin Core record to be packaged together with the more detailed SGML header. The Warwick Framework might also offer possibilities for defining terms and conditions which could be applied to use of different metadata (e.g. to charge for use of Table of Contents).
Recommendation
Consider use of Dublin Core as a minimum element set. Consider use of BIC non-serial DTD and SSSH for more complex records. Consider implementation of Warwick Framework to package more complex SGML records with Dublin Core records.
The provision of bibliographic services in the context of electronic publications is an unstable environment. The nature of the resources themselves are changing, the metadata formats are not mature, the commercial and service model is uncertain. As regards use of metadata it seems clear there is no one solution. New players in the 'information' world outside the traditional library world (e.g. browser manufacturers, commercial Internet search services) will influence future development of formats, just as technology affected development of MARC. In this environment it seems mapping and interoperability between formats will remain an issue for the foreseeable future.
In the context of BIBLINK we must recognise that none of the formats we might use are fully agreed. This includes UKMARC and UNIMARC both of which are being updated to deal with electronic publications. UKMARC is in additional flux due to the convergence program. In this situation any solution must take into account the need for changes to particular fields and data elements and ensure these can be accommodated.
Recommendation
We accept that all metadata formats in this area are unstable. We need to define what level of maturity and stability are required in our format(s) of choice. At that stage we may wish to influence the development of the format(s).
| Next | Table of Contents |