|
Format Conversion Feasibility
Work Package 4 of Telematics for Libraries project BIBLINK (LB 4034) |
Title page Table of Contents |
Dublin (DC) Core was primarily designed to provide a simple description for networked resources. It has a specific relevance to the Web and applications of DC exist to embed metadata into the headers of HTML documents. The examples in this section show DC elements embedded in HTML but this is not supposed to imply that BIBLINK will only receive Dublin Core data in this form. Dublin Core in the form of simple ASCII text might be more appropriate for publishers to provide and any BIBLINK conversion tool would have to deal with that. Fortunately, Dublin Core was designed to be syntax independent so the precise form of syntax used will not affect the mapping tables themselves.
These mappings will attempt to indicate the level of detail required within the Dublin Core record to achieve a minimal but viable UNIMARC record
Table I: Summary Mapping from Dublin Core to UNIMARC
|
Dublin Core |
UNIMARC |
|
Title |
200 $a Title Proper |
|
Creator |
700 $a Personal Name - Primary Intellectual Responsibility, or if more than one: |
|
Subject |
610 $a Uncontrolled Subject Terms |
|
Description |
330 $a Summary or Abstract |
|
Publisher |
210 $c Name of Publisher, Distributor, etc. |
|
Contributors |
701 $a Personal Name - Alternative Intellectual Responsibility |
|
Date |
210 $d Date of Publication, Distribution, etc. |
|
Type |
608 Form, Genre or Physical Characteristics Heading |
|
Format |
336 $a Type of Computer File (provisional) |
|
Identifier |
001 (mandatory for UNIMARC) |
|
Source |
324 Original Version Note |
|
Language |
101 Language of the Item |
|
Relation |
300 General Note |
|
Coverage |
300 General Note |
|
Rights |
300 General Note |
Part of the reason for producing mapping tables between metadata formats is to discover areas where there are important problems. These problem areas can be very significant when the mapping is from a relatively simple metadata format to a more complex one. This is certainly the case with this mapping from Dublin Core to UNIMARC. MARC formats, when they are used for bibliographic data tend to be closely tied to particular cataloguing rules like AACR2. For example, the distinction between main and added entries defined in AACR2 for choosing access points becomes formalised in the distinction in USMARC between fields 100 (Main Entry -- Personal Name) and 700 (Added Entry -- Personal Name). Caplan and Guenther, in their Dublin Core-USMARC mapping, point out that DC CREATOR, which does not embody the concepts of main and added entry, cannot be easily mapped to USMARC [5]. UNIMARC similarly contains fields for Primary Intellectual Responsibility (700, 710 and 720), Alternative Intellectual Responsibility (701, 711, and 721) and Secondary Intellectual Responsibility but in practice is more flexible than USMARC. It suggests that if the given cataloguing code does not embody the concept of main entry "all persons, corporate bodies or families having equal responsibility may be coded as if they had alternative responsibility" [6].
In this section each of the Dublin Core (DC) metadata elements will be taken in turn and any difficulties noted. The definitions of the DC elements are taken from the Reference Description issued by OCLC [7].
6.3.1 Title
The name given to the resource by the CREATOR or PUBLISHER.
UNIMARC:
NOTES:
EXAMPLES:
200 1#$aDublin Core Metadata Element Set$eReference Description200 1#$aOCLC/NCSA Metadata Workshop Report
517 1#$aDublin Core Report
6.3.2 Creator
The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.
Qualifier possible: TYPE.
UNIMARC:
NOTES:
EXAMPLES:
This is an example only. In practice the corporate bodies might better be described as DC CONTRIBUTOR rather than DC CREATOR:
<META NAME="DC.title" CONTENT="OCLC/NCSA Metadata Workshop Report">
<META NAME="DC.creator.corporate" CONTENT ="Online Computer Library Center">
<META NAME="DC.creator.corporate" CONTENT="National Center for Supercomputing Applications">
<META NAME="DC.creator.personal" CONTENT="Stuart Weibel">
<META NAME="DC.creator.personal" CONTENT="Jean Godby">
<META NAME="DC.creator.personal" CONTENT="Eric Miller">
<META NAME="DC.creator.personal" CONTENT="Ron Daniel">
200 1#$aOCLC/NCSA Metadata Workshop Report
701 #0$aStuart Weibel
701 #0$aJean Godby
701 #0$aEric Miller
701 #0$aRon Daniel
711 02$aOnline Computer Library Center
711 02$aNational Center for Supercomputing Applications
6.3.3 Subject
The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as MEdical Subject Headings or Art and Architecture Thesaurus descriptors) as well.
Qualifier possible: SCHEME.
UNIMARC:
NOTES:
6.3.4 Description
A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.
UNIMARC:
EXAMPLE:
300 ##$aClassification schemes have a role in aiding information retrieval in a network environment, especially for providing browsing structures for subject-based information gateways on the Internet. Advantages of using classification schemes include improved subject browsing facilities, potential multi-lingual access and improved interoperability with other services. Classification schemes vary in scope and methodology, but can be divided into universal, national general , subject specific and home-grown schemes. What type of scheme is used, however, will depend upon the size and scope of the service being designed.
6.3.5 Publisher
The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.
UNIMARC:
NOTES:
EXAMPLE:
210 ##$cOnline Computer Library Center
6.3.6 Contributor
Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, illustrators, and convenors).
Qualifier possible: TYPE.
UNIMARC:
NOTES:
6.3.7 Date
The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X3.30-1985 or ISO 8601-1988. In this scheme, the date element for the day this is written would be 19961203, or December 3, 1996. Many other schema are possible, but if used, they should be identified in an unambiguous manner.
Qualifier possible: TYPE
UNIMARC:
NOTES:
6.3.8 Type
The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. A preliminary set of such types can be found at the following: <URL:http://www.roads.lut.ac.uk/Metadata/DC-ObjectTypes.html>
UNIMARC:
NOTES:
6.3.9 Format
The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principle, formats can include physical media such as books, serials, or other non-electronic media.
UNIMARC:
NOTES:
6.3.10 Identifier
String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.
Qualifier possible: SCHEME.
UNIMARC:
NOTES:
6.3.11 Source
The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.
UNIMARC:
NOTES:
6.3.12 Language
Language of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z39.53 three character codes for written languages.
Qualifier possible: SCHEME.
UNIMARC:
NOTES:
6.3.13 Relation
Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.
Possible qualifiers: SCHEME, TYPE.
UNIMARC:
NOTES:
6.3.14 Coverage
The spatial locations and temporal durations characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.
Possible qualifier: TYPE.UNIMARC:
6.3.15 Rights
The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.
Qualifiers possible: URL, URN.
UNIMARC:
NOTES:
6.3.16 An example
The Dublin Core record is for a WWW page and is encoded for use in the header of an HTML file:
<HTML>
<HEAD><TITLE>UKOLN metadata</TITLE>
<META NAME="DC.title" CONTENT="UKOLN metadata">
<META NAME="DC.creator.personal" CONTENT="Andy Powell">
<META NAME="DC.creator.email" CONTENT="A.Powell@ukoln.ac.uk">
<META NAME="DC.creator.personal" CONTENT="Michael Day">
<META NAME="DC.creator.emain" CONTENT="M.Day@ukoln.ac.uk">
<META NAME="DC.subject.DDC" CONTENT="025.05">
<META NAME="DC.subject.LCSH" CONTENT="Library information networks">
<META NAME="DC.description" CONTENT="A web page that provides an introduction to metadata and describes the work of UKOLN: the UK Office for Library and Information Networking in the area of resource discovery">
<META NAME="DC.publisher" CONTENT="UKOLN The UK Office for Library and Information Networking">
<META NAME="DC.contributors.corporate" CONTENT="UKOLN Metadata Group">
<META NAME="DC.date" CONTENT="19970626">
<META NAME="DC.type" CONTENT="text/html">
<META NAME="DC.identifier" CONTENT="TYPE=URL: http://www.ukoln.ac.uk/metadata/intro.html">
<META NAME="DC.language" CONTENT="EN">
</HEAD>
<BODY> ... </BODY></HTML>
Using the mappings defined above a UNIMARC records (of sorts) can be produced:
001 http://www.ukoln.ac.uk/metadata/intro.html
101 1#$aeng
200 1#$aUKOLN metadata
210 ##$cUKOLN The UK Office for Library and Information Networking$d1997
300 ##$aIdentifier: URL:http://www.ukoln.ac.uk/metadata/intro.html
330 ##$aA web page that provides an introduction to metadata and
describes the work of UKOLN the UK Office for Library and Information
Networking in the area of resource discovery
606 ##$aLibrary information networks$lcsh
676 ##$a025.05
701 #0$a Andy Powell
701 #0$aMichael Day
711 $aUKOLN Metadata Group
This record was created using the following mapping:
001 [DC.identifier]The only real difficulty in this mapping is that both 701 and 711 fields have to assume that the data in the DC.creator or DC.contributors elements are written in natural order. The fact that the Dublin Core record also contains TYPE qualifiers for the creator and contributors field also means that the mapping is probably more consistent than it would be with less well defined DC records.
The other major problem with this mapping is that the UNIMARC record is missing the mandatory 24 digit Record Label and the 35 digit 100 General Processing Data field.
6.3.17 Final Comments
6.3.17.1 The Validity of the UNIMARC record
A Dublin Core record might be able, given the right circumstances, to produce a reasonably comprehensive descriptive UNIMARC record. However the record produced may not be a valid UNIMARC record because it is missing some of the mandatory fields:
Constructing the Record Label and the General Processing Data will be the biggest problems. The Record Label is always 24 characters long and contains general information which may be needed in processing the record. It is based on ISO 2709. The General Processing Data (100) field contains 35 characters of fixed-length data including the date entered on file, the publication date, character sets and the language of cataloguing.
6.3.17.2 Automated mapping vs. human cataloguing
With UNIMARC validity in mind, it might be possible to produce better UNIMARC records from Dublin Core format records if the conversion was mediated in some way by human beings. This could improve the quality of the UNIMARC records produced but with significant financial and temporal costs.
6.3.17.3 Extensibility of Dublin Core
Dublin Core elements are extensible and it is possible that publisher DC records might include a variety of organisation specific elements. This information would be lost during the transmission process unless specific ways of dealing with them are produced. The Warwick Framework, or some other software framework which holds together different types of metadata might be a partial solution to this problem.
6.3.17.4 Evolution of UNIMARC
UNIMARC, like other MARC formats, is an evolving format. In the future it might become more useful for holding data originally held in DC or other similar formats. The proposed addition of an equivalent of USMARC 856 is one instance of this. Periodic re-mappings might be necessary in an operational environment.
[5.] Caplan, P. and Guenther, R. Metadata for Internet resources: the Dublin Core Metadata Elements Set and its mapping to USMARC. Cataloguing and Classification Quarterly, 22 (3/4), 1996, 48.
[6.] UNIMARC manual, 7-- Intellectual Responsibility Block.
[7.]Dublin Core Metadata Element Set: Reference Description. <URL:http://purl.org/metadata/dublin_core_elements>
Next