|
Identification
Work Package 2 of Telematics for Libraries project BIBLINK (LB 4034) |
Title page Table of Contents |
A preliminary study of existing and potentially interesting
identification schemes was done to ensure that all relevant schemes
were investigated. Based on this investigation a listing of identification
schemes follows and each scheme is briefly described. For each
scheme a recommendation is made as to whether the scheme should
be investigated further or not. The comparison of the identification
schemes is found in table 1.
Explanation of the columns in table 1:
| Status | Is the identification scheme a standard or not. |
| Used by | Who is using or is intended to use the identification scheme. |
| Assignment | When is the identification scheme assigned to a document and by whom. |
| Coverage | What kind of documents are covered by the identification scheme and on what level of granularity is the identifier assigned. |
| Usage | What documents is the identification scheme assigned to. |
| Recommendation | Is a further investigation of the identification scheme recommended. |
| Id-scheme | Status | Used by | Assignment | Coverage | Usage | Recommendation |
| ISSN | standard | publishers, libraries, book trade, subscription agents | during production process or after publication by national or international agency | journal title | identifier for journal titles | YES |
| ISBN | standard | publishers, libraries, book trade | during the production process or after publication by national agency or by the publisher | book title | identifier for monograph titles | YES |
| SICI | ANSI/NISO standard | subscription agents, libraries | after publication (or after item is assigned to journal) by publisher | journal issue; extensible to journal article and other fragments using the DPI | electronic transactions (1) | YES |
| PII | proprietary | STI publishers | at start of production process by publisher | article level | internal usage and exchange | YES |
| DOI | proposal | publishers | after publication by central DOI agency (Bowker/CNRI) | arbitrary | copyright management | YES |
| URN | proposal | Internet publishers | Dependent on encoded scheme? (2) | arbitrary | identifier | YES |
| PURL | open | Internet publishers | after publication by publisher | arbitrary | identifier for any object on the Web | YES |
| CAE / IP number
(part of CIS) | ? | creators and publishers of music and literary texts | ? | creators and publishers of music and literary texts | copyright management | NO |
| ISMN
(part of CIS) | standard | music industry | ? | edition of printed music | copyright management | NO |
| ISRC
(part of CIS) | standard | music industry | ? | sound recordings | copyright management | NO |
| ISAN
(part of CIS) | standard | audio-visual producers | ? | audio-visual works: films, TV programmes | copyright management | NO |
| EAN/UPC article number
(part of CIS) | standard | music industry | ? | carrier of recorded music | cross-industry exchange / transactions | NO |
| ISWC
(part of CIS) | proposed standard | music industry (and others) | by national agency | arbitrary | copyright management
(initially intended to identify musical compositions but could be extended to other forms of digital material) | NO |
| SMPTE Universal Labels | proprietary | film/TV industry | by publisher (ISO delegation required) | arbitrary | initially intended to identify 'type and encoding of data within general-purpose data stream' but could be extended to other forms of digital material | NO |
Table 1: Brief Comparison of Unique Identifier Schemes
The following schemes from section 5.1 were taken
into the BIBLINK scope and recommended for further investigation:
ISSN, ISBN, SICI, PII, DOI, URN and PURL.
The schemes: CAE /IP number, ISMN, ISRC, ISAN, EAN/UPC
article number, ISWC and SMPTE Universal labels were not recommended
for further investigation. These identifiers are primarily linked
to non-textual documents, e.g. printed music and sound recordings.
These document types are mostly outside the BIBLINK scope, even
if some of the identifiers also could be assigned to other electronic
documents.
The chosen identification schemes are well known
schemes used (or initiated) by major parties in the publishing
industry or the Internet community. The schemes all cover the
documents in the BIBLINK scope.
In order to analyse the identification schemes a
template was drawn up. The template ensures that the identification
schemes are analysed in the same way. This also makes the comparison
and evaluation of the different schemes easier. The template lists
the aspects which should be analysed for each identification scheme:
The following sections provide an analysis of the
identification schemes according to the template.
ISSN (International Standard Serial Number)
The abbreviation ISSN stands for both the singular and the
plural.
2.Overview
The ISSN is a standardised international numeric
code which allows the identification of any serial publication
independent of its medium. This concerns for instance titles of
periodicals, newspapers, newsletters, yearbooks, annuals and series;
these serials can be in printed form or on microform or on any
other medium (floppy disk, CD-ROM, CD-i) or can be accessible
online. The ISSN is linked to a standardised form of the title
of the identified serial, known as the 'key title'.
3.Syntax
The ISSN has (as explicit representation, as appearance
in print) the form of the acronym ISSN followed by two groups
of four digits, separated by a hyphen. The eighth character is
a check digit (on the basis of the preceding 7 digits); the control
digit can be an "X".
Examples:
ISSN 0374-0536
ISSN 0244-433X
The ISSN as stored in a computer system (as implicit
representation) contains only the 8 digit number.
4. Representation
The ISSN can easily be transcribed by humans for
citation purposes.
5. Automation issues
The ISSN can be parsed by computers and transported
by the common Internet protocols. The ISSN has a fixed number
of digits and a built-in check-digit and can be validated locally
by library systems.
6. Uniqueness
Every assigned ISSN is basically unique (globally).
In general an assigned original ISSN will never be used again
for another title. Only one ISSN is assigned to a serial title.
The ISSN is unique for every specific form of the publication.
Documents issued in different versions, i.e. both on paper and
on the Internet, will be assigned different ISSNs.
There is a central ISSN database (ISSN Register)
in which every ISSN input is checked for consistency and uniqueness
by the International ISSN Centre. New blocks of unique ISSN are
only distributed by the International Centre to national centres.
The participants in the network, the national ISSN centres, are
responsible for the correct assignment of ISSN in their own countries.
The International Centre takes care of the ISSN assignment for
countries without a national ISSN centre.
7. Persistence
Identification by ISSN will stay unique. The expected
lifetime of the scheme is extensive.
8. Extensibility
According to the standard and syntax of ISSN there
is no possibility of extension to the scheme at the current time.
The ISSN has a fixed number of digits and consequently the number
of available numbers is finite, but for the foreseeable future
there will be enough numbers available. However, the guidelines
of the scheme have been extended to permit inclusion of new media,
for example electronic documents.
(Note: if serials appear in different physical formats
or manifestations, different editions or, for example, in different
versions a separate ISSN can be assigned without any need for
extension of the syntax).
9. Coverage
ISSN can be used only for serial publications, independent
of the medium. The identification is at the level of the title
of a serial publication.
10. Assignment
The authority responsible for uniqueness is the ISSN
International Centre (located in Paris). It is the registration
institution officially designated by ISO for the ISSN. It works
in collaboration with the national ISSN centres. The ISSN International
Centre compiles and maintains the central ISSN database, ensuring
that it is accurate, consistent and continually updated. On the
national (and regional) level the national (and regional) ISSN
centres are responsible. In cases where there is no existing ISSN
centre in a particular country, the ISSN International Centre
will take responsibility. The ISSN International Centre also takes
responsibility for all ISSN assignments concerning serials of
international organisations world-wide. An ISSN can be assigned
at any point in the publishing process.
11. Usage
The ISSN identification scheme is used, among others,
by: publishers, distributors, subscription agencies, libraries,
national bibliographic agencies, documentation centres and databases,
union catalogues, reproduction rights organisations (RROs), postal
services, (scientific) researchers, authors and library users.
The original purpose of the ISSN is to identify the title of a
specific serial publication by the application of an international
standard code, enabling the exchange of information about serials
between computers.
Actual use is still related to the unique identification
code for serial publications. This is done, for example, by finding
a specific serial title in a database through a search with the
8 digits. The ISSN can also be used in citations. The use of
ISSN is especially effective if titles of serials (world-wide)
resemble each other very closely. In such cases it can be difficult
to identify a title unless the ISSN is known. Without the ISSN,
far more bibliographic detail of the specific serial publication
is required. In general the records within the ISSN Register can
also be used to control, complete or create specific databases.
Cost of usage is in principle zero. In practice only a couple
of national ISSN centres are planning to charge for (part of the)
administration costs. The scale of usage is world-wide.
12. Status
The ISSN is defined by a standard, i.e. it is the object of a definition and of standardised application rules internationally adopted in the framework of ISO which groups the official standardisation institutions throughout the world. ISSN is defined by the ISO 3297 standard, which concerns the definition of a serial. ISO 3297: A publication, in any medium, issued in successive parts, usually having numerical or chronological designations and intended to be continued with no predetermined end. (This definition excludes works intended to be published in a finite number of parts.)
ISBN (International Standard Book Number)
2. Overview
The ISBN system was developed in 1967 (ISO standard
in 1970) as an international standard numbering system for books
and other monographic publications. It has traditionally been
used on books, but has expanded to include other «new»
media such as videocassettes and electronic media.
3. Syntax
An ISBN always has ten digits decimal following the letters «ISBN». The digits are divided into four parts separated by a hyphen or a space. Example:
| Group identifier | Identifies a country (82=Norway) or a language area (3=German, Switzerland (German part) and Austria). May be 1-5 digits in length, depending the number of documents issued in the country/area. |
| Publisher identifier | The number is assigned by the national ISBN agencies and may be 2-6 digits in length. Publishers issuing many «books» have short identifiers and publishers issuing few documents have longer identifiers. |
| Title number | Every title is given an unique title number by the publisher. |
| Check digit | Calculated by using the Modulus 11 algorithm. |
4. Representation
The ISBN can easily be transcribed by humans.
5. Automation issues
The ISBN can be parsed by computers and be transported
by the common Internet protocols. The ISBN has a fixed number
of digits and a built-in check-digit and can be validated locally
by library systems.
6. Uniqueness
An ISBN is unique. The combination of a «country/area
number», a «publisher identifier» and a «title
number» is globally unique. According to the guidelines an
assigned ISBN should never be used for another title. If a publisher
has used up all available title numbers, a new publisher identifier
will be assigned by the national ISBN agency. But since title
numbers are assigned by individual publishers, uniqueness depends
on publishers following the guidelines.The ISBN is unique for
every specific form of the publication. Documents issued in different
versions, for example on both paper and CD-ROM, will be assigned
different ISBNs.
7. Persistence
An ISBN will remain unchanged. The scheme has no
specified lifetime.
8. Extensibility
There is no possibility of extension to the scheme.
The ISBN has a fixed number of digits and consequently the number
of available numbers is finite, but for the foreseeable future
there are expected to be enough numbers available. However, the
guidelines of the scheme have been extended to permit inclusion
of new media, for example electronic documents.
9. Coverage
ISBNs can be assigned to all printed publications
of at least 16 pages. ISBNs can also be assigned to spoken word
audiocassettes, microform publications, Braille publications,
calendars, floppy disks, CD-ROMs and videocassettes. New guidelines
from the International ISBN agency also include on-line publications.
ISBNs should not be given to printed music, newspapers, magazines,
art prints and art folders without title page of text, private
firms catalogues, price-lists, directions, loose-leaf systems,
theatre and exhibition programmes, colouring-books, games or sound
recordings. Serial titles are assigned an ISSN. An ISBN is assigned
at the title level.
10. Assignment
At the international level Internationale ISBN-Agentur,
Staatsbibliothek, Berlin is responsible for the identification
scheme and for assigning new group identifiers. Each country has
a national ISBN agency that is responsible for assigning new publisher
identifiers and for updating the Publisher's International
ISBN Directory, published by Internationale ISBN-Agentur.
The national ISBN agencies also produce lists of
title numbers for publishers. The actual assignment of an ISBN
is done by individual publishers. So, each publisher is responsible
for ensuring that the same title number is not assigned to more
than one publication and for not reusing numbers.
An ISBN can be assigned to a document when
the decision to publish it is made so that the ISBN can be used
throughout the production process, but this up to the publisher
to decide.
11. Usage
The ISBN is used by publishers and booksellers in
ordering, retrieval and handling of books. Libraries use the ISBN
for retrieval, ordering and for citations. The ISBN is widely
used in most countries (129 countries in 1993).
The ISBN is used on nearly all printed books and
to some extent on electronic off-line documents, like CD-ROMs
and floppy disks and on multimedia. Traditional publishers who
normally assign an ISBN to books also tend to use them when they
issue electronic publications.
To date, the guidelines have not allowed for the
allocation of ISBNs to on-line documents.
12. Status
The ISBN is defined by ISO 2108: International Standard Book Numbering.
1. Name of the identification scheme
SICI (Serial Item and Contribution Identifier)
2. Overview
The SICI is a variable length code that uniquely
identifies serial items (e.g. issues) and each contribution (e.g.
article) contained in a serial. The work on the standard began
in the US Serials Industry Systems Advisory Committee (SISAC)
in 1983 and was taken over by the National Information Standards
Organisation (NISO) as the standard was published in 1991. The
standard has recently been revised (1996).
3. Syntax
A SICI is divided in three segments with the following
syntax:
Item segment<Contribution segment>Control
segment
The Contribution segment is optional. The different
parts within the segments are separated by punctuation. There
is no restriction on the length of a SICI.
Example: «Needleman, Mark. "Computing Resources for an Online Catalog - 10 Years Later".
Information Technology and Libraries, 1992 Jun, v11n2:168-172» will be assigned the following SICI:
| Item segment | ISSN | All SICIs must have an ISSN. For serials which do not have an ISSN, there are mechanisms to request for one. |
| Chronology | The cover date for a serial title. | |
| Enumeration | The enumeration of a specific issue of a serial title. As many levels as needed are recorded, e.g. series, volume, number. The levels are separated by a colon. |
| Contribution Segment | Location | The location of the contribution, normally page number, zero for electronic documents. |
| Locally assigned numbers
[Not in the example] | The contribution segment also allows for alternative local numbers, e.g. numbers used by publishers during the production process. Locally assigned numbers are separated from the title code with a colon. (CSI=3) | |
| Titlecode | The first characters in the first six words of the title and subtitle. |
| Control Segment | CSI (Code Structure Identifier) | Determines the coding level.
|
| DPI (Derivative Part Identifier) | Identifies parts of the serial other than articles.
| |
| MFI (Medium/ Format Identifier) | A two letter alphabetic code used to indicate the physical format | |
| SVN | Standard version number of the SICI standard used. | |
| Check character | Calculated by applying the Modulus 37 algorithm. |
4. Representation
The SICI can be transcribed by humans.
5. Automation issues
The SICI can be parsed by computers and can be transported
by the common Internet protocols.
The SICI has a build-in check-digit and can be validated
locally. The same contribution could be defined by different SICIs
so local comparison of SICIs may require algorithms that are more
complex than a simple string comparison.
6. Uniqueness
A SICI is a unique identifier. Theoretically two
contributions can have identical SICI values (if for instance
two articles in different serials start on the same page number
and have the same first six characters in the titles). The design
of the algorithms should minimise the occurrence of identical
numbers, tests indicate that duplicate values occur once per million
contributions. The SICI as a whole would still be unique.
A SICI can be constructed on the basis of different
sources, both from the serial in hand and from various forms of
citations. Therefore, depending on the information available in
the different sources, a contribution (article)
might be given more than one SICI.
A SICI is unique for every specific form of the publication.
Documents issued in different versions, i.e. both on paper and
CD-ROM, will be assigned different SICIs.
7. Persistence
Since the SICI is assigned on the basis of a specific
serial/article it will remain unique. The scheme has no specified
lifetime.
8. Extensibility
The SICI code has no length restriction. The latest
version (Z39.56-199X) is extended to include contributions other
than articles, e.g. table of content, indexes etc. In principle
the SICI code could be further extended if necessary.
9. Coverage
The SICI covers all serial items, including periodicals,
newspapers, annual works, reports, journals, proceedings, transactions
and numbered monographic series and articles in a serial. Book
Industry Communication (BIC) has drafted a non-serial
equivalent of the SICI, a «Book Item and Component Identifier»
(BICI) . The draft is being offered to NISO for adoption and submission
to ISO alongside the SICI.
The numbering scheme does not cover electronic documents
which do not contain location numbers or enumeration.
10. Assignment
The SICI is a standard (Z39.56) under the responsibility
of the American National Standard / National Information Standards
Organization.
A SICI may be derived both from the serial in hand
and from citations of the serial and may be assigned by all «users»
in need of coding information about serials and contributions.
11. Usage
The SICI is intended for use by those members of the bibliographic community engaged in the functions associated with management of serials and the contributions they contain, such as ordering, accessioning, claiming, royalty collection, rights management, online retrieval, database linking, document delivery, etc.
12. Status
ANSI/NISO standard Z39.56
1. Name of identification scheme
PII (Publisher Item Identifier)
2. Overview
The PII initiative started from the need to identify
journal articles independently from their packaging unit, because
they may be published in different ways (database, CD-ROM, paper,
World Wide Web, etc.). Even if an article is uniquely bound to
a journal title and issue number, it can only be identified by
its location on the medium used (page numbers on paper, record
number in database, URL on the World Wide Web). Medium independent
identification of journal articles is necessary for publishing
purposes. Elsevier Science developed the Publisher Item Identifier
(PII) to provide unique identification of article documents.
This scheme has been adopted for all articles published from 1-1-1996 by:
The proponents of the PII wish to encourage its use
by other publishers and by secondary information services. They
are also monitoring other initiatives in the publishing industry,
for example the Digital Object Identifier (DOI).
The PII identifies 'publication items', any unit
which a publisher may wish to use or offer for sale, for example
journal articles and book chapters. The PII is primarily intended
for document items of interest to scientific publishers.
3.Syntax
The PII can be represented (in print) using two formats,
one for serial publication items and one for book publication
items. Both optionally start with the acronym PII and are then
followed by a publication type (S for serial, B for book), the
ISSN or ISBN (x...x), the year of assignment in case of serials
(2 digits), the item number (5 digits) followed by a hyphen and
one check digit (modulo 11). For serial publication items the
syntax is:
PII: Sxxxx-xxxx(yy)iiiii-d
and for book publication items the syntax is:
PII: B x-xxx-xxxxx-x/iiiii-d
(the location of the hyphens may differ per book according to the structure of the ISBN)
The control digit can be an "X".
Example:
S0165-3806(96)00403-8
The PII as stored in a computer system (as implicit
representation) contains only a string of 17 alphanumeric characters.
4. Representation
The PII can be transcribed by humans for citation
purposes.
5. Automation issues
The PII can be parsed by computers. The PII can
be transported by the common Internet protocols. The PII has a
built-in check digit and can be validated locally by a library
system.
6. Uniqueness
One document has one identification code. Uniqueness
is readily guaranteed; a number cannot be accidentally created
identically from two sources. A PII contains the identification
code (ISSN or ISBN) of the publication type (serial or book) to
which the publication item is primarily assigned. The use of
ISSN and ISBN within the PII is solely to guarantee uniqueness,
because they are the most widely accepted current international
publication type identifiers. It is recognised that this may lead
to confusion when the same item, identified with a PII number,
appears in different publication types with different ISBN/ISSN
numbers. The primary concern however is the uniqueness of the
PII, not the consistency of ISSN or ISBN. If an item is re-used
in another publication type, the PII should not change even though
the ISSN or ISBN will change.
In general, once assigned a PII will never be used
again for another publication item. However, an item published
in different media will have the same PII.
7. Persistence
Identification by PII will stay unique. The expected
lifetime of the scheme is extensive.
8. Extensibility
There are enough numbering possibilities available
within the the ISSN/ISBN namespace for the future. The PII has
a fixed number of digits and consequently the total number of
available numbers is finite.
The PII may, in principle, be extended to identify:
components of a publication item (abstract, figure, table etc.);
versions of a component (amended artwork); manifestations of an
item (SGML, PDF etc.). Such extensions are as yet undefined (the
likely uses and formats are not yet clearly defined). Therefore,
extensions should be confined to internal systems within an individual
publishing house, determined by the publisher.
9. Coverage
The PII can be used for items within serial publications
and books. The identification is on the item level (article, chapter
or other components) concerning a serial or book, independent
of the medium.
10. Assignment
There is no responsible authority for assignment
and uniqueness. The involved organisations (see above) have adopted
the PII for assignment from 1996 onwards. They encourage its use
by other publishers and by secondary information services.
A PII can be assigned at any point in the publishing
process but publishers need the PII mainly for identification
before publication. Elsevier Science has looked into other
schemes and, in their view, an identifier like the SICI is of
no use because it is assigned to document items after publication
and is dependent on the end product that contains the item. The
revised 1996 SICI offers the possibility of including locally
assigned identifiers like the PII, making both schemes compatible.
11. Usage
The PII has been in use by STM publishers (see above),
from 1996 onwards. Other users are unknown at the time of writing.
The PII may be used solely for identification purposes between
publishers and users of PII data.
12. Status
The PII is a proprietary scheme, not an international
standard. It is documented however, making the scheme accessible
for third parties to use.
1. Name of the identification scheme
URN (Uniform Resource Name)
2. Overview
Uniform Resource Names (URNs) are intended to serve
as persistent, globally unique resource identifiers that fit into
the larger Internet information architecture composed of, additionally,
Uniform Resource Characteristics (URCs) and Uniform Resource Locators
(URLs). URNs are for identification, URCs for including metadata
and URLs for locating resources. URNs are designed to make it
easy to map other namespaces (which share the properties of URNs)
into URN-space. RFC1737 gives the functional requirements for
URNs and defines them as follows:
"A URN identifies a resource or unit of information.
It may identify, for example, intellectual content, a particular
presentation of intellectual content, or whatever a name assignment
authority determines is a distinctly namable entity. A URL identifies
the location or a container for an instance of a resource identified
by a URN. The resource identified by a URN may reside in one or
more locations at any given time, may move, or may not be available
at all. Of course, not all resources will move during their lifetimes,
and not all resources, although identifiable and identified by
a URN will be instantiated at any given time. As such a URL is
identifying a place where a resource may reside, or a container,
as distinct from the resource itself identified by the URN."
The IETF URN Working Group is currently defining
a framework for URNs and an initial set of components. The framework
will define the mechanics for enabling global scope, persistence,
and the legacy support requirements of URNs. Requirements for
namespaces to support this structure will also be defined. In
addition, at least one resolution registry system, and at least
one namespace will be defined by the group.
3. Syntax
(This section is based on the URN Syntax Internet
Draft, December 1996).
URNs have the following syntax:
<URN> ::= "urn:" <NID>
":" <NSS>
where <NID> is the Namespace Identifier, and
<NSS> is the Namespace Specific String. The leading case-insensitive
"urn:" sequence is required. The Namespace ID is used
to determine the syntactic interpretation of the Namespace Specific
String. The Namespace ID is an alpha-numeric string (which may
include a hyphen ('-')) and is case insensitive. The Namespace
Specific String is made up of a wider set of characters known
as the URN character set. Where valid identifiers in a namespace
contain characters that are not in the URN character set they
must be translated by encoding them as a sequence of one to six
octets using UTF-8 encoding and then encoding them as '%' followed
by two characters giving a hexadecimal representation of the octet.
Some examples of URNs follow:
urn:isbn:1-23485-8-29
urn:hdl:cnri.dlib/august95
urn:lifn:some.domain:anything%20goes%20here
4. Representation
RFC1737 states that it should be easy for URNs to
be transcribed by humans without error. However the URN Working
Group have acknowledged that ensuring the "user friendliness"
of all resultant identifiers may be beyond the scope of the group.
5. Automation issues
URNs can be parsed by computers and transported by
the common Internet protocols.
Validation of URNs will require reference to an external
resolution service. One of functional requirements on URNs is
that they can be compared simply without reference to an external
resolution service. This is likely to be more complicated that
a simple string comparison but should be possible to do locally.
6. Uniqueness
URNs are globally unique and are global in scope.
The URN is unique for every specific form of the publication.
Documents issued in different versions, will be assigned different
URNs.
7. Persistence
URNs are permanent, i.e. an URN is globally unique
forever, and may well be used as a reference to a resource well
beyond the lifetime of the resource it identifies or of any naming
authority involved in the assignment of its name.
8. Extensibility
RFC1737 states that any scheme for URNs has to allow
for future extensions to the scheme.
9. Coverage
URNs can be used to identify any discrete electronic
publication at an arbitrary level of granularity. RFC1737 also
states that URNs should be able to be assigned to any resource
that might conceivably be available on the network, for hundreds
of years.
10. Assignment
The URN framework distinguishes between naming schemes
and resolution systems. A naming scheme is a procedure for creating
and assigning unique URNs that conform to the URN syntax described
above. A resolution system is a network accessible service that
resolves URNs. The two are independent. The criteria for acceptable
URN naming schemes are still to be identified by the URN Working
Group at the time of writing, however they will probably have
to demonstrate a verifiable management system to ensure the integrity
of the naming scheme and the URNs within it. This is likely to
restrict schemes to; those established by an international standards
body (ISBN, ISSN, etc.), those established by an industry standards
body with broad participation (SMPTE, IEEE, etc.) and those established
by commercial organisations for the use of any organisation following
normal business practises (DUNS, bar-code registries, etc.).
11. Usage
At the time of writing URNs are still being developed;
however, it is anticipated that URNs will replace much of the
current usage of URLs.
12. Status
URNs are on the Internet standards track. The IETF URN Working Group plan to publish the various components of the URN framework as RFC's during 1997.
Digital Object Identifier (DOI)
2. Overview
The Digital Object Identifier (DOI) system is being
developed by the Corporation for National Research Initiatives
(CNRI) and R. R. Bowker, a division of Reed Elsevier, Inc. (Bowker)
on behalf of the Association of American Publishers (AAP). The
DOI system is based around a directory, which stores an object's
DOI and its associated location (URL). Queries sent to the directory
result in the DOI being looked up and the location returned to
the client. Any user who knows the DOI of an electronic publication
will be able to query the DOI system. Typically however, DOIs
are likely to be embedded in Web pages, hidden behind clickable
buttons.
The DOI system will be based on CNRI's Handle System and will be distributed (and replicated) across the Internet. All the computers in the distributed system will be administered by CNRI. The development of the system includes:
The DOI project started in September 1996 and will
last five years.
3. Syntax
A DOI has two parts, a globally unique part called
the Publisher ID and a publisher assigned part called the Item
ID. For example, the DOI
10.153/34571
has a Publisher ID of "10.153" and an Item
ID of "34571". Publisher IDs will be assigned by the
DOI Agency. Separate publisher imprints will be identified by
extending the Publisher ID - Publisher ID "10.153" might
have imprints "10.153.2" and "10.153.11.4"
for example. The Item ID will be assigned by publishers and will
be unique to them. It can be any numbering system the publisher
wishes to use but in practice is likely to be based on identifiers
already in use, for example a SICI or PII.
The Publisher ID is always prefixed by a code - "10"
in the above example - to indicate the DOI Agency that allocated
the Publisher ID. For example, at some time in the future, DOIs
may be part of an international system in which "10"
indicates a DOI issued in the USA, "11" indicates an
identifier issued in the European Union and so forth.
The DOI numbering syntax is consistent with Internet
standards activities in that it complies with the syntax for a
URN.
4. Representation
A DOI is made up of a string of printable characters
and can therefore be easily transcribed.
5. Automation issues
DOIs can be parsed by computers and can be transported
by the common Internet protocols. The DOI might be compared locally
by string comparison, but library systems will depend on the external
resolution service for DOI validation.
6. Uniqueness
The combination of DOI Agency allocated unique Publisher
IDs and Item IDs that are unique to each publisher guarantees
that DOIs are globally unique.
7. Persistence
DOIs are intended to be globally unique in perpetuity
and are expected to continue to be valid over very long periods
of time - long enough that they do not depend on current computer
systems or networks. When a change of copyright ownership occurs,
the DOI remains the same but a new pointer (associated address)
is entered in the directory to ensure persistence.
8. Extensibility
The DOI numbering scheme is based on strings of printable
characters that are not limited in length, therefore the scheme
should be fairly easy to extend.
9. Coverage
Each DOI identifies a unique electronic publication.
Publication may mean something as simple as a single photograph
or as complex as an encyclopaedia. Typically a publisher will
give a separate DOI to each item that has different rights associated
with it, or be marketed separately. DOIs are designed for on-line
documents but will probably also be used for off-line documents.
10. Assignment
The DOI Agency provided by Bowker/CNRI will ensure
the integrity of the DOI model, provide identifiers to publishers,
enter them into the directory, and provide quality control for
the entire DOI system. Publishers will assign DOIs to publications.
It will be up to publishers to decide at what stage in a publication's
life-cycle this will happen.
11. Usage
DOIs are intended primarily to allow publishers to
manage rights in digital information and to control the delivery
of that information to customers. The DOI system is currently
under development so there is no widespread use though the underlying
CNRI Handle System is already deployed.
There will be no charge to end-users of the DOI system.
There will be a charge on publishers for registering a Publisher
ID with the DOI Agency. At the time of writing it is not yet
clear what this charge will be but it is expected to be small
enough that DOIs will be used in the non-commercial areas of the
Web as well as by traditional publishers.
12. Status
The DOI system is currently under development. However,
the underlying Handle System is widely available and has open
interfaces. AAP is working closely with other industry identifier
systems to insure cross-industry compatibility and will be introducing
the DOI to ISO in the near future. At the time of writing it seems
likely that the DOI system will be widely and reasonably rapidly
adopted by the publishing industry.
13. Other information
The DOI system is compatible with URNs and is very similar functionally to the PURL system.
1. Name of the identification scheme.
PURL (Persistent Uniform Resource Locator)
2. Overview
PURLs have been developed and deployed by OCLC as
a naming and resolution service for general Internet resources.
Functionally, a PURL is an URL. However, instead of pointing directly
to the location of an Internet resource, a PURL points to an intermediate
resolution service. The PURL Resolution Service associates the
PURL with the actual URL and returns that URL to the client. The
client can then complete the URL transaction in the normal fashion.
In Web terminology, this is a standard HyperText Transfer Protocol
(HTTP) redirect.
PURLs increase the probability of correct resolution
over that of URLs, thereby reducing the burden and expense of
maintaining viable, long-term access to electronic resources.
However, PURLs are very much a short term solution to the problems
of long term naming on the Internet that are being addressed more
fully by the IETF's URN working group.
3. Syntax.
PURLs are simply URLs and are composed of three parts;
a 'protocol', a 'resolver address' and a 'name'. For example:
http://purl.oclc.org/OCLC/PURL/INET96
http://purl.bowker.com/isbn/1-56604-355-7
The 'resolver address' is the domain name or address
of the PURL resolver and is resolved using the DNS. The 'name'
is resolved by the PURL Resolver. The name space on a resolver
is sub-divided into top-level domains and subdomains. Subdomains
can exist within a top-level domain or within another subdomains
to any level of nesting. In the first example above, 'http' is
the protocol, 'purl.oclc.org' is the resolver address and 'PURL'
is a subdomain of the 'OCLC' top-level domain.
4. Representation
PURLs can be easily transcribed by humans for citation
purposes.
5. Automation issues
PURLs can be parsed by computers and can be transported by the common Internet protocols. The PURL might be compared locally by string comparison, but will depend on an external central resolution service for validation.
6. Uniqueness
The domain name of the PURL resolver and the fact
that names are unique within resolvers guarantees that PURLs are
globally unique. The PURL is unique for every specific form of
the publication. Documents issued in different versions, will
be assigned different PURLs.
7. Persistence
There are no restrictions on who can make a PURL
resolver available so the long term availability of any PURL resolver
may be unknown. However the commitment shown by an organisation
in running a PURL resolver probably indicates a commitment to
offering long term resolution and uniqueness. In the PURL model
persistence is seen very much as an organisational rather than
a technological issue.
It is anticipated that PURLs will be replaced by
URNs once they become widely available.
8. Extensibility
PURLs are of arbitrary length so the scheme is easily
extensible.
9. Coverage
PURLs can be used to identify any discrete electronic
publication on the World Wide Web at an arbitrary level of granularity.
Typically this will be at the document level though PURLs could
also be used to identify images or, as with URLs, individual parts
of larger documents.
10. Assignment
There is no central authority responsible for PURLs
on a national or international level. OCLC maintain a PURL resolver
that is open for anyone to use and the software required to run
a resolver is freely available. The OCLC resolver is set up in
such a way that document authors/publishers are able to allocate
PURLs for their own documents.
11. Usage
PURLs are primarily used to provide a naming and
resolution service with a longer lifetime than that offered by
URLs. There is no cost associated with the use of PURLs. At the
time of writing the OCLC PURL resolver stores approximately 12000
PURLs.
12. Status
PURLs do not form a de facto standard nor are they
on any standards track.
| Next | Table of Contents |