Identification
Work Package 2 of Telematics for Libraries project BIBLINK (LB 4034)
The BIBLINK Project
Title page
Table of Contents

Previous - Next

5. Existing Identification Schemes

5.1 Brief comparison of unique identification schemes.

A preliminary study of existing and potentially interesting identification schemes was done to ensure that all relevant schemes were investigated. Based on this investigation a listing of identification schemes follows and each scheme is briefly described. For each scheme a recommendation is made as to whether the scheme should be investigated further or not. The comparison of the identification schemes is found in table 1.

Explanation of the columns in table 1:

StatusIs the identification scheme a standard or not.
Used byWho is using or is intended to use the identification scheme.
AssignmentWhen is the identification scheme assigned to a document and by whom.
CoverageWhat kind of documents are covered by the identification scheme and on what level of granularity is the identifier assigned.
UsageWhat documents is the identification scheme assigned to.
RecommendationIs a further investigation of the identification scheme recommended.
Id-scheme Status Used by Assignment Coverage Usage Recommendation
ISSN standardpublishers, libraries, book trade, subscription agents during production process or after publication by national or international agency journal titleidentifier for journal titles YES
ISBN standardpublishers, libraries, book trade during the production process or after publication by national agency or by the publisher book titleidentifier for monograph titles YES
SICI ANSI/NISO standard subscription agents, libraries after publication (or after item is assigned to journal) by publisher journal issue; extensible to journal article and other fragments using the DPI electronic transactions (1) YES
PII proprietarySTI publishers at start of production process by publisher article levelinternal usage and exchange YES
DOI proposalpublishers after publication by central DOI agency (Bowker/CNRI) arbitrarycopyright management YES
URN proposalInternet publishers Dependent on encoded scheme? (2) arbitraryidentifier YES
PURL openInternet publishers after publication by publisher arbitraryidentifier for any object on the Web YES
CAE / IP number

(part of CIS)

?creators and publishers of music and literary texts ?creators and publishers of music and literary texts copyright management NO
ISMN

(part of CIS)

standardmusic industry ?edition of printed music copyright management NO
ISRC

(part of CIS)

standardmusic industry ?sound recordings copyright management NO
ISAN

(part of CIS)

standardaudio-visual producers ?audio-visual works: films, TV programmes copyright management NO
EAN/UPC article number

(part of CIS)

standardmusic industry ?carrier of recorded music cross-industry exchange / transactions NO
ISWC

(part of CIS)

proposed standardmusic industry (and others) by national agency arbitrarycopyright management

(initially intended to identify musical compositions but could be extended to other forms of digital material)

NO
SMPTE Universal Labels proprietaryfilm/TV industry by publisher (ISO delegation required) arbitraryinitially intended to identify 'type and encoding of data within general-purpose data stream' but could be extended to other forms of digital material NO

Table 1: Brief Comparison of Unique Identifier Schemes

  1. Electronic transactions = Abstracting/Indexing services, document delivery services, reserve rooms, serial check-in, serials claiming.
  2. Encoded schemes are often referred to as 'grandfathered' schemes.

5.2 Identification schemes taken into BIBLINK scope

The following schemes from section 5.1 were taken into the BIBLINK scope and recommended for further investigation: ISSN, ISBN, SICI, PII, DOI, URN and PURL.

The schemes: CAE /IP number, ISMN, ISRC, ISAN, EAN/UPC article number, ISWC and SMPTE Universal labels were not recommended for further investigation. These identifiers are primarily linked to non-textual documents, e.g. printed music and sound recordings. These document types are mostly outside the BIBLINK scope, even if some of the identifiers also could be assigned to other electronic documents.

The chosen identification schemes are well known schemes used (or initiated) by major parties in the publishing industry or the Internet community. The schemes all cover the documents in the BIBLINK scope.

5.3 Template for analysing the identification schemes.

In order to analyse the identification schemes a template was drawn up. The template ensures that the identification schemes are analysed in the same way. This also makes the comparison and evaluation of the different schemes easier. The template lists the aspects which should be analysed for each identification scheme:

  1. Name of the identification scheme
  2. Overview
  3. The syntax of the scheme with examples
  4. Representation
  5. Automation issues
  6. Uniqueness
  7. Persistence
  8. Extensibility
  9. Coverage
  10. Assignment
  11. Usage
  12. Status
  13. Other information

5.4 Analysis of the identification schemes

The following sections provide an analysis of the identification schemes according to the template.

5.4.1 ISSN (International Standard Serial Number)

1.Name of identification scheme

ISSN (International Standard Serial Number) The abbreviation ISSN stands for both the singular and the plural.

2.Overview

The ISSN is a standardised international numeric code which allows the identification of any serial publication independent of its medium. This concerns for instance titles of periodicals, newspapers, newsletters, yearbooks, annuals and series; these serials can be in printed form or on microform or on any other medium (floppy disk, CD-ROM, CD-i) or can be accessible online. The ISSN is linked to a standardised form of the title of the identified serial, known as the 'key title'.

3.Syntax

The ISSN has (as explicit representation, as appearance in print) the form of the acronym ISSN followed by two groups of four digits, separated by a hyphen. The eighth character is a check digit (on the basis of the preceding 7 digits); the control digit can be an "X".

Examples:

ISSN 0374-0536

ISSN 0244-433X

The ISSN as stored in a computer system (as implicit representation) contains only the 8 digit number.

4. Representation

The ISSN can easily be transcribed by humans for citation purposes.

5. Automation issues

The ISSN can be parsed by computers and transported by the common Internet protocols. The ISSN has a fixed number of digits and a built-in check-digit and can be validated locally by library systems.

6. Uniqueness

Every assigned ISSN is basically unique (globally). In general an assigned original ISSN will never be used again for another title. Only one ISSN is assigned to a serial title. The ISSN is unique for every specific form of the publication. Documents issued in different versions, i.e. both on paper and on the Internet, will be assigned different ISSNs.

There is a central ISSN database (ISSN Register) in which every ISSN input is checked for consistency and uniqueness by the International ISSN Centre. New blocks of unique ISSN are only distributed by the International Centre to national centres. The participants in the network, the national ISSN centres, are responsible for the correct assignment of ISSN in their own countries. The International Centre takes care of the ISSN assignment for countries without a national ISSN centre.

7. Persistence

Identification by ISSN will stay unique. The expected lifetime of the scheme is extensive.

8. Extensibility

According to the standard and syntax of ISSN there is no possibility of extension to the scheme at the current time. The ISSN has a fixed number of digits and consequently the number of available numbers is finite, but for the foreseeable future there will be enough numbers available. However, the guidelines of the scheme have been extended to permit inclusion of new media, for example electronic documents.

(Note: if serials appear in different physical formats or manifestations, different editions or, for example, in different versions a separate ISSN can be assigned without any need for extension of the syntax).

9. Coverage

ISSN can be used only for serial publications, independent of the medium. The identification is at the level of the title of a serial publication.

10. Assignment

The authority responsible for uniqueness is the ISSN International Centre (located in Paris). It is the registration institution officially designated by ISO for the ISSN. It works in collaboration with the national ISSN centres. The ISSN International Centre compiles and maintains the central ISSN database, ensuring that it is accurate, consistent and continually updated. On the national (and regional) level the national (and regional) ISSN centres are responsible. In cases where there is no existing ISSN centre in a particular country, the ISSN International Centre will take responsibility. The ISSN International Centre also takes responsibility for all ISSN assignments concerning serials of international organisations world-wide. An ISSN can be assigned at any point in the publishing process.

11. Usage

The ISSN identification scheme is used, among others, by: publishers, distributors, subscription agencies, libraries, national bibliographic agencies, documentation centres and databases, union catalogues, reproduction rights organisations (RROs), postal services, (scientific) researchers, authors and library users. The original purpose of the ISSN is to identify the title of a specific serial publication by the application of an international standard code, enabling the exchange of information about serials between computers.

Actual use is still related to the unique identification code for serial publications. This is done, for example, by finding a specific serial title in a database through a search with the 8 digits. The ISSN can also be used in citations. The use of ISSN is especially effective if titles of serials (world-wide) resemble each other very closely. In such cases it can be difficult to identify a title unless the ISSN is known. Without the ISSN, far more bibliographic detail of the specific serial publication is required. In general the records within the ISSN Register can also be used to control, complete or create specific databases. Cost of usage is in principle zero. In practice only a couple of national ISSN centres are planning to charge for (part of the) administration costs. The scale of usage is world-wide.

12. Status

The ISSN is defined by a standard, i.e. it is the object of a definition and of standardised application rules internationally adopted in the framework of ISO which groups the official standardisation institutions throughout the world. ISSN is defined by the ISO 3297 standard, which concerns the definition of a serial. ISO 3297: A publication, in any medium, issued in successive parts, usually having numerical or chronological designations and intended to be continued with no predetermined end. (This definition excludes works intended to be published in a finite number of parts.)

5.4.2 ISBN (International Standard Book Number)

1.Name of the identification scheme

ISBN (International Standard Book Number)

2. Overview

The ISBN system was developed in 1967 (ISO standard in 1970) as an international standard numbering system for books and other monographic publications. It has traditionally been used on books, but has expanded to include other «new» media such as videocassettes and electronic media.

3. Syntax

An ISBN always has ten digits decimal following the letters «ISBN». The digits are divided into four parts separated by a hyphen or a space. Example:

Group identifierIdentifies a country (82=Norway) or a language area (3=German, Switzerland (German part) and Austria). May be 1-5 digits in length, depending the number of documents issued in the country/area.
Publisher identifier The number is assigned by the national ISBN agencies and may be 2-6 digits in length. Publishers issuing many «books» have short identifiers and publishers issuing few documents have longer identifiers.
Title numberEvery title is given an unique title number by the publisher.
Check digitCalculated by using the Modulus 11 algorithm.

4. Representation

The ISBN can easily be transcribed by humans.

5. Automation issues

The ISBN can be parsed by computers and be transported by the common Internet protocols. The ISBN has a fixed number of digits and a built-in check-digit and can be validated locally by library systems.

6. Uniqueness

An ISBN is unique. The combination of a «country/area number», a «publisher identifier» and a «title number» is globally unique. According to the guidelines an assigned ISBN should never be used for another title. If a publisher has used up all available title numbers, a new publisher identifier will be assigned by the national ISBN agency. But since title numbers are assigned by individual publishers, uniqueness depends on publishers following the guidelines.The ISBN is unique for every specific form of the publication. Documents issued in different versions, for example on both paper and CD-ROM, will be assigned different ISBNs.

7. Persistence

An ISBN will remain unchanged. The scheme has no specified lifetime.

8. Extensibility

There is no possibility of extension to the scheme. The ISBN has a fixed number of digits and consequently the number of available numbers is finite, but for the foreseeable future there are expected to be enough numbers available. However, the guidelines of the scheme have been extended to permit inclusion of new media, for example electronic documents.

9. Coverage

ISBNs can be assigned to all printed publications of at least 16 pages. ISBNs can also be assigned to spoken word audiocassettes, microform publications, Braille publications, calendars, floppy disks, CD-ROMs and videocassettes. New guidelines from the International ISBN agency also include on-line publications. ISBNs should not be given to printed music, newspapers, magazines, art prints and art folders without title page of text, private firms catalogues, price-lists, directions, loose-leaf systems, theatre and exhibition programmes, colouring-books, games or sound recordings. Serial titles are assigned an ISSN. An ISBN is assigned at the title level.

10. Assignment

At the international level Internationale ISBN-Agentur, Staatsbibliothek, Berlin is responsible for the identification scheme and for assigning new group identifiers. Each country has a national ISBN agency that is responsible for assigning new publisher identifiers and for updating the Publisher's International ISBN Directory, published by Internationale ISBN-Agentur.

The national ISBN agencies also produce lists of title numbers for publishers. The actual assignment of an ISBN is done by individual publishers. So, each publisher is responsible for ensuring that the same title number is not assigned to more than one publication and for not reusing numbers.

An ISBN can be assigned to a document when the decision to publish it is made so that the ISBN can be used throughout the production process, but this up to the publisher to decide.

11. Usage

The ISBN is used by publishers and booksellers in ordering, retrieval and handling of books. Libraries use the ISBN for retrieval, ordering and for citations. The ISBN is widely used in most countries (129 countries in 1993).

The ISBN is used on nearly all printed books and to some extent on electronic off-line documents, like CD-ROMs and floppy disks and on multimedia. Traditional publishers who normally assign an ISBN to books also tend to use them when they issue electronic publications.

To date, the guidelines have not allowed for the allocation of ISBNs to on-line documents.

12. Status

The ISBN is defined by ISO 2108: International Standard Book Numbering.

5.4.3 SICI (Serial Item and Contribution Identifier)

1. Name of the identification scheme

SICI (Serial Item and Contribution Identifier)

2. Overview

The SICI is a variable length code that uniquely identifies serial items (e.g. issues) and each contribution (e.g. article) contained in a serial. The work on the standard began in the US Serials Industry Systems Advisory Committee (SISAC) in 1983 and was taken over by the National Information Standards Organisation (NISO) as the standard was published in 1991. The standard has recently been revised (1996).

3. Syntax

A SICI is divided in three segments with the following syntax:

Item segment<Contribution segment>Control segment

The Contribution segment is optional. The different parts within the segments are separated by punctuation. There is no restriction on the length of a SICI.

Example: «Needleman, Mark. "Computing Resources for an Online Catalog - 10 Years Later".

Information Technology and Libraries, 1992 Jun, v11n2:168-172» will be assigned the following SICI:

Item segment ISSNAll SICIs must have an ISSN. For serials which do not have an ISSN, there are mechanisms to request for one.
Chronology The cover date for a serial title.
Enumeration The enumeration of a specific issue of a serial title. As many levels as needed are recorded, e.g. series, volume, number. The levels are separated by a colon.
Contribution Segment Location The location of the contribution, normally page number, zero for electronic documents.
Locally assigned numbers

[Not in the example]

The contribution segment also allows for alternative local numbers, e.g. numbers used by publishers during the production process. Locally assigned numbers are separated from the title code with a colon. (CSI=3)
Titlecode The first characters in the first six words of the title and subtitle.
Control Segment CSI (Code Structure Identifier) Determines the coding level.
  • CSI = 1: Assigned to an issue of a serial (SII- Serial Item Identifier)
  • CSI = 2: Assigned to a contribution within a serial (SCI- Serial Contribution Identifier)
  • CSI = 3: An alternative numbering scheme is included. Only used during the production process. A published document will have a CSI 1 or a CSI 2.
DPI (Derivative Part Identifier) Identifies parts of the serial other than articles.
  • DPI = 0: A serial item or a contribution
  • DPI = 1: A table of contents
  • DPI = 2: An index
  • DPI = 3: An abstract
MFI (Medium/ Format Identifier) A two letter alphabetic code used to indicate the physical format
SVN Standard version number of the SICI standard used.
Check character Calculated by applying the Modulus 37 algorithm.

4. Representation

The SICI can be transcribed by humans.

5. Automation issues

The SICI can be parsed by computers and can be transported by the common Internet protocols.

The SICI has a build-in check-digit and can be validated locally. The same contribution could be defined by different SICIs so local comparison of SICIs may require algorithms that are more complex than a simple string comparison.

6. Uniqueness

A SICI is a unique identifier. Theoretically two contributions can have identical SICI values (if for instance two articles in different serials start on the same page number and have the same first six characters in the titles). The design of the algorithms should minimise the occurrence of identical numbers, tests indicate that duplicate values occur once per million contributions. The SICI as a whole would still be unique.

A SICI can be constructed on the basis of different sources, both from the serial in hand and from various forms of citations. Therefore, depending on the information available in the different sources, a contribution (article) might be given more than one SICI.

A SICI is unique for every specific form of the publication. Documents issued in different versions, i.e. both on paper and CD-ROM, will be assigned different SICIs.

7. Persistence

Since the SICI is assigned on the basis of a specific serial/article it will remain unique. The scheme has no specified lifetime.

8. Extensibility

The SICI code has no length restriction. The latest version (Z39.56-199X) is extended to include contributions other than articles, e.g. table of content, indexes etc. In principle the SICI code could be further extended if necessary.

9. Coverage

The SICI covers all serial items, including periodicals, newspapers, annual works, reports, journals, proceedings, transactions and numbered monographic series and articles in a serial. Book Industry Communication (BIC) has drafted a non-serial equivalent of the SICI, a «Book Item and Component Identifier» (BICI) . The draft is being offered to NISO for adoption and submission to ISO alongside the SICI.

The numbering scheme does not cover electronic documents which do not contain location numbers or enumeration.

10. Assignment

The SICI is a standard (Z39.56) under the responsibility of the American National Standard / National Information Standards Organization.

A SICI may be derived both from the serial in hand and from citations of the serial and may be assigned by all «users» in need of coding information about serials and contributions.

11. Usage

The SICI is intended for use by those members of the bibliographic community engaged in the functions associated with management of serials and the contributions they contain, such as ordering, accessioning, claiming, royalty collection, rights management, online retrieval, database linking, document delivery, etc.

12. Status

ANSI/NISO standard Z39.56

5.4.4 PII (Publisher Item Identifier)

1. Name of identification scheme

PII (Publisher Item Identifier)

2. Overview

The PII initiative started from the need to identify journal articles independently from their packaging unit, because they may be published in different ways (database, CD-ROM, paper, World Wide Web, etc.). Even if an article is uniquely bound to a journal title and issue number, it can only be identified by its location on the medium used (page numbers on paper, record number in database, URL on the World Wide Web). Medium independent identification of journal articles is necessary for publishing purposes. Elsevier Science developed the Publisher Item Identifier (PII) to provide unique identification of article documents.

This scheme has been adopted for all articles published from 1-1-1996 by:

The proponents of the PII wish to encourage its use by other publishers and by secondary information services. They are also monitoring other initiatives in the publishing industry, for example the Digital Object Identifier (DOI).

The PII identifies 'publication items', any unit which a publisher may wish to use or offer for sale, for example journal articles and book chapters. The PII is primarily intended for document items of interest to scientific publishers.

3.Syntax

The PII can be represented (in print) using two formats, one for serial publication items and one for book publication items. Both optionally start with the acronym PII and are then followed by a publication type (S for serial, B for book), the ISSN or ISBN (x...x), the year of assignment in case of serials (2 digits), the item number (5 digits) followed by a hyphen and one check digit (modulo 11). For serial publication items the syntax is:

PII: Sxxxx-xxxx(yy)iiiii-d

and for book publication items the syntax is:

PII: B x-xxx-xxxxx-x/iiiii-d

(the location of the hyphens may differ per book according to the structure of the ISBN)

The control digit can be an "X".

Example:

S0165-3806(96)00403-8

The PII as stored in a computer system (as implicit representation) contains only a string of 17 alphanumeric characters.

4. Representation

The PII can be transcribed by humans for citation purposes.

5. Automation issues

The PII can be parsed by computers. The PII can be transported by the common Internet protocols. The PII has a built-in check digit and can be validated locally by a library system.

6. Uniqueness

One document has one identification code. Uniqueness is readily guaranteed; a number cannot be accidentally created identically from two sources. A PII contains the identification code (ISSN or ISBN) of the publication type (serial or book) to which the publication item is primarily assigned. The use of ISSN and ISBN within the PII is solely to guarantee uniqueness, because they are the most widely accepted current international publication type identifiers. It is recognised that this may lead to confusion when the same item, identified with a PII number, appears in different publication types with different ISBN/ISSN numbers. The primary concern however is the uniqueness of the PII, not the consistency of ISSN or ISBN. If an item is re-used in another publication type, the PII should not change even though the ISSN or ISBN will change.

In general, once assigned a PII will never be used again for another publication item. However, an item published in different media will have the same PII.

7. Persistence

Identification by PII will stay unique. The expected lifetime of the scheme is extensive.

8. Extensibility

There are enough numbering possibilities available within the the ISSN/ISBN namespace for the future. The PII has a fixed number of digits and consequently the total number of available numbers is finite.

The PII may, in principle, be extended to identify: components of a publication item (abstract, figure, table etc.); versions of a component (amended artwork); manifestations of an item (SGML, PDF etc.). Such extensions are as yet undefined (the likely uses and formats are not yet clearly defined). Therefore, extensions should be confined to internal systems within an individual publishing house, determined by the publisher.

9. Coverage

The PII can be used for items within serial publications and books. The identification is on the item level (article, chapter or other components) concerning a serial or book, independent of the medium.

10. Assignment

There is no responsible authority for assignment and uniqueness. The involved organisations (see above) have adopted the PII for assignment from 1996 onwards. They encourage its use by other publishers and by secondary information services.

A PII can be assigned at any point in the publishing process but publishers need the PII mainly for identification before publication. Elsevier Science has looked into other schemes and, in their view, an identifier like the SICI is of no use because it is assigned to document items after publication and is dependent on the end product that contains the item. The revised 1996 SICI offers the possibility of including locally assigned identifiers like the PII, making both schemes compatible.

11. Usage

The PII has been in use by STM publishers (see above), from 1996 onwards. Other users are unknown at the time of writing. The PII may be used solely for identification purposes between publishers and users of PII data.

12. Status

The PII is a proprietary scheme, not an international standard. It is documented however, making the scheme accessible for third parties to use.

5.4.5 URN (Uniform Resource Name)

1. Name of the identification scheme

URN (Uniform Resource Name)

2. Overview

Uniform Resource Names (URNs) are intended to serve as persistent, globally unique resource identifiers that fit into the larger Internet information architecture composed of, additionally, Uniform Resource Characteristics (URCs) and Uniform Resource Locators (URLs). URNs are for identification, URCs for including metadata and URLs for locating resources. URNs are designed to make it easy to map other namespaces (which share the properties of URNs) into URN-space. RFC1737 gives the functional requirements for URNs and defines them as follows:

"A URN identifies a resource or unit of information. It may identify, for example, intellectual content, a particular presentation of intellectual content, or whatever a name assignment authority determines is a distinctly namable entity. A URL identifies the location or a container for an instance of a resource identified by a URN. The resource identified by a URN may reside in one or more locations at any given time, may move, or may not be available at all. Of course, not all resources will move during their lifetimes, and not all resources, although identifiable and identified by a URN will be instantiated at any given time. As such a URL is identifying a place where a resource may reside, or a container, as distinct from the resource itself identified by the URN."

The IETF URN Working Group is currently defining a framework for URNs and an initial set of components. The framework will define the mechanics for enabling global scope, persistence, and the legacy support requirements of URNs. Requirements for namespaces to support this structure will also be defined. In addition, at least one resolution registry system, and at least one namespace will be defined by the group.

3. Syntax

(This section is based on the URN Syntax Internet Draft, December 1996).

URNs have the following syntax:

<URN> ::= "urn:" <NID> ":" <NSS>

where <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. The leading case-insensitive "urn:" sequence is required. The Namespace ID is used to determine the syntactic interpretation of the Namespace Specific String. The Namespace ID is an alpha-numeric string (which may include a hyphen ('-')) and is case insensitive. The Namespace Specific String is made up of a wider set of characters known as the URN character set. Where valid identifiers in a namespace contain characters that are not in the URN character set they must be translated by encoding them as a sequence of one to six octets using UTF-8 encoding and then encoding them as '%' followed by two characters giving a hexadecimal representation of the octet.

Some examples of URNs follow:

urn:isbn:1-23485-8-29

urn:hdl:cnri.dlib/august95

urn:lifn:some.domain:anything%20goes%20here

4. Representation

RFC1737 states that it should be easy for URNs to be transcribed by humans without error. However the URN Working Group have acknowledged that ensuring the "user friendliness" of all resultant identifiers may be beyond the scope of the group.

5. Automation issues

URNs can be parsed by computers and transported by the common Internet protocols.

Validation of URNs will require reference to an external resolution service. One of functional requirements on URNs is that they can be compared simply without reference to an external resolution service. This is likely to be more complicated that a simple string comparison but should be possible to do locally.

6. Uniqueness

URNs are globally unique and are global in scope. The URN is unique for every specific form of the publication. Documents issued in different versions, will be assigned different URNs.

7. Persistence

URNs are permanent, i.e. an URN is globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name.

8. Extensibility

RFC1737 states that any scheme for URNs has to allow for future extensions to the scheme.

9. Coverage

URNs can be used to identify any discrete electronic publication at an arbitrary level of granularity. RFC1737 also states that URNs should be able to be assigned to any resource that might conceivably be available on the network, for hundreds of years.

10. Assignment

The URN framework distinguishes between naming schemes and resolution systems. A naming scheme is a procedure for creating and assigning unique URNs that conform to the URN syntax described above. A resolution system is a network accessible service that resolves URNs. The two are independent. The criteria for acceptable URN naming schemes are still to be identified by the URN Working Group at the time of writing, however they will probably have to demonstrate a verifiable management system to ensure the integrity of the naming scheme and the URNs within it. This is likely to restrict schemes to; those established by an international standards body (ISBN, ISSN, etc.), those established by an industry standards body with broad participation (SMPTE, IEEE, etc.) and those established by commercial organisations for the use of any organisation following normal business practises (DUNS, bar-code registries, etc.).

11. Usage

At the time of writing URNs are still being developed; however, it is anticipated that URNs will replace much of the current usage of URLs.

12. Status

URNs are on the Internet standards track. The IETF URN Working Group plan to publish the various components of the URN framework as RFC's during 1997.

5.4.6 DOI (Digital Object Identifier)

1.Name of the identification scheme

Digital Object Identifier (DOI)

2. Overview

The Digital Object Identifier (DOI) system is being developed by the Corporation for National Research Initiatives (CNRI) and R. R. Bowker, a division of Reed Elsevier, Inc. (Bowker) on behalf of the Association of American Publishers (AAP). The DOI system is based around a directory, which stores an object's DOI and its associated location (URL). Queries sent to the directory result in the DOI being looked up and the location returned to the client. Any user who knows the DOI of an electronic publication will be able to query the DOI system. Typically however, DOIs are likely to be embedded in Web pages, hidden behind clickable buttons.

The DOI system will be based on CNRI's Handle System and will be distributed (and replicated) across the Internet. All the computers in the distributed system will be administered by CNRI. The development of the system includes:

The DOI project started in September 1996 and will last five years.

3. Syntax

A DOI has two parts, a globally unique part called the Publisher ID and a publisher assigned part called the Item ID. For example, the DOI

10.153/34571

has a Publisher ID of "10.153" and an Item ID of "34571". Publisher IDs will be assigned by the DOI Agency. Separate publisher imprints will be identified by extending the Publisher ID - Publisher ID "10.153" might have imprints "10.153.2" and "10.153.11.4" for example. The Item ID will be assigned by publishers and will be unique to them. It can be any numbering system the publisher wishes to use but in practice is likely to be based on identifiers already in use, for example a SICI or PII.

The Publisher ID is always prefixed by a code - "10" in the above example - to indicate the DOI Agency that allocated the Publisher ID. For example, at some time in the future, DOIs may be part of an international system in which "10" indicates a DOI issued in the USA, "11" indicates an identifier issued in the European Union and so forth.

The DOI numbering syntax is consistent with Internet standards activities in that it complies with the syntax for a URN.

4. Representation

A DOI is made up of a string of printable characters and can therefore be easily transcribed.

5. Automation issues

DOIs can be parsed by computers and can be transported by the common Internet protocols. The DOI might be compared locally by string comparison, but library systems will depend on the external resolution service for DOI validation.

6. Uniqueness

The combination of DOI Agency allocated unique Publisher IDs and Item IDs that are unique to each publisher guarantees that DOIs are globally unique.

7. Persistence

DOIs are intended to be globally unique in perpetuity and are expected to continue to be valid over very long periods of time - long enough that they do not depend on current computer systems or networks. When a change of copyright ownership occurs, the DOI remains the same but a new pointer (associated address) is entered in the directory to ensure persistence.

8. Extensibility

The DOI numbering scheme is based on strings of printable characters that are not limited in length, therefore the scheme should be fairly easy to extend.

9. Coverage

Each DOI identifies a unique electronic publication. Publication may mean something as simple as a single photograph or as complex as an encyclopaedia. Typically a publisher will give a separate DOI to each item that has different rights associated with it, or be marketed separately. DOIs are designed for on-line documents but will probably also be used for off-line documents.

10. Assignment

The DOI Agency provided by Bowker/CNRI will ensure the integrity of the DOI model, provide identifiers to publishers, enter them into the directory, and provide quality control for the entire DOI system. Publishers will assign DOIs to publications. It will be up to publishers to decide at what stage in a publication's life-cycle this will happen.

11. Usage

DOIs are intended primarily to allow publishers to manage rights in digital information and to control the delivery of that information to customers. The DOI system is currently under development so there is no widespread use though the underlying CNRI Handle System is already deployed.

There will be no charge to end-users of the DOI system. There will be a charge on publishers for registering a Publisher ID with the DOI Agency. At the time of writing it is not yet clear what this charge will be but it is expected to be small enough that DOIs will be used in the non-commercial areas of the Web as well as by traditional publishers.

12. Status

The DOI system is currently under development. However, the underlying Handle System is widely available and has open interfaces. AAP is working closely with other industry identifier systems to insure cross-industry compatibility and will be introducing the DOI to ISO in the near future. At the time of writing it seems likely that the DOI system will be widely and reasonably rapidly adopted by the publishing industry.

13. Other information

The DOI system is compatible with URNs and is very similar functionally to the PURL system.

5.4.7 PURL (Persistent Uniform Resource Locator)

1. Name of the identification scheme.

PURL (Persistent Uniform Resource Locator)

2. Overview

PURLs have been developed and deployed by OCLC as a naming and resolution service for general Internet resources. Functionally, a PURL is an URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL Resolution Service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web terminology, this is a standard HyperText Transfer Protocol (HTTP) redirect.

PURLs increase the probability of correct resolution over that of URLs, thereby reducing the burden and expense of maintaining viable, long-term access to electronic resources. However, PURLs are very much a short term solution to the problems of long term naming on the Internet that are being addressed more fully by the IETF's URN working group.

3. Syntax.

PURLs are simply URLs and are composed of three parts; a 'protocol', a 'resolver address' and a 'name'. For example:

http://purl.oclc.org/OCLC/PURL/INET96

http://purl.bowker.com/isbn/1-56604-355-7

The 'resolver address' is the domain name or address of the PURL resolver and is resolved using the DNS. The 'name' is resolved by the PURL Resolver. The name space on a resolver is sub-divided into top-level domains and subdomains. Subdomains can exist within a top-level domain or within another subdomains to any level of nesting. In the first example above, 'http' is the protocol, 'purl.oclc.org' is the resolver address and 'PURL' is a subdomain of the 'OCLC' top-level domain.

4. Representation

PURLs can be easily transcribed by humans for citation purposes.

5. Automation issues

PURLs can be parsed by computers and can be transported by the common Internet protocols. The PURL might be compared locally by string comparison, but will depend on an external central resolution service for validation.

6. Uniqueness

The domain name of the PURL resolver and the fact that names are unique within resolvers guarantees that PURLs are globally unique. The PURL is unique for every specific form of the publication. Documents issued in different versions, will be assigned different PURLs.

7. Persistence

There are no restrictions on who can make a PURL resolver available so the long term availability of any PURL resolver may be unknown. However the commitment shown by an organisation in running a PURL resolver probably indicates a commitment to offering long term resolution and uniqueness. In the PURL model persistence is seen very much as an organisational rather than a technological issue.

It is anticipated that PURLs will be replaced by URNs once they become widely available.

8. Extensibility

PURLs are of arbitrary length so the scheme is easily extensible.

9. Coverage

PURLs can be used to identify any discrete electronic publication on the World Wide Web at an arbitrary level of granularity. Typically this will be at the document level though PURLs could also be used to identify images or, as with URLs, individual parts of larger documents.

10. Assignment

There is no central authority responsible for PURLs on a national or international level. OCLC maintain a PURL resolver that is open for anyone to use and the software required to run a resolver is freely available. The OCLC resolver is set up in such a way that document authors/publishers are able to allocate PURLs for their own documents.

11. Usage

PURLs are primarily used to provide a naming and resolution service with a longer lifetime than that offered by URLs. There is no cost associated with the use of PURLs. At the time of writing the OCLC PURL resolver stores approximately 12000 PURLs.

12. Status

PURLs do not form a de facto standard nor are they on any standards track.
Next Table of Contents