|
Authentication
Work Package 6 of Telematics for Libraries project BIBLINK (LB 4034) |
Title page Table of Contents Previous - Next |
A literature study
Literature used:
Graham:
Intellectual preservation and electronic intellectual property / by Peter S. Graham. - URL:http://www.nlc-bnc.ca/ifla/documents/infopol/copyright/graham.txt
RSA:
FAQ 3.0 on Cryptography / RSA. - URL:http://www.rsa.com/rsalabs/newfaq/home.html
Zimmermann:
PGP User's Guide, Volume I: Essential Topics / by Philip Zimmermann. - URL:http://www.pegasus.esprit.ec.org/people/arne/pgpdoc1/pgpdoc1.html
Berghel:
Digital Watermarking / Hal Berghel, University of Arkansas, Lawrence O'Gorman, Bell Laboratories. - URL:http://www.acm.org/~hlb/publications/dig_wtr/dig_watr.html
Zhao:
Look, it's not there : digital watermarking is the best way to protect intellectual property from illicit copying / Jian Zhao. - Byte 97/1. - URL:http://www.byte.com/art/9701/sec18/art1.htm
Introduction on encryption
RSA, Question 1:
«Cryptography, to most people, is concerned with keeping communications private. Indeed, the protection of sensitive communications has been the emphasis of cryptography throughout much of its history. [...]Encryption is the transformation of data into some unreadable form. Its purpose is to ensure privacy by keeping the information hidden from anyone for whom it is not intended, even those who can see the encrypted data. Decryption is the reverse of encryption ; it is the transformation of encrypted data back into some intelligible form.
Encryption and decryption require the use of some secret information, usually referred to as a key. Depending on the encryption mechanism used, the same key might be used for both encryption and decryption, while for other mechanisms, the keys used for encryption and decryption might be different.
[...] Cryptography provides mechanisms for such procedures. A digital signature binds a document to the possessor of a particular key, while a digital timestamp binds a document to its creation at a particular time. These cryptographic mechanisms can be used to control access to a shared disk drive, a high security installation or to a pay-per-view TV channel.»
Graham:
«The two best-known forms of encryption are DES and RSA. DES is the Data Encryption Standard, first established about 1975 and adopted by many business and government agencies. RSA is an encryption process developed by three mathematicians from MIT (Rivest, Shamir and Adleman) at about the same time, and marketed privately. It is regarded by many as superior to the Data Encryption Standard.Encryption depends upon mathematical transformation of a document. The transformation uses an algorithm requiring a particular number as the basis of the computation. This number, or key, is also required to decode the resulting encrypted text; the key is typically many digits long, perhaps 100 or more. Modern encryption depends upon the process being so complex that decoding by chance or merely human effort is impossible. It also depends upon the great difficulty of decoding by brute force. Computational trial-and-error methods would take unreasonably long periods of time, perhaps hundreds or thousands of years even using modern supercomputers.
Therefore the key is crucial to DES encryption. It is also the problem, for passing the key to authorized persons turns out to be the Achilles heel of the process. How is the key sent to someone -- on paper in the mail? By messenger? These introduce the usual vulnerabilities dramatized in thriller literature. Do you send the key electronically? Sending it as plain text doesn't seem like a good idea, and sending it in encrypted form -- well, you see the problem. This is a recognized flaw in the widely-used DES encryption method.
The RSA encryption technique is called public key encryption. The computational algorithm depends upon a specific pair of numbers, a public key and a private key; data encoded by one number cannot be decoded using the same number but can only be decoded by the other number, and vice versa ...»
Primary application area
Graham:
«Hashing as a means of authentication is a topic of interest to the business and governmental communities [...]»
How the technique works
RSA, Question 94:
«A hash function H is a transformation that takes a variable-size input m and returns a fixed-size string, which is called the hash value h (that is, h = H(m)). Hash functions with just this property have a variety of general computational uses, but when employed in cryptography the hash functions are usually chosen to have some additional properties.The basic requirements for a cryptographic hash function are:
- the input can be of any length,
- the output has a fixed length,
- is relatively easy to compute for any given x ,
- is one-way,
- is collision-free.
A hash function H is said to be one-way if it is hard to invert, where "hard to invert" means that given a hash value h, it is computationally infeasible to find some input x such that H(x) = h.
If, given a message x, it is computationally infeasible to find a message y not equal to x such that H(x) = H(y) then H is said to be a weakly collision-free hash function.
A strongly collision-free hash function H is one for which it is computationally infeasible to find any two messages x and y such that H(x) = H(y).
[...] The hash value represents concisely the longer message or document from which it was computed; one can think of a message digest as a "digital fingerprint" of the larger document. [...]
Perhaps the main role of a cryptographic hash function is in the provision of digital signatures. Since hash functions are generally faster than digital signature algorithms, it is typical to compute the digital signature to some document by computing the signature on the document's hash value, which is small compared to the document itself. Additionally, a digest can be made public without revealing the contents of the document from which it is derived. This is important in digital timestamping where, using hash functions, one can get a document timestamped without revealing its contents to the timestamping service.»
Graham:
Hashing is «[...] means by which the uniqueness of a document may be established. Hashing depends upon the assignment of arbitrary values to each portion of the document, and thence upon the resulting computation of specific but contentless values called «hash totals» or «hashes». They are «contentless» because the specific computed hash totals have no value other than themselves. In particular, it is impossible or infeasible to compute backward from the hash to the original document. The hash may be a number of a hundred digits or so, but it is much shorter than the document it was computed from. Thus a hash has several virtues: it is much smaller than the original document; it preserves the privacy of the original document; and it uniquely describes the original document.[...] Using cryptographic techniques, it is easy for current computing technology to compute quite complex hashes for any kind of document; paradoxically, these hashes are beyond the reach of computers to phony up or break in the perceived future.
[...] each time a document or a draft is created or saved the hash is created and saved with it and is separately retrievable. If the document is electronically published, it is published with its hash; and if the document is cited, the hash is part of the citation. If a reader using the document then wishes to know if she has the unaltered form, she computes the hash easily on her own computer using the standard algorithm and compares it with the published hash. If they are the same, she has confidence she has the correct, untampered version of the document before her.»
There exist several hash function techniques to calculate the hash-code.
Usage
RSA, Question 94:
«Perhaps the main role of a cryptographic hash function is in the provision of digital signatures. Since hash functions are generally faster than digital signature algorithms, it is typical to compute the digital signature to some document by computing the signature on the document's hash value, which is small compared to the document itself. Additionally, a digest can be made public without revealing thecontents of the document from which it is derived. This is important in digital timestamping where, using hash functions, one can get a document timestamped without revealing its contents to the timestamping service.»
How the technique works
A digital signature answers the question: Who is the author of a certain document, who wrote it, approved it or consented to it. Digital signatures could be based on the principle of public-key cryptography.
RSA, Question 3:
«[...]each person gets a pair of keys, one called the public key and the other called the private key. Each person's public key is published while the private key is kept secret. [...]The only requirement is that public keys are associated with their users in a trusted (authenticated) manner (for instance, in a trusted directory). Anyone can send a confidential message by just using public information, but the message can only be decrypted with a private key, which is in the sole possession of the intended recipient.[...]When Alice wishes to send a secret message to Bob, she looks up Bob's public key in a directory, uses it to encrypt the message and sends it off. Bob then uses his private key to decrypt the message and read it. No one listening in can decrypt the message. Anyone can
send an encrypted message to Bob but only Bob can read it. Clearly, one requirement is that no one can figure out the private key from the corresponding public key.
Digital Signatures: To sign a message, Alice does a computation involving both her private key and the message itself; the output is called the digital signature and is attached to the message, which is then sent. Bob, to verify the signature, does some computation involving the message, the purported signature, and Alice's public key. If the result properly holds in a simple mathematical relation, the signature is verified as being genuine; otherwise, the signature may be fraudulent or the message might have been altered.»
RSA, Question 5:
«A digital signature is superior to a handwritten signature in that it attests to the contents of a message as well as to the identity of the signer. As long as a secure hash function is used, there is no way to take someone's signature from one document and attach it to another, or to alter a signed message in any way. The slightest change in a signed document will cause the digital signature verification process to fail. Thus, public-key authentication allows people to check the integrity of signed documents. If a signature verification fails, however, it will generally difficult to determine whether there was an attempted forgery or simply a transmission error.»Digital signatures could be refined by digital timestamping.
Usage
Graham:
For instance authentication e-mail of electronic purchase orders.
How the technique works
RSA, Question 108:
«Consider two questions that may be asked by a computer user as he or she views a digital document or on-line record. (1) Who is the author of this record - who wrote it, approved it, or consented to it? (2) When was this record created or last modified? [...]A system for answering the first question is called a digital signature scheme. [...]A system for answering the second question is called a digital timestamping scheme.»
Graham:
«[...] Time-stamping is a means of authenticating not only a document but its existence at a specific time.[...]Their technique depends on a mathematical procedure involving the entire specific contents of the document, which means they have provided a tool for determining change as well as for fixing the date of the document. A great advantage of their procedure is that it is entirely public, except (if desired) for the contents of the document itself. [...]»
RSA, Question 108:
«First, there must be a certification procedure with which (1) the author of a record can "sign" the record, or (2) any user can fix a record in time. The result of this procedure is a small certifying file, a certificate if you will, that captures the result of this procedure. Second, there must be a verification procedure by which any user can check a record and its accompanying certificate to make sure it correctly answers (1) who and what? or (2) when and what? about the record in question.
The "certificate" returned by the certification procedure of a digital signature system is usually called a signature ; it is a signature for a particular signer (specifying whom) and for a particular record (specifying what). In order to be able to "sign" documents, a user registers with the system by using special software to compute a pair of numbers called keys - a public key and a corresponding private
key. The private key should only be available to the user to whom it belongs, and is used (by the certification or "signing" procedure) in order to sign documents; it is by employing the user's private key that the signature and the record are tied to that particular user.
The public key may be available to many users of the system, and is used by the verification procedure. That is, the verification procedure takes a particular record, a particular user's public key, and a putative signature for that record and that user, and uses this information to check whether the would-be signature was correctly computed using that record and the corresponding private key.
[...]The procedure works by mathematically linking the bits of the record to a "summary number" that is widely witnessed by and widely available to members of the public - including, of course, users of the system. The computational methods employed ensure that only the record in question can be linkd, according to the "instructions" contained in its timestamp certificate, to this widely witnessed summary number; this is how the particular record is tied to a particular moment in time. The verification procedure takes a particular record and a putative timestamp certificate for that record and a particular time, and uses this information to validate whether that record was indeed certified at the time claimed by checking it against the widely available summary number for that moment.»
An example from Graham:
«Assume,[...] that Author A creates Document A and wishes to establish it as of a certain time. First he creates a hash for Document A using a standard, publicly-available program. He then sends this hash over the network to a time-stamping server. Note that he has thus preserved the privacy of his document for as long as he wishes, as it is only the hash that is sent to the server. The time-stamping server uses standard, publicly-available software to combine this hash with two other numbers: a hash from the just-previous document that it has authenticated, and a hash derived from the current time and date. The resulting number is called a certificate, and the server returns this certificate to Author A. The author now preserves this certificate, a number, and transmits it with Document A and uses it when referring to Document A (e.g. in a bibliography) in order to distinguish it from other versions of the document.
The time-stamping server has one other important function: It combines the certificate hash with others for that week into a number which, once a week, is now published in the personals column of The New York Times ("Commercial and Public Notices") [...]. The public nature of this number (what Stornetta calls an example of a "widely-witnessed event") assures that it cannot be tampered with.
[...]Now let us consider Reader C, who wishes to determine the authenticity of the electronic document before her. [...] Reader C has available the certificate for Document A. If she can validate that number from the document she can be sure she has the authenticated contents. Using the standard software, she recreates the hash for the document and sends the hash over the network, with the certificate, to the time-stamping server. The server reports back on the validity of the certificate for that document.
But let us suppose that it is the year 2093 and the server is nowhere to be found. Reader C then searches out the microfilm of The New York Times for the putative date of the document in question and determines the published hash number; using that number and the standard software she tests the authenticity of her document just as the server would.»
One could also imagine that the published hash number is stored by the National library.
Usage
Graham:
[...] it is very useful for the library community, which wishes to keep documents available rather than hide them, and which needs to do so over periods of time beyond those it can immediately control. It is also likely to be useful for segments of the publishing community which will want to provide a means for buyers to authenticate what they have purchased.
Primary application area
PGP (short for Pretty Good Privacy) is a highly secure public key encryption program originally written by Philip Zimmermann. [...] has become a de-facto standard for encryption of email on the Internet.
How the technique works
PGP performs encryption in two stages (1) a single-key encryption and (2) an encryption based on a public-key cryptosystem. The message is decrypted in the same to stages.
Zimmermann:
«Because the public key encryption algorithm is much slower than conventional single-key encryption, encryption is better accomplished by using a high-quality fast conventional single-key encryption algorithm to encipher the message. This original unenciphered message is called plaintext. In a process invisible to the user, a temporary random key, created just for this one session, is used to conventionally encipher the plaintext file. Then the recipient's public key is used to encipher this temporary random conventional key. This public-key-enciphered conventional session key is sent along with the enciphered text (called ciphertext) to the recipient. The recipient uses her own secret key to recover this temporary session key, and then uses that key to run the fast conventional single-key algorithm to decipher the large ciphertext message.»
Primary application area
Copyright protection and protection against forgery, both for images, audio, video and text.
How the technique works
Berghel:
«[...]Watermarking can be applied to text images as well. Three proposed methods are: text line coding, word space coding, and character encoding. For text line coding, the text lines of a document page are shifted imperceptibly up or down. For a 40-line text page, for instance, this yields 20 possible codewords. For word-shift coding, the spacing between words in a line of justified text is altered [...]. For character coding, a feature such as the endline at the top of a letter, "t" is imperceptibly extended. An advantage of these methods over those applied to picture images is that, by combining two or three of these to one document, two documents with different watermarks cannot be spatially registered to extract the watermark. Of course, the watermark can be defeated by retyping the text.»
Zhao:
«In contrast to a traditional watermark on an invoice, for example, a digital watermark can be detected only by appropriate software. Rather than ensuring the authenticity or integrity of documents, as a digital signature or a digital seal does, a digital watermark aims to identify the origin, author, owner, usage rights, distributor, or authorized user of an image, video clip, or audio clip, even if the image or clip has been processed and distorted (via analog-to-digital conversion, low-pass filtering, resampling, lossy compression, cropping, or rotation).[...]There are two different categories of watermarking tools available. The first is based on fingerprinted binary information (FBI), as exemplified by an eponymous product from the U.K. company HighWater FBI. The other, based on watermarking techniques developed at NEC Research and the University Catholique de Louvain (Belgium), identifies documents by hidden numbers (fingerprints). Other approaches, such as SysCoP (System for Copyright Protection), developed at the Fraunhofer Institute for Computer Graphics; Digimarc, from Digimarc (Portland, OR); and Argent, from DICE, can encode additional identification information such as the author's name or the ISBN number of a book.
Direct-sequence and frequency-hopping spread-spectrum techniques are the major watermark embedding methods used in existing tools. Both modify the noise value of the target documents. The direct-sequence technique adds noise to every element of the document, whereas the frequency-hopping method selects a pseudorandom subset of the data to be watermarked. Digimarc and FBI, for example, use direct-sequence methods to superimpose a watermark over an image by modulating a noise pattern of the same size onto the image. SysCoP, however, uses a secret key to pseudorandomly select blocks and frequencies that are modulated within the block.
Other systems use secret keys to determine which lines or words of a text will be slightly shifted vertically or horizontally. Hiding secret messages in the least-significant bits of some pseudorandom frequencies or pixels of an image, which is a common approach employed in many steganographic tools, can also be considered a simple example of frequency hopping. Because frequency hopping modifies only a subset of pixels or other elements of a document, it tends to be much faster than direct-sequence methods. It is, however, less robust and more vulnerable to attack.
A watermark must be extractable even from degraded documents that might have been photocopied, scanned, or manipulated by imaging programs. If a degraded document does not have the same format, resolution, or physical size as the original, it has to be normalized to the original format before the watermark can be extracted. Typical normalization processes include format conversion, resampling, enlarging a cropped part to full size, and scaling of the signal level.
Watermark extraction includes two main steps: selecting the locations where the watermark has been inserted (only in frequency hopping) and retrieving the watermark from those locations. The retrieval process normally needs either the original, unwatermarked data or the added noise for comparison with the watermarked document. It is also possible to extract the watermark without the original data. In this case the algorithm detects specific properties and patterns from the watermarked document. These patterns can be represented as signal shapes or the cross-correlation between certain document elements. This retrieval method is generally more efficient and enables, for example, SysCoP to retrieve watermarks in real time.»