[ Contents ]
6. Radix-64 Conversions
As stated in the introduction, OpenPGP's underlying native
representation for objects is a stream of arbitrary octets, and some
systems desire these objects to be immune to damage caused by
character set translation, data conversions, etc.
In principle, any printable encoding scheme that met the requirements
of the unsafe channel would suffice, since it would not change the
underlying binary bit streams of the native OpenPGP data structures.
The OpenPGP standard specifies one such printable encoding scheme to
ensure interoperability.
OpenPGP's Radix-64 encoding is composed of two parts: a base64
encoding of the binary data, and a checksum. The base64 encoding is
identical to the MIME base64 content-transfer-encoding [RFC2231,
Section 6.8]. An OpenPGP implementation MAY use ASCII Armor to
protect the raw binary data.
The checksum is a 24-bit CRC converted to four characters of radix-64
encoding by the same MIME base64 transformation, preceded by an
equals sign (=). The CRC is computed by using the generator 0x864CFB
and an initialization of 0xB704CE. The accumulation is done on the
data before it is converted to radix-64, rather than on the converted
data. A sample implementation of this algorithm is in the next
section.
The checksum with its leading equal sign MAY appear on the first line
after the Base64 encoded data.
Rationale for CRC-24: The size of 24 bits fits evenly into printable
base64. The nonzero initialization can detect more errors than a
zero initialization.
6.1. An Implementation of the CRC-24 in "C"
#define CRC24_INIT 0xb704ceL
#define CRC24_POLY 0x1864cfbL
typedef long crc24;
crc24 crc_octets(unsigned char *octets, size_t len)
{
crc24 crc = CRC24_INIT;
int i;
while (len--) {
crc ^= (*octets++) << 16;
for (i = 0; i < 8; i++) {
crc <<= 1;
if (crc & 0x1000000)
crc ^= CRC24_POLY;
}
}
return crc & 0xffffffL;
}
6.2. Forming ASCII Armor
When OpenPGP encodes data into ASCII Armor, it puts specific headers
around the data, so OpenPGP can reconstruct the data later. OpenPGP
informs the user what kind of data is encoded in the ASCII armor
through the use of the headers.
Concatenating the following data creates ASCII Armor:
- An Armor Header Line, appropriate for the type of data
- Armor Headers
- A blank (zero-length, or containing only whitespace) line
- The ASCII-Armored data
- An Armor Checksum
- The Armor Tail, which depends on the Armor Header Line.
An Armor Header Line consists of the appropriate header line text
surrounded by five (5) dashes ('-', 0x2D) on either side of the
header line text. The header line text is chosen based upon the type
of data that is being encoded in Armor, and how it is being encoded.
Header line texts include the following strings:
BEGIN PGP MESSAGE
Used for signed, encrypted, or compressed files.
BEGIN PGP PUBLIC KEY BLOCK
Used for armoring public keys
BEGIN PGP PRIVATE KEY BLOCK
Used for armoring private keys
BEGIN PGP MESSAGE, PART X/Y
Used for multi-part messages, where the armor is split amongst Y
parts, and this is the Xth part out of Y.
BEGIN PGP MESSAGE, PART X
Used for multi-part messages, where this is the Xth part of an
unspecified number of parts. Requires the MESSAGE-ID Armor Header
to be used.
BEGIN PGP SIGNATURE
Used for detached signatures, OpenPGP/MIME signatures, and
natures following clearsigned messages. Note that PGP 2.x s BEGIN
PGP MESSAGE for detached signatures.
The Armor Headers are pairs of strings that can give the user or the
receiving OpenPGP implementation some information about how to decode
or use the message. The Armor Headers are a part of the armor, not a
part of the message, and hence are not protected by any signatures
applied to the message.
The format of an Armor Header is that of a key-value pair. A colon
(':' 0x38) and a single space (0x20) separate the key and value.
OpenPGP should consider improperly formatted Armor Headers to be
corruption of the ASCII Armor. Unknown keys should be reported to
the user, but OpenPGP should continue to process the message.
Currently defined Armor Header Keys are:
- "Version", that states the OpenPGP Version used to encode the
message.
- "Comment", a user-defined comment.
- "MessageID", a 32-character string of printable characters. The
string must be the same for all parts of a multi-part message
that uses the "PART X" Armor Header. MessageID strings should be
unique enough that the recipient of the mail can associate all
the parts of a message with each other. A good checksum or
cryptographic hash function is sufficient.
- "Hash", a comma-separated list of hash algorithms used in this
message. This is used only in clear-signed messages.
- "Charset", a description of the character set that the plaintext
is in. Please note that OpenPGP defines text to be in UTF-8 by
default. An implementation will get best results by translating
into and out of UTF-8. However, there are many instances where
this is easier said than done. Also, there are communities of
users who have no need for UTF-8 because they are all happy with
a character set like ISO Latin-5 or a Japanese character set. In
such instances, an implementation MAY override the UTF-8 default
by using this header key. An implementation MAY implement this
key and any translations it cares to; an implementation MAY
ignore it and assume all text is UTF-8.
The MessageID SHOULD NOT appear unless it is in a multi-part
message. If it appears at all, it MUST be computed from the
finished (encrypted, signed, etc.) message in a deterministic
fashion, rather than contain a purely random value. This is to
allow the legitimate recipient to determine that the MessageID
cannot serve as a covert means of leaking cryptographic key
information.
The Armor Tail Line is composed in the same manner as the Armor
Header Line, except the string "BEGIN" is replaced by the string
"END."
6.3. Encoding Binary in Radix-64
The encoding process represents 24-bit groups of input bits as output
strings of 4 encoded characters. Proceeding from left to right, a
24-bit input group is formed by concatenating three 8-bit input
groups. These 24 bits are then treated as four concatenated 6-bit
groups, each of which is translated into a single digit in the
Radix-64 alphabet. When encoding a bit stream with the Radix-64
encoding, the bit stream must be presumed to be ordered with the
most-significant-bit first. That is, the first bit in the stream will
be the high-order bit in the first 8-bit octet, and the eighth bit
will be the low-order bit in the first 8-bit octet, and so on.
+--first octet--+-second octet--+--third octet--+
|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
+-----------+---+-------+-------+---+-----------+
|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
+--1.index--+--2.index--+--3.index--+--4.index--+
Each 6-bit group is used as an index into an array of 64 printable
characters from the table below. The character referenced by the
index is placed in the output string.
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
The encoded output stream must be represented in lines of no more
than 76 characters each.
Special processing is performed if fewer than 24 bits are available
at the end of the data being encoded. There are three possibilities:
1. The last data group has 24 bits (3 octets). No special
processing is needed.
2. The last data group has 16 bits (2 octets). The first two 6-bit
groups are processed as above. The third (incomplete) data group
has two zero-value bits added to it, and is processed as above.
A pad character (=) is added to the output.
3. The last data group has 8 bits (1 octet). The first 6-bit group
is processed as above. The second (incomplete) data group has
four zero-value bits added to it, and is processed as above. Two
pad characters (=) are added to the output.
6.4. Decoding Radix-64
Any characters outside of the base64 alphabet are ignored in Radix-64
data. Decoding software must ignore all line breaks or other
characters not found in the table above.
In Radix-64 data, characters other than those in the table, line
breaks, and other white space probably indicate a transmission error,
about which a warning message or even a message rejection might be
appropriate under some circumstances.
Because it is used only for padding at the end of the data, the
occurrence of any "=" characters may be taken as evidence that the
end of the data has been reached (without truncation in transit). No
such assurance is possible, however, when the number of octets
transmitted was a multiple of three and no "=" characters are
present.
6.5. Examples of Radix-64
Input data: 0x14fb9c03d97e
Hex: 1 4 f b 9 c | 0 3 d 9 7 e
8-bit: 00010100 11111011 10011100 | 00000011 11011001
11111110
6-bit: 000101 001111 101110 011100 | 000000 111101 100111
111110
Decimal: 5 15 46 28 0 61 37 62
Output: F P u c A 9 l +
Input data: 0x14fb9c03d9
Hex: 1 4 f b 9 c | 0 3 d 9
8-bit: 00010100 11111011 10011100 | 00000011 11011001
pad with 00
6-bit: 000101 001111 101110 011100 | 000000 111101 100100
Decimal: 5 15 46 28 0 61 36
pad with =
Output: F P u c A 9 k =
Input data: 0x14fb9c03
Hex: 1 4 f b 9 c | 0 3
8-bit: 00010100 11111011 10011100 | 00000011
pad with 0000
6-bit: 000101 001111 101110 011100 | 000000 110000
Decimal: 5 15 46 28 0 48
pad with = =
Output: F P u c A w = =
6.6. Example of an ASCII Armored Message
-----BEGIN PGP MESSAGE-----
Version: OpenPrivacy 0.99
yDgBO22WxBHv7O8X7O/jygAEzol56iUKiXmV+XmpCtmpqQUKiQrFqclFqUDBovzS
vBSFjNSiVHsuAA==
=njUN
-----END PGP MESSAGE-----
Note that this example is indented by two spaces.
Updated: 1999-05-23 wkoch