Densely Packed Decimal encoding (original) (raw)

A Summary of Densely Packed Decimal encoding

This document summarizes the decimal data encoding presented in the paper:

Densely Packed Decimal Encoding, by Mike Cowlishaw, in_IEE Proceedings – Computers and Digital Techniques_,ISSN 1350-2387, Vol. 149, No. 3, pp102–104, IEE, May 2002
Abstract: Chen-Ho encoding is a lossless compression of three Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations. An improvement to the encoding which has the same advantages but is not limited to multiples of three digits is described. The new encoding allows arbitrary-length decimal numbers to be coded efficiently while keeping decimal digit boundaries accessible. This in turn permits efficient decimal arithmetic and makes the best use of available resources such as storage or hardware registers.

Densely Packed Decimal Encoding, by Mike Cowlishaw, in_IEE Proceedings – Computers and Digital Techniques_,ISSN 1350-2387, Vol. 149, No. 3, pp102–104, IEE, May 2002

Abstract: Chen-Ho encoding is a lossless compression of three Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations. An improvement to the encoding which has the same advantages but is not limited to multiples of three digits is described. The new encoding allows arbitrary-length decimal numbers to be coded efficiently while keeping decimal digit boundaries accessible. This in turn permits efficient decimal arithmetic and makes the best use of available resources such as storage or hardware registers.

Chen-Ho encoding is described in the paper Storage-Efficient Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho,in CACM (18)1, pp.49–52, January 1975. Their encoding scheme is summarized inhttp://speleotrove.com/decimal/chen-ho.html.

Briefly, Chen-Ho encoding compresses three decimal digits into 10 bits with a 0.34% wastage, giving a 20% more efficient encoding than simple BCD (one digit in 4 bits). The primary advantage of the encoding over a pure binary representation in ten bits is that no arithmetic is needed for conversion to or from BCD. Only a very few boolean operations are needed for conversions – in hardware, encoding or decoding can be achieved with only 2–3 gate delays; in software, a simple table look-up suffices. In addition, the encoding has other advantages, for example, the least-significant bit of each digit remains unencoded, which allows bit-per-digit operations to be effected directly.

The new encoding described here uses an equivalent encoding to the Chen-Ho scheme, but instead of a Huffman code it uses a novel arrangement of bits which give further advantages:

Compression of one or two decimal digits (into the optimal four or seven bits respectively) is achieved as a subset of the 3-digit encoding. This means that arbitrary numbers of decimal digits can be encoded efficiently. For example, 38 decimal digits can be encoded in 127 bits, or 71 digits can be encoded in 237.
In contrast, Chen-Ho encoding requires that the compressed representation be a multiple of 10 bits and hence that the number of decimal digits always be a multiple of three.
The encodings for one or two decimal digits are right-aligned in the ten bits (the remaining bits being 0). This means that encoded decimal numbers can be expanded into a longer field simply by padding with zero bits; no re-encoding is necessary.
In contrast, Chen-Ho encoding requires that expanding an encoded two digits into a three-digit field (for example) would need a re-encoding instead of simple padding.
The positioning and choice of the indicator bits allow all single-digit numbers (indeed, all numbers in the range 0 through 79) to have the same right-aligned encoding as in BCD. This makes conversions of common small numbers trivial.
In contrast, Chen-Ho encoding only maps the numbers 0 through 7 unchanged.

These advantages make the new encoding a better choice than Chen-Ho encoding for both hardware and software representations of decimal numbers.

Here are some examples:

BCD	Chen-Ho	Densely Packed
005	000 000 0101	000 000 0101
009	110 000 0001	000 000 1001
055	000 010 1101	000 101 0101
099	111 000 1001	000 101 1111
555	010 110 1101	101 101 0101
999	111 111 1001	001 111 1111

Note that, necessarily, neither encoding is directly sortable. For example, if the encoded values are sorted in ascending order as though they were binary strings and then decoded to BCD, the BCD results will not be in ascending order.

Details

The new encoding, like Chen-Ho encoding, considers each of the three digits as either being small (0–7, requiring 3 bits to distinguish) or large (8 or 9, requiring one bit). The top bit of each BCD digit is 0 for small values, and 1 for the two large values.

The possible combinations of these ranges are then:

a)	All digits are small (51.2% of the possibilities).This requires 3+3+3 bits for the digits, leaving 1 bit to indicate this combination.
b)	Two digits are small (38.4%).This requires 3+3+1 bits for the digits, leaving 3 to indicate this combination.
c)	One digit is small (9.6%).This requires 3+1+1 bits for the digits, leaving 5 to indicate this combination.
d)	No digits are small (0.8%).This requires 1+1+1 bits for the digits, leaving 7 (only 5 are needed) for the indication.

The following tables fully describe the encoding (compression from BCD) and decoding (expansion to BCD); the letters a–k and m represent the 12 bits of three BCD digits, and p–y represent the 10 bits of the encoded digits.

Compression: (abcd)(efgh)(ijkm) becomes (pqr)(stu)(v)(wxy)

aei	pqr stu v wxy	Comments
000 001 010 100 110 101 011 111	bcd fgh 0 jkm bcd fgh 1 00m bcd jkh 1 01m jkd fgh 1 10m jkd 00h 1 11m fgd 01h 1 11m bcd 10h 1 11m 00d 11h 1 11m	All digits are small Right digit is large [this keeps 0-9 unchanged] Middle digit is large Left digit is large Right digit is small [L & M are large] Middle digit is small [L & R are large] Left digit is small [M & R are large] All digits are large; two bits are unused
(24 of the 1024 values are redundant and could be used for future extensions; however they are generally mapped to specific combinations of 8s and 9s as defined by the equations in the DPD paper.)

Expansion: (pqr)(stu)(v)(wxy) becomes (abcd)(efgh)(ijkm)

vwxst	abcd efgh ijkm
0.... 100.. 101.. 110.. 11100 11101 11110 11111	0pqr 0stu 0wxy 0pqr 0stu 100y 0pqr 100u 0sty 100r 0stu 0pqy 100r 100u 0pqy 100r 0pqu 100y 0pqr 100u 100y 100r 100u 100y
('.' means don't-care.)

In hardware, simple boolean operations on input bits define each output bit (for example, during compression, r=d andv=a+e+i). The operations can be derived from the tables, and also appear as boolean expressions in the sample code below.

For software, table lookup is a simple and efficient solution. Below is a Rexx program which is easy to modify for generating table constants (or it can be ported to a different language).

| /* dpdGenerate.rex -- display Densely Packed Decimal table */ /* mfc 2000.10.03; Rexx version with new equations 2007.02.01 */ do i=0 to 999 bcd=right(i, 3, 0) -- make three-digit hexadecimal string bit10=bcd2dpd(x2b(bcd)) -- compress bit12=dpd2bcd(bit10) -- expand say bcd bit10 bit12 -- display end i exit /* bcd2dpd -- Compress BCD to Densely Packed Decimal argument is a string of 12 characters, each 0 or 1, being 3 digits of 4 bits, each being a valid BCD digit (0000-1001) (for example, 923 is 100100100011) result is a string of 10 characters, each 0 or 1 (for the example, this would be 0110101101) */ bcd2dpd: procedure -- assign each bit to a variable, named as in the description parse arg a +1 b +1 c +1 d +1 e +1 f +1 g +1 h +1 i +1 j +1 k +1 m +1 -- derive the result bits, using boolean expressions only -- [the operators are: '&'=AND, '|'=OR, '\'=NOT.] p=b | (a & j) | (a & f & i) q=c | (a & k) | (a & g & i) r=d s=(f & (\a | \i)) | (\a & e & j) | (e & i) t=g | (\a & e &k) | (a & i) u=h v=a | e | i w=a | (e & i) | (\e & j) x=e | (a & i) | (\a & k) y=m -- concatenate the bits and return return p||q||r||s||t||u||v||w||x||y /* dpd2bcd -- Expand Densely Packed Decimal to BCD argument is a string of 10 characters, each 0 or 1; all 1024 possibilities are accepted (non-canonicals -> various) (for example, 0110101101) result is a string of 12 characters, each 0 or 1 (for the example, 100100100011 -- 923) */ dpd2bcd: procedure -- assign each bit to a variable, named as in the description parse arg p +1 q +1 r +1 s +1 t +1 u +1 v +1 w +1 x +1 y +1 -- derive the result bits, using boolean expressions only a= (v & w) & (\s | t | \x) b=p & (\v | \w | (s & \t & x)) c=q & (\v | \w | (s & \t & x)) d=r e=v & ((\w & x) | (\t & x) | (s & x)) f=(s & (\v | \x)) | (p & \s & t & v & w & x) g=(t & (\v | \x)) | (q & \s & t & w) h=u i=v & ((\w & \x) | (w & x & (s | t))) j=(\v & w) | (s & v & \w & x) | (p & w & (\x | (\s & \t))) k=(\v & x) | (t & \w & x) | (q & v & w & (\x | (\s & \t))) m=y -- concatenate the bits and return return a||b||c||d||e||f||g||h||i||j||k||m | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ----------------- | ------- | ------------------------------- | ----- | ------------- | ------------- | ------------ | ------------------- | - | ------- | ------- | --------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - | ----------------- | --- | --------------------------- | --- | --------------------------------------- | --------- | ---------------------- | ----- | --------------------------------------- | ----- | -------------------------------------------- | ----------- | ------------------ | ----------------- | ------------- | --------------------------- | ------------- | ----------------- | ----------------------------------------------------------------------------------------------------- | | Note: versions of this web page prior to 2007.02.01 (with the NetRexx code example) used equations for dpd2bcd which assumed the output was “don’t care” for the 24 redundant (non-canonical) codes. The equations here result in specific sequences of 8s and 9s for each redundant code, as specified in the DPD paper. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Acknowledgement
Many thanks to Mark Erle and Brian Hickmann for suggestions for improvements to this page and for their careful checking.

Summary byMike Cowlishaw(mfc@speleotrove.com), 3 October 2000, updated 20 May 2001, 26 April 2002, 6 July 2002, 8 January 2003, 16 July 2005, 13 February 2007.
Copyright © IBM Corporation and Mike Cowlishaw, 2000, 2007. All rights reserved.