next up previous
Next: Implications of Finite Word Up: The Programming Environment Previous: The Interpreter and Compiler

Data Representation - Data Types

While the basic memory unit in a digital computer is the bit, holding a value of 1 or 0, or equivalently ON or OFF, most information of interest to the user requires more than a simple bi-valued representation. Thus it is practical to form groups of bits, each consisting of some carefully chosen number of bits, and design the computer hardware to deal with these groups as complete units. In modern computers the most basic group consists of 8 bits, called the byte. Most computers implement groupings of two bytes, 4 bytes, and 8 bytes for purposes to be discussed below. Figure 1.2 illustrates the most frequently used word sizes.

Figure 1.2: The bit, holding a value of 1 or 0, and typical groups of bits of practical use in computer memory and for computation.
\includegraphics[width=0.7\textwidth]{words.eps}

This is a useful approach from the users' point of view, and also from the technical implementation point of view. If the user requires information in chunks of 8, 16, or 32 bits, than there is no reason not to build the hardware so that it can transfer 8, 16, or 32 bits in one cycle rather than in sequence, and similar advantage can be gained by building the basic arithmetic and logic unit so that it can perform the basic operations of arithmetic and logic on these large chunks as well. This is the first and by far the most significant step towards implementing parallelism in computers. Most computers are now designed to handle long words, consisting of 32 bits.

The task of the computer is to process information, which it can only represent by various bit patterns, in a manner defined by the user's computer program. The significance of these bit patterns as data lies with the user and programmer, who has significant freedom in choosing the representations. In fact, in most programs, different memory locations will be defined to have completely different interpretations for any bit pattern. This leads to the possibility of assigning a data type to each variable in a program. The usual data types would include (we will use the names familiar to the C language - other languages have equivalents) unsigned integer (positive), signed integer, float (a finite subset of the rational numbers), as well as char for single characters. Most numeric data types can exist in at least two sizes. The integer data type usually has a short (usually 16 bits) and a long size. Similarly, the float type usually contains 32 bits, and its large counterpart is the double with 64 bits.

Figure: A simple array of bytes in memory. This example illustrates an array as a sequence of cells of the same data type, in this case interpreted as a vector, since it is a one dimensional array. This vector is also an example of a character string, since the basic data type is the byte, used here to hold one character, with the additional property that it is terminated by a special character, the null character which has the value 0, and is represented by the notation '$\backslash0$'. The array is shown twice, the first copy showing the numerical value (in decimal) of each byte, and the second with the interpretation of these values as characters in the ASCII alphabet. Above the array of cells the address of each byte is shown, and it is clear that the bytes are contiguous in memory. The starting address of 123 is of no particular interest, and could lie anywhere in memory.
\includegraphics[width=1.0\textwidth]{barrays.eps}

Of course, it is straightforward to develop the idea of a sequence of memory cells holding the same data type, and sharing a common name, with only a sequential index to distinguish them. This then allows the use of arrays, vectors, and character strings as illustrated in Figure 1.3. In scientific computation the use of matrices in calculations is thus quite easy to implement.

While the implications of the finite number of bits used in each date type is easy to understand for the integer types, they are quite complicated for the float and double data types. In particular, there is always a temptation to treat the float and double data types as if they are real numbers, which they are most definitely not. In fact some languages use the data type name real, which adds to the confusion.


next up previous
Next: Implications of Finite Word Up: The Programming Environment Previous: The Interpreter and Compiler
Charles Dyer
2002-04-24