Data Representation. Topics Bit patterns Binary numbers Data type formats Character representation...
-
Upload
isabella-taylor -
Category
Documents
-
view
228 -
download
1
Transcript of Data Representation. Topics Bit patterns Binary numbers Data type formats Character representation...
Topics
• Bit patterns
• Binary numbers
• Data type formats
• Character representation
• Integer representation
• Floating point number representation
Data Representation
• Data representation refers to the manner in which data is stored in the computer
• There are several different formats for data storage
• It is important for computer problem solvers to understand the basic formats
Why is it Important?• As an example: • We will learn that since we have finite storage, it is possible to
overflow a storage location by trying to store too large a number
• Most programming languages provide multiple data types each providing different length storage for variables
• It is up to the programmer to choose the data type with a length that won’t overflow
• Knowing how numbers are represented in storage helps one to understand this
Bit Pattern
• As you recall from an earlier presentation, data may take various forms: characters, numbers, graphical, etc.
• All data is stored in the computer as a sequence of bits, that is, binary digits
• This is a universal storage format for all data types, and it is called a bit pattern
Bits and Bytes• A bit is the smallest unit of data stored in a computer and it has a
value of 0 or 1• It’s like a switch, on (1) or off (0)• In computers, bits are stored electronically in RAM and auxiliary
storage devices by two-state digital circuits• The storage device itself doesn’t know what the bit pattern
represents, but software (application software, operating system, and I/O device firmware) stores and interprets the pattern
• That is, data is coded then stored and when retrieved it is decoded• A byte is a string of 8 bits and is called a character when the data
is text
Binary Numbers
• Each bit pattern is a binary number, that is, a number represented by 0’s and 1’s rather than 0, 1, 2, …, 9 as decimal numbers are
• For example, bit patterns like 1010 and 101001 are also binary numbers
• Binary numbers are based on powers of 2 rather than powers of 10 as decimal numbers are
Data Type Formats• As we have learned, fundamentally all data is
stored as a bit pattern
• But the different data types have different bit pattern formats
• We want to learn the formats for:– Characters (for example, left, Lane, a, ?, \)– Integers (for example, 1, 453, -10, 0)– Floating point numbers (for example, 3.14159, 1.2, -
567.235, 0.009)
Character Representation• The American Standard Code for Information Interchange (ASCII)
is the scheme used to assign a bit pattern to each of the characters
• ASCII charts come in different flavors:
– Some have 7 bit strings, some 8 or more
– Some show the binary code for the various characters as well as the code represented in other number systems, e.g., decimal, hex, octal
• For example, the letter A has the ASCII code:– 1000001 in binary for a 7-bit chart– 65 in decimal– 41 in hex– 101 in octal Note: All four of these
numbers represent the same value but using different number systems
ASCII Chart• Uppercase characters have different ASCII codes
than lowercase
• Uppercase characters come before lowercase
• Numbers come before letters
• The special characters are spread around
• The numbers and upper and lowercase characters are in adjacent groupings, so that their codes increment by one
ASCII Chart• The only difference between the binary codes for the
upper and lowercase characters is the sixth bit, that is, the decimal code for a lowercase character is 32 greater than the uppercase character’s decimal code
• ASCII codes before decimal 32 are control characters (nonprintable) like bell, backspace, and carriage return
• The final ASCII code in the 7-bit chart is the control character DEL with decimal code 127
Extended ASCII & Unicode• The eight-bit ASCII chart is sometimes called Extended
ASCII• The seven-bit ASCII codes are the same in eight-bit chart
except have a zero at the left• Some manufactures use the extra bit to create additional
special characters, these however are nonstandard, e.g., using decimal 171 for ½, or 246 for ÷
• Unicode is another scheme developed so that the many symbols in international languages may be represented. It also uses bit patterns. UTF-32 uses 32 bits.
Numeric Representation• ASCII codes are an inefficient method for representing
numbers• For example, the number 1,024 using 8-bit ASCII would
require four bytes or 32 bits of storage• Arithmetic operations on numbers represented in ASCII
are very complicated• Representing the precision of a number, that is, the
number of digits stored, may require large amounts of space when stored in ASCII
• There are more efficient schemes for numbers
Integer Representation• An integer is a whole number, that is, a number
without a decimal portion• Integers may be positive, negative, or zero• A plus-sign or minus-sign in front of the number
is used to represent positive and negative numbers• The plus-sign is not required for positive numbers
and zero• There are two categories of integer representation:
unsigned and signed
Unsigned Integer• An unsigned integer is an integer without a sign, that is, a non-
negative integer• They range from zero to infinity, but no computer can store all
the integers in that range• So, a maximum unsigned integer is defined• This maximum is based on the number of bits used to store an
integer• Let’s use 8 and 16-bit (1 and 2 bytes) storage locations in our
examples• The length of storage is set by the data type the programmer
specifies for a variable
Unsigned Integer• An unsigned integer is stored as its value when
represented as a binary number• Leading zeros are added to fill out the storage location• For example, the decimal number 9 is represented as
00001001 when stored in 1-byte because 000010012 = 910
• When stored in a 2-byte location, 9 would be represented as 0000000000001001
Unsigned Integer• One may use the following table to work with binary
numbers:
• For example, given 00001001, what decimal number does it represent?
• Add the non-negative powers of two, that is, 8 + 1 = 9
Unsigned Integer• One may use the same table to go the other way, that is,
given the decimal number 13, what is its binary representation?
• Find the largest power of 2 that doesn’t exceed the number and place a 1 in that cell:
• Subtract that power of 2 from the number and use this as the new number: 13 – 8 = 5
Unsigned Integer• Then continue in this way until the sum of the
powers of two equals the number:
• Now, 5 – 4 = 1, and so finally:
• Note that 8 + 4 + 1 = 13
Unsigned Integer
• Then fill in the remaining cells with zeros:
• So, the unsigned integer representation of decimal 13 is 00001101 when stored in 1-byte
Unsigned Integer
• If one tries to store a number in a memory location that is not large enough we have what is called overflow
• In this case, depending on the system, one may or may not receive an error message
• So, one must not store a number that is larger than the maximum for a given length of storage
• The maximum number storable in 1-byte is 255
Unsigned Integer
• For example, if one tries to store 256 in 1-byte there is overflow because the largest value storable in 8 bits is 255 as one can see from the following table:
• Note that 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255
Signed Integer• A sign-and-magnitude format is used to allow for positive
and negative numbers (and zero)• The leading bit is designated as the sign bit: 0 for positive or
zero, 1 for negative• The remaining bits represent the value• So, in 1-byte of storage the maximum number storable is not
255 as it was for the unsigned integer representation, but 127:
• Note that 64 + 32 + 16 + 8 + 4 + 2 + 1 = 127
Signed Positive Integer• To determine what the sign-and-magnitude
representation of a positive decimal number is:– Convert the decimal number to binary– If needed add leading zeros to fill the storage
location
• For example, decimal 12 is represented in 1-byte as 00001100 because 8 + 4 = 12:
Signed Positive Integer• Going the other way, given a sign-and-magnitude
representation for a positive number, one can interpret it as follows:– Leftmost bit will be 0 indicating positive– Convert the remaining bits to a decimal number
• For example, 00010001 is decimal 17:
• Because 16 + 1 = 17
Signed Negative Integer• For negative numbers, two’s
complement format is used
• Two’s complement is still a sign-and-magnitude format
• In two’s complement, some of the magnitude bits are flipped from 0 to 1 or 1 to 0
Signed Negative Integer• To determine what the two’s complement representation
of a negative decimal number is:– Ignore the sign and convert the decimal number to
binary– If needed add leading zeros to fill the storage location– Leave all the rightmost 0’s and first 1 unchanged, but
flip the remaining bits– Make the sign bit 1
• For example, decimal -14 is represented in 1-byte as 11110010 because (see next slide)
Signed Negative Integer• Convert 14 to binary (8 + 4 + 2 = 14) and make
leading bits zero:
• Leave the rightmost 0’s and first 1 as is, but flip the remaining bits:
• Make sign bit 1:
Signed Negative Integer• Going the other way, given a two’s complement
representation for a negative number, one can interpret it as follows:– Leave the rightmost bits up to and including the first
1 unchanged, but flip the remaining bits– Convert the binary number to decimal– Put a minus-sign in front
• For example, 11101010 is decimal –22 because (see next slide)
Signed Negative Integer
• Flip all but the rightmost 1 and any following 0’s:
• Convert the binary number to decimal:
• We get 22 because 16 + 4 + 2 = 22• Put a minus-sign in front yielding -22
Signed Negative Integer• Two’s complement is the standard representation for
negative integers in modern computers• This is because arithmetic operations are simple to
implement when integers are stored this way (but this concept is beyond the scope of the course)
• Although on the surface it seems complicated, at a deeper level it allows for simplicity of operations
Signed Negative Integer• An alternative but equivalent method for converting a
negative number to its two’s complement representation is:– Ignore the sign and convert the decimal number to
binary– If needed add leading zeros to fill the storage location– Flip all the bits– Add 1 to the result of the last step– Make the sign bit 1
• Some people find this easier
Signed Negative Integer• For example, -14. First, convert 14 to binary (8 + 4 + 2 = 14)
and make leading bits zero:
• Flip all the bits:
• Add 1:
• Make the sign bit 1:
Floating Point Number Representation
• Float point numbers are those that have a decimal portion (mathematicians call these real numbers)
• Numbers like 3.14159, 50000.3, and 0.000005
• The method that is used allows for very large or very small numbers to be stored using the same format
Floating Point• The main idea in this format is that the
decimal point is allowed to “float”
• That is, there is an “actual” decimal location in the original number, and there is “stored” decimal location that is usually different
• The original number is normalized by moving the decimal place so there is only one digit to the left
Floating Point• The basic idea can be seen from an example although
this description glosses over many details• The number 102.39 is normalized by moving its decimal
point two places to the left to become 1.0239, and this number is stored and is called the mantissa
• Also, the fact that the decimal point was moved left by 2 is stored so that the original number may be reconstructed and this is called the exponent
• The sign of the number is also stored (0 for positive or zero, 1 for negative)
Floating Point• However, it is actually more complicated than
that
• The exponent and mantissa are actually stored in binary
• And the value stored as the mantissa is only the fractional part of the binary number once the decimal point has been moved so that there is a binary 1 at the left, that is, 1.101001 is stored as 101001 and the leading 1 is assumed
Floating Point• The representation of numbers in floating point
involves a couple procedures that are complicated and beyond the scope of the course
• These are “repetitive multiplication of a decimal fraction by 2,” and the “excess system” for storing positive and negative numbers
• So, we won’t be converting the numbers manually ourselves
Floating Point• However, the procedure used to store a number in floating
point representation is:– Store a 0 (positive) or 1 (negative) in the sign field
– Convert the integer part to binary
– Convert the decimal part (fraction) to binary by using “repetitive multiplication by 2”
– Combine the two binary numbers with a decimal point between
– Move the decimal point so that there is a 1 bit at the left and store the remaining bits in the mantissa field
– Store the number of places moved using the “excess system” in the exponent field
Floating Point• Computers store data in binary and in finite
space, i.e., they are discrete, finite systems
• However real numbers form a continuous, infinite system
• Hence, computers can only approximate real numbers
• The precision of a floating point number is how close the stored number is to the original number
Floating Point
• Small Basic Example:
• Mathematically c should be 0 but what does the program display for c?
a = 2 / 3b = 2 * (1 / 3)c = a - bTextWindow.WriteLine(c)
Floating Point
• The more bits available for the mantissa field the more digits of the original number may be stored
• Programming languages normally allow the programmer to define the precision by the data type chosen
Floating Point
• Institute of Electrical and Electronics Engineers (IEEE) standards:
• Single-Precision (4 bytes)
• Double-Precision (8 bytes)
Floating Point
• Trade off:
• Double precision numbers require more space and therefore programs using them may run slower
• But operations using double precision numbers will be more precise
Summary
• Data are stored as bit patterns
• A bit pattern is a binary number
• There are various data type formats
• Characters are represented in ASCII
Summary• Integers are represented as either
– Unsigned – stored as the binary number equivalent to the original
– Signed Positive – stored using the sign-and-magnitude format where the magnitude is the binary equivalent
– Signed Negative – stored using the sign-and-magnitude format where the magnitude is in the two’s complement format
• Floating point numbers are represented using sign, exponent, and mantissa
Terminology• Data representation• Bit• Byte• Bit pattern• Binary number• Character• Integer• Floating point number• ASCII• Control characters• Extended ASCII• Unicode
• Unsigned integer representation• Overflow• Sign-and-magnitude representation• Sign• Two’s complement representation• Floating point representation• Normalize• Exponent• Mantissa• Precision• Single-precision• Double-precision