Bytes & Thoughts
And there was a byte.
In the previous post, we asked: what is a computer? And how did it come to be? We learned that a computer is a device that can process information using ideas, symbols, rules, and systems. We also saw some examples of how people can use computers to create, communicate, and explore. But how do computers actually store and manipulate information? And what is a byte?
A byte is a basic unit of information in computer storage and processing. It consists of eight bits, which are binary digits that can be either 0 or 1. For example, 01000001 is a byte. A byte can represent different types of information, such as a letter, a number, a color, or a sound. For example, the byte 01000001 can represent the letter A, the number 65, or a shade of green, depending on how the computer interprets it.
But why eight bits? Why not four, or ten, or twenty? The answer lies in the history of the computer and the innovation that led to the byte and modern data processing.
The concept of a bit, or a binary digit, was first introduced by Claude Shannon, a mathematician and engineer, in his 1948 paper “A Mathematical Theory of Communication”. ¹ He showed that any information can be encoded using only two symbols, such as 0 and 1, and that this can simplify the design of communication systems, such as telegraphs and radios. He also showed that the amount of information in a message can be measured by the number of bits required to encode it.
However, the term byte was not coined until 1956, by Werner Buchholz, an engineer at IBM, during the early design phase for the IBM Stretch computer. ²³ The IBM Stretch was a supercomputer that had addressing to the bit and variable field length instructions with a byte size encoded in the instruction. ⁴ Buchholz chose the term byte as a deliberate respelling of bite, to avoid confusion with bit. ⁵
The IBM Stretch was one of the first computers to use eight bits as the standard size for a byte. However, this was not a universal convention at the time. Some early computers used fewer or more bits for each byte, depending on the hardware and software design. For example, the UNIVAC I, which was the first commercial computer in the United States, used six bits for each byte. ⁶ The PDP-8, which was one of the first minicomputers, used 12 bits for each byte. ⁷ The CDC 6600, which was the fastest computer in the world in the 1960s, used 60 bits for each byte. ⁸
The diversity of byte sizes created compatibility issues among different computers and systems. To address this problem, some standards were developed to define the size and meaning of bytes. One of the most influential standards was the American Standard Code for Information Interchange (ASCII), which was first published in 1963. ASCII defined a set of 128 characters, each represented by a seven-bit code, that could be used for text communication. For example, the letter A was assigned the code 01000001, the number 2 was assigned the code 00110010, and the symbol $ was assigned the code 00100100. ASCII also reserved the eighth bit for error detection or other purposes, such as extending the character set.
ASCII became widely adopted by many computers and systems, and helped to popularize the use of eight bits as the standard size for a byte. However, ASCII was not sufficient to represent all the characters and symbols used in different languages and cultures. To accommodate the diversity of human languages, other standards were developed to extend ASCII or to define new character sets, such as Unicode, which was first published in 1991. Unicode is a universal standard that can encode over a million characters, covering most of the world’s writing systems, as well as emojis, mathematical symbols, and other symbols.
The development of standards for bytes and characters enabled more efficient and consistent data processing and communication among different computers and systems. However, bytes are not only used for text information. They can also be used for other types of information, such as numbers, colors, sounds, images, and videos. To process these types of information, computers need to use different methods and formats to encode and decode them using bytes.
For example, to process numbers, computers need to use different number systems, such as binary, decimal, hexadecimal, and octal. Binary is the system that uses only two symbols, 0 and 1, to represent numbers. Decimal is the system that uses ten symbols, 0 to 9, to represent numbers. Hexadecimal is the system that uses 16 symbols, 0 to 9 and A to F, to represent numbers. Octal is the system that uses eight symbols, 0 to 7, to represent numbers. Each system has its own advantages and disadvantages, and computers can convert among them using algorithms.
To process colors, computers need to use different color models, such as RGB, CMYK, and HSV. RGB is the model that uses three primary colors, red, green, and blue, to represent colors. CMYK is the model that uses four primary colors, cyan, magenta, yellow, and black, to represent colors. HSV is the model that uses three attributes, hue, saturation, and value, to represent colors. Each model has its own applications and limitations, and computers can convert among them using formulas.
To process sounds, computers need to use different sound formats, such as WAV, MP3, and MIDI. WAV is the format that uses a waveform to represent sound. MP3 is the format that uses a compression algorithm to reduce the size of sound files. MIDI is the format that uses a sequence of instructions to synthesize sound. Each format has its own quality and performance, and computers can convert among them using codecs.
To process images, computers need to use different image formats, such as BMP, JPEG, and PNG. BMP is the format that uses a bitmap to represent images. JPEG is the format that uses a compression algorithm to reduce the size of image files. PNG is the format that uses a lossless compression algorithm to preserve the quality of image files. Each format has its own features and trade-offs, and computers can convert among them using codecs.
To process videos, computers need to use different video formats, such as AVI, MP4, and GIF. AVI is the format that uses a container to store video and audio data. MP4 is the format that uses a compression algorithm to reduce the size of video files. GIF is the format that uses a sequence of images to create animations. Each format has its own advantages and disadvantages, and computers can convert among them using codecs.
As you can see, bytes are the building blocks of information in computer storage and processing. They can represent different types of information, such as text, numbers, colors, sounds, images, and videos, using different methods and formats. However, bytes are not the end of the story. There are more levels of abstraction and complexity in data processing, such as data structures, algorithms, databases, and artificial intelligence. We will explore these topics in the next posts. Stay tuned!
Source:
(1) Byte – Wikipedia. https://en.wikipedia.org/wiki/Byte.
(2) Byte | Definition & Facts | Britannica. https://www.britannica.com/technology/byte.
(3) Byte (magazine) – Wikipedia. https://en.wikipedia.org/wiki/Byte_%28magazine%29.
(4) Byte – Simple English Wikipedia, the free encyclopedia. https://simple.wikipedia.org/wiki/Byte.
(5) How to build a data architecture to drive innovation—today and tomorrow …. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/how-to-build-a-data-architecture-to-drive-innovation-today-and-tomorrow.
(6) 15 innovative uses of AI and data – Fast Company. https://www.fastcompany.com/90696567/best-ai-data-innovations.
(7) Five leading trends of database innovation: Cloud-native, self-driving …. https://www.cloudcomputing-news.net/news/2021/apr/30/five-leading-trends-of-database-innovation-cloud-native-self-driving-and-more/.
(8) Top 7 Innovations in Data Science [2023] – Analytics Vidhya. https://www.analyticsvidhya.com/blog/2023/04/innovations-in-data-science/.