The History of Computing: Claude Shannon and the Origins of Information Theory

Claude Shannon and the Origins of Information Theory

Sep 9, 2020

The name Claude Shannon has come up 8 times so far in this podcast. More than any single person. We covered George Boole and the concept that Boolean is a 0 and a 1 and that using Boolean algebra, you can abstract simple circuits into practically any higher level concept. And Boolean algebra had been used by a number of mathematicians, to perform some complex tasks. Including by Lewis Carroll in Through The Looking Glass to make words into math.

And binary had effectively been used in morse code to enable communications over the telegraph.

But it was Claude Shannon who laid the foundation for making a theory that took both the concept of communicating over the telegraph and applying Boolean algebra to get to a higher level of communication possible. And it all starts with bits, which we can thank Shannon for.

Shannon grew up in Gaylord, Michigan. His mother was a high school principal and his grandfather had been an inventor. He built a telegraph as a child, using a barbed wire fence. But barbed wire isn’t the greatest conducer of electricity and so… noise. And thus information theory began to ruminate in his mind. He went off to the University of Michigan and got a Bachelors in electrical engineering and another in math. A perfect combination for laying the foundation of the future.

And he got a job as a research assistant to Vannevar Bash, who wrote the seminal paper, As We May Think. At that time, Bush was working at MIT on The Thinking Machine, or Differential Analyzer. This was before World War II and they had no idea, but their work was about to reshape everything. At the time, what we think of as computers today, were electro-mechanical. They had gears that were used for the more complicated tasks, and switches, used for simpler tasks.

Shannon devoted his masters thesis to applying Boolean algebra, thus getting rid of the wheels, which moved slowly, and allowing the computer to go much faster. He broke down Boole’s Laws of Thought into a manner it could be applied to parallel circuitry. That paper was called A Symbolic Analysis of Relay and Switching Circuits in 1937 and helped set the stage for the Hackers revolution that came shortly thereafter at MIT.

At the urging of Vannevar Bush, he got his PhD in Biology, pushing genetics forward by theorizing that you could break the genetic code down into a matrix. The structure of DNA would be discovered by George Gamow in 1953 and Watson and Crick would discover the helix and Rosalind Franklin would use X-ray crystallography to capture the first photo of the structure.

He headed off to Princeton in 1940 to work at the Institute for Advanced Study, where Einstein and von Neumann were. He quickly moved over to the National Defense Research Committee, as the world was moving towards World War II. A lot of computing was going into making projectiles, or bombs, more accurate. He co-wrote a paper called Data Smoothing and Prediction in Fire-Control Systems during the war.

He’d gotten a primer in early cryptography, reading The Gold-Bug by Edgar Allan Poe as a kid. And it struck his fancy. So he started working on theories around cryptography, everything he’d learned forming into a single theory. He would have lunch with Alan Turning during the war. He would And it was around this work that he first coined the term “information theory” in 1945.

A universal theory of communication gnawed at him and formed during this time, from the Institute, to the National Defense Research Committee, to Bell Labs, where he helped encrypt communications between world leaders. He hid it from everyone, including failed relationships. He broke information down into the smallest possible unit, a bit, short for a binary digit. He worked out how to compress information that was most repetitive. Similar to how morse code compressed the number of taps on the electrical wire by making the most common letters the shortest to send. Eliminating redundant communications established what we now call compression.

Today we use the term lossless compression frequently in computing. He worked out that the minimum amount of information to send would be H = - Sigma Pi log2 Pi - or entropy.

His paper, put out while he was at Bell, was called “A mathematical theory or communication” and came out in 1948. You could now change any data to a zero or a one and then compress it. Further, he had to find a way to calculate the maximum amount of information that could be sent over a communication channel before it became garbled, due to loss. We now call this the Shannon Limit. And so once we have that, he derived how to analyze information with math to correct for noise. That barbed wire fence could finally be useful. This would be used in all modern information connectivity. For example, when I took my Network+ we spent an inordinate amount of time learning about Carrier-sense multiple access with collision detection (CSMA/CD) - a media access control (MAC) method that used carrier-sensing to defer transmissions until no other stations are transmitting.

And as his employer, Bell Labs helped shape the future of computing. Along with Unix, C, C++, the transistor, the laser, information theory is a less tangible yet given what we all have in our pockets on on our wrists these days, more tangible discovery. Having mapped the limits, Bell started looking to reach the limit. And so the digital communication age was born when the first modem would come out of his former employer, Bell Labs, in 1958. And just across the way in Boston, ARPA would begin working on the first Interface Message Processor in 1967, the humble beginnings of the Internet.

His work done, he went back to MIT. His theories were applied to all sorts of disciplines. But he comes in less and less. Over time we started placing bits on devices. We started retrieving those bits. We started compressing data. Digital images, audio, and more. It would take 35 or so years

He consulted with the NSA on cryptography. In 1949 he published Communication Theory of Secrecy Systems, pushed cryptography to the next level. His paper Prediction and Entropy of Printed English in 1951 practically created the field of natural language processing, which evolved into various branches of machine learning. He helped give us the Nyquist–Shannon sampling theorem, used in aliasing, deriving maximum throughput, RGB, and of course signal to noise.

He loved games. In 1941 he theorized the Shannon Number, or the game-tree complexity of chess. In case you’re curious, the reason deep blue can win at chess is that it can brute force 10 to the 120th power. His love of games continued and in 1949 he presented Programming a Computer for Playing Chess. That was the first time we thought about computers playing chess. And he’d have a standing bet that a computer would beat a human grand master at chess by 2001. Garry Kasparov lost to Deep Blue in 1997.

That curiosity extended far beyond chess. He would make Theseus in 1950 - a maze with a mouse that learned how to escape, using relays from phone switches. One of the earliest forms of machine learning. In 1961 he would co-invent the first wearable computer to help win a game of roulette. That same year he designed the Minivan 601 to help teach how computers worked.

So we’ll leave you with one last bit of information. Shannon’s maxim is that “the enemy knows the system.” I used to think it was just a shortened version of Kerckhoffs's principle, which is that it should be possible to understand a cryptographic system, for example, modern public key ciphers, but not be able to break the encryption without a private key. Thing is, the more I know about Shannon the more I suspect that what he was really doing was giving the principle a broader meaning. So think about that as you try and decipher what is and what is not disinformation in such a noisy world.

Lots and lots of people would cary on the great work in information theory. Like Kullback–Leibler divergence, or relative entropy. And we owe them all our thanks. But here’s the thing about Shannon: math. He took things that could have easily been theorized - and he proved them. Because science can refute disinformation. If you let it.