Article

Encoding vs. Encryption

July 27, 2009 | Posted by Cody


Encoding and encryption are both routines performed on data, however the end results are quite different. In the case of encryption the purpose is to disguise the data such that it can’t be read, except by the intended recipient. On the other hand, encoding is used merely to work the data into a more suitable format. Sometimes these methods are used in conjunction, as we’ll see later in this article, but frequently developers mistakenly substitute encryption with encoding, which can cause some very serious security issues.

While there are many different encoding algorithms, one of the most widely used in web development is base64. As the name suggests, base64 maps 6-bit blocks of binary data into 64 different character representations. The phrase “hello world!” in base64 encoding appears as “aGVsbG8gd29ybGQh,” a somewhat random looking set of characters. However, if we examine the string in more detail we see right away that a very limited set of characters are in use, and applying base64 decoding gives the original “hello world!” message back.

<?php 
echo base64_encode("hello world!"); // prints aGVsbG8gd29ybGQh
?>

A typical application for encoding is transmitting binary data across the Internet. If not encoded, the binary data will likely become corrupt. This is because some systems may interpret the data differently. To ensure this doesn’t happen we can encode the data before sending it, and decode it upon arrival.

It’s important to point out that encoding should never be used in place of encryption. The reason for this is due to the very nature of encoding, which allows data to be easily converted from one representation to another. Only the algorithm is needed, no key is required. To an attacker, it’s like coming across a front door with a dozen knobs and no lock. The only thing standing between them and what’s inside is finding the right knob to turn.

When storing or transmitting sensitive information encryption should always be used. As with encoding algorithms there are many different encryption algorithms (ciphers), perhaps even more than the former. It’s worth noting that ciphers typically have very short life spans, and while popular ciphers in use today have withstood rigorous attacks, it’s likely that will not always be the case.

What makes a good cipher is high entropy. The more random a string appears, the more difficult it is to crack. Because of this many encrypted strings contain unreadable characters, which can often be lost or corrupt when transmitting or reading. To prevent that from happening we can encode the encrypted string into a readable format before storing or transmitting. Anyway, let’s take a look at an example.

Our “hello world!” phrase, encrypted using the AES cipher looks like “R4OuDkO7P5Z6fLHzpuC8ZQ==” in base64 encoding, or “4783ae0e43bb3f967a7cb1f3a6e0bc65” in hexadecimal (I’ve left out the plain-text representation here because it contains almost no readable characters). The only way to convert this data back into its original form is to decrypt it using the same algorithm and key that was used to encrypt it. Since the key is kept private, an attacker will have a very difficult time recovering the plain text even if he knows the algorithm used.

It’s easy to mistake an encoded string with an encrypted string, especially if we assume an attacker has no idea what encoding is being used. For example, if we take our base64 encoded “hello world!” string and reverse it we get “hQGby92dg8GbsVGa.” An attacker may recognize this as a base64 encoded string, and it is a valid one, however, simply decoding this without first reversing the string will result in what appears to be a random collection of characters. While this may deter some attackers, it’s absolutely no substitute for encryption.

<?php 
$a = strrev(base64_encode("hello world!")); // $a = hQGby92dg8GbsVGa
echo base64_decode($a); // prints unreadable characters
echo base64_decode(strrev($a)); // prints hello world!
?>

Tags: