Post Ad Area

Thursday, October 20, 2011

Optimizing Long Lists Of Yes/No Values With JavaScript

Very frequently in Web development (and programming in general), you need to store a long list of boolean values (yes/no, true/false, checked/unchecked… you get the idea) into something that accepts only strings. Maybe it’s because you want to store them in localStorage or in a cookie, or send them through the body of an HTTP request. I’ve needed to do this countless times.

The last time I stumbled on such a case wasn’t with my own code. It was when Christian Heilmann showed me his then new slide deck, with a cool feature where you could toggle the visibility of individual slides in and out of the presentation. On seeing it, I was impressed. Looking more closely, though, I realized that the checkbox states did not persist after the page reloaded. So, someone could spend a long time carefully tweaking their slides, only to accidentally hit F5 or crash their browser, and then — boom! — all their work would be lost. Christian told me that he was already working on storing the checkbox states in localStorage. Then, naturally, we endlessly debated the storage format. That debate inspired me to write this article, to explore the various approaches in depth.

[Editor's note: Have you already got your copy of the Smashing Book #2? The book shares valuable practical insight into design, usability and coding. Have a look at the contents.]

Using An Array

We have two (reasonable) ways to model our data in an array. One is to store true/false values, like so:

1[false, true, true, false, false, true, true]

The other is to store an array of 0s and 1s, like so:

1[0, 1, 1, 0, 0, 1, 1]

Whichever solution we go with, we will ultimately have to convert it to a string, and then convert it back to an array when it is read. We have two ways to proceed: either with the old Array#join() (or Array#toString()) and String#split(), or with the fancier JSON.stringify() and JSON.parse().

With the JSON way, the code will be somewhat shorter, although it is the JavaScript equivalent of slicing bread with a chainsaw. The impact on performance doesn’t seem to be significant, but you’d be cutting down browser support quite a bit.

The main drawback of using array-based strings is their size in bytes. If you go with the number method, you would use almost 2 bytes (or characters) per number (or, more precisely, 2N − 1, since you’d need one delimiter per number, except for the last one):

1[0, 1, 1, 0, 0, 1, 1].toString().length // 13, for 7 values

So, for 512 numbers, that would be 1023 bytes (or 1 KB). If you go with the boolean method, it’s even worse:

1[false, true, true, false, false, true, true].toString().length // 37, also for 7 values

That’s around 5 to 6 characters per value, so 2560 to 3072 bytes for 512 numbers (which is 2.5 to 3 KB). JSON.stringify() even wastes 2 more characters in each case, for the opening and closing brackets, but its advantage is that you get your original value types back with JSON.parse() instead of strings.

Using A String

Using a string saves some space, because no delimiters are involved. For example, if you go with the number approach and store strings like '01001101010111', you are essentially storing one character per value, which is 100% better than the better of the two previous approaches. You can then get the values into an array by using String#split:

1'01001101010111'.split(''); // ['0','1','0','0','1','1','0','1','0','1','0','1','1','1']

Or you could just loop over the string using string.charAt(i) — or even the string indexes (string[i]), if you don’t care about older browsers.

Using Bitfields

Did the previous method make you think of binary numbers? It’s not just you. The concept of bitfields is quite popular in other programming languages, but not so much in JavaScript. In a nutshell, bitfields are used to pack a lot of boolean values into the bits of the boolean representation of a number. For example, if you have eight values (true, false, false, true, false, true, true, false), the number would be 10010110 in binary; so, 150 in decimal and 96 in hex. That’s 2 characters instead of 8, so 75% saved. In general, 1 digit in the hex representation corresponds to exactly 4 bits. (That’s because 16 = 24. In general, in a base2n system, you can pack n bits into every base2n digit.) So, we weren’t lucky with that 75%; it’s always that much.

Thus, instead of storing that string as a string and using 1 character per value, we can be smarter and convert it to a (hex) number first. How do we do that? It’s no more than a line of code:

1parseInt('10010110', 2).toString(16); // returns '96'

And how do we read it back? That’s just as simple:

1parseInt('96', 16).toString(2); // returns  '10010110'

From this point on, we can follow the same process as the previous method to loop over the values and do something useful with them.

Can We Do Better?

In fact, we can! Why convert it to a hex (base 16) number, which uses only 6 of the 26 alphabet letters? The Number#toString() method allows us to go up to base 36 (throwing a RangeError for >= 37), which effectively uses all letters in the alphabet, all the way up to z! This way, we can have a compression of up to 6 characters for 32 values, which means saving up to 81.25% compared to the plain string method! And the code is just as simple:

1parseInt( '1001011000', 2).toString(36); // returns 'go' (instead of '258', which would be the hex version)
2parseInt('go', 36).toString(2); // returns  '1001011000'

Base 64

The previous method allows us to go up only to base 36. But in modern JavaScript, we can use the methods atob() and btoa() to utilize a base 64 encoding system. That’s 64 = 26, which means we can pack 6 yes/no characters into a single character, saving us 83.3% compared to the plain string method. Let’s look at the code:

1atob(parseInt('1001011000', 2)); // returns 'ëM'
2parseInt(btoa('ëM')).toString(2) // returns '1001011000'

For some of you, this will be enough. But I can almost hear the more inquisitive minds out there shouting, “But we have 255 symbols, we are still not using strings to their full potential!” And you’d be right. There is a reason why every time you open a binary file in a text editor, you get weird symbols mixed with numbers, uppercase letters, lowercase letters and whatnot. Every character in an ASCII string is a byte (8 bits), which means that if we use the right compression algorithm, we should be able to store 8 yes/no values in it (saving 87.5%).

The problem is that JavaScript doesn’t offer a built-in way to do that, so the code becomes a bit more complicated.

Packing 8 Values Into One Character

You can use String.fromCharCode to get the individual characters. It accepts a numerical value of up to 65,535 and returns a character (and for values greater than that, it returns an empty string). In this case, however, we’re going to use character codes of up to 255 (8 bits).

(Aside: Unicode characters are variable width, so they take up from 1 up to 4 bytes per character. Thus, if you pack your data in characters beyond the first 255, you will be consuming 2 bytes instead of 1, so you might as well have used 2 ASCII characters instead — the file size would be the same. Only if, for some reason, you have a limit that is set in number of (Unicode) characters, then using the full range might be a good idea. But in most cases, the main concern is byte size, so it’s entirely pointless.)

So, we have to split our string into chunks of 8 characters in size. We can do that through .match(/.{1,8}/g). To sum up, the full solution would look like this:

01function pack(/* string */ values) {
02    var chunks = values.match(/.{1,8}/g), packed = '';
03    for (var i=0; i < chunks.length; i++) {
04        packed += String.fromCharCode(parseInt(chunks[i], 2));
05    }
06    return packed;
09function unpack(/* string */ packed) {
10    var values = '';
11    for (var i=0; i < packed.length; i++) {
12        values += packed.charCodeAt(i).toString(2);
13    }
14    return values;

It wasn’t that hard, was it?

With these few lines of code, you can pack the aforementioned 512 values into — drum roll, please — 64 characters!

Quite an improvement over our original 1 KB (with the array method), isn’t it?


Numbers in JavaScript have limits. For the methods discussed here that involve an intermediate state of converting to a number, the limit appears to be 1023 yes/no values, because parseInt('1111…1111', 2) returns Infinity when the number of aces is bigger than 1023. This limit does not apply to the last method, because we’re only converting blocks of bits instead of the whole thing. And, of course, it doesn’t apply to the first two methods (array and string) because they don’t involve packing the values into an integer.

“I Think You Took It A Bit Too Far”

This might be overkill for some cases. But it will definitely come in handy when you want to store a lot of boolean values in any limited space that can only store strings. And no optimization is overkill for things that go through the wire frequently. For example, cookies are sent on every single request, so they should be as tiny as possible. Another use case would be online multiplayer games, for which response times should be lightning-fast, otherwise the games wouldn’t be fun.
And even if this kind of optimization isn’t your thing, I hope you’ve found the thought process and the code involved educational.

Start typing and press Enter to search