Very frequently in Web development (and programming in general), you
need to store a long list of boolean values (yes/no, true/false,
checked/unchecked… you get the idea) into something that accepts only
strings. Maybe it’s because you want to store them in
localStorage
or in a cookie, or send them through the body of an HTTP request. I’ve needed to do this countless times.
The last time I stumbled on such a case wasn’t with my own code. It was when
Christian Heilmann showed me his
then new slide deck,
with a cool feature where you could toggle the visibility of individual
slides in and out of the presentation. On seeing it, I was impressed.
Looking more closely, though, I realized that the checkbox states did
not persist after the page reloaded. So, someone could spend a long time
carefully tweaking their slides, only to accidentally hit F5 or crash
their browser, and then — boom! — all their work would be lost.
Christian told me that he was already working on storing the checkbox
states in
localStorage
. Then, naturally, we endlessly
debated the storage format. That debate inspired me to write this
article, to explore the various approaches in depth.
[Editor's note: Have you already got your copy of the
Smashing Book #2? The book shares valuable practical insight into design, usability and coding.
Have a look at the contents.]
Using An Array
We have two (reasonable) ways to model our data in an array. One is to store true/false values, like so:
1 | [false, true, true, false, false, true, true] |
The other is to store an array of 0s and 1s, like so:
Whichever solution we go with, we will ultimately have to convert it
to a string, and then convert it back to an array when it is read. We
have two ways to proceed: either with the old
Array#join()
(or
Array#toString()
) and
String#split()
, or with the fancier
JSON.stringify()
and
JSON.parse()
.
With the JSON way, the code will be somewhat shorter, although it is
the JavaScript equivalent of slicing bread with a chainsaw.
The impact on performance doesn’t seem to be significant, but you’d be cutting down browser support quite a bit.
The main drawback of using array-based strings is their size in
bytes. If you go with the number method, you would use almost 2 bytes
(or characters) per number (or, more precisely,
2N − 1
, since you’d need one delimiter per number, except for the last one):
1 | [0, 1, 1, 0, 0, 1, 1].toString().length // 13, for 7 values |
So, for 512 numbers, that would be 1023 bytes (or 1 KB). If you go with the boolean method, it’s even worse:
1 | [false, true, true, false, false, true, true].toString().length // 37, also for 7 values |
That’s around 5 to 6 characters per value, so 2560 to 3072 bytes for 512 numbers (which is 2.5 to 3 KB).
JSON.stringify()
even wastes 2 more characters in each case, for the opening and closing
brackets, but its advantage is that you get your original value types
back with
JSON.parse()
instead of strings.
Using A String
Using a string saves some space, because no delimiters are involved.
For example, if you go with the number approach and store strings like
'01001101010111'
,
you are essentially storing one character per value, which is 100%
better than the better of the two previous approaches. You can then get
the values into an array by using
String#split
:
1 | '01001101010111'.split(''); // ['0','1','0','0','1','1','0','1','0','1','0','1','1','1'] |
Or you could just loop over the string using
string.charAt(i)
— or even the string indexes (
string[i]
), if you don’t care about older browsers.
Using Bitfields
Did the previous method make you think of binary numbers? It’s not just you. The concept of
bitfields
is quite popular in other programming languages, but not so much in
JavaScript. In a nutshell, bitfields are used to pack a lot of boolean
values into
the bits of the boolean representation of a
number. For example, if you have eight values (true, false, false,
true, false, true, true, false), the number would be 10010110 in binary;
so, 150 in decimal and 96 in hex. That’s 2 characters instead of 8, so
75% saved. In general, 1 digit in the hex representation corresponds to exactly 4 bits. (That’s because
16 = 24
. In general, in a
base2n
system, you can pack
n
bits into every
base2n
digit.) So,
we weren’t lucky with that 75%; it’s always that much.
Thus, instead of storing that string as a string and using 1
character per value, we can be smarter and convert it to a (hex) number
first. How do we do that? It’s no more than a line of code:
1 | parseInt('10010110', 2).toString(16); // returns '96' |
And how do we read it back? That’s just as simple:
1 | parseInt('96', 16).toString(2); // returns '10010110' |
From this point on, we can follow the same process as the previous
method to loop over the values and do something useful with them.
Can We Do Better?
In fact, we can! Why convert it to a hex (base 16) number, which uses only 6 of the 26 alphabet letters? The
Number#toString()
method allows us to go up to
base 36 (throwing a
RangeError
for
>= 37
), which effectively uses
all
letters in the alphabet, all the way up to z! This way, we can have a
compression of up to 6 characters for 32 values, which means saving up
to 81.25% compared to the plain string method! And the code is just as
simple:
1 | parseInt( '1001011000', 2).toString(36); // returns 'go' (instead of '258', which would be the hex version) |
2 | parseInt('go', 36).toString(2); // returns '1001011000' |
Base 64
The previous method allows us to go up only to base 36. But in modern JavaScript, we can use the methods
atob()
and
btoa()
to utilize a
base 64 encoding system. That’s
64 = 26
,
which means we can pack 6 yes/no characters into a single character,
saving us 83.3% compared to the plain string method. Let’s look at the
code:
1 | atob(parseInt('1001011000', 2)); // returns 'ëM' |
2 | parseInt(btoa('ëM')).toString(2) // returns '1001011000' |
For some of you, this will be enough. But I can almost hear the more
inquisitive minds out there shouting, “But we have 255 symbols, we are
still not using strings to their full potential!” And you’d be right.
There is a reason why every time you open a binary file in a text
editor, you get weird symbols mixed with numbers, uppercase letters,
lowercase letters and whatnot. Every character in an ASCII string is a
byte (8 bits), which means that if we use the right compression
algorithm, we should be able to store 8 yes/no values in it (saving
87.5%).
The problem is that JavaScript doesn’t offer a built-in way to do that, so the code becomes a bit more complicated.
Packing 8 Values Into One Character
You can use
String.fromCharCode
to get the individual
characters. It accepts a numerical value of up to 65,535 and returns a
character (and for values greater than that, it returns an empty
string). In this case, however, we’re going to use character codes of up
to 255 (8 bits).
(Aside: Unicode characters are variable width, so they take up from 1
up to 4 bytes per character. Thus, if you pack your data in characters
beyond the first 255, you will be consuming 2 bytes instead of 1, so you
might as well have used 2 ASCII characters instead — the file size
would be the same. Only if, for some reason, you have a limit that is
set in number of (Unicode) characters, then using the full range might
be a good idea. But in most cases, the main concern is byte size, so
it’s entirely pointless.)
So, we have to split our string into chunks of 8 characters in size. We can
do that through .match(/.{1,8}/g)
. To sum up, the full solution would look like this:
01 | function pack(/* string */ values) { |
02 | var chunks = values.match(/.{1,8}/g), packed = ''; |
03 | for (var i=0; i < chunks.length; i++) { |
04 | packed += String.fromCharCode(parseInt(chunks[i], 2)); |
09 | function unpack(/* string */ packed) { |
11 | for (var i=0; i < packed.length; i++) { |
12 | values += packed.charCodeAt(i).toString(2); |
It wasn’t that hard, was it?
With these few lines of code, you can pack the aforementioned 512 values into — drum roll, please —
64 characters!
Quite an improvement over our original 1 KB (with the array method), isn’t it?
Limitations
Numbers in JavaScript have limits. For the methods discussed here
that involve an intermediate state of converting to a number, the limit
appears to be
1023 yes/no values, because
parseInt('1111…1111', 2)
returns
Infinity
when the number of aces is bigger than 1023. This limit does not apply
to the last method, because we’re only converting blocks of bits instead
of the whole thing. And, of course, it doesn’t apply to the first two
methods (array and string) because they don’t involve packing the values
into an integer.
“I Think You Took It A Bit Too Far”
This might be overkill for some cases. But it will definitely come in
handy when you want to store a lot of boolean values in any limited
space that can only store strings. And no optimization is overkill for
things that go through the wire frequently. For example, cookies are
sent on every single request, so they should be as tiny as possible.
Another use case would be online multiplayer games, for which response
times should be lightning-fast, otherwise the games wouldn’t be fun.
And even if this kind of optimization isn’t your thing, I hope you’ve
found the thought process and the code involved educational.
Hey thanks for share useful information
ReplyDeletewebsite development company in bangalore
Thank you for this valuable information. It was really helpful.
ReplyDeleteweb designing course in chennai