Let’s start with some brain food for you:
In other words, values of types like
null and internal pointers are all ‘boxed’ into double precision floating point values or ‘double’s.
What do we mean by ‘boxed’?
We mean we’re storing a given value (say a boolean
true) inside another type internally. In this case, a floating point value.
But why? And how does it work?
Time for a deep dive into this magic! ✨🦄
Let me explain “NaN boxing”…
About “double precision floating point” numbers
Floating point numbers are sometimes misunderstood & feared, but are actually not that complicated to get the basics of.
Generally, we recognised floating point numbers in our code by the decimal point in the number, say
1.32 (though this is not always the case).
So what exactly do they represent?
They can represent rational numbers, but often only approximately. A rational number is a number that we can also specify exactly as the ratio of two integers. For example,
0.5 is rational as we can also write it as
0.0347826087 is also rational, it is
4/115, as is
323 which is
They can represent whole numbers (integer values) within a certain range.
Finally, they can approximate irrational numbers. These are numbers that can not be represented completely by a ratio of two integers. They have an infinite number of digits after the decimal places. A famous example of an irrational number is Pi (
In computer systems, such as programming languages or in computer hardware, floating point numbers are generally implemented to follow the IEEE 754 standard.
Exactly how the floating point number type works, and what is the secret behind that notorious fact that
0.1 + 0.2 != 0.3, is a story for another day… but if you are keen to know more, sign up to get early access to this course.
Doubles vs Singles
Surely there is a joke about tennis here, but I’m not going to even try.
Singles are often simply called
floats and are 32 bits wide in their representation.
double, or double precision floating point number, is double the size in bits to that of a ‘single’.
double is 64 bits wide and can therefore represent a larger range of values with higher precision. But of course requires twice as much space to store.
The structure of a float
Let us investigate the structure of floating point numbers. By that I mean look at the representation at the binary level.
So a double is 64 bits wide, and is composed of 3 parts:
- An ‘exponent’: the power to raise the base by (base 2 incidently), to scale the value
- A ‘significand’ or the fraction if you like, in other words, the value itself
- A ‘sign’ bit which represents whether the number is positive or negative
double the significand part of the representation is 52 bits wide. For the sake of completeness we do not consider the ‘implicit bit’ since it’s not part of the actual representation.
By setting these 3 parts to specific values we can represent a large range of numbers. The standard also includes certain values which are reserved to mean specific things.
One of those is a special value is called
NaN (‘not a number’)…
Not a number: NaN
But the fact is… it is correct.
As mentioned, the floating point standard specifies a number of special values, including
Infinity, negative zero, and
NaN, standing for ‘not a number’.
(Note if you are interested to also deep dive into the other special values sign up to the newsletter, I’ll do negative zero soon!)
NaN is a floating point value that is used to represent states that are invalid. For example, when a mathematical calculation is performed the doesn’t make any sense, say
0 ÷ 0.
The value is conceptually ‘not a number’. But we are looking at types here!
No “#wtfjs” here… just a confusing overlap of naming and terminology across different domains.
Anyway, what we need to know is that to represent a
NaN value in a
double, we set:
- The exponent part of the floating point number is set to its max value, (all 11 bits are set to ‘1’)
- The significand is anything but 0.
Now for the start of the magic of NaN boxing…
There are A LOT of NaN values
Note our definition of a NaN value
all non-zero values of the significand are NaNs when the exponent is at its max value.
But the significand is 52 bits wide, so there is a huge number of possible NaN values (
2^52 - 1 …
We only need a few of those values to represent actual NaNs in our programming language. So we can think of all the other possible NaN values as being ‘unused’.
Maybe we can use these unused values to our advantage?
JS has a number of data types. Numeric types, booleans, undefined and so on.
To make that possible the engine needs to know what data type each value has at all times.
How does it know that?
Enter ‘NaN boxing’!
The idea is to shoehorn everything into those ‘unused’ NaNs in floats.
This method of encoding values and their type inside some other representation is called ‘boxing’ 📦 (🥊)
Let’s look at the size in bits of data types in JS:
- Numbers in JS are double precision floats, so they fit neatly into doubles… because they are doubles!
- unsigned integers, used for operations like logical AND, bit shifts etc, are 32 bits (think about the
- pointers used internally. Pointers are used to reference the memory location of data that is not a primitive type (like an Object). In practice they are 48-bit wide even on 64-bit machines.
- booleans, undefined and null all have limited values, for example there is only one ‘undefined’ value for the Undefined type.
So we can see that the largest non-float type is 48 bits wide & we know the ‘double’ significand is 52 bits wide…
So the significand can contain the values of all other data types, and have bits left over to indicate the type. 🎉
The exact way the data type tag or boxed value is represented inside the significand of the NaN is different in each implementation. I have linked to a few examples at the end of the article.
And remember, with the ‘exponent’ of the double at its max value, all these boxed value are valid ‘NaN’s!
In fact this NaN boxing technique is used in other dynamically typed languages too.
Unboxing value from NaNs
Unboxing is the process of reversing the ‘NaN boxing’…
I.e. everything is a double, but we ‘unbox’ the actual type and value when we encounter a NaN.
- Any non-NaN floating point value is just that, a floating point value
- If the value is a ‘NaN’, then we look at the top bits of the significand to find what type it is
- Sitting neatly inside the remaining bits is the value of that type. We can extract the value by masking off the bits needed for the given type.
Why use NaN boxing?
Firstly, the representation of types and values with NaN boxing is compact & predictable in size. All type values and type tags fit into 64 bits. This aids performance and caching.
64bit floats are directly represented (and integers up to
MAX_SAFE_INTEGER) so there is no performance penalty associated to the encoding and them.
However, not that other techniques for representing value and type in dynamic languages do exist and offer their own set of pros and cons.
An example of NaN boxing
A primitive type is data type that is supported directly by the programming language itself, and which is not composed of any other data types.
boolean. In other languages like C++ it means
Let’s take an example of a primitive data type, an unsigned integer which as mentioned is used internally for bitwise operations.
We are encoding a 32-bit number, but we have 48 bits remaining.
So that means there are 16 bits unused and we mask off the lower 32 bits to get the actual value.
Remember this is still a valid NaN value in IEEE 754 floating point terms.
Summary & further reading
So NaN boxing is a clever way to get data type info & values in to a standard type without affecting its precision or range!
- NaNs are special floating point values
- dynamic language need to keep data type information along with a value at runtime
- ‘NaN boxing’ lets a the interpreters of there languages encode any data type by embedding information inside unused floating point ‘NaN’s
Want to learn more? 🧠
Also, here is an article related to the programming language Charly on how it uses this technique https://leonardschuetz.ch/blog/nan-boxing/ and also in the Sink https://sean.cm/a/nan-boxing language. You also have an example C implementation here.
Computer number systems: a complete guide for programmers
Found this article on NaN boxing interesting?
Want to feel more comfortable around numbers in computers? Want to ace any interview questions that touch numeric types? Want to avoid being the developer that introduces a bug which calculates prices off by one cent (“Total price: $1000.01” … “😱 where did 0.01 come from??”)? Are number systems and floating point a mystery to you?
If so, sign up to my newsletter as I will be releasing more articles like this one! I’m hoping to put together an e-book on the topic too ` Computer number systems: a complete guide for programmers`.