This is a look at ‘NaN boxing’ for JavaScript devs.

Let’s start with some brain food for you:

some JavaScript engines represent all data types internally using just floating point values!

In other words, values of types like boolean, undefined, null and internal pointers are all ‘boxed’ into double precision floating point values or ‘double’s.

What do we mean by ‘boxed’?

We mean we’re storing a given value (say a boolean true) inside another type internally. In this case, a floating point value.

But why? And how does it work?

Time for a deep dive into this magic! ✨🦄

Let me explain “NaN boxing”…

'Let me explain' meme

“It’s simple really, its NaN boxing…”

About “double precision floating point” numbers

Floating point numbers are sometimes misunderstood & feared, but are actually not that complicated to get the basics of.

Generally, we recognised floating point numbers in our code by the decimal point in the number, say 1.32 (though this is not always the case).

So what exactly do they represent?

They can represent rational numbers, but often only approximately. A rational number is a number that we can also specify exactly as the ratio of two integers. For example, 0.5 is rational as we can also write it as 1/2. 0.0347826087 is also rational, it is 4/115, as is 323 which is 323/1.

They can represent whole numbers (integer values) within a certain range.

Finally, they can approximate irrational numbers. These are numbers that can not be represented completely by a ratio of two integers. They have an infinite number of digits after the decimal places. A famous example of an irrational number is Pi (π).

In computer systems, such as programming languages or in computer hardware, floating point numbers are generally implemented to follow the IEEE 754 standard.

Exactly how the floating point number type works, and what is the secret behind that notorious fact that 0.1 + 0.2 != 0.3, is a story for another day… but if you are keen to know more, sign up to get early access to this course.

Doubles vs Singles

Surely there is a joke about tennis here, but I’m not going to even try.

Singles are often simply called floats and are 32 bits wide in their representation.

A double, or double precision floating point number, is double the size in bits to that of a ‘single’.

Thus, a double is 64 bits wide and can therefore represent a larger range of values with higher precision. But of course requires twice as much space to store.

The structure of a float

Let us investigate the structure of floating point numbers. By that I mean look at the representation at the binary level.

So a double is 64 bits wide, and is composed of 3 parts:

  • An ‘exponent’: the power to raise the base by (base 2 incidently), to scale the value
  • A ‘significand’ or the fraction if you like, in other words, the value itself
  • A ‘sign’ bit which represents whether the number is positive or negative

Structure of a double precision floating point value

Structure of a double precision floating point value

In a double the significand part of the representation is 52 bits wide. For the sake of completeness we do not consider the ‘implicit bit’ since it’s not part of the actual representation.

By setting these 3 parts to specific values we can represent a large range of numbers. The standard also includes certain values which are reserved to mean specific things.

One of those is a special value is called NaN (‘not a number’)…

Not a number: NaN

The internet is full of memes like this, pandering to the ‘wtfjs’, or JavaScript sucks crowd:

A NaN in JavaScript is a type of number

Well it technically is correct! NaN is a floating point value

But the fact is… it is correct.

As mentioned, the floating point standard specifies a number of special values, including Infinity, negative zero, and NaN, standing for ‘not a number’.

(Note if you are interested to also deep dive into the other special values sign up to the newsletter, I’ll do negative zero soon!)

NaN is a floating point value that is used to represent states that are invalid. For example, when a mathematical calculation is performed the doesn’t make any sense, say 0 ÷ 0.

The value is conceptually ‘not a number’. But we are looking at types here!

And, yes, NaN is a value of the floating point numeric type, and in JavaScript that type is simply called number.

typeof Nan = number

A NaN in JavaScript is a type of number because a NaN is actually a floating point value!

No “#wtfjs” here… just a confusing overlap of naming and terminology across different domains.

Anyway, what we need to know is that to represent a NaN value in a double, we set:

  • The exponent part of the floating point number is set to its max value, (all 11 bits are set to ‘1’)
  • The significand is anything but 0.

The bit pattern for a floating point NaN

The bit pattern or structure of a NaN in a double precision floating point value.

Now for the start of the magic of NaN boxing…

There are A LOT of NaN values

Note our definition of a NaN value

all non-zero values of the significand are NaNs when the exponent is at its max value.

But the significand is 52 bits wide, so there is a huge number of possible NaN values (2^52 - 14,503,599,627,370,495 values)!

We only need a few of those values to represent actual NaNs in our programming language. So we can think of all the other possible NaN values as being ‘unused’.

Maybe we can use these unused values to our advantage?

Let’s go back to JavaScript.

JavaScript’s data types

JS has a number of data types. Numeric types, booleans, undefined and so on.

Since JavaScript is a dynamic language, it performs type conversions at runtime, i.e. it converts value between data types as needed.

To make that possible the engine needs to know what data type each value has at all times.

How does it know that?

NaN boxing

Enter ‘NaN boxing’!

Instead of inventing a new format to remember and encode the type and value… JavaScript’s engines use the fact that this large range of NaN values exist!

The idea is to shoehorn everything into those ‘unused’ NaNs in floats.

This method of encoding values and their type inside some other representation is called ‘boxing’ 📦 (🥊)

It relies on one important fact: data types in JavaScript are less than 52 bits wide in practice.

Let’s look at the size in bits of data types in JS:

So we can see that the largest non-float type is 48 bits wide & we know the ‘double’ significand is 52 bits wide…

So the significand can contain the values of all other data types, and have bits left over to indicate the type. 🎉

The exact way the data type tag or boxed value is represented inside the significand of the NaN is different in each implementation. I have linked to a few examples at the end of the article.

And remember, with the ‘exponent’ of the double at its max value, all these boxed value are valid ‘NaN’s!

In fact this NaN boxing technique is used in other dynamically typed languages too.

Unboxing value from NaNs

Unboxing is the process of reversing the ‘NaN boxing’…

I.e. everything is a double, but we ‘unbox’ the actual type and value when we encounter a NaN.

The internal coding of values in JavaScript boxed in floats is:

  • Any non-NaN floating point value is just that, a floating point value
  • If the value is a ‘NaN’, then we look at the top bits of the significand to find what type it is
  • Sitting neatly inside the remaining bits is the value of that type. We can extract the value by masking off the bits needed for the given type.

Why use NaN boxing?

Firstly, the representation of types and values with NaN boxing is compact & predictable in size. All type values and type tags fit into 64 bits. This aids performance and caching.

64bit floats are directly represented (and integers up to MAX_SAFE_INTEGER) so there is no performance penalty associated to the encoding and them.

However, not that other techniques for representing value and type in dynamic languages do exist and offer their own set of pros and cons.

An example of NaN boxing

A primitive type is data type that is supported directly by the programming language itself, and which is not composed of any other data types.

In JavaScript there are: numbers, strings, null, undefined and boolean. In other languages like C++ it means int, long, char, bool, float, double.

Let’s take an example of a primitive data type, an unsigned integer which as mentioned is used internally for bitwise operations.

The following diagram shows how we might represent it in our JavaScript interpreter using NaN boxing:

An unsigned integer represented as boxed inside an NaN  floating point value

An unsigned integer represented as boxed inside an NaN floating point value.

Imagine we chose to implement an interpreter for JavaScript, and decide the type tag for unsigned integers is ‘1100’ (in binary). The top four bits of our significand are set to that.

We are encoding a 32-bit number, but we have 48 bits remaining.

So that means there are 16 bits unused and we mask off the lower 32 bits to get the actual value.

Remember this is still a valid NaN value in IEEE 754 floating point terms.

Summary & further reading

So NaN boxing is a clever way to get data type info & values in to a standard type without affecting its precision or range!

To summarise:

  • NaNs are special floating point values
  • dynamic language need to keep data type information along with a value at runtime
  • ‘NaN boxing’ lets a the interpreters of there languages encode any data type by embedding information inside unused floating point ‘NaN’s

Nice right?!

Want to learn more? 🧠

Have a look at Annie Cherkaev, Brion Vibber & Piotr Duperas’s awesome articles about this. 💯 recommended. Also, Robert Nystrom’s Crafting Interpreters covers NaN boxing.

Also, here is an article related to the programming language Charly on how it uses this technique and also in the Sink language. You also have an example C implementation here.

Computer number systems: a complete guide for programmers

Found this article on NaN boxing interesting?

Want to feel more comfortable around numbers in computers? Want to ace any interview questions that touch numeric types? Want to avoid being the developer that introduces a bug which calculates prices off by one cent (“Total price: $1000.01” … “😱 where did 0.01 come from??”)? Are number systems and floating point a mystery to you?

If so, sign up to my newsletter as I will be releasing more articles like this one! I’m hoping to put together an e-book on the topic too ` Computer number systems: a complete guide for programmers`.

Found that useful? Subscribe for more

Keep up to date with all my content by subscribing to my very low traffic and zero spam substack.

You can also find me on Mastodon and Twitter