Chris Doran, co-author of Geometric Algebra for Physicists and founder/director of Geomerics has written an **excellent** blog post on the mathematics behind quaternion compression. His approach to quaternion compression provides significantly better compression than the delta smallest three approach described in my article.

**Link**: Quaternions, Rotations and Compression

Download as PDF

]]>Hey everybody, my GDC 2015 talk is now free to view in the GDC vault!

Unfortunately the talk videos in the slides seem to have been recorded at 20fps

You can get the original slides (HD 60fps videos here). I might see if I can take the audio track from the GDC vault and splice that in with the original HD videos in the keynote.

There is also an article series covering this same material in more detail here:

http://gafferongames.com/networked-physics/introduction-to-networked-physics/

cheers

]]>Hey everybody, I have great news! My GDC 2013 talk “Virtual Go” is now free to watch in the GDC Vault

Many thanks to Meggan Scavio for making this talk free for everybody to watch!

If you would like more detail on my project to simulate a go board and stones, there is an article series here, and some source code for this project available here on github.

I hope to return to this project one day so let me know if you are still interested in it. With the latest developments in Virtual Reality these days I think this project is as relevant as ever. Still massively niche of course, but how cool would it be to be able to play go with someone on the other side of the planet on a physically simulated go board inside Virtual Reality?

]]>Some folks out there think they can do better than the compression described in the article. Now I don’t doubt for one second that better compression approaches exist, but exactly how much better is your encoding than the one described in the article? Don’t just tell me your encoding is better. PROVE IT.

It’s time for a good old fashioned walk-off.

Here are the terms of the competition. Old school rules.

If you can beat my best encoding over this dataset by 10% I’ll link to your blog post describing your compression technique from within the article. Your blog post must fully describe your encoding and include source code in the public domain that proves the result. Compression must be lossless.

The data set to be compressed is position and orientation data (smallest 3) in integer format post-quantize with delta encoding baseline. Your job, if you choose to accept it, is to encode each frame snapshot into packets using the least number of bits.

When reporting your results please include the following: number of packets encoded, average packet size in bytes, average kbps across the entire data set. To calculate this add up the number of bytes for all of your packets (if you have encoded bits not bytes, please round up to the next byte per-packet as you can’t send fractional bytes over UDP). Divide the total bytes by the number of packets to get the average packet size in bytes. To calculate bytes per-second multiply average packet size by 60 (60 snapshots per-second). To convert bytes/sec to kbps multiply bytes per-second by 8 and divide by 1000 (*not* 1024!). My best result so far is below 256kbps.

**IMPORTANT**: You __must__ encode and decode each frame individually into packets (eg. 901 cubes at a time). You may not encode the entire data set at once. You cannot assume that all packets will be received by the sender. You cannot assume that all packets will be received in-order. You may not rely on information inferred from previously encoded/decoded packets when decoding a packet unless you embed that information in the packet or can infer that information even in the presence of packet loss and out of order packets. You may not write any decompression scheme that relies on putting packets in a queue and processing them in-order as this delays realtime data delivery of snapshots and defeats the purpose. Encryption and decryption must be plausibly realtime. Static dictionaries trained on the data set are acceptable. You may not claim a win based on a dictionary trained on the same data set that is being compressed. Final results will be judged against an unreleased data set. If necessary, separate datasets for training and encoding will be provided for the final judging.

Data format is fixed records. 901 cubes per-frame. Frames 0..5 are duplicates of initial state. Start your encoding at frame 6 and encode frame n relative to frame n-6.

Each cube struct is as follows:

struct DeltaData { int orientation_largest; int orientation_a; int orientation_b; int orientation_c; int position_x; int position_y; int position_z; int interacting; };

**Best of luck!**

Thanks to everybody who attended my talk today. I had a great time presenting for you.

The final slides for my talk are available here (850MB keynote. HD video)

I have also written an article series on this subject: Networked Physics

]]>This content is password protected. To view it please enter your password below:

]]>

In the previous article we sent snapshots of the entire simulation state 10 times per-second and interpolated between them to reconstruct the simulation. Problem is to handle packet loss and jitter you have to add a delay of 350ms on network latency. Unfortunately this is too much delay for most interactive games.

The solution is to increase the packet send rate so we get the same protection against dropped packets with less delay. For example, at 60 packets per-second, protection against two dropped packets in a row and jitter can be obtained with just 85ms of delay.

Now we have a new problem though. At 60 packets per-second bandwidth is extremely high!

So what we’re going to do in this article is work through every possible bandwidth optimization necessary until we get the bandwidth under control. Our target bandwidth is ** 256 kilobits per-second**.

Lets look at the snapshot state being sent. Each cube in the snapshot has the following properties:

- quat orientation:
**128 bits** - vec3 linear_velocity:
**96 bits** - vec3 position:
**96 bits** - bool interacting:
**1 bit**

That’s a total of 321 bits bits per-cube (40 bytes).

Orientation is a quaternion value (4 floats) and is largest field so lets start there.

Many people when they compress a quaternion think: OK, it’s a normalized quaternion so each component is in [-1,+1] so I’ll just pack it into 8.8.8.8 with one 8 bit signed integer per-component. Sure, this works. But with a bit of math you can get much better accuracy with fewer bits using a trick called the “smallest three”.

Since we know that the length of a quaternion that represents a rotation must be 1 (unit length), x^2+y^2+z^2+w^2 = 1. We can use this identity to drop one component and not send it, reconstructing that component on the other side. For example, send xyz and reconstruct w = sqrt( 1 – x^2 – y^2 – z^2 ). You might think you need to send a sign bit for w in case it is negative, but in fact you don’t, because you can always make w positive by negating the entire quaternion. (In quaternion space (x,y,z,w) and (-x,-y,-z,-w) represent the same rotation.)

You don’t want to always drop the same component due to numerical precision issues. What you want to do instead is find the largest component (abs) and encode its index using two bits [0,3] (0=x, 1=y, 2=z, 3=w) and then send that largest component index, omitting the largest component and only sending the smallest three components over the network.

Also, if v is the absolute value of the largest quaternion component, the next largest possible component value occurs when two components have the same absolute value and the other two components are zero. The length of that quaternion(v,v,0,0) is 1, therefore v^2 + v^2 = 1, 2v^2 = 1, v = 1/sqrt(2). This means that you get to encode the smallest three components in [-0.707107,+0.707107] instead of [-1,+1] giving you more precision with the same number of bits.

With this technique I’ve found that minimum sufficient precision for my simulation is 9 bits per-smallest component. This gives a result of 2 + 9 + 9 + 9 = 29 bits per-orientation (originally 128!).

What should we optimize next? It’s a tie between linear velocity and position (96 bits).

In my experience position is the harder quantity to compress so lets start with linear velocity.

To compress linear velocity we first need to determine a maximum linear speed that is reasonable and doesn’t visually affect the simulation. This allows us to bound the linear velocity in some range per-component so we don’t need to send the full float. I found a maximum speed of 32 meters per-second is a nice power of two and doesn’t negatively affect the player experience in the cube simulation. Since we are really only using the linear velocity as a __hint__ to improve interpolation between position sample points we can be pretty rough with compression. I found that 32 distinct values per-meter per second squared provides acceptable precision.

So what we have now for linear velocity in three components represented by an integer in the range [-1024,1023]. I hate messing around with sign bits so I just add 1024 to get this in range [0,2047] and send that, subtracting 1024 on receive to get back to signed integer range before I convert back to float.

11 bits per-component that gives 33 bits total per-linear velocity. Just over 1/3 the original uncompressed size!

We can do better even better than this because most cubes are stationary.

To take advantage we just write a single bit “at rest”. If this bit is 1, then velocity is known zero and not sent. Otherwise, if this bit is 0 the compressed velocity follows after the bit (33 bits). Cubes at rest now cost just 127 bits, while cubes that are moving cost one bit more than they previously did: 159 + 1 = 160 bits.

But why are we sending linear velocity at all? In the previous article we decided to send it because it significantly improved the quality of interpolation at 10 packets per-second. But, now that we’re sending 60 packets per-second is it still necessary? The answer is __no__. Linear interpolation is good enough at high send rates.

Now we have only position to compress. We’ll use the same trick that we used for linear velocity: bound and quantize. Most game worlds are reasonably big so I chose a position bound of [-256,255] meters in the horizontal plane (xy) and since in my cube simulation the floor is at z=0, I chose for z a range of [0,32] meters.

Now we need to work out how much precision is required. With some experimentation I found that 512 values per-meter (roughly 0.5mm precision) provides sufficient precision. This gives position x and y components in [-131072,+131071] and z components in range [0,16383]. That’s 18 bits for x, 18 bits for y and 14 bits for z giving a total of 50 bits per-position (originally 96).

This reduces our cube state to 80 bits, or just 10 bytes per-cube (4X improvement. Originally 40 bytes per-cube).

Now that we’ve compressed position and orientation we’ve run out of simple compressions by reducing the precision of values we are sending. Any further reduction in precision results in unacceptable artifacts. Can we optimize further? The answer is yes, but only if we embrace a completely new technique: ** delta compression**.

Delta compression sounds mysterious. Magical. Hard. Actually, it’s not hard at all. Here is how it works: the left side starts sending packets to the right that look like: hey, this is snapshot 110 encoded relative to snapshot 100. The snapshot being encoded relative to is called the baseline. How you do this encoding is up to you, there are many fancy tricks, but the basic, big order of magnitude win comes when you say, hey this cube hasn’t changed from the value in the baseline. I’m going to encode it with just one bit. Not changed!

To implement delta encoding it is of course essential that the sender only encodes snapshots relative to baselines that it knows the receiver has definitely received (because, packet loss). Therefore, the receiver continually sends “ack” packets to the sender saying: hey, the most recent snapshot I have received is n (ack). It doesn’t need to send back any more detailed information than this, because the sender would never want to encode a packet relative to an older baseline than the most recent one received.

There is one slight wrinkle: for the initial RTT time past connection the sender doesn’t have any baseline to encode against because it hasn’t received an ack from the receiver yet. In my case I handle this by adding a single flag to the packet that says: “this snapshot is encoded relative to the initial state of the simulation” (which is known on both sides). This avoids having to send non-relative delta snapshots ever, making the code simpler, and avoids an initial spike in bandwidth.

We can refine this approach and lock in more gains but we’re not going to get another order of magnitude improvement past this point. We’re going to have to work pretty hard to get a number of small, cumulative gains to reach our goal of 256 kilobits per-second.

First small improvement. Each cube that isn’t sent costs 1 bit (not changed). There are 901 cubes so we send 901 bits in each packet even if no cubes have changed. At 60 packets per-second this adds up to 54kbps of bandwidth. Seeing as there are usually significantly less than 901 changed cubes per-snapshot in the common case, we can reduce bandwidth by sending only changed cubes with a cube index [0,900] identifying which cube it is. To do this we need to add a 10 bit index per-cube to identify it.

There is a cross-over point where it is actually more expensive to send indices than not-changed bits. With 10 bit indices, the cost of indexing is 10*n bits. Therefore it’s more efficient to use indices if we are sending 90 cubes or less (900 bits). We can evaluate this per-snapshot and send a single bit in the header indicating which encoding we are using: 0 = indexing, 1 = changed bits. This way we can use the most efficient encoding for the number of changed cubes in the snapshot.

This reduces the steady state bandwidth when all objects are stationary to around 15 kilobits per-second. This bandwidth is composed entirely of our own packet header (uint16 sequence, uint16 base, bool initial) plus IP and UDP headers (28 bytes).

Next small gain. What if we encoded the cube index relative to the previous cube index? Since we are iterating across and sending changed cube indices in-order: cube 0, cube 10, cube 11, 50, 52, 55 and so on we could easily encode the 2nd and remaining cube indices relative to the previous changed index, e.g.: +10, +1, +39, +2, +3. If we are smart about how we encode this index offset we should be able to, on average, represent a cube index with less than 10 bits.

The best encoding depends on the set of objects you interact with. If you spend a lot of time moving horizontally while blowing cubes from the initial cube grid then you hit lots of +1s. If you move vertically from initial state you hit lots of +30s (sqrt(900)). What we need then is a general purpose encoding capable of representing statistically common index offsets with less bits.

After a small amount of experimentation here’s the encoding I came up with:

- [1,8] => 1 + 3 (4 bits)
- [9,40] => 1 + 1 + 5 (7 bits)
- [41,900] => 1 + 1 + 10 (12 bits)

Notice how large relative offsets are actually more expensive than 10 bits. It’s a statistical game. The bet is that I’m going to get a good chunk of small offsets to offset the increased cost of large offsets. It works. With this encoding I was able to get an average of 5.5 bits per-relative index.

Now we have a slight problem. We can no longer easily determine whether changed bits or relative indices are the best encoding. The solution I used is to run through a mock encoding of all changed cubes on packet write and count the number of bits required to encode relative indices. If the number of bits required is larger than 901, fallback to changed bits.

Next lets try encoding position relative to (offset from) the baseline position. Here there are a lot of different options. You can just do the obvious thing, eg. 1 bit relative position, and then say 8-10 bits per-component if all components have deltas within the range provided by those bits, otherwise send the absolute position (50 bits).

This gives a decent encoding but we can do better. If you think about it then there will be situations where one position component is large but the others are small. It would be nice if we could take advantage of this and send these small components using less bits.

It’s a statistical game and the best selection of small and large ranges per-component depend on the data set. I couldn’t really tell looking at a noisy bandwidth meter if I was making any gains so I captured the position vs. position base data set and wrote it to a text file for analysis. The format is x,y,z,base_x,base_y,base_z with one cube per-line. The goal is to encode x,y,z relative to base x,y,z for each line. If you are interested, you can download this data set here.

I wrote a short ruby script to find the best encoding with a greedy search. The optimal encoding I found works like this: 1 bit small per delta component followed by 3 bits if small [-16,+15] range, otherwise the delta component is in [-256,+255] range and is sent with 9 bits. If any component delta values are outside the large range fallback to absolute position. Using this encoding I was able to obtain on average 26.1 bits for each changed position.

Next I figured that relative orientation would be a similar easy big win. Problem is that unlike position where the range of the position offset is quite small relative to the total position space, the change in orientation in 100ms is a much larger percentage of total quaternion space.

I tried a bunch of stuff without good results. I tried encoding the 4D vector of the delta orientation directly and recomposing the largest component post delta using the same trick as smallest 3. I tried calculating the relative quaternion between orientation and base orientation, and since I knew that w would be large for this (rotation relative to identity) I could avoid sending 2 bits to identify the largest component, but in turn would need to send one bit for the sign of w because I don’t want to negate the quaternion. The best compression I could find using this scheme was only 90% of the smallest three. Not very good.

I was about to give up but I run some analysis over the smallest three representation. I found that 90% of orientations in the smallest three format had the same largest component index as their base orientation 100ms ago. This meant that it could be profitable to delta encode the smallest three format directly. What’s more I found that there would be no additional precision loss with this method when reconstructing the orientation from its base. I exported the quaternion values from a typical run as a data set in smallest three format (available here) and got to work trying the same multi-level small/large range per-component greedy search that I used for position.

The best encoding found was: 5-8, meaning [-16,+15] small and [-128,+127] large. One final thing: as with position the large range can be extended a bit further by knowing that if the component value is not small the value cannot be in the [-16,+15] range. I leave the calculation of how to do this as an exercise for the reader. Be careful not to collapse two values onto zero.

The end result is an average of 23.3 bits per-relative quaternion. That’s 80.3% of the absolute smallest three.

It’s quite likely that a better encoding for relative quaternions exists. Geomerics guys tell me they have some cool ideas for compression using rotors/bivectors. It’s entirely possible a better encoding may exist even with something as simple as delta on axis/angle representation or on the relative quaternion itself.

For this reason I am providing the uncompressed quaternion data set here. This text file contains uncompressed orientation quaternions in the format: x,y,z,w,base_x,base_y,base_z,base_w. The goal of this exercise is to encode x,y,z,w on each line relative to base on the same line in the least number of bits. You may not quantize the orientations significantly more than 2-9-9-9 smallest three encoding. If you think you have a better relative quaternion encoding within this precision please try it on this data set and report your results in the comments section. If you have a technique that beats 23.3 bits per-orientation on average I am very interested in hearing from you!

That’s just about it but there is one small win left.

Doing one final analysis pas over the position and orientation data sets I noticed that 5% of positions are unchanged from the base position after being quantized to 0.5mm resolution, and 5% of orientations in smallest three format are also unchanged from base.

These two probabilities are mutually exclusive, because if both are the same then the cube would be unchanged and therefore not sent, meaning a small statistical win exists for 10% of cube state if we send one bit for position changing, and one bit for orientation changing. Yes, 90% of cubes have 2 bits overhead added, but the 10% of cubes that save 20+ bits by sending 2 bits instead of 23.3 bit orientation or 26.1 bits position make up for that providing a small overall win of roughly 2 bits per-cube.

And that’s as far as I can take it using traditional hand-rolled bitpacking techniques. Think you have a better encoding for the delta compressed data? Why not enter The Networked Physics Data Compression Walk-Off and PROVE your encoding is the best. If you can beat the encoding described in this article by 10% I’ll link to your blog post and source code from within this article, right here.

I hope this article showed you that there are many options for bandwidth optimization and with a bit of work the seemingly impossible is in fact possible. We just took 20mbit down to 0.25mbit on average. That’s a reduction to an average of 1.25% of original uncompressed bandwidth. Consider, most consume internet connections these can easily sustain 256kbit/sec (with a lot of headroom) right now in 2015. Over the next few years these same internet connections will easily sustain 1, 2, 5, 10, 20 mbit download…

How many physics objects do you think you’ll be able to include in your delta compressed snapshot then?

Up next: State Synchronization

If you are enjoying the networked physics article series, please consider supporting my work with a small patreon donation.

]]>If you have enjoyed the articles on this site over the past ten years, please consider supporting gafferongames.com with a small patreon donation. It takes a only small amount of money to host this website but a whole truckload of effort to research and write the articles that make this site worth reading. Many of these articles are only possible because I spend large portions of my spare time, holidays and time off work and between jobs researching and developing techniques to write about here.

My aim on this site is to always write clearly and explain game networking concepts that you won’t read about anywhere else. If you have read any of my game networking articles over the past 10 years I’m sure you’ll agree. You’ll find quality game networking information here that simply isn’t available anywhere else. Many people have learned game network programming with the articles posted on this website and this is something I’m very proud of. Well done!

Lately I’ve started including videos with my articles to explain concepts. It costs money to host these videos at high quality and a great deal of work, effort and so on to research and develop the game networking techniques I write about on this website. You can trust that I *only* ever write about things which I have implemented and this implementation takes a LOT of time. This is why my articles are so full of concrete examples vs. theoretical explanation but also it’s why it’s so much work to write articles for this website!

If you have enjoyed the articles on this website, please consider showing your support for my work by making a small patreon donation.

**Networked Physics article series resumes once my hosting costs for 2015 are covered!**

*Wishing all my readers a happy new year and a great 2015. I hope to see you with more articles in the new year!*

In this article we’ll spend some time exploring the physics simulation we’re going to network.

First we need an object controlled by the player. Here I’ve setup a simple simulation of a cube in the open source physics engine ODE. The player can use the arrow keys to make the cube move around by applying force at its center of mass. The physics simulation takes this linear motion and calculates friction as the cube collides with the ground, inducing a rolling and tumbling motion.

This tumbling is quite intentional and is why I chose a cube for this simulation instead a sphere. I want this complex, unpredictable motion because rigid bodies in general move in interesting non-linear ways according to their shape! It’s not possible to accurately predict the motion of this tumbling cube using a simple linear extrapolation or the ballistic equations of motion. If you want to know where a physics object is at a future time, you have to run the whole physics simulation with collision detection, collision response and friction in order to find out.

Moving forward. Networking a physics simulation is just __too easy__ if there is only one object interacting with a static world. It starts to get interesting when the player controls an object that interacts with other physically simulated objects, especially if those objects push back and affect the motion of the player.

Here I’ve added a grid of 900 small cubes to the simulation so the player has something to interact with. When the player interacts with a cube it turns red. When a non-player cube comes to rest it returns to grey (non-interacting). Interactions aren’t just direct – so if a cube is red, it turns other cubes it interacts with red as well. This way player interactions fan-out recursively covering all affected objects.

It’s cool to roll around and push cubes, but I really wanted a way to interact with lots of object and push them around. What I came up with is this:

To implement this I raycast below the cube to find the intersection point with the ground below the center of mass of the player cube, then apply a spring force (Hooke’s law) to the player cube such that it floats in the air at a certain height above this point. Then all non-player cubes within a certain distance of that intersection point have a force applied proportional to their distance from this point and away from it, so they are pushed away from the cube like it’s a leaf blower.

I also wanted a very complex coupled motion between the player and non-player cubes where the player interacts with lots of other physics objects very closely, such that the player and the objects its interacting with become a single system, a group of rigid bodies joined together by constraints. To implement this I thought it would be cool if the player could roll around and create a ball of cubes, like in one of my favorite games Katamari Damacy

To implement this effect cubes within a certain distance of the player have a force applied towards the center of the cube. Note that these cubes remain physically simulated while in the katamari ball, they are not just “stuck” to the player like in the original game. This means that the cubes in the katamari are continually pushing and interacting with each other and the player cube via collision constraints, a very difficult situation for networked physics.

That’s it for the exploration of the physics simulation. Lets get busy networking it!

Up first: Deterministic Lockstep

If you enjoyed this article please consider making a small donation. __Donations encourage me to write more articles!__