The Ultimate Immersive Experience

Do you remember your first experience with any VR headset?

Let me guess, except the inevitable nausea, it must have brought you flashback memories of the 480p videos recorded by Nokia 3130. Obviously I'm exaggerating but most user generated VR content doesn't look a whole lot better today. Needless to say, we're on the journey to change this.

In this article I'm going to try to dissect the matter, identify the top three issues of the user generated VR content, and finally offer a potential definition of the Ultimate Immersive Experience.

This is going to be fairly technical, so if you don't feel like reading about resolution, frame rate, field of view, and video compression standards feel free to watch the Introduction video, and then skip to the Bottom Line section at end of this page.

An Introduction

In May 2017, our CEO, Shahar Bin-Nun, gave a nice presentation on the topic of the Ultimate Immersive Experience, at the Google Street View Summit 2017 in Tokyo, where we announced our partnership with Google. It summaries the essence of this post, without getting too technical.

The Gap

The user generated content world has changed a lot since Nokia 3130, and I think it would be fair to say that it all started with the introduction of iPhone, and especially iPhone 4, first unveiled in June 2010. It had a Quarter of HD (qHD) screen with 960×640 resolution and 326 pixels per inch (PPI) pixel density. Steve Jobs rechristened it as "Retina Display", obviously because humans could not see its pixels when looking from a normal viewing distance of 10 to 12 inch. The images and videos looked great, which perhaps was the main reason why the mobile screens didn't develop very significantly in the past.

Since then the world of mobile displays has moved on to HD Ready (1280x720), then FullHD (1920x1080), Quad HD (2560×1440) and UHD (3840x2160), aka 4K, aka 2160p. The latter resolution is still rare, and at the moment of writing, to the best of my knowledge, this is the best that smartphones can offer. The list of phones that support UHD is short and the first one from that category, thus its most prominent example is Sony Xperia Z5 Premium, sporting 5.5" UHD display with 806ppi pixel density.

Does anyone really need it? Personally, I agree with CNET that for probably every day-to-day application:

"Phones with ultra high-res 4K screens are serious overkill. Seriously."

This is true, of course, assuming that you use it to view regular content, not VR. The quality of Virtual Reality content is still very much limited, even on the state of the art displays such as that of Sony Xperia Z5 Premium.

This brought me to thinking about what would be the criteria of Ultimate Immersive Experience in VR. Is it just the resolution? Probably not... So what ingredients does one need in order to make the VR experience indistinguishable from the real world?

Here's the list I came up with:
1. 3D 360
2. VR Retina Resolution
3. Ultra high frame rate

Let's review these one by one.

3D 360

First of all 3D 360 content is basically two fully spherical panoramic photos or videos, one for the left eye and another for the right one, both comprising a single 3D 360 photo or video. These are usually kept as Over-Under, aka Top-Bottom format, where the upper half of each frame is a fully spherical image for one eye, and the other half is a fully spherical image for the other eye. Sometimes these are kept as Side-by-side format, with a similar separation of left and right spheres but in this case to left and right halves of each frame.

Why would one need it? Well, that's super easy - the world is not flat, so there's no reason why your VR videos should be! We see the world with two eyes, with an average distance of 5-6 cm or 2-2.4 inches between them. It's called stereo parallax and it's probably the main reason why humans have perception of depth at relatively short distances.

Unfortunately, camera rigs where there's just one camera module looking in every direction, cannot mimic human vision. Simply put, they "see the world" with one eye, like a prey, not a predator. In order to get 3D 360 you would need several sets of stereo pairs or potentially more complex Stereo Panorama rig. By the way, Stereo Panorama was the invention our company was founded and funded on, and the inventors, Prof. S.Peleg, Dr. M.Ben-Ezra and Dr. Y.Pritch, are our company's founding mother and fathers :)

But that's a topic for another blog post, which will compare the different camera rigs, their pros and cons, the importance of the nodal point, the optical compression and the artifacts of fish-eye lenses etc.

For now let's just say, that 3D 360 is essential for the Ultimate Immersive Experience.

VR Retina Resolution

The regular Retina Resolution is insufficient for VR for two reasons:

  1. VR displays are located just about 5 cm or 2 inches from a viewer's eyes.

  2. VR displays use lenses that let us focus on a screen that's so close to the eyes.

Let's estimate the minimum requirements of what I call VR Retina Resolution. In other words, now many pixels are enough so we would not notice them in a VR headset.

First of all, lets define the maximum theoretical human field of view. As shown in the diagram below, our maximum Binocular vision is up to about 120 degrees and the vertical visual field seems to be between 55 and 120 degrees, lets call it 90 to simplify the calculations.
Human Field of View

So our field of view is up to 120 degrees horizontal by 90 degrees vertical. The full sphere is 360x180, not 360x360 by the way, as some often claim, because a sphere covers everything around you (360 degrees), top to bottom (180). Hence 120x90 gives you 1/6 of the full sphere. This in turn means that if the monitor is 4K, which is not enough for VR for the reasons mentioned above, the resolution of the full sphere, therefore, would need at least the whopping 11520x4320 pixels (3840 x 3 = 11520 and 2160 x 2 = 4320). Note that neither h264 (4K) nor h265 (8K) support it, and these are by far the most common in the modern video compression chips nowadays. VP9 seems to be the only option but give me holler if you stumble upon a chip that supports it and can fit a camera of a reasonable size at a reasonable price.

Frame Rate

Until this very day, the two most prominent video standards, are PAL and NTSC. A quote from Wikipedia:

"NTSC is used with a frame rate of 60i or 30p whereas PAL generally uses 50i or 25p; both use a high enough frame rate to give the illusion of fluid motion."

Do they really?

Check out the following two examples from the LI Blog - the Logical Increments PC Building Guide, from the point of view of PC Gaming. I think it's pretty clear that in this particular case, even at 50 frames per second (FPS) the movement is not smooth enough.
Frame Rate

To make it even clearer, check out the difference between 60fps and 30fps at varying speed:
Frame Rate

An important note: in regular video it's only the content that moves. In VR videos the viewer moves as well, potentially very fast, and potentially against the direction of the motion of content. All this can easily quadruple the problem.

The industry realizes this well nowadays. Sony have acknowledged on their developer event:

"You cannot drop below 60 fps. Period. Ever."

Academy seems to agree as well... PC Gamer website have interviewed Prof. DeLong, who said:

"They have to be very specific and special, but you could see an artifact at 500 fps if you wanted to... I think typically, once you get up above 200 fps it just looks like regular, real-life motion."

They also interviewed Prof. Busey, who said:

"Certainly 60 Hz is better than 30 Hz, demonstrably better. Whether that plateaus at 120 Hz or whether you get an additional boost up to 180 Hz, I just don’t know."

And it's not just the frame rate of the content. The video decoding chip must support it too. And the frame rate of the content needs to match that of the display. Here's an explanation of "Judder", an artifact that can happen when the frame rates do not match.

Bottom Line

To summarize the above, I think it's fair to say that the Ultimate Immersive Experience requires all the aforementioned artifacts to be resolved. In particular we need 12288p resolution at 200fps. Today I feel comfortable with calling it "The Ultimate Immersive Experience". Until this happens I suggest to lower the expectations and enjoy what we've got :)