3D - A Beginners Guide to Stereoscopic Understanding

July 31, 2019 JOHN DARO

Early Days

My interest in Stereoscopic imaging started in 2006. One of my close friends, Trevor Enoch, showed me a stereo-graph that was taken of him while out at Burning Man. I was blown away and immediately hooked. I spent the next four years experimenting with techniques to create the best, most comfortable, and immersive 3D I could. In 2007, I worked on “Hannah Montana and Miley Cyrus: Best of Both Worlds Concert” directed by Bruce Hendricks and shot by cameras provided by Pace. Jim Cameron and Vince Pace were already developing the capture systems for the first “Avatar” film. The challenge was that a software package had yet to be created to post stereo footage. To work around this limitation, Bill Schultz and I slaved two Quantel IQ machines to a Bufbox to control the two color correctors simultaneously. This solution was totally inelegant but it was enough to award us the job from Disney. Later during the production, Quantel came out with stereo support eliminating the need to color each eye on independent machines.

We did what we had to in those early days. When I look back at that film, there is a lot that I would do differently now. It was truly the wild west of 3D post and we were writing the rules (and the code for the software) as we went. Over the next few pages I’m going to layout some basics of 3D stereo imaging. The goal is to have a working understanding of the process and technical jargon by the end . Hopefully I can help some other post professionals avoid a lot the pitfalls and mistakes I made as we blazed the trail all those years ago.

Camera 1, Camera 2

Stereopsis is the term that describes how we collect depth information from our surroundings using our sight. Most everyone is familiar with stereo sound; when two separate audio tracks are played simultaneously out of two different speakers. We can take that information in using both of our ears (binaural hearing) and create a reasonable approximation from the direction of where that sound is coming from in space. This approximation is calculated by the offset in time of the sound hitting one ear vs the other.

Stereoscopic vision works much in the same way. Our eyes have a point of interest. When that point of interest is very far away our eyes are parallel to one another. As we focus on objects that are closer to us, our eyes converge. Do this simple experiment right now. Hold up your finger as far away from your face as you can. Now slowly bring that finger towards your nose, noting the angle of your eyes as you get closer to your face. Once your finger is about 3 inches away from your face, alternately close one eye and then the other. Notice the view as you alternate between your eyes, camera 1, camera 2, camera 1, camera 2. Your finger moves position from left to right. You also see “around” your finger more in one eye vs the other. This offset between your two eyes is how your brain makes sense of the 3D world around you. To capture this depth for films we need to recreate this system by utilizing two cameras roughly the same distance as your eyes.

Camera Rigs

The average interpupillary distance is 64mm. Since most feature grade cinema cameras are rather large, special rigs for aligning them together need to be employed. Side by side rigs are an option when your cameras are small, but when they are not you need to use a beam splitter configuration.

Beam splitter rig in an “over” configuration.

Essentially, a beam splitter rig uses a half silvered mirror to “split” the view into two. This allows the cameras to shoot at a much closer inter-axial distance than they would otherwise be able to using a parallel side by side rig. Both of these capture systems are for the practical shooting of 3D films . Fortunately or unfortunately most 3D films today use a technique called Stereo Conversion.

Image comes in from position 1. Passes through to camera at position 2. It is also reflected to the camera at position 3. You will need to flip it in post since the image is mirrored.

Conversion

There are three main techniques for Stereo Conversion.

Roto and Shift

In this technique, characters and objects in the frame are roto’d out and placed in a 3D composite in virtual space. The scene is then re-photographed using a pair of virtual cameras. The down side to this is that the layers often lack volume and the overall effect feels like a grade school diorama.

Projection

For this method, the 2D shot is modeled in 3D space. Then, the original 2D video is projected onto the 3D models and re-photographed using a pair of virtual cameras. This yields very convincing stereo and looks great, but can be expensive to generate the assets needed to create complex scenes.

Virtual World

Stupid name, but I can’t really think of anything better. In this technique, scenes are created entirely in 3D programs like Maya or 3DS Max. As this is how most high end VFX are created for larger films,some of this work is already done. This is the best way to “create” Stereo images since the volumes, depth and occlusions are mimicking the real world. The downside to this is that if your 2D VFX shot took a week to render in all of its ray traced glory, your extra “eye” will take the same.

Cartesian Plane

No matter how you acquire your stereo images eventually you are going to take them into post production. In Post, I make sure the eyes are balanced for color between one another. I also “set depth” for comfort and to creatively promote the narrative.

In order to set depth we will have to offset one eye against the other. Objects in space gain their depth from the relative offset in the other eye/view. In order to have a consistent language, we speak in number of pixels offset to describe this depth.

When we discus 2D images we use pixel values that are parrell with the screen. A given cordinate pair locates the pixel along the screens surface.

Once we add the 3rd axis we need to think of a Cartesian plane laying down perpendicular to the screen. Positive numbers are receding away from the viewer into the screen. Negative numbers come off the screen towards the viewer.

The two views are combined for the viewing system. The three major systems are Dolby, RealD, and Expand. There are others, but these are the most prevalent in theatrical exhibition.

In Post we control the relative offset between the two views using a “HIT” or horizontal image transform. A very complicated way for saying we move one eye right or left along the X axis

The value of the offset dictates where in space the object will appear. This rectangle is traveling from +3 pixels offset to -6 pixels offset.

Often we will apply this move symmetrically to both eyes. In other words to achieve a -6 pixels offset, we may move both views -3 instead of one view moving -6.

Using this offset we can begin to move comped elements or the entire “world” in Z space. This is called depth grading. Much like color, our goal is to try and make the picture feel consistent without big jumps in depth. Too many large jumps can cause eye strain and headaches. My First rule of depth grading is “do no harm.” Pain should be avoided at all costs. However, there is another aspect of depth grading beyond the technical side. Often we use depth to promote the narrative. For example, you may pull action forward to be more immersed in the chaos, or you can play quite drama scenes at screen plane so that you don’t take away from performance. Establishing shots are best played deep for a sense of scale. Now all of these examples are just suggestions and not rules. Just my approach.

Once you know the rules, you are allowed to break them as long as it’s motivated by what’s on screen. I remember one particular shot in Jackass 3D where Bam gets his junk whacked. I pop’ed the offset towards the audience just for that frame. I doubt anybody noticed other then a select circle of 3D nerds (I’m looking at you Captain 3D) but I felt it was effective to make the pain on screen “felt” by the viewer.

Floating Windows

Floating Windows are another tool that we have at our disposal while working on the depth grade. When we “Float the Window” what we are actually doing is controlling the proscenium in depth just like we were moving the “world” while depth grading. Much like depth offsets, floating windows can be used for technical and creative reasons. Firstly, they are most commonly used for edge violations. An edge violation is where there is an object that is “in front” of the screen in Z space, but is being occluded by the screen. Now our brains are smarter than our eyeballs and kick into over-ride mode. The edge of the broken picture feels uncomfortable and all sense of depth is lost. What we do to fix this situation is move the edge of the screen forward into the theater using a negative offset. This floats the “window” we are looking through in front of the offending object and our eyes and brain are happy again.

We achieve a floating window through a crop or by using the software’s “window” tool.

The fish is at +1 behind screen, but in front of the +3 proscenium. The fish will have the feeling of being off screen even though it’s behind.

Another use for controlling the depth of the proscenium is to creatively enhance the perceived depth. Often, you need to keep a shot at a certain depth due to what is on either side of the cut but creatively want it to feel more forward. A great work around is to keep your subject at the depth that feels comfortable to the surrounding shots and move the “screen” back into positive space. This can have the effect of feeling as if the subject is in negative space without actually having to place them there. Conversely you can float the window into negative space on both sides to create the feeling of distance even if your character or scene is at screen plane with a zero offset.

Stereo Color Grading

Stereo color grading is an additional step, when compared to standard 2D finishing, which needs to be accomplished after the depth grade is complete. Native shot 3D footage is much more challenging to match color from one eye to another. Reflections or flares may appear in one and not the other. We call this retinal conflict. One fix for such problems is to the steal the “clean” information from one eye and comp it over the offending one paying mind to offset for the correct depth.

Additionally, any shapes that were used in the 2D grade will have to be offset for depth. Most professional color grading software has automated ways to do this. In rare instances, an overall color correction is not enough to balance the eyes. When this occurs, you may need a localized block based color match like the one found in the Foundry’s Ocula plugin for Nuke.

Typically a 4.5FL and a 7FL master are created with different trim values. In recent years, a 14FL version is also created for stereo laser projection and Dolby’s HDR projector. In most cases this is as simple a gamma curve and a sat boost.

The Future of Stereo Exibihition

The future for 3D resides in even deeper immersive experiences. VR screens are becoming higher in resolution and, paired with accelerometers, are providing a true be “there” experience. I feel that the glasses and apparatus that are required for stereo viewing also contributed to it’s falling out of vogue in recent years. I’m hopeful that new technological enhancements and a better, more easily accessible user experience will lead to another resurgence in the coming years. Ultimately, creating the most immersive content is a worthy goal. Thanks for reading and please leave a comment with any questions or differing views. They are always welcome.

WRITTEN BY

John Daro