Volumetric Video

Cost-efficient Volumetric Video by Combining RGB video onto Depth Data Captured by Sony Kinect Cameras


The objective as to test conceptualising and carrying out content production by combining video and depth camera material shot inwards from an outer rim as so called 360° outside-in content (as opposed to inside-out 360° video that is more common at the moment). At the same time, we also experimented with presenting the results with HMD (the interface and editing), immersiveness (storytelling), as well as the technical implementation of the whole project.

A successful outcome was predefined as an HMD content demo based on material shot simultaneously with several (3+) depth cameras, and stitched into a seamless HMD experience for the viewer (wearing HMD glasses) on HTC Vive and other interfaces if possible.


According to many experts, volumetric video is the next big thing in virtual storytelling. With the technology, we can film 360 videos in 3D that give the viewer a chance to step into experiences, even ones being filmed live. This enables a whole new way of storytelling and is, at the moment, one of the hottest research and development focuses in the field of VR.

At the moment, volumetric video can be produced in the Microsoft data studio in California and with the similar technology of the company 8i, also based in California. Both options are very expensive. If we can test the possibilities that devices like Kinect, or other prototypes built with existing consumer technology, can offer in terms of storytelling, we will be more prepared to adopt more advanced technology when it matures and the price gets more affordable.


The material for the project was filmed at YLE Studio 1 on the 28th and 29th of March 2017. The Sony Kinect cameras and RGB cameras were firmly fastened to each other with the rig designed by Jouni Weckmann. The purpose of the custom-made rig was to prevent any movement and changes in the angle and the positioning of the DSLR camera and the Kinect. After rigging them together, the RGB video produced by the DSLR camera was calibrated to match the depth data captured by the Kinect camera.

Delicode has been developing its own RGBD (RGB + depth) file format. The format includes all the data recorded by the sensor and separate video-specific metadata (for example time code). During the project, Delicode also spent several days examining the prospect of predicting and manipulating the phase of the frame rate of a Kinect for XBOX One device connected to a PC. If the phase could be successfully predicted and edited, it would partly compensate for the genlock function missing from sensory devices and mentioned in the background. In spite of the promising preliminary results, arriving at a final resolution calls for more research. For this reason, testing the prediction and manipulation of the phase of the frame rate will have to be done in the next project.

During the first day of filming, we recorded some material with the Finnish electro duo Phantom (Tommi and Hanna Toivonen). On the second day, we filmed dancers Satu Lähdeoja and Teemu Korjuslommi and pair acrobats Jenni Lehtinen and Sasu Peistola.

The post-production was kicked off by developing a whole new software library for editing and manually synchronising depth images. The library will enable the post-editing of filmed material and editing several RGBD files simultaneously in a synchronised way. The library was integrated as a tool facilitating the work. For the tool that utilises the library, we also developed automatic and manual filters for noise reduction.

Because of the nature of the project, Delicode chose the Unity Game Engine that is widely popular among VR game developers as the development environment. The short cuts provided by the game engine enabled fast post-production and cost-efficient implementation of visual ideas.


Volumetric capturing and creating 3D images on top of it can also be achieved with using consumer electronics. It is not completely straightforward, and at the moment, a satisfactory result requires very thought-out lighting and a lot of manual labour.

When aiming for multiple interfaces, the wide support of Unity for various VR HMDs (head mounted displays) was one reason to use the game engine in the project. When the captured material was fed into Unity, we noticed that the setup of the compression algorithm used in capturing depth material was too aggressive. The problems caused an unwarranted decline in the resolution of depth data. Removing artefacts from the compressed video material is possible in theory, but because the end result would quite likely not be true to reality, we decided not to fix the issue in this demo.

The end result of the project, the music video, will be published in its final form on various VR platforms during 2017.The HTC Vive (Windows PC) and Gear VR (Android) versions of the experiment will be released at a suitable point during the year.