Professional Documents
Culture Documents
Christian-Albrechts-University Kiel
Computer Science Department
Hermann-Rodewald-Str. 3, 24118 Kiel, Germany
{africk,fkellner,bartczak,rk}@mip.informatik.uni-kiel.de
978-1-4244-4318-5/09/$25.00 °2009
c IEEE 3DTV-CON 2009
nr. 3) are aligned through a mirror, which allows a very short third order polynomial, whose parameters are also estimated
baseline of 60 mm. The two outer cameras (cameras nr. 1 and in the optimization process [9].
nr. 4) are to the left and right of the middle pair at 250 mm
distance (see Figure (1)). The ToF - Camera is an active range 3.2. Mean Filtering and Mixture of Gaussians
Unfortunately the depth images provided by the ToF - Cam-
era are very noisy. Therefore we apply mean filtering in time
directly after the capturing process.
Using the Mixture of Gaussians method for movement detec-
tion [10], slightly adapted to the use on depth images, we are
able to constrain the filter only to the areas without movement
and so to prevent the filtering errors which can be caused by
(a) Photo of the camera rig (b) A schematic representation the moving objects. The idea behind the Mixture of Gaussians
method is to represent all possible backgrounds for a pixel
through a mixture of N Gaussian distributions. The probabil-
Fig. 1. Capturing System ity that a value vt for a pixel is observed to the certain time t
is then given through:
sensor, which is positioned on top of the camera 3. The ToF -
N
Camera emits modulated infrared light at 20 MHz frequency X
and measures the time per pixel which the light needs to come P (vt ) = ωi,t G(vt , µi,t , σi,t ), (1)
i=1
back through special correlation element. The measured time
together with the light speed gives then the depth information where G is the Gaussian kernel of the i-th distribution, with
for each pixel. In addition, a reflectance image measures the mean value µi,t , variance σi,t , and the weight ωi,t which re-
amount of reflected IR amplitude. flects the amount of data belonging to this distribution.
For each pixel in the depth image the value vt at time t is then
3. CONTENT ACQUISITION AND PROCESSING tested against all the Gaussian distributions. If a distribution
i is matched, the weight wi,t is increased and the mean value
For the data acquisition, four streams are simultaneously cap- and variance are updated according to:
tured from the CCD - Cameras at 30 frames per second. The
µi,t = (1 − φ)µi,t−1 + φvt , (2)
ToF - Camera takes pictures at 15 frames per second. Un-
2 2 2
fortunately there is currently no possibility to synchronize the σi,t = (1 − φ)σi,t−1 + φ||vt − µi,t || , (3)
ToF - and CCD - Cameras on hardware level, so the clap in
where φ is the control parameter for the update rate. For all
front of the cameras, at the beginning, is used to manually
unmatched distributions only the weight wi,t is decreased. If
synchronize the cameras for content postprocessing.
no distribution is matched at all, a new distribution is created
and, if the number of distributions is smaller then N , added
3.1. Calibration to the already existing distributions. If the number of distri-
For the further data processing, a reliable calibration of the butions is already N , the distribution with the smallest weight
complete camera rig is required. Due to bad SNR of the re- wi,t is replaced by the newly created distribution. At time t
flectance image, low pixel resolution and systematic errors in the distribution with the highest weight wi,t is considered to
the depth measurement, the ToF - Cameras are difficult to cal- be the background model. One of the advantages of the Mix-
ibrate with traditional calibration methods. In [9] therefore
the authors propose to calibrate the ToF - Camera together
with multiple CCD - Cameras in a joint method. Here we
briefly describe the main idea.
In the first calibration step the 2D to 3D correspondences are
established between a checkerboard pattern and reflectance
images of the cameras. Using this correspondence the initial
internal and external camera parameters are estimated using
(a) Original depth image (b) Filtered depth image
the standard methods of openCV. In a second step an analysis-
by-synthesis approach is used where the internal and exter-
nal camera parameters are further refined in a non-linear opti- Fig. 2. Original and mean filtered ToF depth images; depth
mization. For the correction of the systematic depth measure- values inverted for better visibility [dark = far, light = near]
ment errors, which is not only a constant offset but a higher
order function, the authors model the correct depth through a ture of Gaussians models is that if a moving object remains
stationary for a certain time, it will be adapted as background and update D(x, y) through found polynomial mini-
and included in the mean filtering. The other point is that mum; build new cost volume based on updated depth
if a background is shortly occluded by a moving object its image and start new iteration
distribution will still be maintained and can be adapted fast
when the object moves away. Figure (2) shows an example of The bilateral filtering is expensive and depends strongly on
aplying the mean filtering in time with integrated Mixture of the image size and on the filter mask size. Therefore we use
Gaussians motion detection to the raw Time-Of-Flight depth the approximation of the bilateral filter as proposed in [13]
image. and implement the refinement process on gpu using fragment
shaders. To further improve the refinement results we apply
the above described approach multiple times with varying fil-
3.3. Fusion of depth and color images ter sizes, starting with very large filters and decreasing the
The ToF - Camera computes low-resolution depth images from filter size stepwise (typically 3 or 4 refinement steps).
the view point of the ToF - Camera. To compute depth data for Figure (4) shows (a) the color image with segmented for-
the high-resolution color images, the depth image is warped ground and (b) the refined depth map. The foreground seg-
into the position of the color camera, taking into account the
known relative calibration and scene depth. A 3D surface
mesh is generated for each depth image and rendered into the
color camera’s view [11] . Figure (3) shows the color image
(a) and resulting warped depth map (b) for camera 3. Due to
(a) Color image (b) Depth image mentation was performed by marking a pixel as background
if its depth value from the refined depth map lies above a cer-
Fig. 3. Result from warping the ToF depth map to camera 3; tain threshold for background. This segmentation shows the
depth values inverted for better visibility [dark = far, light = high quality of the refined depth map.
near]
4. OCCLUSION LAYER GENERATION
occlusions at object boundaries and the very different image
resolution (176 × 144 vs. 1980 × 1080), boundary artefacts After the color alignment, all pixels with the missing depth
will occur in the warped depth map. To correct such artifacts value are assigned the background depth from the neighbor-
we perform a color alignment between the warped depth map hood. Here we currently make a simplified assumption of
and the corresponding color image, based on bilateral filter- planar background, but for more sophisticated depth estima-
ing, as proposed in [12]. In the following we shortly describe tion, stereo matching can be used.
the applied method and refer to the paper for more details. By warping the pixels of the color image according to the
In the first step a coarse cost volume based on the warped depth values, new views can then be generated to the left and
depth image is constructed: right of the current view. By moving to the side of the cur-
rent view, new areas become visible, where no texture infor-
C(d, x, y) = min(n · L, (d − D(x, y))2 ), (4) mation is available in the original view (see Figure (5) left).
Such areas are called occlusions and can be either handled by
where L and n are constants for the cost truncation, D(x, y) interpolation of the original texture, which produces undesir-
the warped depth image and d stands for the current depth able ”rubber sheet” effect on the object boundaries, or by us-
candidat. In the second step this cost volume is iterativly re- ing a layered depth image, where occlusions are represented
fined as follows: through an additional occlusion layer, consisting of texture
1. filter each d - slice of the cost volume with the bilateral and depth information for the background. Here we propose
filter constructed from corresponding color image; for a straight forward approach for occlusion layer generation.
each position (x, y) find dˆ = mind C(x, y, d) We first warp the depth map to the most outer left and right
views relative to the camera 3, according to the depth values.
2. perform quadratic polynomial interpolation of filtered In the outer views, all areas which don’t have depth values
ˆ dˆ− 1, and dˆ+ 1 for subpixel estimation
cost between d, assigned are then marked as occluded in the original view.
For these areas, we currently estimate depth through back- [3] P. Kauff et al., “Depth map creation and image-based
ground extrapolation from neighborhood. But here again one rendering for advanced 3dtv services providing interop-
of the stereo matching methods can be used for more correct erability and scalability,” Image Communication, vol.
depth estimation. The filled occlusion areas are then warped 22, no. 2, pp. 217–234, 2007.
back, according to the estimated depth values, from the outer
[4] A. Redert et al., “Advanced three-dimensional televi-
views to the main view and merged to a depth map with the
sion system technologies,” 3D Data Processing Visual-
occlusion layer. In the second step we use backward mapping
ization and Transmission, 2002. Proceedings. First In-
according to available depth values from the main view into
ternational Symposium on, pp. 313–319, 2002.
the outer cameras to get the corresponding texture informa-
tion. Figure (5) shows a new view, generated left from the [5] Jonathan Shade et al., “Layered depth images,” in SIG-
main view, with and without estimated occlusion layer. Note GRAPH ’98: Proceedings of the 25th annual conference
on Computer graphics and interactive techniques, New
York, NY, USA, 1998, pp. 231–242, ACM.
[6] A. Kolb et al., “Time-of-Flight Sensors in Computer
Graphics,” Computer Graphics Forum, vol. 28, 2009.
[7] J.J. Zhu et al., “Fusion of time-of-flight depth and stereo
for high accuracy depth maps,” IEEE Conf. on Comp.
Vision and Pattern Rec. (CVPR), 2008.
[8] Swissranger SR3000 Manual, version 1.02, 2006.
Fig. 5. Generated view, without (left) and with the occlusion
layer (right). [9] Ingo Schiller et al., “Calibration of a pmd camera us-
ing a planar calibration object together with a multi-
that texture errors on the object and occlusion layer bound- camera setup,” in The International Archives of the Pho-
aries in the new view might occur due to alignment errors togrammetry, Remote Sensing and Spatial Information
between depth and color edges in the main view. Such errors Sciences, Beijing, China, 2008, vol. Vol. XXXVII. Part
can be minimized through further refinement of the depth map B3a, pp. 297–302, XXI. ISPRS Congress.
by stereo matching, before occlusion layer calculation. How-
[10] Ming Xu and Tim Ellis, “Illumination-invariant motion
ever, they can’t be completely corrected because of the alias-
detection using colour mixture models,” in British Ma-
ing in the color images, so that blending and image composit-
chine Vision Conference (BMVC), 2001, pp. 163–172.
ing techniques by the generation of new views are required
[14, 15]. This will be the focus of further research. [11] Bogumil Bartczak et al., “Integration of a time-of-
flight camera into a mixed reality system for handling
5. CONCLUSIONS dynamic scenes, moving viewpoints and occlusions in
real-time,” in Proceedings of the 3DPVT Workshop, At-
We present an approach of LDV - Content creation for 3D- lanta, GA, USA, June 2008.
TV production using a ToF - Camera integrated into a camera [12] Qingxiong Yang et al., “Spatial-depth super resolution
rig of four CCD - Cameras. We have shown that, using only for range images,” Computer Vision and Pattern Recog-
the ToF - Camera for depth estimation, one is able to generate nition, 2007. CVPR ’07. IEEE Conference on, pp. 1–8,
a reliable dense depth map for one of the CCD - Cameras June 2007.
and to use this depth map for occlusion layer generation. The
future work will include the combination of the ToF - data [13] T.Q. Pham and L.J. van Vliet, “Separable bilateral filter-
with stereo matching to further improve the content quality. ing for fast video preprocessing,” Multimedia and Expo,
2005. ICME 2005. IEEE International Conference on,
6. REFERENCES pp. 4 pp.–, July 2005.
[14] B. Barenburg, “Declipse 2: Multi-layer image-and-
[1] C. Girdwood and Ph. Chiwy, “MIRAGE: An ACTS depth with transparency made practical,” in Stereo-
Project In Virtual Production And Stereoscopy,” in scopic Displays and Applications. SPIE and IS&T,
Broadcasting Convention, International, 12-16 Sep 2009, vol. 7237.
1996, pp. 155–160.
[15] Peter J. Burt and Edward H. Adelson, “A multiresolu-
[2] M. Ziegler et al., “Evolution of stereoscopic and three- tion spline with application to image mosaics,” ACM
dimensional video,” Image Communication, vol. 14, Trans. Graph., vol. 2, no. 4, pp. 217–236, 1983.
November 1998.