True 3D Object Reconstruction with A RGB-D Camera Surrounding Array


This paper presents research findings on calibrating RGB-D information from an array of RGB-D sensors to construct a 3D model of a human bust and a puppet. RGB-D information of each RGB-D sensor are first collected at a centralized PC and point clouds are generated. Point clouds are aligned using a color-based duplicate removal iterative closest point algorithm. Two noise removal algorithms are introduced before and after the alignment to remove the outliers. A 8-neighbor 3D super-resolution algorithm is introduced to increase the point cloud quality. Next, small hole filling mechanisms and large hole filling in 3D based on 2D inpainting are proposed. Finally, a 3D Poisson surface is created. Main contributions of this work are, color-based duplicate removal iterative closest point algorithm, noise removal, super- resolution, and 3D inpainting based on 2D inpainting. Experimental results demonstrate that the proposed strategic steps provide a better 3D model.


3d reconstruction, kinect, icp, noise removal, inpainting, hole filling

Proposed Method

Strategic steps and modules of the proposed system.

Data Collection

RGB-D sensors can be used to capture color and depth images of the scene. We first project the depth pixel values in to the color image space. Then depth values are projected in to the camera space or the 3D space. Next, we can copy the corresponding color values from the color image to the point cloud. The result is a colored 3D point cloud. As we focus only on 3D reconstruction of the objects or human bust here, we first need to extract only the object from the complete point cloud. A predefined depth threshold is used to extract the object.

Alignment and Object Refinement

Noise removal of the point clouds

RGB-D sensor depth image values are not always correct. Depth image may contain noise due to several factors such as inherent camera problems, projection issues, interferences from the surrounding, interferences from the other sensors, etc. We need to remove noise points from the point clouds before proceeding. Two noise removal algorithms are proposed here in order to obtain a noise free point cloud. Noise points are dense around the edges of the objects and the density is proportional to the distance from the sensor. Hence, the first noise removal algorithm removes noise points from each point cloud created from each sensor. This is called adaptive distance-based noise removal algorithm. Next noise removal is performed after align point clouds from all sensors. After align all the point clouds, points are independent from the original sensor. Hence the second algorithm is called adaptive density- based noise removal algorithm and performed after the merging of point clouds.

Adaptive distance-based noise removal.

8-neighbor 3D super-resolution

After the noise removal of the point cloud, a 3D super-resolution method is applied on the point cloud to increase the number of points. Newly introduced points in the super-resolution step helps filling small holes in the point cloud. Color and depth image resolutions of the RGB-D sensor are different. Point cloud is generated from the depth image and only the corresponding color values are copied. Since the color image resolution is higher comparing to the depth image, by bringing in some extra points from the color image as new points to the point cloud, number of points in the point cloud can be easily increased.

8-neighbor 3D super-resolution of point clouds. The color values of the north (cn), north-east (cne), east (ce), south-east (cse), south (cs), south-west (csw), west (cw), and north-west (cnw) neighboring pixels of the corresponding pixel c of point p ∈ pc are used as the color values of the newly introduced points, north (pn), north-east (pne), east (pe), south-east (pse), south (ps), south-west (psw), west (pw), and north-west (pnw) (p, pn, pne, pe, pse, ps, psw, pw, pnw ∈ R3) at a distance dsr = 0.001 from point p.

Point cloud alignment with color-based duplicate-removal ICP

Point coordinates of a point in a surface captured from two RGB-D sensors at two locations are different as the two sensors use individual coordinate systems. In order to construct a 3D object from the multiple point clouds, first, point clouds must be aligned correctly. Iterative closest point (ICP) algorithm is used to align two point clouds. ICP works if the two point clouds are in a close proximity. Since a sensor pair is around 45◦ apart in our setup, first two clouds are required to be brought to a coarse alignment. We placed a paper with four different color rectangle markers in the middle of the set up and recorded once from all the sensors. Then the user marks three corresponding point pairs in both color views (Figure 4(a)). We require at least three point pairs to find transformation matrix in Euclidian space. Red points marked are the points with available depth values from the depth im- age. The corresponding 3D coordinates for the user marked points are found from the point clouds. Found 3D coordinates from two clouds are used to find the transformation matrix of the rough alignment phase using the single value decomposition. In 4(b), red circles are user marked 3 points in the reference point cloud (pcr) A, green circles are user marked 3 correspoinding points in the source point cloud (pcs) (B), and blue crosses are coarse aligned corresponding source points (B2). Then, ICP is used to find a fine alignment considering of the coarse aligned point clouds.

Coarse alignment. (a) User marked 3 corresponding point pairs in source and reference point clouds (depth values transformed into color space are marked in red), (b) Result of the coarse alignment of the user marked points (A: reference, B: source, B2: coarse aligned source points).

Inpainting and Final Object Construction

There may be large holes available in the point cloud due to occlusion as the chin hole in Figure 6(a). The proposed large hole filling consists of two steps as finding large holes and filling large holes. The purpose is to fill holes on a 3D model using 2D image inpainting.

Finding the color and depth images for large holes. (a) original point cloud with the large hole. (b) detected hole boundary and filtered surrounding points of the point cloud. (c) new axes along the best matching PCA plane of the filtered points and the boundary points. (d) generated grid. (e) color and depth image with hole mask for 2D inpainting.


Experimental Results


We used two setups during the experiment. In the first setup, 8 RGB- D sensors were placed 1m height at 45◦ angles in the circumference of a 1.5m diameter circle to record a person sitting in the center of the circle as in Figure 8.

The first setup. 8 RGB-D sensors were placed 1m height at 45◦ angles in the circumference of a 1.5m diameter circle to record a person sitting in the center of the circle. Each RGB-D sensor is connected to a separate PC. Four monitors displaying a common timer were placed in the visinity of all RGB-D sensors.

Adaptive distance-based noise removal

Person bust was then separated from the original point clouds of the synchronized frames using distance thresholds along x, y, and z axes. Next, adaptive distance-based noise removal performed to remove noise points in each point cloud. The effect of different outlier probability op values are shown in Table 2. For the rest of the examples op = 0.15 was used.

Adaptive distance-based noise removal example (op: outlier probability), (a) original point cloud (no. of points = 37,980 and 47,726. Yellow: good points, red: detected outlier points. (b) op = 0.05 (c) op = 0.10 (d) op = 0.15 (e) op = 0.20.
An example result of adaptive distance-based noise removal algorithm. (a) Original point cloud. (b) Marked noise points in the point cloud. (c) After noise removal. Rows 1-2 shows results for person 1 and 2.

8-neighbor 3D super-resolution

An example result of 8-neighbor 3D super-resolution. (a) Original point cloud, (b)-(c) resultant point cloud for dsr = 0.01 and dsr = 0.001.

Small hole finding

An example small hole finding result for person 2. (a) Original point cloud, (b) detected boundary points, number of neighbors considered n = 20, neighborhood radius neirad = 0.01, angle criterian weight weightangle = 0.2, half-disk criterian weight weighthalfdisk = 0.4, boundary criterian weight weightboundary = 0.4, and probability threshold to filter boundary points probTh > 0.5.
The effect of different parameter values in small hole finding for person 2 (number of neighbors considered (n) = 20, neighborhood radius neirad = 0.01). (a) angle criterian weight weightangle = 1/3, half-disk criterian weight weighthalfdisk = 1/3, boundary criterian weight weightboundary = 1/3, and probability threshold to fil- ter boundary points probTh > 0.5, (b) weightangle = 1/3, weighthalfdisk = 1/3, weightboundary = 1/3, probTh > 0.6, (c) weightangle = 0.2, weighthalfdisk = 0.3, weightboundary = 0.5, probTh > 0.5, (d) weightangle = 0.2, weighthalfdisk = 0.4, weightboundary = 0.4, probTh > 0.5, (e) weightangle = 0.2, weighthalfdisk = 0.5, weightboundary = 0.3, probT h > 0.5.
Table 1: The effect of different parameter values in small hole finding for person 2.

Results: Setup 1

Steps till merging for person 1. Row 1: RGB image, row 2: original point cloud, row 3: extracted object, row 4: marked noise points, row 5: after noise removal. (a)-(h): 8 RGB-D camera views.
Surface constructed human busts. (a) Alignment result with separate colors for each point cloud, (b) front view of the merged point cloud, (c)-(f) front, right, rear, left view of the surface reconstruction the human busts, (g) original color image captured by RGB-D sensor.

Results: Setup 2

Surface constructed BuDaShi puppets. (a)-(d) front, right, rear, left view of the surface reconstruction the BuDaShi puppets.


Surface Interpenetration Measure (SIM), Mean Squared Error (MSE)

Binary representation of the detected points considered for SIM value calculation for person 1. Row 1: reference point cloud, row 2: source point cloud, row 3: set of inter-penetrating points (CA,B).
Table 2: The effect of different parameter values in small hole finding for person 1.
Binary representation of the detected points considered for SIM value calculation for person 2. Row 1: reference point cloud, row 2: source point cloud, row 3: set of inter-penetrating points (CA,B).
Table 3: The effect of different parameter values in small hole finding for person 2.
Binary representation of the detected points considered for SIM value calculation for person 3. Row 1: reference point cloud, row 2: source point cloud, row 3: set of inter-penetrating points (CA,B).
Table 4: The effect of different parameter values in small hole finding for person 3.
Binary representation of the detected points considered for SIM value calculation for person 4. Row 1: reference point cloud, row 2: source point cloud, row 3: set of inter-penetrating points (CA,B).
Table 5: The effect of different parameter values in small hole finding for person 4.





  1. W. G. C. W. Kumara, S.-H. Yen, H.-H. Hsu, T. K. Shih, W.-C. Chang, E. Togootogtokh, Real-time 3d human objects rendering based on mul- tiple camera details, Multimedia Tools and Applications 76 (9) (2017) 11687–11713.
  2. B. Bellekens, V. Spruyt, R. Berkvens, R. Penne, W. M., A benchmark survey of rigid 3d point cloud registration algorithms, International Journal on Advances in Intelligent Systems 8 (12) (2015) 118–127.
  3. G. Elbaz, T. Avraham, A. Fischer, 3d point cloud registration for local- ization using a deep neural network auto-encoder, in: 2017 IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 2472–2481.
  4. S. Rusinkiewicz, M. Levoy, Efficient variants of the icp algorithm, in: Proceedings of the Third International Conference on 3-D Digital Imag- ing and Modeling, IEEE, 2001, pp. 145–152.
  5. S. Bouaziz, A. Tagliasacchi, M. Pauly, Sparse iterative closest point, in: Computer graphics forum, Vol. 32, Wiley Online Library, 2013, pp. 113–123.
  6. J. Yang, H. Li, Y. Jia, Go-icp: Solving 3d registration efficiently and globally optimally, in: Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1457–1464.
  7. M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, A. Kolb, Real- time 3d reconstruction in dynamic scenes using point-based fusion, in: International Conference on 3D Vision, IEEE, 2013, pp. 1–8.
  8. D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang, M. Pollefeys, Real- time plane-sweeping stereo with multiple sweeping directions, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007, pp. 1–8.
  9. X. Zabulis, K. Daniilidis, Multi-camera reconstruction based on surface normal estimation and best viewpoint selection, in: Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization and Transmission, IEEE, 2004, pp. 733–740.
  10. Y. Furukawa, J. Ponce, Accurate, dense, and robust multiview stereopsis, IEEE transactions on pattern analysis and machine intelligence 32 (8) (2010) 1362–1376.
  11. C. E. Scheidegger, S. Fleishman, C. T. Silva, Triangulating point set surfaces with bounded error., in: Symposium on Geometry Processing, 2005, pp. 63–72.
  12. Y. Liu, Q. Dai, W. Xu, A point-cloud-based multiview stereo algorithm for free-viewpoint video, IEEE transactions on visualization and com- puter graphics 16 (3) (2010) 407–418.
  13. Y. Alj, G. Boisson, P. Bordes, M. Pressigout, L. Morin, Space carving mvd sequences for modeling natural 3d scenes, in: Three-Dimensional Image Processing (3DIP) and Applications II, 2012, pp. 1–8.
  14. W. E. Lorensen, H. E. Cline, Marching cubes: A high resolution 3d surface construction algorithm, in: ACM siggraph computer graphics, Vol. 21, ACM, 1987, pp. 163–169.
  15. S. M. Seitz, C. R. Dyer, Photorealistic scene reconstruction by voxel coloring, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1997, pp. 1067–1073.
  16. K. N. Kutulakos, S. M. Seitz, A theory of shape by space carving, in: The Proceedings of the Seventh IEEE International Conference on Computer Vision, Vol. 1, IEEE, 1999, pp. 307–314.
  17. K. Kutulakos, Approximate n-view stereo, European Conference on Computer Vision (2000) 67–83.
  18. C. H. Esteban, F. Schmitt, Silhouette and stereo fusion for 3d object modeling, Computer Vision and Image Understanding 96 (3) (2004) 367–392.
  19. M. Goesele, B. Curless, S. M. Seitz, Multi-view stereo revisited, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, IEEE, 2006, pp. 2402–2409.
  20. J. Park, H. Kim, Y.-W. Tai, M. S. Brown, I. Kweon, High quality depth map upsampling for 3d-tof cameras, in: IEEE International Conference on Computer Vision, IEEE, 2011, pp. 1623–1630.
  21. J. Becker, C. Stewart, R. J. Radke, Lidar inpainting from a single im- age, in: 12th International Conference on Computer Vision Workshops, IEEE, 2009, pp. 1441–1448.
  22. C. Zhang, Q. Cai, P. A. Chou, Z. Zhang, R. Martin-Brualla, Viewport: A distributed, immersive teleconferencing system with infrared dot pat- tern, IEEE MultiMedia 20 (1) (2013) 17–27.
  23. A. Smolic, 3d video and free viewpoint videofrom capture to display, Pattern recognition 44 (9) (2011) 1958–1968.
  24. G. H. Bendels, R. Schnabel, R. Klein, Detecting holes in point set sur- faces, The Journal of WSCG 14 (2006) 89–96.
  25. A. Telea, An image inpainting technique based on the fast marching method, Journal of graphics tools 9 (1) (2004) 23–34.
  26. A. Criminisi, P. P ́erez, K. Toyama, Region filling and object removal by exemplar-based image inpainting, IEEE Transactions on image process- ing 13 (9) (2004) 1200–1212.
  27. M. Kazhdan, H. Hoppe, Screened poisson surface reconstruction, ACM Transactions on Graphics (TOG) 32 (3) (2013) 29.
  28. L. Silva, O. R. P. Bellon, K. L. Boyer, Precision range image registration using a robust surface interpenetration measure and enhanced genetic algorithms, IEEE transactions on pattern analysis and machine intelli- gence 27 (5) (2005) 762–776.


Source Code

Source Code

Please contact Timothy K. Shih ( for the password.


Contact Us

For any questions or comments regarding this content please contact:
Timothy K. Shih (

Timothy K. Shih is a Distinguished Professor at the National Central University, Taiwan. He was the Dean of the College of Computer Science, Asia University, Taiwan and the Chairman of the CSIE Department at Tamkang University, Taiwan. Prof. Shih is a Fellow of the Institution of Engineering and Technology (IET). He was also the founding Chairman Emeritus of the IET Taipei Local Network. In addition, he is a senior member of ACM and a senior member of IEEE. Prof. Shih joined the Educational Activities Board of the IEEE Computer Society. He was the founder and co-editor-in-chief of the International Journal of Distance Education Technologies, USA. He is the Associate Editor of IEEE Computing Now. And, he was the associate editors of the IEEE Transactions on Learning Technologies, the ACM Transactions on Internet Technology, and the IEEE Transactions on Multimedia. Prof. Shih was the Conference Co-Chair of the 2004 IEEE International Conference on Multimedia and Expo (ICME'2004). He has been invited to give more than 50 keynote speeches and plenary talks in international conferences, as well as tutorials in IEEE ICME 2001 and 2006, and ACM Multimedia 2002 and 2007. Prof. Shih's current research interests include Multimedia Computing, Computer-Human-Interaction, and Distance Learning. He has edited many books and published over 500 papers and book chap- ters. Prof. Shih has received many research awards, including research awards from National Science Council of Taiwan, IIAS research award from Germany, HSSS award from Greece, Brandon Hall award from USA, the 2015 Google MOOC Focused Research Award, and sev- eral best paper awards from international conferences. Professor Shih was named the 2014 Outstanding Alumnus by Santa Clara University.
