Three-dimensional (3D) profile measurement is of paramount importance in numerous industrial applications, including quality inspection, reverse engineering, and process control. Conventionally, the Coordinate Measuring Machine (CMM) is widely utilized for its high-precision inspection of workpieces. However, its contact-based nature results in a very low measurement speed. In recent decades, high-speed, non-contact optical measurement methods have seen rapid development, including Fringe Projection Profilometry (FPP) [1,2], photometric stereo [3,4], and stereo vision, among others.
Among these optical techniques, FPP stands out as a particularly promising method owing to its flexibility, high speed, and high precision [[5], [6], [7], [8], [9]]. Nevertheless, the measurement integrity and accuracy of FPP are adversely affected by both the motion of the target and the optical properties of its surface. The inherent principles of FPP necessitate that the target remains stationary during the measurement period and presents a Lambertian (i.e., diffuse) surface. In industrial settings, however, many machined workpieces are not only non-Lambertian but are also frequently measured while in motion, such as on conveyor belts. The confluence of these two challenges often leads to significant problems in the reconstructed point cloud, such as missing data and geometric distortions.
To address the challenges of 3D reconstruction in dynamic scenes, numerous methods have been proposed to compensate for motion-induced errors. According to a survey by Lu et al. [10], these approaches can be broadly categorized into four groups: object tracking, Fourier-assisted methods, motion prediction, and others. Object tracking methods acquire the object's trajectory to correct the phase, either by placing markers on the object [11] or by leveraging feature matching algorithms like SIFT [12]; however, these methods often struggle with complex non-rigid or 6-DoF movements. Motion prediction techniques, on the other hand, compensate for inter-frame phase changes by estimating parameters such as the object's velocity [13], but their application is often limited by the assumption of uniform motion. Among these strategies, single-shot methods, exemplified by Fourier Transform Profilometry (FTP), have garnered significant attention for their ability to “freeze” motion [14]. Such methods require only a single fringe pattern to recover the phase, making them inherently well-suited for high-speed dynamic measurements. Their primary drawback, however, lies in limited accuracy. Spectrum leakage and aliasing, caused by the global nature of the Fourier transform, result in poor performance when processing surfaces with rapid height variations or discontinuities. To enhance the accuracy of single-shot methods, researchers have introduced more advanced signal processing tools like the Windowed Fourier Transform (WFT), Wavelet Transform (WT), and even the Shearlet Transform (ST), aiming to optimize phase extraction through more refined time-frequency analysis [15]. Despite these advancements, the inherent trade-off between speed and accuracy, especially in regions of steep topography, remains a central challenge in the field.
Separately, photometric stereo offers an alternative pathway for acquiring fine geometric details of an object's surface. This technique determines surface normal vectors by analyzing intensity variations under different illumination conditions and exhibits exceptional capability in recovering high-frequency details. However, its reconstruction accuracy is highly contingent upon the precise calibration of the illumination system, which primarily involves two aspects: geometric position calibration of the light sources and calibration of their Radiance Intensity Distribution (RID). For geometric position calibration, research is mainly divided into reflection-based and shadow-based methods. Reflection-based approaches typically use specular highlights on a mirrored sphere to triangulate the light source position, as demonstrated in the work of Powell et al. [16] and Ackermann et al. [17]. Shadow-based methods, conversely, infer the source position from the geometric relationship between a known object and its cast shadow [18]. A common challenge for these methods is that non-uniform illumination can severely impact the stable extraction of highlight or shadow features, thereby degrading calibration accuracy. Regarding RID calibration, early studies often simplified the light source as isotropic or used simple parametric models (e.g., the cosine-power model) [19]. This, however, exhibits significant deviation from the complex, anisotropic radiation characteristics of real-world LEDs, introducing reconstruction errors [20]. To resolve this issue, more recent work has begun to adopt flexible, non-parametric models, such as using Spherical Harmonics (SH) to fit the true radiation field of the source [21]. Furthermore, it is important to note that traditional PS requires capturing multiple images, rendering the technique itself susceptible to motion artifacts.
Accurate 3D profiling of non-Lambertian surfaces, particularly those exhibiting high specularity, presents considerable challenges for conventional optical metrology. Methods such as FPP, while effective for diffuse surfaces, often suffer from image saturation in high-gloss regions, leading to data voids and incomplete reconstructions [22,23]. Photometric stereo, conversely, operates based on surface reflectance properties, enabling the recovery of high-fidelity surface normals and fine texture details even on non-diffuse surfaces by analyzing images captured under varying illumination conditions [24]. Although traditional PS methodologies were constrained by the Lambertian reflectance assumption [25], recent advancements leveraging deep learning have demonstrated the capability to infer surface normals directly from observed images for general non-Lambertian surfaces with unknown isotropic Bidirectional Reflectance Distribution Functions (BRDFs) [[26], [27], [28]]. Concurrently, Multi-View Stereo (MVS), a fundamental problem in computer vision, aims to recover dense 3D structure from calibrated image sets [25,29]. However, conventional MVS algorithms, often relying on hand-crafted photo-consistency metrics, exhibit performance degradation in regions characterized by low texture, specularity, or high reflectance, where local features lack discriminative power [30,31]. Significant progress has been made through end-to-end deep learning frameworks like MVSNet, which construct a cost volume based on deep features warped via differentiable homography and subsequently regularize it using 3D convolutions to infer depth, markedly improving reconstruction quality [32]. For complex non-Lambertian objects, although FPP establishes a reliable global geometric structure, its spatial resolution remains dependent on the fringe frequency. This characteristic presents challenges for the detailed recovery of micro-textures [33,34]. In comparison, photometric stereo effectively captures pixel-wise surface normals and high-frequency details from shading information, serving as a complementary constraint for the depth data. Since photometric stereo resolves local normals on challenging surfaces while FPP determines the global geometry, their integration offers a viable pathway for achieving high-accuracy surface reconstructions [35,36]. Recent contributions exemplify advancements in related domains: Wang et al. integrated FPP with near-field PS (NPS) within a robotic system, employing self-supervision and a novel geometric descriptor incorporating surface normals for robust point cloud registration on reflective surfaces [37]. Jian et al. developed a task-specific NPS approach, utilizing pre-training on synthetic data and fine-tuning on high-precision real data to achieve micro- and sub-micrometric texture measurement on metallic surfaces [33]. Within the MVS domain, the seminal MVSNet architecture [32] demonstrated scalable, high-accuracy reconstruction through differentiable homography warping and 3D regularization, while subsequent work like Fast-MVSNet [38] introduced sparse-to-dense strategies and learned propagation to enhance computational efficiency without compromising accuracy. These developments underscore the potential for synergistic approaches combining multi-view geometry and reflectance-based analysis for comprehensive 3D measurement.
To concurrently address the challenges presented by both object motion and specular reflections, this paper presents a methodology integrating single-shot structured light projection with single-source, dynamic Near-field Photometric Stereo (NFPS) for 3D reconstruction. The workflow commences with the projection of a single structured light pattern onto the object undergoing motion. This step facilitates the reconstruction of an initial, albeit incomplete, surface geometry via FPP, primarily capturing the non-specular surface aspects. Subsequently, following deactivation of the structured light projector, a single point light source provides illumination. Under this condition, a sequence comprising at least three photometric images is captured while the object continues its motion. Inter-frame motion estimation then leverages the SuperGlue feature matching algorithm to establish sparse correspondences between the acquired photometric images. This process enables the precise estimation of the object's relative pose between sequential frames. While essential for reconstruction, this reliance on relative motion restricts the methodology to dynamic scenes. The recovered motion information subsequently serves as a constraint for a temporal depth-photometric stereo network. Finally, a complete and globally consistent point cloud is generated by this network. This network architecture is designed to directly fuse the initial depth map (from FPP) with the temporal photometric image sequence, thereby resolving depth ambiguities within the specular regions. The network simultaneously enforces consistency with the depth measurements obtained from the initial structured light projection at the boundaries of the valid FPP data regions.
Comments (0)