Snapshot HoloSpec: dispersion-coded 4D feature learning for waterborne particle monitoring

Plastic pollution has emerged as one of humanity’s most pressing environmental challenges. Global plastic consumption reached 460 million tons in 2019 and is projected to surge to 1.31 billion tons by 2060 [1]. Over time, plastic debris undergoes fragmentation into smaller particles, forming microplastics (MPs, 1 µm–5 mm) and nanoplastics [2]. These particles, alongside natural and anthropogenic particulates such as mineral sediments, organic debris, and aquatic organisms like water fleas, collectively contribute to the complex matrix of water-suspended particles. Within this matrix, MPs are particularly concerning because they degrade slowly, are highly mobile, and have strong adsorption, enabling pervasive distribution across terrestrial, aquatic, and atmospheric ecosystems [3].

MP-biota interactions amplify these risks: MPs adsorb onto cell surfaces, disrupting energy and nutrient transfer [4], while ingestion by organisms ranging from invertebrates to marine mammals impairs physiological functions. Consequently, MP pollution has escalated into a global crisis, demanding urgent advancements in detection methodologies capable of differentiating MPs from co-occurring particles (e.g. mineral grains, organic aggregates, or zooplankton) and quantifying their abundance, morphology, size distribution, and chemical composition. This precise differentiation and classification are critical, as misidentification can lead to significant over- or underestimation of pollution levels. Ultimately, robust data on these parameters is fundamental to understanding the origin, transport routes, and environmental consequences of MP pollution [5].

Conventional spectroscopic techniques, such as Fourier-transform infrared and Raman spectroscopy, remain the gold standard for MP identification [68]. However, they suffer from limitations in field applicability, throughput, and multimodal analysis. Laboratory-based approaches require costly equipment, prolonged scanning times, and lack simultaneous spatial-spectral resolution, rendering them impractical for environmental samples containing thousands of particles [9]. This challenge is especially pronounced in natural waters, where MPs coexist with a diverse array of suspended particulates. Consequently, researchers have highlighted the need for low-cost, high-throughput systems that integrate morphological, spatial, and spectral data to address this complexity, aiming to reduce operational workload, cost, and identification time while maximizing information quality [1013].

Spectral imaging offers a promising solution by combining digital imaging with spectroscopic analysis to concurrently quantify particle quantity, size, shape, and composition [3, 14, 15]. However, conventional systems trade spatial or spectral resolution for speed, relying on intricate optical components, such as filter arrays and coded apertures, that inflate costs and complexity [16, 17]. Moreover, handling the large volumes of data generated by spectral imaging poses significant challenges. Additionally, traditional intensity-based spectral imaging systems inherently overlook the phase information of incident light, leading to the loss of potentially critical structural and material details of the specimens.

To overcome these limitations, we develop a dispersion-based snapshot hyperspectral imaging system integrated with digital holography. Digital Holography encodes the rich phase information of the object within the intensity pattern of the hologram that contains interference fringes [18, 19]. It enables the simultaneous acquisition of both intensity and phase information using conventional sensors, a capability that proves instrumental for materials monitoring [2023]. The combination of holography with other techniques, such as polarization imaging [24, 25] or Raman spectroscopy [26], has demonstrated feasibility in multimodal MP identification. However, reliance solely on polarization data and the use of dual sensors with an external spectrometer limit its precision, efficiency, and practicality. It underscores the need for advancements to enhance high-throughput capabilities and enable comprehensive polymer identification.

In this paper, we present a dispersion-based snapshot method combining digital holography with spectroscopy for real-time material monitoring. The system uses a single detector to capture the compressed four-dimensional (4D) signal (3D spatial and 1D spectral) encoded by the simplified optical system called HoloSpec [27]. By analyzing the raw images using a learning-based method, we can obtain 3D spatial information and classification of the sample, enabling the monitoring of material size, number, shape, and chemical composition. This system offers several important advantages, including significant cost savings, enhanced system portability, and the ability to rapidly recover the 3D holographic intensity and class identity of the target object through snapshot computational methods. The system is adaptable for real-time, field use in particle monitoring, and it combines deep learning analysis. This positions it as a novel tool for environmental diagnostics and a practical bridge from the lab to in situ applications. By unifying morphological and chemical insights, this platform advances the study of water-suspended particles, prioritizing MPs while accounting for the ecological context of coexisting contaminants.

2.1. Schematic of snapshot HoloSpec

The HoloSpec setup, as shown in figure 1(a), uses a compact design for multi-dimensional imaging. Two orthogonally aligned light sources, combined by a beam splitter (BS), are directed onto the specimen. The first beam is a 532 nm single-longitudinal-mode continuous-wave laser that forms the lensless in-line digital holography channel [23]; its interference with the unscattered reference wave produces a hologram that encodes the sample’s phase information. The second beam originates from a broadband white light (350–800 nm LED) and furnishes a spectral dimension: wavelength-dependent variations in the transmitted intensity provide chemical contrast that augments the classification capability of the system. After passing through the sample, the superimposed optical field is angularly dispersed by a miniature triangular prism (BRP1112, $12.7\,\times\,$ mm), and the resulting spatial-spectral distribution is recorded by a 2D complementary metal-oxide semiconductor (CMOS) sensor (MV-CB120-10UM-C). Figure 1(b) summarizes the proposed workflow: following preprocessing of the raw snapshot, multimodal information of the sample is reconstructed by jointly applying physics-based back-propagation and inference from a trained neural network. The classification prowess of the HoloSpec system arises from its unique dispersive-encoding mechanism coupled with deep learning. The prism maps the sample’s broadband transmission spectrum into wavelength dependent spatial shifts on the sensor. Consequently, the material’s spectral profile is not recorded as a conventional 1D curve but is embedded as a distinctive 2D spatial-intensity texture. Concurrently, the same snapshot captures the holographic interference pattern, encoding the particle’s 3D morphology and refractive-index variations. The neural network is trained to jointly decode these fused cues, learning to associate the composite patterns that arise from both spectral absorption/scattering and physical structure with specific material identities.

Figure 1. The proposed monitoring method. (a) Experimental setup. A 532 nm single longitudinal mode laser and a broadband white LED light illuminate samples placed on the glass slide at the same time and dispersed by a prism. The dispersion image is captured by a monochromatic CMOS. (b) Schematic diagram of our snapshot HoloSpec system.

Standard image High-resolution image 2.2. Samples

Experiments were conducted using well-labeled water fleas, environmental impurities, and MP films, including polymethyl methacrylate (PMMA), polycarbonate (PC), polypropylene (PP), and polyvinyl chloride (PVC). The number of samples prepared for each type is shown in table 1, ranging in size from 1 to 5 mm. The MP films were cut into squares with dimensions ranging from 1 to 5 mm. These MPs represent the most common types of plastics found in the environment, particularly in aquatic ecosystems [28, 29].

Table 1. Samples used in experiments.

Particle typeDescriptionNumber of samplesOrganic carbonWater flea20ImpurityGravel, soil, leaf20MicroplasticsPP30MicroplasticsPC30MicroplasticsPMMA30MicroplasticsPVC30

To simulate in situ MPs detection in aquatic environments, the experimental samples were supplemented with water fleas and environmental impurities, including gravel, soil particles, and leaf debris. As a highly abundant zooplankton species in freshwater ecosystems, water fleas are widely distributed across various lentic and slow-flowing water bodies [30]. Moreover, their body size range (0.2–5 mm) closely overlaps with that of MPs and environmental impurities.

2.3. Data acquisition and pre-pocessing

We utilized the described optical system to collect data for all sample types and constructed a well-labeled HoloSpec dataset for neural network training and particle classification. Specifically, a total of 165, 172, 152, 140, 134, and 132 raw images were captured for PC, PMMA, PP, PVC, water flea, and environmental impurity samples, respectively. To enable a comparative evaluation of the proposed method, corresponding in-line holographic images (Holo) were also acquired by deactivating the LED light source while maintaining the samples in identical positions.

Representative examples of both HoloSpec and Holo images for each sample type are provided in figures 2(a) and (c) respectively. It can be observed that zooplanktons, such as water fleas, and environmental impurities are relatively easier to classify due to their distinct morphological features and optical transmission properties. However, the four types of MP particles have no significant visual differences, making their classification very challenging.

Figure 2. Examples of (a) HoloSpec raw images, (b) pre-processed images and (c) holograms for comparison for each particle type.

Standard image High-resolution image

In the HoloSpec imaging mode, white-light illumination introduces background signals that elevate overall image brightness, thereby reducing the sharpness of interference fringes and impairing image clarity. To address these effects and ensure a fair comparative assessment, all images underwent a pre-processing procedure in which brightness and contrast were normalized within each category. This adjustment harmonizes intensity distributions across images and mitigates illumination-related artifacts, promoting consistency and robustness in downstream analysis. Specifically, each HoloSpec frame was linearly normalized to match the global luminance statistics of its corresponding Holo image pair. We compute the mean and standard deviation of pixel intensities for the source (HoloSpec) and reference (Holo) images, and the brightness and contrast of the source image are adjusted via the formula:

Equation (1)

where $I_}$ and $I_}$ represent the pixel intensity arrays of the source images and reference images, while µ and σ are their respective means and standard deviations. The pre-processed images $I_}^}$ are used for the later training. Examples of pre-processed images are shown in figure 2(b).

This method ensures robust pre-processing of images, enabling more accurate analysis in environments with variable lighting conditions.

2.4. Joint characterization

This method facilitates the reverse propagation of HoloSpec images containing dispersive holograms using the optical system model in conjunction with the angular spectrum formula. The retrieved optical field information enables the 3D reconstruction of samples, providing a comprehensive characterization of their spatial features.

Additionally, raw images can be directly classified through a neural network, further obtaining the 1D class information. To validate the effectiveness of our method, we employed both supervised and unsupervised learning strategies to classify and analyze water-suspended particles in the HoloSpec and conventional Holo datasets. In this study, the feature extraction strategies for supervised and unsupervised learning were designed based on the specific methodological requirements of each approach. For supervised learning, we trained a multilayer perceptron (MLP) classification model using labeled data to quantitatively evaluate the classification performance across different datasets. For unsupervised learning, approaches such as k-means clustering were used to explore the intrinsic feature distributions within the data, demonstrating the capability of HoloSpec’s spectral information to automatically distinguish unknown particles. On a standard consumer-grade GPU (NVIDIA GeForce RTX 4090), the average processing time per image is approximately 1.1 s, enabling near real-time monitoring capabilities in field applications.

2.4.1. Supervised learning

For supervised learning, we used a two-stage pipeline to classify six types of microparticles: PC, PMMA, PP, PVC, impurities, and water fleas, based on the preprocessed HoloSpec and Holo datasets. The pipeline consisted of a fixed deep feature extractor followed by a lightweight classifier.

We employed a ResNet50 [31] model pre-trained on ImageNet as a frozen backbone to extract high-dimensional features. Specifically, the global average pooling layer of ResNet50 was used to generate 2048-dimensional feature representations for each image without any fine-tuning. This approach retained the strong representational power of the pre-trained model while focusing on feature selection for the classification task. The extracted features were subsequently fed into an MLP classifier implemented in scikit-learn, which consisted of two hidden layers with 256 and 128 units, respectively, and utilized ReLU activations. While deep learning has expanded into diverse architectures [32], MLPs remain widely adopted for high-dimensional feature classification due to their interpretability and efficiency [33]. The MLP was trained using the Adam optimizer with a learning rate of 0.001 and early stopping to prevent overfitting. The training process was capped at 50–100 epochs, with a fixed random seed to ensure reproducibility.

Model evaluation was conducted using stratified five-fold cross-validation with shuffled splits. To standardize feature scales and prevent data leakage, a standard scaler was fit on the training partition of each fold and applied to both the training and test data within that fold. For each fold, class probabilities and predictions were generated on the held-out split, and the model with the best performance on the test set was retained. This workflow not only maximized the retention of fine-grained features essential for microparticle classification but also leveraged the regularization mechanisms in the supervised learning framework to mitigate overfitting risks associated with high-dimensional features.

2.4.2. Unsupervised learning

For unsupervised learning, we employed AlexNet [34] and ResNet18 as pre-trained convolutional neural network (CNN) feature extractors, followed by k-means clustering to analyze the intrinsic structure of the data. These architectures were selected to extract low-dimensional features suitable for clustering, considering the sensitivity of clustering algorithms to high-dimensional data. By reducing the feature dimensionality, we mitigated the ‘curse of dimensionality’, enhanced computational efficiency, and leveraged shallow network features to better reveal the underlying clustering structure of the data.

The AlexNet model was truncated to retain only its convolutional subnetwork, consisting of five convolutional layers interleaved with ReLU activations and max-pooling layers. The final convolutional feature map was processed using adaptive average pooling to compress it into a $1 \times 1$ spatial resolution, resulting in a 256-dimensional feature vector for each image. For ResNet18, the network was modified by removing the dense layers, retaining eight residual blocks, each containing two $3 \times 3$ convolutional layers with identity shortcut connections. The output of the last residual block, a 512-channel activation map, was subsequently reduced into a 512-dimensional feature vector using global adaptive average pooling.

All images were resized to $512 \times 512$ pixels prior to feature extraction. Using the frozen weights of the pre-trained networks, each image was transformed into a high-dimensional feature vector. These feature vectors were then clustered using k-means, with the number of clusters k = 6, corresponding to the six target categories: PC, PMMA, PP, PVC, impurities, and water fleas. To quantitatively evaluate the clustering results, the Hungarian algorithm was applied to optimally match the clustering outputs with ground-truth labels.

To ensure reproducibility, a fixed random seed of 42 was used throughout the analysis. Additionally, a batch size of 32 was adopted for processing, balancing computational efficiency and memory usage. This approach effectively demonstrated the ability of shallow and compact feature representations to uncover the data’s inherent clustering structures while maintaining computational feasibility.

2.4.3. Dispersive hologram reconstruction

During holographic reconstruction, the spectrally dispersed signal has a negligible effect. First, the optical power of the 532 nm laser used for holography significantly outweighs that of the broadband LED, ensuring that the recorded interference fringes dominate the intensity pattern. Second, due to the linear dispersion of the prism, the spectral information is smeared into a smooth, low frequency background across the sensor; it does not introduce high spatial frequency modulations that would corrupt the fine fringe structure essential for phase retrieval. Third, the preprocessing step (equation (1)) actively suppresses this uniform background and enhances fringe contrast. Therefore, the dispersive holographic interference fringes captured by the CMOS sensor can be expressed as:

Equation (2)

where $I_\textrm(x, y)$ represents the intensity distribution of the hologram recorded on the imaging plane with coordinates (x, y). Here, δy denotes the dispersion displacement caused by prism, Eo is the diffracted wave of the object, while $E_}$ corresponds to the un-diffracted plane reference wave.

After deriving $I_\mathrm(x, y-\delta y)$ from the modeled optical system based on $I_\mathrm(x, y)$ and pre-processing mentioned above, the hologram can be numerically back-propagated to a specific reconstruction plane with coordinates $(x, y, z)$, allowing the complex optical field of the target object, $\Gamma(x, y, z)$, to be retrieved. In this paper, we adopt the angular spectrum algorithm [19, 35], a widely-used holographic reconstruction method, to reconstruct the optical field. It is mathematically described as:

Equation (3)

where $\mathcal$ and $\mathcal^$ represent the Fourier transform and its inverse, respectively, and fx and fy are the transverse spatial frequencies. From the retrieved complex optical field $\Gamma(x,y,z)$, both the amplitude $I = |\Gamma|^2$ and the phase $\varphi(x,y,z) = \arg(\Gamma)$ can be obtained. The phase is calculated as $\varphi = \tan ^ \frac(\Gamma)}(\Gamma)}$, where ‘Im’ and ‘Re’ denote the imaginary and real parts, respectively. The phase map directly encodes the optical path length difference induced by the specimen, thereby providing quantitative information about its three dimensional structure and refractive index distribution. In this way, we can reconstruct the 3D spatial information of the specimen.

3.1. Supervised classification

To validate the material recognition capability of our system, we conducted supervised classification experiments on both the HoloSpec and Holo datasets using a ResNet50-based pipeline and performed the same procedure on the Holo dataset for comparison.

Figure 3 compares the results between the two datasets across three metrics: (a) feature-wise significance, (b) intra-class Euclidean distances, and (c) confusion matrices. In figure 3(a), HoloSpec has higher feature significance values in the leading dimensions, suggesting stronger alignment between the extracted ResNet50 features and class labels, which enhances discriminability. The intra-class distance analysis (figure 3(b)) reveals that HoloSpec features are more compact, with lower median distances and tighter interquartile ranges across most classes, though Holo shows marginally shorter extreme tails in some cases. This indicates better intra-class cohesion for HoloSpec. Besides, the confusion matrices in figure 3(c), generated using the ResNet50-MLP pipeline, demonstrate superior performance for HoloSpec, with fewer off-diagonal misclassifications and higher per-class precision and recall. These results suggest that ResNet50 effectively captures a small set of highly discriminative features, enabling rapid overall model convergence.

Figure 3. (a) Feature-wise significance ($-\log_p$), (b) intra-class distance comparison and (c) confusion matrix for HoloSpec vs. Holo.

Standard image High-resolution image

To further evaluate and quantify the classification performance of the supervised network on different sample types, we calculated the macro F1 score [36], as well as the precision, recall, and F1 score for each class in the HoloSpec dataset (as shown in table 2). The same analysis was performed on the Holo dataset for comparison. The results reveal that our system always has a better classification performance, especially for samples with indistinct shape features, such as MP particles. It is noteworthy that the system remains effective for semi transparent or opaque samples, such as water fleas and environmental impurities. Although these specimens may exhibit weak spectral transmission in the visible band, the in-line holographic geometry records their scattered wavefronts, producing interference patterns that encode distinct morphological features (e.g. sharp edges, surface texture, and shadowing effects). These holographically captured morphological signatures provide robust discriminative information, enabling accurate classification even in the absence of strong spectral contrast. Quantitatively, overall accuracy rises from 96.54% (864/895) on Holo to 98.10% (878/895) on HoloSpec, and macro F1 improves from 96.59% to 98.10%. Taken together, these converging results indicate that HoloSpec yields more separable representations and stronger downstream classification performance than Holo.

Table 2. Supervised clustering performance (precision, recall, and F1 score) of ResNet50 for each class on HoloSpec and Holo datasets.

 HoloSpecHoloClass$\mathcal$$\mathcal$F1$\mathcal$$\mathcal$F1PC0.97590.98180.97890.95270.97580.9641PMMA0.97140.98840.97980.91760.97090.9435PP0.99330.97370.98340.98650.96050.9733PVC0.99291.00000.99640.98580.99290.9893impurity0.96990.96990.96990.98410.93940.9612Waterflea0.98470.96990.97730.98450.94780.9658Macro F10.98100.96593.2. Unsupervised classification

We further selected two classic well-trained CNN architectures, AlexNet and ResNet18, to perform feature extraction on both datasets, followed by k-means clustering for unsupervised classification. A series of visual analyzes were conducted on the clustering results. The t-SNE projections [37] shown in figure 4(a) reveal that the clusters formed by HoloSpec samples in the embedded space are more distinct and well-separated compared to Holo samples. Further feature-significance analysis (figure 4(b)) indicates that, on the HoloSpec dataset, both CNN models extract a greater number of high-amplitude, statistically significant features, demonstrating that multi-channel inputs facilitate the capture of more discriminative representations. In contrast, under identical significance thresholds, the Holo dataset yields substantially fewer significant features, highlighting its limitations in feature expressiveness.

Figure 4. (a) t-SNE projection of extracted features, (b) feature-wise significance ($-\log_p$) and (c) intra-class distance comparison for HoloSpec vs. Holo datasets.

Standard image High-resolution image

The compactness analysis of the feature space further supports this conclusion (figure 4(c)). Across both AlexNet and ResNet-18, the intra-class feature distributions learned from HoloSpec are consistently more compact than those learned from the original Holo dataset. In the violin plots, the median distance to the class centroid for HoloSpec is roughly 10% lower, the inter-quartile range is 15%–20% narrower, and the long-tail extent is markedly reduced, indicating fewer hard or aberrant samples. Taken together, these results show that HoloSpec offers a cleaner class structure and better separability, making it a more suitable dataset for representation-learning tasks.

Tables 3 provides a detailed evaluation of clustering performance metrics. Using the same clustering procedure, the macro F1 scores for HoloSpec reached 65.62% (AlexNet) and 65.91% (ResNet18), representing improvements of 15.22% and 13.33%, respectively, over the Holo dataset scores of 50.40% and 52.58%. Furthermore, the per-class F1 scores for all six categories in HoloSpec are consistently higher than those in Holo. The corresponding confusion matrices (figure 5) demonstrate that HoloSpec achieves 15–25 additional correctly classified samples on the diagonal, with significantly fewer misclassifications, further validating its superior performance in material classification. These results conclusively demonstrate that HoloSpec not only generates more discriminative features and effectively compresses intra-class variance but also significantly improves clustering accuracy. It is important to clarify that common MPs do not exhibit sharp, fingerprint like absorption lines in the visible range. Instead, our method leverages their subtle, material specific variations in broadband absorption and scattering profiles. These continuous spectral contrasts are optically encoded into 2D spatial intensity patterns via dispersion. Crucially, this spectral information is analyzed in conjunction with the rich morphological and refractive index data simultaneously captured in the holographic channel. This dual modality fusion provides the comprehensive feature set that enables the system to accurately differentiate between particle types, as demonstrated by its superior performance over Holo baselines.

Figure 5. Confusion matrices on the HoloSpec and Holo dataset under unsupervised learning.

Standard image High-resolution image

Table 3. Comparison of F1 scores for HoloSpec and Holo datasets using AlexNet and ResNet18 features.

 HoloSpecHoloAlexNetResNet18AlexNetResNet18Macro F10.65620.65910.50400.5258PC0.32600.54560.29800.000PMMA0.61150.55680.41330.5518PP0.88740.85320.82620.6134PVC0.60760.28850.60400.3444Impurity0.77040.80550.58220.7595Waterflea0.73440.90170.30020.8855

To provide a more intuitive and quantitative comparison of the performance of the two datasets across supervised and unsupervised learning using three different network architectures, we present the overall accuracy ($\mathcal$), precision ($\mathcal$), recall ($\mathcal$), and F1 scores in table 4. It can be observed that regardless of the network model used for training and testing, HoloSpec consistently demonstrates superior accuracy, precision, recall, and F1 scores. Moreover, the metrics are stable, indicating strong robustness. Notably, for unsupervised feature learning, HoloSpec achieves a significant performance improvement exceeding 10%.

Comments (0)

No login
gif