基于轻型自限制注意力的结构光相位及深度估计混合网络

朱新军; 赵浩淼; 王红一; 宋丽梅; 孙瑞群

doi:10.37188/CO.2023-0066

基于轻型自限制注意力的结构光相位及深度估计混合网络

doi: 10.37188/CO.2023-0066

cstr: 32171.14.CO.2023-0066

天津工业大学人工智能学院, 天津 300387

基金项目: 国家自然科学基金 (No. 61905178)；天津市教委科研计划项目 (No. 2019KJ021)

详细信息

作者简介:
朱新军（1985—），男，山东临沂人，博士，副教授，硕士生导师，2008年于临沂师范学院获得学士学位, 2011年于山东理工大学获得硕士学位，2015年于天津大学获得博士学位，主要从事光学三维测量与智能计算成像的研究。E-mail：xinjunzhu@tiangong.edu.cn

中图分类号: TP394.1;TH691.9
计量
- 文章访问数: 951
- HTML全文浏览量: 387
- PDF下载量: 236
- 被引次数: 0
出版历程
- 收稿日期: 2023-04-14
- 修回日期: 2023-05-15
- 网络出版日期: 2023-09-18

A hybrid network based on light self-limited attention for structured light phase and depth estimation

School of Artificial Intelligence, Tiangong University, Tianjin 300387, China

Funds: Supported by National Natural Science Foundation of China (No. 61905178); Science & Technology Development Fund of Tianjin Education Commission for Higher Education (No. 2019KJ021)

More Information

Corresponding author: xinjunzhu@tiangong.edu.cn

摘要

摘要:
相位提取与深度估计是结构光三维测量中的重点环节，目前传统方法在结构光相位提取与深度估计方面存在效率不高、结果不够鲁棒等问题。为了提高深度学习结构光的重建效果，本文提出了一种基于轻型自限制注意力（Light Self-Limited-Attention，LSLA）的结构光相位及深度估计混合网络，即构建一种CNN-Transformer的混合模块，并将构建的混合模块放入U型架构中，实现CNN与Transformer的优势互补。将所提出的网络在结构光相位估计和结构光深度估计两个任务上进行实验，并和其他网络进行对比。实验结果表明：相比其他网络，本文所提出的网络在相位估计和深度估计的细节处理上更加精细，在结构光相位估计实验中，精度最高提升31%；在结构光深度估计实验中，精度最高提升26%。该方法提高了深度神经网络在结构光相位估计及深度估计的准确性。
- 结构光 /
- 深度学习 /
- 自限制注意力 /
- 相位估计 /
- 深度估计
Abstract:
Phase retrieval and depth estimation are vital to three-dimensional measurement using structured light. Currently, conventional methods for structured light phase retrieval and depth estimation have limited efficiency and are lack of robustness in their results and so on. To improve the reconstruction effect of structured light by deep learning, we propose a hybrid network for structured light phase and depth estimation based on Light Self-Limited Attention (LSLA). Specifically, a CNN-Transformer hybrid module is constructed and integrated into a U-shaped structure to realize the advantages complementary of CNN and Transformer. The proposed network is experimentally compared with other networks in structured light phase estimation and structured light depth estimation. The experimental results indicate that the proposed network achieves finer detail processing in phase and depth estimation compared to other networks. Specifically, for structured light phase and depth estimation, its accuracy improves by 31% and 26%, respectively. Therefore, the proposed network improves the accuracy of deep neural networks in the structured light phase and depth estimation areas.
- structured light /
- deep learning /
- self-limited attention /
- phase estimation /
- depth estimation

HTML全文

图 1 FPP系统原理图

Figure 1. Schematic diagram of the FPP system

下载: 全尺寸图片幻灯片

图 2 网络结构图

Figure 2. Network structure diagram

下载: 全尺寸图片幻灯片

图 3 CNN-Transformer模块结构图

Figure 3. Structure of the CNN-Transformer module

下载: 全尺寸图片幻灯片

图 4 部分数据示例图。第一行为仿真数据，第二行为真实数据。(a)仿真条纹图；(b)仿真条纹图D；(c)仿真条纹图M；(d)仿真条纹图包裹相位；(e)真实条纹图；(f)真实条纹图D；(g)真实条纹图M；(h)真实条纹图包裹相位

Figure 4. Sample maps in some datasets. The first lines are simulation data, the second lines are real data. (a) Simulation fringe map; (b) simulation fringe map D; (c) simulation fringe map M; (d) simulation fringe wrapped phase; (e) real fringe map; (f) real fringe map D; (g) real fringe map M; (h) real fringe wrapped phase

下载: 全尺寸图片幻灯片

图 5 不同网络仿真和真实数据包裹相位对比。蓝色框为仿真数据，橙色框为真实数据。(a)UNet；(b)DPH；(c)R2UNet；(d)SUNet；(e)Ours；(f)标签

Figure 5. Comparison of different network simulation and real data wrapped phases. The blue boxes are the simulation data, and the orange boxes are the real data. (a) UNet; (b) DPH; (c) R2UNet; (d) SUNet; (e) Ours; (f) Label

下载: 全尺寸图片幻灯片

图 6 包裹相位结果曲线图。(a)仿真数据结果比较；(b)真实数据结果比较

Figure 6. Wrapped phase curves.(a) Comparison of simulation data; (b) comparison of real data

下载: 全尺寸图片幻灯片

图 7 生成数据集流程图。(a) 模型导入；(b) 调整大小；(c) 投影条纹

Figure 7. Flowchart of dataset generation. (a) Model import; (b) adjust of the model size; (c) projection fringe

下载: 全尺寸图片幻灯片

图 8 部分数据示例图。(a)仿真条纹图；(b)真实条纹图；(c)仿真深度图；(d)真实深度图

Figure 8. Sample maps in the dataset. (a) Simulated fringe map; (b) real fringe map; (c) simulation depth map; (d) real depth map

下载: 全尺寸图片幻灯片

图 9 不同方法深度估计视觉结果比较。蓝色框为仿真数据，橙色框为真实数据。(a) 输入数据； (b) UNet；(c) DPH；(d) R2UNet；(e) Ours；(f)标签

Figure 9. Comparison of the visual results of depth estimation by different methods. The blue boxes are the simulation data, and the orange boxes are the real data. (a) Input data; (b) UNet; (c) DPH; (d) R2UNet; (e) Ours; (f) Label

下载: 全尺寸图片幻灯片

表 1 不同包裹相位计算方法比较

Table 1. Comparison of the different wrapped phase calculation methods

	MSE	时间t/s
直接预测包裹相位	0.2833	5.89
分别预测DM	0.1739	11.7
同时预测 DM	0.16806	7.54

下载: 导出CSV

表 2 包裹相位预测方法比较

Table 2. Comparison of the wrapped phase prediction methods

	仿真数据		真实数据
	MSE	时间t/s	MSE	时间t/s
UNet	0.02658	6.67	0.16806	7.54
DPH	0.02710	11.65	0.12974	11.78
R2UNet	0.02734	13.69	0.12905	14.30
SUNet	0.02717	7.95	0.14350	8.29
Ours	0.02395	11.06	0.11622	11.67

下载: 导出CSV

表 3 消融实验结果比较

Table 3. Comparison of ablation experiment results

	MSE	时间t/s
CMT	11.32	6.89
CMT替换LSLA	9.17	6.45
CMT替换FFN	11.34	5.54
CMT+U形结构	8.94	9.68

下载: 导出CSV

表 4 不同方法深度估计结果比较

Table 4. Comparison of the depth estimation results by different methods

	仿真数据		真实数据
	MSE	时间t/s	MSE	时间t/s
Unet	8.78	5.98	9.97	6.44
DPH	8.03	8.66	9.86	10.59
R2UNet	7.57	8.73	8.72	10.92
Ours	6.43	8.09	7.64	8.44

下载: 导出CSV

参考文献(23)

[1]	左超, 张晓磊, 胡岩, 等. 3D真的来了吗?—三维结构光传感器漫谈[J]. 红外与激光工程,2020,49(3):0303001. doi: 10.3788/IRLA202049.0303001 ZUO CH, ZHANG X L, HU Y, et al. Has 3D finally come of age?——An introduction to 3D structured-light sensor[J]. Infrared and Laser Engineering, 2020, 49(3): 0303001. (in Chinese) doi: 10.3788/IRLA202049.0303001
[2]	王永红, 张倩, 胡寅, 等. 显微条纹投影小视场三维表面成像技术综述[J]. 中国光学,2021,14(3):447-457. doi: 10.37188/CO.2020-0199 WANG Y H, ZHANG Q, HU Y, et al. 3D small-field surface imaging based on microscopic fringe projection profilometry: a review[J]. Chinese Optics, 2021, 14(3): 447-457. (in Chinese) doi: 10.37188/CO.2020-0199
[3]	冯世杰, 左超, 尹维, 等. 深度学习技术在条纹投影三维成像中的应用[J]. 红外与激光工程,2020,49(3):0303018. doi: 10.3788/IRLA202049.0303018 FENG SH J, ZUO CH, YIN W, et al. Application of deep learning technology to fringe projection 3D imaging[J]. Infrared and Laser Engineering, 2020, 49(3): 0303018. (in Chinese) doi: 10.3788/IRLA202049.0303018
[4]	SU X Y, CHEN W J. Fourier transform profilometry: a review[J]. Optics and Lasers in Engineering, 2001, 35(5): 263-284. doi: 10.1016/S0143-8166(01)00023-9
[5]	ZHENG D L, DA F P, KEMAO Q, et al. Phase-shifting profilometry combined with Gray-code patterns projection: unwrapping error removal by an adaptive median filter[J]. Optics Express, 2017, 25(5): 4700-4713. doi: 10.1364/OE.25.004700
[6]	AN Y T, HYUN J S, ZHANG S. Pixel-wise absolute phase unwrapping using geometric constraints of structured light system[J]. Optics Express, 2016, 24(16): 18445-18459. doi: 10.1364/OE.24.018445
[7]	GHIGLIA D C, ROMERO L A. Robust two-dimensional weighted and unweighted phase unwrapping that uses fast transforms and iterative methods[J]. Journal of the Optical Society of America A, 1994, 11(1): 107-117. doi: 10.1364/JOSAA.11.000107
[8]	FENG SH J, CHEN Q, GU G H, et al. Fringe pattern analysis using deep learning[J]. Advanced Photonics, 2019, 1(2): 025001.
[9]	NGUYEN H, WANG Y Z, WANG ZH Y. Single-shot 3D shape reconstruction using structured light and deep convolutional neural networks[J]. Sensors, 2020, 20(13): 3718. doi: 10.3390/s20133718
[10]	VAN D J S, DIRCKX J J J. Deep neural networks for single shot structured light profilometry[J]. Optics Express, 2019, 27(12): 17091-17101. doi: 10.1364/OE.27.017091
[11]	张钊, 韩博文, 于浩天, 等. 多阶段深度学习单帧条纹投影三维测量方法[J]. 红外与激光工程,2020,49(6):20200023. doi: 10.3788/irla.12_2020-0023 ZHANG ZH, HAN B W, YU H T, et al. Multi-stage deep learning based single-frame fringe projection 3D measurement method[J]. Infrared and Laser Engineering, 2020, 49(6): 20200023. (in Chinese) doi: 10.3788/irla.12_2020-0023
[12]	RANFTL R, BOCHKOVSKIY A, KOLTUN V. Vision transformers for dense prediction[C]. Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, 2021.
[13]	YANG G L, TANG H, DING M L, et al. Transformer-based attention networks for continuous pixel-wise prediction[C]. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, IEEE, 2021.
[14]	QI F, ZHAI J Z, DANG G H. Building height estimation using Google Earth[J]. Energy and Buildings, 2016, 118: 123-132. doi: 10.1016/j.enbuild.2016.02.044
[15]	ZHU X J, HAN ZH Q, YUAN M K, et al. Hformer: hybrid CNN-transformer for fringe order prediction in phase unwrapping of fringe projection[J]. Optical Engineering, 2022, 61(9): 093107.
[16]	GENG J. Structured-light 3D surface imaging: a tutorial[J]. Advances in Optics and Photonics, 2011, 3(2): 128-160. doi: 10.1364/AOP.3.000128
[17]	GUO J Y, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers[C]. Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2022.
[18]	CHEN ZH ZH, HANG W, ZHAO Y X. ViT-LSLA: Vision Transformer with Light Self-Limited-Attention[J]. arXiv:2210.17115.
[19]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2015.
[20]	WANG L, LU D Q, QIU R W, et al. 3D reconstruction from structured-light profilometry with dual-path hybrid network[J]. EURASIP Journal on Advances in Signal Processing, 2022, 2022(1): 14. doi: 10.1186/s13634-022-00848-5
[21]	袁梦凯, 朱新军, 侯林鹏. 基于R2U-Net的单帧投影条纹图深度估计[J]. 激光与光电子学进展,2022,59(16):1610001. YUAN M K, ZHU X J, HOU L P. Depth estimation from single-frame fringe projection patterns based on R2U-Net[J]. Laser & Optoelectronics Progress, 2022, 59(16): 1610001. (in Chinese)
[22]	FAN CH M, LIU T J, LIU K H. SUNet: swin transformer UNet for image denoising[C]. Proceedings of 2022 IEEE International Symposium on Circuits and Systems, IEEE, 2022.
[23]	ZHU X J, ZHANG ZH ZH, HOU L P, et al. Light field structured light projection data generation with Blender[C]. Proceedings of 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications, IEEE, 2022.