计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (5): 1160-1181.DOI: 10.3778/j.issn.1673-9418.2306024
曾凡智,冯文婕,周燕
出版日期:
2024-05-01
发布日期:
2024-04-29
ZENG Fanzhi, FENG Wenjie, ZHOU Yan
Online:
2024-05-01
Published:
2024-04-29
摘要: 自然场景文本识别在学术研究和实际应用中具有重要价值,已经成为计算机视觉领域的研究热点之一。然而,识别过程存在文本风格多样、背景环境复杂等挑战,导致识别效率和准确率不佳。传统的基于手工设计特征文本识别方法由于其有限的表示能力,不足以有效地应对复杂的自然场景文本识别任务。近年来,采用深度学习方法在自然场景文本识别中取得了重大进展,系统地梳理了近年来相关研究工作。首先,根据是否需要对单字符进行分割,将自然场景文本识别方法分为基于分割与无需分割的方法,再根据其技术实现特点将无需分割的方法进行细分,并对各类最具有代表性的方法工作原理进行了阐述。然后,介绍了当前常用数据集以及评价指标,并在数据集上对各类方法进行了性能对比,从多个方面讨论了各类方法的优势与局限性。最后,指出基于深度学习的自然场景文本识别研究存在的不足和难点,对其未来的发展趋势进行了展望。
曾凡智, 冯文婕, 周燕. 深度学习的自然场景文本识别方法综述[J]. 计算机科学与探索, 2024, 18(5): 1160-1181.
ZENG Fanzhi, FENG Wenjie, ZHOU Yan. Survey on Natural Scene Text Recognition Methods of Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1160-1181.
[1] LUCAS S M, PANARETOS A, SOSA L, et al. ICDAR 2003 robust reading competitions: entries, results, and future directions[J]. International Journal of Document Analysis and Recognition, 2005, 7: 105-122. [2] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893. [3] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the 1999 International Conference on Computer Vision, Kerkyra, Sep 20-25, 1999. Washington: IEEE Computer Society, 1999: 1150-1157. [4] GRAVES A, FERNáNDEZ S, GOMEZ F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 369-376. [5] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014. [6] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [7] CHEN X, JIN L, ZHU Y, et al. Text recognition in the wild: a survey[J]. ACM Computing Surveys, 2021, 54(2): 1-35. [8] 刘崇宇, 陈晓雪, 罗灿杰, 等. 自然场景文本检测与识别的深度学习方法[J]. 中国图象图形学报, 2021, 26(6): 1330-1367. LIU C Y, CHEN X X, LUO C J, et al. Deep learning methods for scene text detection and recognition[J]. Journal of Image and Graphics, 2021, 26(6): 1330-1367. [9] 刘艳菊, 伊鑫海, 李炎阁, 等. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63. LIU Y J, YI X H, LI Y G, et al. Application of scene text recognition technology based on deep learning: a survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63. [10] NAIEMI F, GHODS V, KHALESI H. Scene text detection and recognition: a survey[J]. Multimedia Tools and Applications, 2022, 81(14): 20255-20290. [11] 周燕, 韦勤彬, 廖俊玮, 等. 自然场景文本检测与端到端识别: 深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594. ZHOU Y, WEI Q B, LIAO J W, et al. Natural scene text detection and end-to-end recognition: deep learning methods[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 577-594. [12] WANG K, BABENKO B, BELONGIE S. End-to-end scene text recognition[C]//Proceedings of the 2011 International Conference on Computer Vision. Washington: IEEE Computer Society, 2011: 1457-1464. [13] MISHRA A, ALAHARI K, JAWAHAR C V. Scene text recognition using higher order language priors[C]//Proceedings of the 2012 British Machine Vision Conference, Surrey, Sep 3-7, 2012: 1-11. [14] LAFFERTY J, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 2001: 282-289. [15] YAO C, BAI X, SHI B, et al. Strokelets: a learned multi-scale representation for scene text recognition[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2014: 4042-4049. [16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [17] WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition. Piscataway: IEEE, 2012: 3304-3308. [18] MISHRA A, ALAHARI K, JAWAHAR C V. Enhancing energy minimization framework for scene text recognition with top-down cues[J]. Computer Vision and Image Understanding, 2016, 145: 30-42. [19] BISSACCO A, CUMMINS M, NETZER Y, et al. PhotoOCR: reading text in uncontrolled conditions[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2013: 785-792. [20] ALSHARIF O, PINEAU J. End-to-end text recognition with hybrid HMM maxout models[J]. arXiv:1310.1811, 2013. [21] GOODFELLOW I, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[C]//Proceedings of the 30th International Conference on Machine Learning, Atlanta, Jun 16-21, 2013: 1319-1327. [22] BAUM L E, EAGON J A. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology[J]. Bulletin of the American Meteorological Society, 1967, 73(3): 360-363. [23] LIU X, KAWANISHI T, WU X, et al. Scene text recognition with CNN classifier and WFST-based word labeling[C]// Proceedings of the 23rd International Conference on Pattern Recognition. Piscataway: IEEE, 2016: 3999-4004. [24] PHAN T Q, SHIVAKUMARA P, TIAN S, et al. Recognizing text with perspective distortion in natural scenes[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2013: 569-576. [25] GORDO A. Supervised mid-level features for word image representation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 2956-2964. [26] WAN Z, HE M, CHEN H, et al. TextScanner: reading characters in order for robust scene text recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 12120-12127. [27] ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent neural network regularization[J]. arXiv:1409.2329, 2014. [28] GRAVES A, LIWICKI M, BUNKE H, et al. Unconstrained on-line handwriting recognition with recurrent neural networks[C]//Advances in Neural Information Processing Systems 20, Vancouver, Dec 3-6, 2007: 577-584. [29] HE P, HUANG W, QIAO Y, et al. Reading scene text in deep convolutional sequences[C]//Proceedings of the 2016 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2016: 3501-3508. [30] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780. [31] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its appli-cation to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304. [32] GAO Y, CHEN Y, WANG J, et al. Reading scene text with attention convolutional sequence modeling[J]. arXiv:1709. 04303, 2017. [33] YIN F, WU Y C, ZHANG X Y, et al. Scene text recognition with sliding convolutional character models[J]. arXiv:1709. 01727, 2017. [34] WANG J, HU X. Gated recurrent convolution neural network for OCR[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 335-344. [35] LIANG M, HU X. Recurrent convolutional neural network for object recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3367-3375. [36] CHO K, VAN M B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014. [37] LIU W, CHEN C, WONG K Y K, et al. STAR-Net: a spatial attention residue network for scene text recognition[C]//Proceedings of the British Machine Vision Conference 2016, York, Sep 19-22, 2016: 2-7. [38] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems 28, Montreal, Dec 7-12, 2015: 2017-2025. [39] LIU Y, WANG Z, JIN H, et al. Synthetically supervised feature learning for scene text recognition[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 435-451. [40] QI X, CHEN Y, XIAO R, et al. A novel joint character categorization and localization approach for character-level scene text recognition[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 83-90. [41] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [42] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10012-10022. [43] DU Y, CHEN Z, JIA C, et al. SVTR: scene text recognition with a single visual model[J]. arXiv:2205.00159, 2022. [44] LIU H, JIN S, ZHANG C. Connectionist temporal classification with maximum entropy regularization[C]//Advances in Neural Information Processing Systems 31, Montreal, Dec 3-8, 2018: 839-849. [45] WAN Z, XIE F, LIU Y, et al. 2D-CTC for scene text recognition[J]. arXiv:1907.09705, 2019. [46] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 3104-3112. [47] CHENG Z, BAI F, XU Y, et al. Focusing attention: towards accurate text recognition in natural images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 5076-5084. [48] BAI F, CHENG Z, NIU Y, et al. Edit probability for scene text recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 1508-1516. [49] LUO C, JIN L, SUN Z. Moran: a multi-object rectified attention network for scene text recognition[J]. Pattern Recognition, 2019, 90: 109-118. [50] WANG T, ZHU Y, JIN L, et al. Decoupled attention network for text recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 12216-12224. [51] SHI B, WANG X, LYU P, et al. Robust scene text recognition with automatic rectification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 4168-4176. [52] BOOKSTEIN F L. Principal warps: thin-plate splines and the decomposition of deformations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, 11(6): 567-585. [53] SHI B, YANG M, WANG X, et al. Aster: an attentional scene text recognizer with flexible rectification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9): 2035-2048. [54] ZHAN F, LU S. ESIR: end-to-end scene text recognition via iterative image rectification[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2059-2068. [55] YANG M, GUAN Y, LIAO M, et al. Symmetry-constrained rectification network for scene text recognition[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9147-9156. [56] LIU W, CHEN C, WONG K Y. Char-Net: a character-aware neural network for distorted scene text recognition[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 7154-7161. [57] LIN Q, LUO C, JIN L, et al. STAN: a sequential transformation attention-based network for scene text recognition[J]. Pattern Recognition, 2021, 111: 107692. [58] MCLEOD R M. The generalized Riemann integral[M]. American Mathematical Soc., 1980. [59] MOU Y, TAN L, YANG H, et al. PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 158-174. [60] 陈佐瓒, 徐兵, 丁小军, 等. 基于Encoder-Decoder框架的双监督机制自然场景文本识别[J]. 计算机工程与应用, 2022, 58(6): 128-133. CHEN Z Z, XU B, DING X J, et al. Natural scene text recognition based on encoder-decoder framework with dual supervision mechanism[J]. Computer Engineering and Applications, 2022, 58(6): 128-133. [61] LEE C Y, OSINDERO S. Recursive recurrent nets with attention modeling for OCR in the wild[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2231-2239. [62] SHENG F, CHEN Z, XU B. NRTR: a no-recurrence sequence-to-sequence model for scene text recognition[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 781-786. [63] YANG X, HE D, ZHOU Z, et al. Learning to read irregular text with attention mechanisms[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Aug 19-25, 2017: 3280-3286. [64] LI H, WANG P, SHEN C, et al. Show, attend and read: a simple and strong baseline for irregular text recognition[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8610-8617. [65] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3431-3440. [66] LIAO M, ZHANG J, WAN Z, et al. Scene text recognition from two-dimensional perspective[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8714-8721. [67] LONG S, GUAN Y, BIAN K, et al. A new perspective for flexible feature gathering in scene text recognition via character anchor pooling[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 2458-2462. [68] LITMAN R, ANSCHEL O, TSIPER S, et al. Scatter: selective context attentional scene text recognizer[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11962-11972. [69] ZHANG Y, NIE S, LIU W, et al. Sequence-to-sequence domain adaptation network for robust text image recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2740-2749. [70] UO C, LIN Q, LIU Y, et al. Separating content from style using adversarial learning for recognizing text in the wild[J]. International Journal of Computer Vision, 2021, 129: 960-976. [71] MIRZA M, OSINDERO S. Conditional generative adversarial nets[J]. arXiv:1411.1784, 2014. [72] ZHONG D, LYU S, SHIVAKUMARA P, et al. SGBANet: semantic GAN and balanced attention network for arbitrarily oriented scene text recognition[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 464-480. [73] QIAO Z, ZHOU Y, YANG D, et al. Seed: semantics enhanced encoder-decoder framework for scene text recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13528-13537. [74] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146. [75] YU D, LI X, ZHANG C, et al. Towards accurate scene text recognition with semantic reasoning networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 12113-12122. [76] FANG S, XIE H, WANG Y, et al. Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 7098-7107. [77] NA B, KIM Y, PARK S. Multi-modal text recognition networks: interactive enhancements between visual and semantic features[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 446-463. [78] WANG Y, XIE H, FANG S, et al. From two to one: a new scene text recognizer with visual language modeling network[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14194-14203. [79] ZHANG B, XIE H, WANG Y, et al. Linguistic more: taking a further step toward efficient and accurate scene text recognition[J]. arXiv:2305.05140, 2023. [80] CHU X, WANG Y. IterVM: iterative vision modeling module for scene text recognition[C]//Proceedings of the 26th International Conference on Pattern Recognition. Piscataway: IEEE, 2022: 1393-1399. [81] HE Y, CHEN C, ZHANG J, et al. Visual semantics allow for textual reasoning better in scene text recognition[C]//Proceedings of the 2022 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 888-896. [82] BHUNIA A K, SAIN A, KUMAR A, et al. Joint visual semantic reasoning: multi-stage decoder for text recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14940-14949. [83] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 2117-2125. [84] LIU H, WANG B, BAO Z, et al. Perceiving stroke-semantic context: hierarchical contrastive learning for robust scene text recognition[C]//Proceedings of the 2022 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 1702-1710. [85] WANG P, DA C, YAO C. Multi-granularity prediction for scene text recognition[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 339-355. [86] BAUTISTA D, ATIENZA R. Scene text recognition with permuted autoregressive sequence models[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 178-196. [87] YANG Z, DAI Z, YANG Y, et al. XLNet: generalized auto-regressive pretraining for language understanding[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 5754-5764. [88] DU Y, CHEN Z, JIA C, et al. Context perception parallel decoder for scene text recognition[J]. arXiv:2307.12270, 2023. [89] YANG X, QIAO Z, WEI J, et al. Masked and permuted implicit context learning for scene text recognition[J]. arXiv:2305.16172, 2023. [90] ABERDAM A, BENSA?D D, GOLTS A, et al. CLIPTER: looking at the bigger picture in scene text recognition[J]. arXiv:2301.07464, 2023. [91] ZHAO S, WANG X, ZHU L, et al. CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model[J]. arXiv:2305.14014, 2023. [92] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24 , 2021: 8748-8763. [93] HU W, CAI X, HOU J, et al. GTC: guided training of CTC towards efficient and accurate scene text recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 11005-11012. [94] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016. [95] YAN R, PENG L, XIAO S, et al. Primitive representation learning for scene text recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 284-293. [96] ATIENZA R. Vision transformer for fast and efficient scene text recognition[C]//Proceedings of the 16th International Conference on Document Analysis and Recognition, Lausanne, Sep 5-10, 2021. Cham: Springer, 2021: 319-334. [97] YAN X, FANG Z, JIN Y. Augmented Transformers with adaptive n-grams embedding for multilingual scene text recognition[J]. arXiv:2302.14261, 2023. [98] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv:1406.2227, 2014. [99] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2315-2324. [100] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, Aug 25-28, 2013. Washington: IEEE Computer Society, 2013: 1484-1493. [101] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition, Nancy, Aug 23-26, 2015. Washington: IEEE Computer Society, 2015: 1156-1160. [102] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18): 8027-8048. [103] HE M, LIU Y, YANG Z, et al. ICPR2018 contest on robust reading for multi-type web images[C]//Proceedings of the 24th International Conference on Pattern Recognition. Piscataway: IEEE, 2018: 7-12. [104] SUN Y, LIU J, LIU W, et al. Chinese street view text: large-scale Chinese text reading with partially supervised learning[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9086-9095. [105] SUN Y, NI Z, CHNG C K, et al. ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 1557-1562. [106] NAYEF N, PATEL Y, BUSTA M, et al. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 1582-1587. |
[1] | 蒲秋梅, 殷帅, 李正茂, 赵丽娜. U型卷积网络在乳腺医学图像分割中的研究综述[J]. 计算机科学与探索, 2024, 18(6): 1383-1403. |
[2] | 江健, 张琪, 王财勇. 基于深度学习的虹膜识别研究综述[J]. 计算机科学与探索, 2024, 18(6): 1421-1437. |
[3] | 于范, 张菁. 滑窗注意力多尺度均衡的密集行人检测算法[J]. 计算机科学与探索, 2024, 18(5): 1286-1300. |
[4] | 张凯丽, 王安志, 熊娅维, 刘运. 基于Transformer的单幅图像去雾算法综述[J]. 计算机科学与探索, 2024, 18(5): 1182-1196. |
[5] | 蓝鑫, 吴淞, 伏博毅, 秦小林. 深度学习的遥感图像旋转目标检测综述[J]. 计算机科学与探索, 2024, 18(4): 861-877. |
[6] | 孙水发, 汤永恒, 王奔, 董方敏, 李小龙, 蔡嘉诚, 吴义熔. 动态场景的三维重建研究综述[J]. 计算机科学与探索, 2024, 18(4): 831-860. |
[7] | 王恩龙, 李嘉伟, 雷佳, 周士华. 基于深度学习的红外可见光图像融合综述[J]. 计算机科学与探索, 2024, 18(4): 899-915. |
[8] | 曹传博, 郭春, 李显超, 申国伟. 基于AECD词嵌入的挖矿恶意软件早期检测方法[J]. 计算机科学与探索, 2024, 18(4): 1083-1093. |
[9] | 周燕, 李文俊, 党兆龙, 曾凡智, 叶德旺. 深度学习的三维模型识别研究综述[J]. 计算机科学与探索, 2024, 18(4): 916-929. |
[10] | 杨超城, 严宣辉, 陈容均, 李汉章. 融合双重注意力机制的时间序列异常检测模型[J]. 计算机科学与探索, 2024, 18(3): 740-754. |
[11] | 申通, 王硕, 李孟, 秦伦明. 深度学习在动物行为分析中的应用研究进展[J]. 计算机科学与探索, 2024, 18(3): 612-626. |
[12] | 薛金强, 吴秦. 面向图像复原和增强的轻量级交叉门控Transformer[J]. 计算机科学与探索, 2024, 18(3): 718-730. |
[13] | 彭斌, 白静, 李文静, 郑虎, 马向宇. 面向图像分类的视觉Transformer研究进展[J]. 计算机科学与探索, 2024, 18(2): 320-344. |
[14] | 王一凡, 刘静, 马金刚, 邵润华, 陈天真, 李明. 深度学习在乳腺癌影像学检查中的应用进展[J]. 计算机科学与探索, 2024, 18(2): 301-319. |
[15] | 王昆, 郭威, 王尊严, 韩文强. 赤足足迹识别研究综述[J]. 计算机科学与探索, 2024, 18(1): 44-57. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||