Review of Visual Question Answering Technology

doi:10.3778/j.issn.1673-9418.2303025

Abstract

Abstract: Visual question answering (VQA) is a popular cross-modal task that combines natural language pro-cessing and computer vision techniques. The main objective of this task is to enable computers to intelligently recognize and retrieve visual content and provide accurate answers. VQA involves the integration of multiple technologies such as object recognition and detection, intelligent question answering, image attribute classification, and scene analysis. It can support a wide range of cutting-edge interactive AI tasks such as visual dialogue and visual navigation, and has broad application prospects and great value. Over the past few years, the development of computer vision, natural language processing, and cross-modal AI models has provided many new technologies and methods for achieving the task of visual question answering. This paper mainly summarizes the mainstream models and specialized datasets in the field of visual question answering between 2019 and 2022. Firstly, this paper provides a review and discussion of the mainstream technical methods used in the key steps of implementing the visual question answering task, based on the module framework. Next, it subdivides various types of models in this field according to the technical methods adopted by mainstream models, and briefly introduces their improvement focus and limitations. Then, it summarizes the commonly used datasets and evaluation metrics for visual question answering, and compares and discusses the performance of several typical models. Finally, this paper focuses on the key issues that need to be addressed in the current visual question answering field, and predicts and prospects the future application and technological development in this field.

Key words: visual question answering (VQA), modal fusion, visual dialogue, intelligent question answering, cross-modal technology

摘要： 视觉问答（visual question answering，VQA）是融合自然语言处理与计算机视觉技术的图-文跨模态热门任务。该任务以计算机智能识别与检索图像内容并给出准确答案为主要目标，融合应用了目标识别与检测、智能问答、图像属性分类、场景分析等多项技术，能够支撑许多前沿交互式人工智能高层任务，如视觉对话、视觉导航等，具有广泛的应用前景和极高的应用价值。近几年，计算机视觉、自然语言处理及图-文跨模态领域人工智能模型的发展为视觉问答任务的实现提供了许多新的技术和方法。主要对2019—2022年视觉问答领域的主流模型及专业数据集进行总结。首先，依据视觉问答任务实现的模块框架，对关键步骤中的主流技术方法进行综述讨论。其次，按照主流模型采用的技术方法，将该领域内各类模型进行细分，并简要介绍改进重点和局限性。随后，综述视觉问答常用数据集与评价指标，对几类典型模型性能进行对比阐述。最后，对现阶段视觉问答领域内亟待解决的问题进行重点阐述，并对视觉问答领域未来应用及技术发展进行预测和展望。

关键词: 视觉问答（VQA）, 模态融合, 视觉对话, 智能问答, 跨模态技术

WANG Yu, SUN Haichun. Review of Visual Question Answering Technology[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(7): 1487-1505.

王虞, 孙海春. 视觉问答技术研究综述[J]. 计算机科学与探索, 2023, 17(7): 1487-1505.

References

[1] ANTOL S, AGRAWAL A, LU J, et al. VQA: visual ques-tion answering[C]//Proceedings of the 2015 IEEE Interna-tional Conference on Computer Vision, Santiago, Dec 7-13,2015. Washington: IEEE Computer Society, 2015: 2425-2433.
[2] FUKUI A, PARK D H, YANG D, et al. Multimodal com-pact bilinear pooling for visual question answering and visual grounding[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-4, 2016. Stroudsburg: ACL, 2016: 457-468.
[3] WU Q, WANG P, SHEN C H, et al. Ask me anything: free-form visual question answering based on knowledge from external sources[C]//Proceedings of the 2016 IEEE Confer-ence on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 4622-4630.
[4] LU J S, YANG J W, BATRA D, et al. Hierarchical co-attention for visual question answering[J]. arXiv:1606. 00061, 2016.
[5] LU J S, BATRA D, PARIKH D, et al. ViLBERT: pretrai-ning task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, Dec 8-14, 2019: 13-23.
[6] LU J S, GOSWAMI V, ROHRBACH M, et al. 12-in-1: multi- task vision and language representation learning[C]//Procee-dings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10434-10443.
[7] TAN H, BANSAL M. LXMERT: learning cross-modality en-coder representations from transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 5099-5110.
[8] CADENE R, DANCETTE C, BEN-YOUNES H, et al. RUBi: reducing unimodal biases in visual question answering[C]// Proceedings of the Annual Conference on Neural Informa-tion Processing Systems 2019, Vancouver, Dec 8-14, 2019: 841-852.
[9] CHEN L, YAN X H, XIAO J, et al. Counterfactual samples synthesizing for robust visual question answering[C]//Pro-ceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10797-10806.
[10] SONG H Y, DONG L, ZHANG W N, et al. CLIP models are few-shot learners: empirical studies on VQA and visual entailment[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 6088-6100.
[11] WANG R N, QIAN Y X, FENG F X, et al. Co-VQA: answering by interactive sub question sequence[C]//Procee-dings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stro-udsburg: ACL, 2022: 2396-2408.
[12] YU Z, YU J, CUI Y H, et al. Deep modular co-attention networks for visual question answering[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 6281-6290.
[13] GaO P, JIANG Z K, YOU H X, et al. Dynamic fusion with intra- and inter-modality attention flow for visual question answering[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 6639-6648.
[14] BEN-YOUNES H, CADèNE R, THOME N, et al. BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 8102-8109.
[15] LI H, WANG P, SHEN C H, et al. Visual question answer-ing as reading comprehension[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recogni-tion, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 6319-6328.
[16] MANJUNATHA V, SAINI N, DAVIS L S. Explicit bias discovery in visual question answering models[C]//Procee-dings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 9562-9571.
[17] WU J L, HU Z Y, MOONEY R J. Generating question relevant captions to aid visual question answering[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019.Stroudsburg: ACL, 2019: 3585-3594.
[18] CADèNE R, BEN-YOUNES H, CORD M, et al. MUREL: multimodal relational reasoning for visual question ans-wering[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 1989-1998.
[19] ZHOU Y Y, JI R R, SU J S, et al. Dynamic capsule attention for visual question answering[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Con-ference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 9324-9331.
[20] LAO M R, GUO Y M, PU N, et al. Multi-stage hybrid embedding fusion network for visual question answering[J]. Neurocomputing, 2021, 423: 541-550.
[21] SHRESTHA R, KAFLE K, KANAN C. Answer them all! Toward universal visual question answering models[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 10472-10481.
[22] YANG H, LIN J Y, YANG A, et al. Prompt tuning for generative multimodal pretrained models[J]. arXiv:2208. 02532, 2022.
[23] LI L J, GAN Z, CHENG Y, et al. Relation-aware graph attention network for visual question answering[C]//Procee-dings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 10312-10321.
[24] WU J L, MOONEY R J. Self-critical reasoning for robust visual question answering[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, Dec 8-14, 2019: 8601-8611.
[25] JIANG H Z, MISRA I, ROHRBACH M, et al. In defense of grid features for visual question answering[J]. arXiv:2001. 03615, 2020.
[26] YANG Z C, HE X D, GAO J F, et al. Stacked attention networks for image question answering[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016.Washington: IEEE Computer Society, 2016: 21-29.
[27] PENG L, YANG Y, BIN Y, et al. Word-to-region attention network for visual question answering[J]. Multimedia Tools and Applications, 2019, 78(3): 3843-3858.
[28] LIU F, LIU J, HONG R C, et al. Erasing-based attention learning for visual question answering[C]//Proceedings of the 27th ACM International Conference on Multimedia, Nice, Oct 21-25, 2019. New York: ACM, 2019: 1175-1183.
[29] SUN Q, FU Y W. Stacked self-attention networks for visual question answering[C]//Proceedings of the 2019 Interna-tional Conference on Multimedia Retrieval, Ottawa, Jun 10-13, 2019. New York: ACM, 2019: 207-211.
[30] RAHMAN T, CHOU S H, SIGAL L, et al. An improved attention for visual question answering[C]//Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun 19-25, 2021. Piscataway: IEEE, 2021: 1653-1662.
[31] PENG L, YANG Y, WANG Z, et al. CRA-Net: composed relation attention network for visual question answering[C]//Proceedings of the 27th ACM International Conference on Multimedia, Nice, Oct 21-25, 2019. New York: ACM, 2019: 1202-1210.
[32] PENG L, YANG Y, WANG Z, et al. MRA-Net: improving VQA via multi-modal relation attention network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 318-329.
[33] HUANG P P, HUANG J H, GUO Y Q, et al. Multi-grained attention with object-level grounding for visual question answering[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 3595-3600.
[34] WU C F, LIU J L, WANG X J, et al. Differential networks for visual question answering[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Inno-vative Applications of Artificial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in Artifi-cial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 8997-9004.
[35] ZHOU Y Y, JI R R, SUN X S, et al. Plenty is plague: fine-grained learning for visual question answering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019, 44(2): 697-709.
[36] XIONG P X, SHEN Y L, JIN H X. MGA-VQA: multi-granularity alignment for visual question answering[J]. arXiv:2201.10656, 2022.
[37] ZHAO Z L, SAMEL K, CHEN B H, et al. ProTo: program-guided transformer for program-guided tasks[C]//Procee-dings of the Annual Conference on Neural Information Processing Systems 2021, Dec 6-14, 2021: 17021-17036.
[38] HUANG Q B, WEI J L, CAI Y, et al. Aligned dual channel graph convolutional network for visual question answering[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 7166-7176.
[39] GAO P, YOU H X, ZHANG Z P, et al. Multi-modality latent interaction network for visual question answering[C]//Proceedings of the 2019 IEEE/CVF International Confer-ence on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 5824-5834.
[40] ZHONG H S, CHEN J Y, SHEN C, et al. Self-adaptive neural module transformer for visual question answering[J]. IEEE Transactions on Multimedia, 2021, 23: 1264-1273.
[41] GUO W Y, ZHANG Y, WU X P, et al. Re-attention for visual question answering[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 91-98.
[42] WU Y Z, SUN Q, MA J Q, et al. Question guided modular routing networks for visual question answering[J]. arXiv:1904.08324, 2019.
[43] LI X J, YIN X, LI C Y, et al. Oscar: object-semantics aligned pre-training for vision-language tasks[C]//LNCS 12375: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 121-137.
[44] XIONG P X, YOU Q Z, YU P, et al. SA-VQA: structured alignment of visual and semantic representations for visual question answering[J]. arXiv:2201.10654, 2022.
[45] GUO D L, XU C, TAO D C. Graph reasoning networks for visual question answering[J]. arXiv:1907.09815, 2019.
[46] LI G H, WANG X, ZHU W W. Boosting visual question answering with context-aware knowledge aggregation[C]//Proceedings of the 28th ACM International Conference on Multimedia, Seattle, Oct 12-16, 2020. New York: ACM, 2020: 1227-1235.
[47] ZHU Z H, YU J, WANG Y J, et al. Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual ques-tion answering[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence, Yokohama, 2020: 1097-1103.
[48] KHADEMI M. Multimodal neural graph memory networks for visual question answering[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 7177-7188.
[49] ZHOU Y Y, JI R R, SUN X S, et al. K-armed bandit based multi-modal network architecture search for visual question answering[C]//Proceedings of the 28th ACM International Conference on Multimedia, Seattle, Oct 12-16, 2020. New York: ACM, 2020: 1245-1254.
[50] RUWA N, MAO Q R, WANG L J, et al. Mood-aware visual question answering[J]. Neurocomputing, 2019, 330: 305-316.
[51] HUDSON D A, MANNING C D. Learning by abstraction: the neural state machine[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, Dec 8-14, 2019: 5901-5914.
[52] HEO Y J, KIM E S, CHOI W S, et al. Hypergraph trans-former: weakly-supervised multi-hop reasoning for knowledge-based visual question answering[J]. arXiv:2204.10448, 2022.
[53] YAMADA M, D'AMARIO V, TAKEMOTO K, et al. Trans-former module networks for systematic generalization in visual question answering[J]. arXiv:2201.11316, 2022.
[54] WANG W H, BAO H B, DONG L, et al. Image as a foreign language: BEiT pretraining for all vision and vision-language tasks[J]. arXiv:2208.10442, 2022.
[55] BAO H B, WANG W H, DONG L, et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts[J]. arXiv:2111.02358, 2021.
[56] CHEN X, WANG X, CHANGPINYO S, et al. PaLI: a jointly-scaled multilingual language-image model[J]. arXiv:2209. 06794, 2022.
[57] WANG Z R, YU J H, YU A W, et al. SimVLM: simple visual language model pretraining with weak supervision[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022: 1-17.
[58] SHAH M, CHEN X L, ROHRBACH M, et al. Cycle-consistency for robust visual question answering[C]//Procee-dings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 6649-6658.
[59] GUO D L, TAO D C. Learning compositional represen-tation for few-shot visual question answering[J]. arXiv:2102.10575, 2021.
[60] CHEN Y, LI L J, YU L C, et al. UNITER: universal image-text representation learning[C]//LNCS 12375: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 104-120.
[61] LI C L, XU H Y, TIAN J F, et al. mPLUG: effective and efficient vision-language learning by cross-modal skip-connections[J]. arXiv:2205.12005, 2022.
[62] JIA C, YANG Y F, XIA Y, et al. Scaling up visual and vision-language representation learning with noisy text supervision[C]//Proceedings of the 38th International Con-ference on Machine Learning, Jul 18-24, 2021: 4904-4916.
[63] ZHU X, MAO Z D, LIU C X, et al. Overcoming language priors with self-supervised learning for visual question answering[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence, Yokohama, Jul 2020: 1083-1089.
[64] GUPTA V, LI Z W, KORTYLEWSKI A, et al. SwapMix: diagnosing and regularizing the over-reliance on visual context in visual question answering[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022. Piscat-away: IEEE, 2022: 5068-5078.
[65] GHOSH S, BURACHAS G, RAY A, et al. Generating natural language explanations for visual question answering using scene graphs and visual attention[J]. arXiv:1902.05715, 2019.
[66] AYYUBI H A, TANJIM M, MCAULEY J J, et al. Generating rationales in visual question answering[J]. arXiv: 2004.02032, 2020.
[67] YAN M, XU H Y, LI C L, et al. Achieving human parity on visual question answering[J]. arXiv:2111.08896, 2021.
[68] WHITEHEAD S, PETRYK S, SHAKIB V, et al. Reliable visual question answering: abstain rather than answer incor-rectly[C]//LNCS 13696: Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 148-166.
[69] GAO F, PING Q, THATTAI G, et al. A thousand words are worth more than a picture: natural language-centric outside-knowledge visual question answering[J]. arXiv:2201.05299, 2022.
[70] XU Y M, CHEN L, CHENG Z W, et al. Open-ended visual question answering by multi-modal domain adaptation[C]//Findings of the Association for Computational Linguistics, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 367-376.
[71] WU J L, LU J S, SABHARWAL A, et al. Multi-modal answer validation for knowledge-based VQA[C]//Procee-dings of the 36th AAAI Conference on Artificial Intel-ligence, the 34th Conference on Innovative Applications of Artificial Intelligence, the 12th Symposium on Educational Advances in Artificial Intelligence, Feb 22-Mar 1, 2022. Menlo Park: AAAI, 2022: 2712-2721.
[72] LIN T Y, MAIRE M, BELONGIE S J, et al. Microsoft COCO: common objects in context[C]//LNCS 8693: Procee-dings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755.
[73] GOYAL Y, KHOT T, SUMMERS-STAY D, et al. Making the V in VQA matter: elevating the role of image under-standing in visual question answering[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6325-6334.
[74] ZHU Y K, GROTH O, BERNSTEIN M S, et al. Visual7W: grounded question answering in images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 4995-5004.
[75] JOHNSON J, HARIHARAN B, VAN DER MAATEN L, et al. CLEVR: a diagnostic dataset for compositional lang-uage and elementary visual reasoning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2901-2910.
[76] HUDSON D A, MANNING C D. GQA: a new dataset for real-world visual reasoning and compositional question answering[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 6700-6709.
[77] MARINO K, RASTEGARI M, FARHADI A, et al. OK-VQA: a visual question answering benchmark requiring ex-ternal knowledge[C]//Proceedings of the 2019 IEEE Confe-rence on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3195-3204.
[78] FURKAN B A, RUBEN T, ANDRES M, et al. Scene text visual question answering[C]//Proceedings of the 2019 IEEE/ CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 4290-4300.
[79] WANG X Y, LIU Y L, SHEN C H, et al. On the general value of evidence, and bilingual scene-text visual question answering[C]//Proceedings of the 2020 IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10123-10132.
[80] SHENG S S, SINGH A, GOSWAMI V, et al. Human-adversarial visual question answering[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2021, Dec 6-14, 2021: 20346-20359.
[81] SINGH A, NATARAJAN V T, SHAH M, et al. Towards VQA models that can read[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recog-nition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 8317-8326.
[82] AGRAWAL A, BATRA D, PARIKH D. Analyzing the behavior of visual question answering models[C]//Procee-dings of the 2016 Conference on Empirical Methods in Nat-ural Language Processing, Austin, Nov 1-4, 2016. Stroud-sburg: ACL, 2016: 1955-1960.
[83] GUO Y Y, CHENG Z Y, NIE L Q, et al. Quantifying and alleviating the language prior problem in visual question answering[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Infor-mation Retrieval, Paris, Jul 21-25, 2019. New York:ACM, 2019: 75-84.
[84] LI J N, SELVARAJU R R, GOTMARE A, et al. Align before fuse: vision and language representation learning with momentum distillation[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2021, Dec 6-14, 2021: 9694-9705.