Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (9): 2011-2029.DOI: 10.3778/j.issn.1673-9418.2110073
• Surveys and Frontiers • Previous Articles Next Articles
ZHANG Xiangping1,2, LIU Jianxun1,2,+()
Received:
2021-10-28
Revised:
2022-04-21
Online:
2022-09-01
Published:
2022-09-15
About author:
ZHANG Xiangping, born in 1993, Ph.D. candidate. His research interests include code representation and code clone detection.Supported by:
通讯作者:
+ E-mail: ljx529@gmail.com作者简介:
张祥平(1993—),男,福建三明人,博士研究生,主要研究方向为代码表征、代码克隆检测。基金资助:
CLC Number:
ZHANG Xiangping, LIU Jianxun. Overview of Deep Learning-Based Code Representation and Its Applications[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2011-2029.
张祥平, 刘建勋. 基于深度学习的代码表征及其应用综述[J]. 计算机科学与探索, 2022, 16(9): 2011-2029.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2110073
编程语言 | 工具名称 | 工具地址 |
---|---|---|
Java | Javaparser | |
Python | astor | |
TypeScript | TypeScript AST Viewer | |
JavaScript | Javascript- astar | |
C | pycparser | |
C++ | cppast |
Table 1 Tools of AST generation for different programming languages
编程语言 | 工具名称 | 工具地址 |
---|---|---|
Java | Javaparser | |
Python | astor | |
TypeScript | TypeScript AST Viewer | |
JavaScript | Javascript- astar | |
C | pycparser | |
C++ | cppast |
模型简称 | 检测类型 | 神经网络模型 | 检测语言 | 时间 |
---|---|---|---|---|
CCLearner[ | Type-1,2,3(ST) | 深度神经网络 | Java | 2017 |
CDLH[ | Type-1,2,3,4 | 长短期记忆网络 | Java、C | 2017 |
DeepSim[ | Type-1,2,3,4 | 前馈神经网络 | Java | 2018 |
ASTNN[ | Type-1,2,3,4 | 门控循环单元 | Java、C | 2019 |
TECCD[ | Type-1,2,3 | 图神经网络 | Java | 2019 |
FCCA[ | Type-1,2,3,4 | 长短期记忆网络、图神经网络 | Java | 2020 |
FCDetector[ | Type-4 | 深度神经网络 | C | 2020 |
At-biLSTM[ | Type-1,2,3,4 | 双向长短期记忆网络 | Java、C | 2020 |
Rsharer+[ | Type-1,2,3,4 | 卷积神经网络 | Java | 2020 |
MISIM[ | Type-1,2,3,4 | 图神经网络 | C、C++ | 2020 |
CodeAli[ | Type-1,2,3,4 | 卷积神经网络 | Java、C | 2021 |
CACCD[ | Type-1,2,3,4 | 双向长短期记忆网络 | Java | 2021 |
Table 2 Deep learning-based code clone detection methods
模型简称 | 检测类型 | 神经网络模型 | 检测语言 | 时间 |
---|---|---|---|---|
CCLearner[ | Type-1,2,3(ST) | 深度神经网络 | Java | 2017 |
CDLH[ | Type-1,2,3,4 | 长短期记忆网络 | Java、C | 2017 |
DeepSim[ | Type-1,2,3,4 | 前馈神经网络 | Java | 2018 |
ASTNN[ | Type-1,2,3,4 | 门控循环单元 | Java、C | 2019 |
TECCD[ | Type-1,2,3 | 图神经网络 | Java | 2019 |
FCCA[ | Type-1,2,3,4 | 长短期记忆网络、图神经网络 | Java | 2020 |
FCDetector[ | Type-4 | 深度神经网络 | C | 2020 |
At-biLSTM[ | Type-1,2,3,4 | 双向长短期记忆网络 | Java、C | 2020 |
Rsharer+[ | Type-1,2,3,4 | 卷积神经网络 | Java | 2020 |
MISIM[ | Type-1,2,3,4 | 图神经网络 | C、C++ | 2020 |
CodeAli[ | Type-1,2,3,4 | 卷积神经网络 | Java、C | 2021 |
CACCD[ | Type-1,2,3,4 | 双向长短期记忆网络 | Java | 2021 |
[1] | 刘芳, 李戈, 胡星. 基于深度学习的程序理解研究进展[J]. 计算机研究与发展, 2019, 56(8): 1605-1620. |
LIU F, LI G, HU X. Program comprehension based on deep learning[J]. Journal of Computer Research and Develop-ment, 2019, 56(8): 1605-1620. | |
[2] | HINDLE A, BARR E T, SU Z, et al. On the naturalness of software[C]// Proceedings of the 2012 34th International Conference on Software Engineering, Zurich, Jun 2-9, 2012. Washington: IEEE Computer Society, 2012: 837-847. |
[3] | ROBBES R, LANZA M. How program history can improve code completion[C]// Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, L'Aquila, Sep 15-19, 2008. Washington: IEEE Computer Society, 2008: 317-326. |
[4] | PROKSCH S, LERCH J, MEZINI M. Intelligent code com-pletion with Bayesian networks[J]. ACM Transactions on Software Engineering and Methodology, 2015, 25(1): 1-31. |
[5] | BIELIK P, RAYCHEV V, VECHEV M T. PHOG: probabi-listic model for code[C]// Proceedings of the 33rd Internat-ional Conference on Machine Learning, New York, Jun 19-24, 2016: 2933-2942. |
[6] | OMORI T, KUWABARA H, MARUYAMA K. A study on repetitiveness of code completion operations[C]// Proceed-ings of the 2012 28th IEEE International Conference on Software Maintenance, Trento, Sep 23-28, 2012. Washin-gton: IEEE Computer Society, 2012: 584-587. |
[7] | TU Z P, SU Z D, DEVANBU P T. On the localness of soft-ware[C]// Proceedings of the 22nd ACM SIGSOFT Interna-tional Symposium on Foundations of Software Engineer-ing, Hong Kong, China, Nov 16-22, 2014. New York: ACM, 2014: 269-280. |
[8] | OSCAR K. TF-IDF inspired detection for cross-language source code plagiarism and collusion[J]. Computer Scie-nce, 2020, 21: 113-134. |
[9] | LE T H, CHEN H, BABAR M A. Deep learning for source code modeling and generation: models, applications, and challenges[J]. ACM Computing Surveys, 2020, 53(3): 1-38. |
[10] | ZHANG J, WANG X, ZHANG H Y. A novel neural source code representation based on abstract syntax tree[C]// Proce-edings of the 41st International Conference on Software Engineering, Montreal, May 25-31, 2019. Piscataway: IEEE, 2019: 783-794. |
[11] | 刘斌斌, 董威, 王戟. 智能化的程序搜索与构造方法综述[J]. 软件学报, 2018, 29(8): 2180-2197. |
LIU B B, DONG W, WANG J. Survey on intelligent search and construction methods of program[J]. Journal of Soft-ware, 2018, 29(8): 2180-2197. | |
[12] | WHITE M, TUFANO M, VENDOME C. Deep learning fragments for code clone detection[C]// Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, Sep 3-7, 2016. New York: ACM, 2016: 87-98. |
[13] | WHITE M, VENDOME C. Toward deep learning software repositories[C]// Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, Florence, May 16-17, 2015. Washington: IEEE Computer Society, 2015: 334-345. |
[14] | WHITE M, TUFANO M, MARTINEZ M, et al. Sorting and transforming program repair ingredients via deep learning code similarities[C]// Proceedings of the 26th IEEE Internat-ional Conference on Software Analysis, Evolution and Reengineering, Hangzhou, Feb 24-27, 2019. Piscataway: IEEE, 2019: 479-490. |
[15] | WANG P P, SVAJLENKO J, WU Y Z, et al. CCAligner: a token based large-gap clone detector[C]// Proceedings of the 40th International Conference on Software Engineering, Gothenburg, May 27-Jun 3, 2018. New York: ACM, 2018: 1066-1077. |
[16] | GU X D, ZHANG H Y, KIM S H. Deep code search[C]// Proceedings of the 40th International Conference on Soft-ware Engineering, Gothenburg, May 27-Jun 3, 2018. New York: ACM, 2018: 933-944. |
[17] | ALON U, ZILBERSTEIN M, LEVY O. A general path-based representation for predicting program properties[C]// Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Phila-delphia, Jun 18-22, 2018. New York: ACM, 2018: 404-419. |
[18] | ALON U, ZILBERSTEIN M, LEVY O, et al. Code2vec: learning distributed representations of code[C]// Proceedings of the 2019 ACM on Programming Languages, Cascais, Jan 13-19, 2019. New York: ACM, 2019: 1-29. |
[19] | MOU L L, LI G, ZHANG L. Convolutional neural network over tree structures for programming language processing[C]// Proceedings of the 30th AAAI Conference on Artific-ial Intelligence, Phoenix, Feb 12-17, 2016. Menlo Park: AAAI, 2016: 1287-1293. |
[20] | BÜCH L, ANDRZEJAK A. Learning-based recursive aggr-egation of abstract syntax trees for code clone detection[C]// Proceedings of the 2019 IEEE 26th International Conf-erence on Software Analysis, Evolution and Reengineering, Hangzhou, Feb 24-27, 2019. Piscataway: IEEE, 2019: 95-104. |
[21] | SAHLGREN M. The word-space model: using distrib-utional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces[D]. Stockholm: Institutionen för Lingvistik, 2006. |
[22] |
DUMAIS S T. Latent semantic analysis[J]. Annual Review of Information Science and Technology, 2004, 38(1): 188-230.
DOI URL |
[23] | BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(1): 993-1022. |
[24] | ŘEHŮŘEK R, SOJKA P. Software framework for topic modelling with large corpora[C]// Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Malta, May 22, 2010. Valletta: University of Malta, 2004: 45-50. |
[25] | LE Q V, MIKOLOV T. Distributed representations of sent-ences and documents[C]// Proceedings of the 31st Interna-tional Conference on Machine Learning, Beijing, Jun 21-26, 2014: 1188-1196. |
[26] | 蹇松雷. 基于复杂异构数据的表征学习研究[D]. 长沙: 国防科技大学, 2019. |
JIAN S L. Research on the representation learning of com-plex heterogeneous data[D]. Changsha: National University of Defense Technology, 2019. | |
[27] | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient esti-mation of word representations in vector space[J]. arXiv:1301.3781, 2013. |
[28] |
KAUR A, NAYYAR R. A comparative study of static code analysis tools for vulnerability detection in C/C++and JAVA source code[J]. Procedia Computer Science, 2020, 171: 2023-2029.
DOI URL |
[29] | HARER J, KIM L, RUSSELL R, et al. Automated software vulnerability detection with machine learning[J]. arXiv:1803.04497, 2018. |
[30] | CHEN Z M, MONPERRUS M. The remarkable role of similarity in redundancy-based program repair[J]. arXiv:1811.05703, 2018. |
[31] | HENKEL J, LAHIRI S, LIBLIT B, et al. Code vectors: understanding programs through embedded abstracted sym-bolic traces[C]// Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Eng-ineering, Lake Buena Vista, Nov 4-9, 2018. New York: ACM, 2018: 163-174. |
[32] | NGUYEN T D, NGUYEN A T, PHAN H D, et al. Expl-oring API embedding for API usages and applications[C]// Proceedings of the 39th International Conference on Software Engineering, Buenos Aires, May 20-28, 2017. Piscataway: IEEE, 2017: 438-449. |
[33] | PRADEL M, SEN K. DeepBugs: a learning approach to name-based bug detection[J]. Proceedings of the ACM on Programming Languages, 2018, 2: 1-25. |
[34] | IYER S, KONSTAS I, CHEUNG A. Summarizing source code using a neural attention model[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 2073-2083. |
[35] | ALLAMANIS M, PENG H, SUTTON C. A convolutional attention network for extreme summarization of source code[C]// Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 2091-2100. |
[36] | LI J, WANG Y, LYU M R, et al. Code completion with neural attention and pointer networks[J]. arXiv:1711.09573, 2017. |
[37] | BHOOPCHAND A, ROCKSTASCHEL T, BARR E. Lear-ning python code suggestion with a sparse pointer network[J]. arXiv:1611.08307, 2016. |
[38] | SHUAI J, XU L, LIU C, et al. Improving code search with co-attentive representation learning[C]// Proceedings of the 28th International Conference on Program Comprehension, Seoul, Jul 13-15, 2020. New York: ACM, 2020: 196-207. |
[39] | GU X D, ZHANG H Y, ZHANG D M, et al. Deep API learning[C]// Proceedings of the 24th ACM SIGSOFT Inte-rnational Symposium on Foundations of Software Engine-ering, Seattle, Nov 13-18, 2016. New York: ACM, 2016: 631-642. |
[40] |
LU X F, JIANG F S, ZHOU X, et al. ASSCA: API sequence and statistics features combined architecture for malware detection[J]. Computer Networks, 2019, 157: 99-111.
DOI URL |
[41] | SAIFULLAH C M. Learning APIs through mining code snippet examples[D]. Saskatoon: University of Saskatchewan, 2020. |
[42] | HU X, LI G, XIA X. Summarizing source code with transferred API knowledge[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence,Stockholm, Jul 13-19, 2018: 2269-2275. |
[43] | SVAJLENKO J, ISLAM J F, KEIVANLOO I, et al. Towards a big data curated benchmark of inter-project code clones[C]// Proceedings of the 30th IEEE International Con-ference on Software Maintenance and Evolution, Victoria, Sep 29-Oct 3, 2014. Washington: IEEE Computer Society, 2014: 476-480. |
[44] | WEI H H, LI M. Supervised deep features for software functional clone detection by exploiting lexical and synta-ctical information in source code[C]// Proceedings of the 26th International Joint Conference on Artificial Intelli-gence, Melbourne, Aug 19-25, 2017: 3034-3040. |
[45] | CHEN L, YE W, ZHANG S K. Capturing source code semantics via tree-based convolution over API-enhanced AST[C]// Proceedings of the 16th ACM International Conf-erence on Computing Frontiers, Alghero, Apr 30-May 2, 2019. New York: ACM, 2019: 174-182. |
[46] | WANG W H, LI G, MA B, et al. Detecting code clones with graph neural network and flow-augmented abstract syntax tree[C]// Proceedings of the 27th IEEE International Confer-ence on Software Analysis, Evolution and Reengineering,London, Feb 18-21, 2020. Piscataway: IEEE, 2020: 261-271. |
[47] | HU X, LI G, XIA X. Deep code comment generation[C]// Proceedings of the 26th Conference on Program Compr-ehension, Gothenburg, May 27-28, 2018. New York: ACM, 2018: 200-210. |
[48] | ALON U, BRODY S, LEVY O, et al. Code2seq: generating sequences from structured representations of code[J]. arXiv: 1808.01400, 2018. |
[49] | ALON U, SADAKA R, LEVY O, et al. Structural language models of code[C]// Proceedings of the 2020 International Conference on Machine Learning. New York: ACM, 2020: 245-256. |
[50] | ALLAMANIS M, BROCKSCHMIDT M, KHADEMI M. Learning to represent programs with graphs[J]. arXiv:1711.00740, 2017. |
[51] | LU M M, TAN D W, XIONG N X, et al. Program classification using gated graph attention neural network for online programming service[J]. arXiv:1903.03804, 2019. |
[52] | BROCKSCHMIDT M, ALLAMANIS M, GAUNT A L. Generative code modeling with graphs[J]. arXiv:1805.08490, 2018. |
[53] | BEN-NUN T, JAKOBOVITS A S, HOEFLER T. Neural code comprehension: a learnable representation of code semantics[C]// Proceedings of the 32nd International Confe-rence on Neural Information Processing Systems, Montré-al, Dec 3-8, 2018: 3589-3601. |
[54] |
LI Z M, LU S, MYAGMAR S, et al. CP-Miner: finding copy-paste and related bugs in large-scale software code[J]. IEEE Transactions on Software Engineering, 2006, 32(3): 176-192.
DOI URL |
[55] | CHEN W K, LI B G, GUPTA R. Code compaction of matching single-entry multiple-exit regions[C]// LNCS 2694: Proceedings of the 10th International Symposium Static Analysis. Berlin, Heidelberg: Springer, 2003: 401-417. |
[56] | KIM M, SAZAWAL V, NOTKIN D, et al. An empirical study of code clone genealogies[C]// Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lisbon, Sep 5-9, 2005. New York: ACM, 2005: 187-196. |
[57] | PATENAUDE J, MERLO E, DAGENAIS M, et al. Exte-nding software quality assessment techniques to Java syst-ems[C]// Proceedings of the 7th International Workshop on Program Comprehension, Pittsburgh, May 5-7, 1999. Wash-ington: IEEE Computer Society, 1999: 49-56. |
[58] | SHENEAMER A, KALITA J. A survey of software clone detection techniques[J]. International Journal of Computer Applications, 2016, 137(10): 1-21. |
[59] | BAKER B. On finding duplication and near-duplication in large software systems[C]// Proceedings of the 2nd Work-ing Conference on Reverse Engineering, Toronto, Jul 14-16, 1995. Piscataway: IEEE, 1995: 86-95. |
[60] | ROY C K, CORDY J R. NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]// Proceedings of the 16th IEEE Inter-national Conference on Program Comprehension, Ams-terdam, Jun 10-13, 2008. Washington: IEEE Computer Soci-ety, 2008: 172-181. |
[61] |
MONDAL M, RAHMAN M S, ROY C K, et al. Is cloned code really stable[J]. Empirical Software Engineering, 2018, 23(2): 693-770.
DOI URL |
[62] | JÜRGENS E, DEISSENBOECK F, HUMMEL B, et al. Do code clones matter[C]// Proceedings of the 31st Interna-tional Conference on Software Engineering, Vancouver, May 16-24, 2009. Piscataway: IEEE, 2019: 485-495. |
[63] | MONDAL M, ROY C, SCHNEIDER K. Dispersion of changes in cloned and non-cloned code[C]// Proceeding of the 6th International Workshop on Software Clones, Zurich, Jun 4, 2012. Washington: IEEE Computer Society, 2012: 29-35. |
[64] | LOZANO A, WERMELINGER M. Tracking clones' imprint[C]// Proceeding of the 4th ICSE International Workshop on Software Clones, Cape Town. New York: ACM, 2010: 65-72. |
[65] | 陈秋远, 李善平, 鄢萌, 等. 代码克隆检测研究进展[J]. 软件学报, 2019, 30(4): 962-980. |
CHEN Q Y, LI S P, YAN M, et al. Code clone detection: a literature review[J]. Journal of Software, 2019, 30(4): 962-980. | |
[66] |
BELLON S, KOSCHKE R, ANTONIOL G, et al. Comp-arison and evaluation of clone detection tools[J]. IEEE Transactions on Software Engineering, 2007, 33(9): 577-591.
DOI URL |
[67] |
KAMIYA T, KUSUMOTO S, INOUE K. CCFinder: a multilinguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002, 28(7): 654-670.
DOI URL |
[68] | DUCASSE S, RIEGER M, DEMEYER S. A language independent approach for detecting duplicated code[C]// Proceedings of the 1999 International Conference on Soft-ware Maintenance, Oxford, Aug 30-Sep 3, 1999. Washington: IEEE Computer Society, 1999: 109-118. |
[69] | LEE S, JEONG I. SDD: high performance code clone detection system for large scale source code[C]// Procee-dings of the Companion to the 20th Annual ACM SIG-PLAN Conference on Object-Oriented Programming, Syst-ems, Languages, and Applications, San Diego, Oct 16-20, 2005. New York: ACM, 2005: 140-141. |
[70] | MURAKAMI H, HOTTA K, HIGO Y, et al. Gapped code clone detection with lightweight source code analysis[C]// Proceedings of the IEEE 21st International Conference on Program Comprehension, San Francisco, May 20-21, 2013. Washington: IEEE Computer Society, 2013: 93-102. |
[71] | DANG Y N, ZHANG D M, GE S, et al. XIAO: tuning code clones at hands of engineers in practice[C]// Proceedings of the 28th Annual Computer Security Applications, Orlando, Dec 3-7, 2012. New York: ACM, 2012: 369-378. |
[72] |
ALOMARI H, MATTHEW S. Clone detection through srcClone: a program slicing based approach[J]. Journal of Systems and Software, 2022, 184: 111115.
DOI URL |
[73] | LI L, FENG H, ZHUANG W. CCLearner: a deep learning-based clone detection approach[C]// Proceedings of the 2017 IEEE International Conference on Software Maint-enance and Evolution, Shanghai, Sep 17-22, 2017. Washi-ngton: IEEE Computer Society, 2017: 249-260. |
[74] | ZHAO G, HUANG J. DeepSim: deep learning code func-tional similarity[C]// Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engin-eering, Lake Buena Vista, Nov 4-9, 2018. New York: ACM, 2018: 141-151. |
[75] | GAO Y, WANG Z, LIU S. TECCD: a tree embedding approach for code clone detection[C]// Proceedings of the 2019 IEEE International Conference on Software Mainte-nance and Evolution, Cleveland, Sep 29-Oct 4, 2019. Pisc-ataway: IEEE, 2019: 145-156. |
[76] |
HUA W, SUI Y, WAN Y. FCCA: hybrid code representation for functional clone detection using attention networks[J]. IEEE Transactions on Reliability, 2020, 70(1): 304-318.
DOI URL |
[77] | FANG C, LIU Z, SHI Y, et al. Functional code clone detection with syntax and semantics fusion learning[C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2020: 516-527. |
[78] | MENG Y, LIU L. A deep learning approach for a source code detection model using self-attention[J]. Complexity, 2020, 9: 1-15. |
[79] |
GUO C, YANG H, HUANG D. Review sharing via deep semi-supervised code clone detection[J]. IEEE Access, 2020, 8: 24948-24965.
DOI URL |
[80] | YE F, ZHOU S, VENKAT A. MISIM: an end-to-end neural code similarity system[J]. arXiv:2006.05265, 2020. |
[81] | ZHANG A, LIU K, FANG L, et al. Learn to align: a code alignment network for code clone detection[C]// Proceed-ings of the 28th Asia-Pacific Software Engineering Confere-nce, Taipei, China, Dec 6-9, 2021. Piscataway: IEEE, 2021: 1-11. |
[82] | LIANG H, AI L. AST-path based compare-aggregate network for code clone detection[C]// Proceedings of the 2021 Int-ernational Joint Conference on Neural Networks, Shen-zhen, Jul 18-22, 2021. Piscataway: IEEE, 2021: 1-8. |
[83] | SINGER J, LETHB T C, VINSON N G, et al. An exami-nation of software engineering work practices[C]// Procee-dings of the 1997 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, Nov 10- 13: 21. |
[84] | ZHONG H, XIE T, ZHANG L, et al. mAPO: mining and recommending API usage patterns[C]// LNCS 5653: Procee-dings of the 23rd European Conference on Object-Oriented Programming, Genoa, Jul 6-10, 2009. Berlin, Heidelberg: Springer, 2009: 318-343. |
[85] | 张峰逸, 彭鑫, 陈驰. 基于深度学习的代码分析研究综述[J]. 计算机应用与软件, 2018, 35(6): 9-17. |
ZHANG F Y, PENG X, CHEN C. Research on code analy-sis based on deep learning[J]. Computer Applications and Software, 2018, 35(6): 9-17. | |
[86] | SUBRAMANIAN S, INOZEMTSEVA L, HOLMES R. Live API documentation[C]// Proceedings of the 36th Intern-ational Conference on Software Engineering, Hyderabad, May 31-Jun 7, 2014. New York: ACM, 2014: 643-652. |
[87] | KIM K, KIM D, BISSYANDÉ T F, et al. FaCoY: a code-to-code search engine[C]// Proceedings of the 40th Internati-onal Conference on Software Engineering, Gothenburg, May 27-Jun 3, 2018. New York: ACM, 2018: 946-957. |
[88] | DEERWESTER S C, DUMAIS S T, LANDAUER T K, et al. Indexing by latent semantic analysis[J]. Journal of the Ame-rican Society for Information Science, 1990, 41(6): 391-407. |
[89] | BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learn-ing Research, 2003, 3(2): 1137-1155. |
[90] | EGOZI O, MARKOVITCH S, GABRILOVICH E. Concept-based information retrieval using explicit semantic analysis[J]. ACM Transactions on Information Systems, 2011, 29(2): 1-34. |
[91] | SCHUHMACHER M, PONZETTO S P. Knowledge-based graph document modeling[C]// Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, Feb 24-28, 2014. New York: ACM, 2014: 543-552. |
[92] | LIU X T, FANG H. Latent entity space: a novel retrieval approach for entity-bearing queries[J]. Information Retrie-val Journal, 2015, 18(6): 473-503. |
[93] | XIONG C Y, CALLAN J. EsdRank: connecting query and documents through external semi-structured data[C]// Proc-eedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Oct 19-23, 2015. New York: ACM, 2015: 951-960. |
[94] | RAVIV H, KURLAND O, CARMEL D. Document retrie-val using entity-based language models[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Jul 17-21, 2016. New York: ACM, 2016: 65-74. |
[95] | NI Y, XU Q K, CAO F, et al. Semantic documents related-ness using concept graph representation[C]// Proceedings of the 9th ACM International Conference on Web Search and Data Mining, San Francisco, Feb 22-25, 2016. New York: ACM, 2016: 635-644. |
[96] | GABRILOVICH E, MARKOVITCH S. Computing seman-tic relatedness using Wikipedia-based explicit semantic analysis[C]// Proceedings of the 2007 International Joint Conference on Artificial Intelligence, Hyderabad, Jan 6-12, 2007. San Mateo: Morgan Kaufmann, 2007: 1606-1611. |
[97] | SACHDEV S, LI H Y, LUAN S F, et al. Retrieval on source code: a neural code search[C]// Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Lear-ning and Programming Languages, Philadelphia, Jun 18-22, 2018. New York: ACM, 2018: 31-41. |
[98] | LV F, ZHANG H Y, LOU J G, et al. CodeHow: effective code search based on API understanding and extended Boolean model[C]// Proceedings of the 30th IEEE/ACM International Conference on Automated Software Enginee-ring, Lincoln, Nov 9-13, 2015. Washington: IEEE Computer Society, 2015: 260-270. |
[99] |
FANG S, TAN Y, ZHANG T, et al. Self-attention networks for code search[J]. Information and Software Technology, 2021, 134: 106542-106553.
DOI URL |
[100] | GU J, CHEN Z, MONPERRUS M. Multimodal represen-tation for neural code search[C]// Proceedings of the 2021 International Conference on Software Maintenance and Evolution, Luxembourg, Sep 27-Oct 1, 2021. Piscataway: IEEE, 2021: 483-494. |
[101] | MENG Y. An intelligent code search approach using hybrid encoders[J]. Wireless Communications and Mobile Computing, 2021: 9990988. |
[102] | XU L, YANG H, LIU C, et al. Two-stage attention-based model for code search with textual and structural features[C]// Proceedings of the 28th IEEE International Confer-ence on Software Analysis, Evolution and Reengineering, Honolulu, Mar 9-12, 2021. Piscataway: IEEE, 2021: 342-353. |
[103] | ZOU Y Z, LING C Y, LIN Z Q, et al. Graph embedding based code search in software project[C]// Proceedings of the 10th Asia-Pacific Symposium on Internetware, Beijing, Sep 16, 2018. New York: ACM, 2018: 1-10. |
[104] | GORIN R E. SPELL: a spelling checking and correction program[J]. Online Documentation for the DEC-10 Com-puter, 1971: 147-160. |
[105] | BRUCH M, MONPERRUS M, MEZINI M. Learning from examples to improve code completion systems[C]// Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, Aug 24-28, 2009. New York: ACM, 2009: 213-222. |
[106] | HOU D Q, PLETCHER D M. An evaluation of the strategies of sorting, filtering, and grouping API methods for code completion[C]// Proceedings of the IEEE 27th International Conference on Software Maintenance, Wil-liamsburg, Sep 25-30, 2011. Washington: IEEE Computer Society, 2011: 233-242. |
[107] | LEE Y Y, HARWELL S, Khurshid S, et al. Temporal code completion and navigation[C]// Proceedings of the 35th International Conference on Software Engineering, San Francisco, May 18-26, 2013. Washington: IEEE Computer Society, 2013: 1181-1184. |
[108] | NGUYEN A T, NGUYEN H A, NGUYEN T T, et al. GraPacc: a graph-based pattern-oriented, context-sensitive code completion tool[C]// Proceedings of the 34th Interna-tional Conference on Software Engineering, Zurich, Jun 2-9, 2012. Washington: IEEE Computer Society, 2012: 1407-1410. |
[109] | JIN X H, SERVANT F. The hidden cost of code comple-tion: understanding the impact of the recommendation-list length on its efficiency[C]// Proceedings of the 15th International Conference on Mining Software Repositories, Gothenburg, May 28-29, 2018. New York: ACM, 2018: 70-73. |
[110] | ZHONG H, WANG X Y. Boosting complete-code tool for partial program[C]// Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engine-ering, Urbana, Oct 30-Nov 3, 2017. Washington: IEEE Computer Society, 2017: 671-681. |
[111] | NGUYEN T T, NGUYEN A T, NGUYEN H A, et al. A statistical semantic language model for source code[C]// Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Saint Petersbury, Aug 18-26, 2013. New York: ACM, 2013: 532-542. |
[112] | DE SOUZA AMORIM L E, ERDWEG S, WACHSMUTH G, et al. Principled syntactic code completion using place-holders[C]// Proceedings of the 2016 ACM SIGPLAN Inter-national Conference on Software Language Engineering, Amsterdam, Oct 31-Nov 1, 2016. New York: ACM, 2016: 163-175. |
[113] | HOU D Q, PLETCHER D M. Towards a better code com-pletion system by API grouping, filtering, and popularity-based ranking[C]// Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, Cape Town, May 4, 2010. New York: ACM, 2010: 26-30. |
[114] | JACOBELLIS J, MENG N, KIM M. Cookbook: in Situ code completion using edit recipes learned from examples[C]// Companion Proceedings of the 36th International Conference on Software Engineering, Hyderabad, May 31-Jun 7, 2014. New York: ACM, 2014: 584-587. |
[115] | NGUYEN T T, PHAM H V, VU P M, et al. Recommend-ing API usages for mobile Apps with hidden Markov model[C]// Proceedings of the 30th IEEE/ACM International Con-ference on Automated Software Engineering, Lincoln, Nov 9-13, 2015. Washington: IEEE Computer Society, 2015: 795-800. |
[116] | GVERO T, KUNCAK V, KURAJ I, et al. Complete completion using types and weights[J]. ACM SIGPLAN Notices, 2013, 48(6): 27-38. |
[117] | FERNANDES P, ALLAMANIS M, BROCKSCHMIDT M. Structured neural summarization[J]. arXiv:1811.01824, 2018. |
[118] | KARAMPATSIS R, BABII H, ROBBES R, et al. Big code !=big vocabulary: open-vocabulary models for source code[C]// Proceedings of the 42nd International Confere-nce on Software Engineering, Seoul, Jun 27-Jul 19, 2020. New York: ACM, 2020: 1073-1085. |
[119] | 杨博, 张能, 李善平, 等. 智能代码补全研究综述[J]. 软件学报, 2020, 31(5): 1435-1453. |
YANG B, ZHANG N, LI S P, et al. Survey of intelligent code completion[J]. Journal of Software, 2020, 31(5): 1435-1453. | |
[120] | HAN S, WALLACE D R, MILLER R C. Code completion of multiple keywords from abbreviated input[J]. Autom-ated Software Engineering, 2011, 18(3/4): 363-398. |
[121] | HAN S, WALLACE D R, MILLER R C. Code completion from abbreviated input[C]// Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, Auckland, Nov 16-20, 2009. Washington: IEEE Computer Society, 2009: 332-343. |
[122] |
RAYCHEV V, BIELIK P, VECHEV M. Probabilistic model for code with decision trees[J]. ACM SIGPLAN Notices, 2016, 51(10): 731-747.
DOI URL |
[123] | HELLENDOORN V J, DEVANBU P. Are deep neural networks the best choice for modeling source code?[C]// Proceedings of the 2017 11th Joint Meeting on Foun-dations of Software Engineering, Paderborn, Sep 4-8, 2017. New York: ACM, 2017: 763-773. |
[124] | BAZZI I. Modelling out-of-vocabulary words for robust speech recognition[D]. Massachusetts Institute of Technol-ogy, 2002. |
[125] | LUONG M T, SOCHER R, MANNING C D. Better word representations with recursive neural networks for morph-ology[C]// Proceedings of the 17th Conference on Comput-ational Natural Language Learning, Sofia, Aug 8-9, 2013. Stroudsburg: ACL, 2013: 104-113. |
[126] |
HARRIS Z. Distributional structure[J]. Word, 1981, 10(2/3): 146-162.
DOI URL |
[127] | BABII H, JANES A, ROBBES R. Modeling vocabulary for big code machine learning[J]. arXiv:1904.01873, 2019. |
[128] | DEVLIN J, CHANG M W, LEE K. BERT: PRE-training of deep bidirectional transformers for language underst-anding[J]. arXiv:1810.04805, 04805. |
[129] | RADFORD A, NARASIMHAN K, SALIMANS T. Impr-oving language understanding by generative pre-training[EB/OL]. [2021-07-06]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. |
[130] | PETERS M, NEUMANN M, IYYER M. Deep context-ualized word representations[J]. arXiv:1802.05365, 2018. |
[131] | KANG H J, BISSYANDÉ T F, LO D. Assessing the gener-alizability of code2vec token embeddings[C]// Proce-edings of the 34th IEEE/ACM International Conference on Automated Software Engineering, San Diego, Nov 11-15, 2019. Piscataway: IEEE, 2019: 1-12. |
[1] | LYU Xiaoqi, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun, LI Yidong. Expert Recommendation Algorithm Combining Attention and Recurrent Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2068-2077. |
[2] | LI Dongmei, LUO Sisi, ZHANG Xiaoping, XU Fu. Review on Named Entity Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1954-1968. |
[3] | REN Ning, FU Yan, WU Yanxia, LIANG Pengju, HAN Xi. Review of Research on Imbalance Problem in Deep Learning Applied to Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1933-1953. |
[4] | YANG Caidong, LI Chengyang, LI Zhongbo, XIE Yongqiang, SUN Fangwei, QI Jin. Review of Image Super-resolution Reconstruction Algorithms Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1990-2010. |
[5] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[6] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[7] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[8] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[9] | ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. |
[10] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[11] | LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. |
[12] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
[13] | ZHONG Mengyuan, JIANG Lin. Review of Super-Resolution Image Reconstruction Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 972-990. |
[14] | PEI Lishen, ZHAO Xuezhuan. Survey of Collective Activity Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 775-790. |
[15] | XU Jia, WEI Tingting, YU Ge, HUANG Xinyue, LYU Pin. Review of Question Difficulty Evaluation Approaches [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 734-759. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/