References
1.Fu, M., Tantithamthavorn, C.: Linevul: A transformer-based line-level vulnerability predic- tion. In: Proceedings of the 19th International Conference on Mining Software Repositories, pp. 608–620 (2022)
2.Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A C/C++ code vulnerability dataset with code changes and CVE summaries. In: Proceedings of the 17th inter- national conference on mining software repositories, pp. 508–512 (2020)
3.Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:180101681 (2018)
4.Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., McConley, M.: Automated vulnerability detection in source code us- ing deep representation learning. In: 2018 17th IEEE international conference on machine learning and ap- plications (ICMLA), pp. 757–762 IEEE. (2018)
5.Pornprasit, C., Tantithamthavorn, C.K.: Jitline: A simpler, better, faster, finer-grained just-in-time de- fect prediction. In: 2021 IEEE/ACM 18th Interna- tional Conference on Mining Software Repositories (MSR), pp. 369–379 IEEE. (2021)
6.Wattanakriengkrai, S., Thongtanunam, P., Tan- tithamthavorn, C., Hata, H., Matsumoto, K.: Predict- ing defective lines using a model-agnostic technique. IEEE Transactions on Software Engineering, 48(5), 1480–1496 IEEE. (2020)
7.Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning com- prehensive program semantics via graph neural net- works. Adv. neural Inform. Process. sys- tems, 32 (2019)
8.Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: Sysevr: A framework for using deep learning to de- tect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing, 19(4), 2244–2258 IEEE. (2021)
9.Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering, 48(9), 3280–3296 IEEE. (2021)
10.Li, Y., Wang, S., Nguyen, T.N.: Vulnerability detec- tion with fine-grained interpretations. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foun- dations of Software Engineering, pp. 292–303 (2021)
11.Lu, S., Barthe, G., Bieber, D.: others: CodeGPT-small-py-adaptedGPT2. Hug- ging Face [Online]. (2020). Available: https://huggingface.co/microsoft/CodeGPT-small- py-adaptedGPT2 (Accessed: 2025-06-10)
12.Meta, A.I.: Code LLaMA: Open Foundation Models for Code. Hugging Face [Online]. (2023). Available: https://huggingface.co/codellama (Accessed: 2025- 06–10)
13.Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:220303850 (2022)
14.Microsoft: UniXcoder-base-nine. Hug- ging Face [Online]. (2022). Available: https://huggingface.co/microsoft/unixcoder-base- nine (Accessed: 2025-06-10)
15.CVE Details: CVE Details - Vulnerabil- ity Database: [Online]. (2024). Available: https://www.cvedetails.com/ (Accessed: 2025- 06–10)
A
16.U.S. House of Representatives Committee on Oversight and Government Reform: Report of Investigation: Equifax Inc. Data Breach. U.S. Government Publishing Office: [On- line]. (2018). Available: https://oversight.house.gov/wp- content/uploads/2018/12/Equifax-Report.pdf (Ac- cessed May 2025)
17.CVE Details: – 2024 Vulnerabilities. [Online]. Avail- able: (2025). https://www.cvedetails.com (Accessed
18.Bessey, A., et al.: A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM, 53(2) (2010). ACM.
19.Micro Focus: Fortify Static Code An- alyzer: [Online]. (2020). Available: https://www.microfocus.com/documentation/fortify- static-code-analyzer/
20.LLVM Project: Clang Static Analyzer: [On- line]. (2024). Available: https://clang-analyzer.llvm.org/ (Ac- cessed May 2025)
21.Daniel Marjamäki: Cppcheck - A tool for static C/C + + code analysis: [Online]. (2024). Available: http://cppcheck.sourceforge.net/ (Accessed May 2025)
22.Cadar, C., Dunbar, D., Engler, D.: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2008)
23.Li, Z., Vijaykumar, T.N., Snavely, A., Falsafi, B.: Vul- can: Binary transformation in a distributed environ- ment. In: International Symposium on Code Gener- ation and Optimization (CGO), pp. 271–283 IEEE. (2005). 10.1109/CGO.2005.39
24.Zalewski, M.: American Fuzzy Lop (AFL) - Security-oriented fuzzer [Online]. Avail- able: (2014). https://lcamtuf.coredump.cx/afl/ (Accessed May 2025)
25.Cheng, W., Zhang, J., Zhou, Z., Zhu, S., Zou, W., Gong, X.: TaintTrace: Efficient flow tracing with dy- namic binary rewriting. In: IEEE Symposium on Computers and Communications (ISCC), pp. 807– 812 IEEE. (2009). 10.1109/ISCC.2009.5202251
26.Chen, M., et al.: Evaluating large language mod- els trained on code. arXiv preprint arXiv:210703374 (2021)
27.Chakraborty, S., et al.: Are Large Language Models Capable of Vulnerability Detection? In: IEEE S&P (2023)
28.Niu, L., Lin, Q., Han, K., Xu, T., Liu, Y., Liu, Z.: Safe: Self-attentive function embeddings for binary similarity. In: Proceedings of the 31st USENIX Secu- rity Symposium (USENIX Security), pp. 4979–4996 (2022)
29.Zhou, S., Han, S., Duan, M., Wang, Y., Xue, J., Zhang, D.: Devil is in the Details: Evaluating Large Language Models for Vulnerability Detection and Lo- calization. In: Proceedings of the 32nd USENIX Se- curity Symposium (USENIX Security) (2023)
30.Lu, F., Tunstall, L., Rabe, M., Xu, J.: others: Code Llama: Open Foundation Models for Code. arXiv preprint arXiv:2308.12950 (2023)
31.GitHub: GitHub Copilot: Your AI pair programmer: [Online]. (2023). Available: https://github.com/features/copilot (Accessed May 2025)
32.Cursor: Cursor: The AI-first Code Editor [Online]. (2024). Available: https://www.cursor.sh (Accessed May 2025)
33.Ding, Y., Fu, Y., Ibrahim, O., Sitawarin, C., Chen, X., Alomair, B., Wagner, D., Ray, B., Chen, Y.: Vul- nerability detection with code language models: How far are we? arXiv preprint arXiv:2403.18624 (2024)
34.Feng, Z., Guo, D., Tang, D., et al.: CodeBERT: A pre-trained model for programming and natural lan- guages. Findings of EMNLP (2020)
A
35.Guo, D., Ren, S., et al.: GraphCodeBERT: Pre- training Code Representations with Data Flow. In: ICLR (2021)
36.Bhandari, G., Naseer, A., Moonen, L.: CVEfixes: au- tomated collection of vulnerabilities and their fixes from open-source software. In: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 30– 39 (2021)
37.Ni, C., Shen, L., Yang, X., Zhu, Y., Wang, S.: MegaVul: AC/C + + vulnerability dataset with com- prehensive code representations. In: Proceedings of the 21st International Conference on Mining Software Repositories, pp. 738–742 (2024)
38.Chen, Y., Ding, Z., Alowain, L., Chen, X., Wagner, D.: Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. In: Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, pp. 654–668 (2023)
39.Nikitopoulos, G., Dritsa, K., Louridas, P., Mitropou- los, D.: CrossVul: a cross-language vulnerability dataset with commit data. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineer- ing Conference and Symposium on the Foundations of Software Engineering, pp. 1565–1569 (2021)
40.Zheng, Y., Pujar, S., Lewis, B., Buratti, L., Ep- stein, E., Yang, B., Laredo, J., Morari, A., Su, Z.: D2a: A dataset built for ai-based vulnerability de- tection methods using differential analysis. In: 2021 IEEE/ACM 43rd International Conference on Soft- ware Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 111–120 IEEE. (2021)
41.Anthropic: Claude Language Model: [On- line]. (2024). Available: https://www.anthropic.com/claude (Accessed: July 2025)
42.Facebook, A.I.R.: Infer: Static Analyzer for Java, C, C++, and Objective-C [Online]. (2024). Available: https://fbinfer.com/ (Accessed: 2025-07-28)
43.Liu, X., Zheng, J., Yang, G., Wen, S., Liu, Q., Wang, X.: Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes. arXiv preprint arXiv:2503.22935 (2025)
44.Tymchuk, Y.: The False False Positives of Static Analysis. Seminar Series on Advanced Techniques and Tools for Software Evolution SATToSE, pp. 07–09 (2017)
45.Cui, H., Xie, M., Su, T., Zhang, C., Tan, S.H.: An Em- pirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Is- sues. (2024). arXiv preprint arXiv:2408.13855
A
46.Murali, A., Mathews, N., Alfadel, M., Nagappan, M., Xu, M.: FuzzSlice: Pruning false positives in static analysis warnings through function-level fuzzing. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp. 1–13 (2024)
47.Shields, P.: Hybrid testing: Combining static analy- sis and directed fuzzing. PhD thesis, Massachusetts Institute of Technology (2023)
48.Bessler, G., Cordova, J., Cullen-Baratloo, S., Dis- sem, S., Lu, E., Devin, S., Abughararh, I., Bang, L.: Metrinome: Path complexity predicts symbolic execu- tion path explosion. In: 2021 IEEE/ACM 43rd Inter- national Conference on Software Engineering: Com- panion Proceedings (ICSE-Companion), pp. 29–32 IEEE. (2021)
49.Ding, Y., Suneja, S., Zheng, Y., Laredo, J., Morari, A., Kaiser, G., Ray, B.: VELVET: a noVel Ensemble Learning approach to automatically locate VulnEra- ble sTatements. In: 2022 IEEE International Confer- ence on Software Analysis, Evolution and Reengineer- ing (SANER), pp. 959–970 IEEE. (2022)
50.Risse, N., Böhme, M.: Uncovering the limits of ma- chine learning for automatic vulnerability detection. In: 33rd USENIX Security Symposium (USENIX Se- curity 24), pp. 4247–4264 (2024)
A
51.Li, Y., Bui, N.T., Zhang, T., Weyssow, M., Yang, C., Zhou, X., Jiang, J., Chen, J., Huang, H., Nguyen, H.H.: others: Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses? (2025). arXiv preprint arXiv:2507.21817
52.Croft, R., Babar, M.A., Kholoosi, M.M.: Data qual- ity for software vulnerability datasets. In: 2023 IEEE/ACM 45th International Conference on Soft- ware Engineering (ICSE), pp. 121–133 IEEE. (2023)
53.Yadav, A.S., Wilson, J.N.: R + R: Security Vulnerabil- ity Dataset Quality Is Critical. In: 2024 Annual Com- puter Security Applications Conference (ACSAC), pp. 1047–1061 IEEE. (2024)
54.Gao, Z., Zhou, J., Zhang, B., He, Y., Zhang, C., Cui, Y., Wang, H.: Mono: Is Your Clean Vulner- ability Dataset Really Solvable? Exposing and Trap- ping Undecidable Patches and Beyond. (2025). arXiv preprint arXiv:2506.03651
55.Wang, Z., Li, G., Li, J., Xiong, Y., Li, J., Jin, Z.: M2CVD: Multi-model collaboration for code vulner- ability detection. arXiv e-prints (arXiv–2406) (2024)
56.Le, T.H.M., Babar, M.A.: Automatic data labeling for software vulnerability prediction models: How far are we? In: Proceedings of the 18th ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement, pp. 131–142 (2024)
57.Gao, Z., Wang, H., Zhou, Y., Zhu, W., Zhang, C.: How far have we gone in vulnerability detec- tion using large language models. arXiv preprint arXiv:231112420 (2023)
58.Dong, H., Lin, J., Wang, Y., Leng, Y., Chen, J., Xie, Y.: Improving Code Search with Hard Negative Sam- pling Based on Fine-tuning. In: 2024 31st Asia-Pacific Software Engineering Conference (APSEC), pp. 221– 230 IEEE. (2024)
59.Robinson, J., Chuang, C.-Y., Sra, S., Jegelka, S.: Con- trastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020)
60.Shi, W., Chen, J., Feng, F., Zhang, J., Wu, J., Gao, C., He, X.: On the theories behind hard negative sampling for recommendation. In: Proceedings of the ACM Web Conference pp. 812–822 (2023). (2023)
61.Kalantidis, Y., Sariyildiz, M.B., Pion, N., Weinza- epfel, P., Larlus, D.: Hard negative mixing for con- trastive learning. Adv. neural Inform. pro- cessing Syst. 33, 21798–21809 (2020)
62.Collobert, R., Kavukcuoglu, K., Farabet, C.: others: Torch7: A matlab-like environment for machine learn- ing. In: BigLearn, NIPS workshop, vol. 5, p. 10 Lake Tahoe, NV. (2011)
63.Wolf, T., Debut, L., Sanh, V., Chaumond, J., De- langue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M.: others: Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
64.Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: Deep- speed: System optimizations enable training deep learning models with over 100 billion parameters. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 3505–3506 (2020)
65.Shestov, A., Levichev, R., Mussabayev, R., Maslov, E., Zadorozhny, P., Cheshkov, A., Mussabayev, R., Toleu, A., Tolegen, G., Krassovitskiy, A.: Finetun- ing large language models for vulnerability detection. IEEE Access IEEE. (2025)
66.Sheng, Z., Chen, Z., Gu, S., Huang, H., Gu, G., Huang, J.: Large language models in software secu- rity: A survey of vulnerability detection techniques and insights. arXiv preprint arXiv:250207049 (2025)
67.Steenhoek, B., Rahman, M.M., Jiles, R., Le, W.: An empirical study of deep learning models for vulner- ability detection. In: 2023 IEEE/ACM 45th Interna- tional Conference on Software Engineering (ICSE), pp. 2237–2248 IEEE. (2023)
A
68.Li, H., Zhou, X., Tuan, L.A., Miao, C.: Rethink- ing negative pairs in code search. arXiv preprint arXiv:231008069 (2023)
69.Baran, G.: 40,000 + CVEs Published In Marking A 38% Increase From 2023. Cy- ber Security News (2025). [Online]. Avail- able: (2024). https://cybersecuritynews.com/40000-cves- published-in-2024
70.Akhavani, S.A., Kharraz, B.O.A.: Open Source, Open Threats? Investigating Security Chal- lenges in Open-Source Software. arXiv [On- line]. (2025). Available: https://arxiv.org/abs/2506.12995
71.Techzine, Global: An average of 131 CVE re- ports per day [Online]. (2025). Available: https://www.techzine.eu/news/security/133037/an- average-of-131-cve-reports-per-day (Accessed 2 Aug. 2025)
72.Coker, J.: Software Vulnerabilities Take Almost Nine Months to Patch. Infosecurity Magazine [Online]. (2025). Available: https://www.infosecurity- magazine.com/news/software-vulnerabilities-nine/
73.McDade, M.: Discover key statistics on common software vulnerabilities, the market, and pre- dicted trends. Expert Insights [Online]. (2025). Available: https://expertinsights.com/network- management/software-vulnerability-statistics-and- trends-2025
74.VulnCheck: Trends in Vulnerability Exploita- tion | Blog | VulnCheck (2025). [Online]. (2024). Available: https://www.vulncheck.com/blog/2024-exploitation- trends (Accessed 2 Aug. 2025)
75.Shen, M., Pillai, A., Yuan, B.A., Davis, J.C., Machiry, A.: An empirical study on the use of static analy- sis tools in open source embedded software. arXiv preprint arXiv:2310.00205 [Online]. (2023). Available: https://arxiv.org/abs/2310.00205
76.Kaniewski, S., Schmidt, F., Enzweiler, M., Menth, M., Heer, T.: A Systematic Literature Review on De- tecting Software Vulnerabilities with Large Language Models. arXiv preprint arXiv:2507.22659 [On- line]. (2025). Available: https://arxiv.org/abs/2507.22659