Much of this needs further movement into other areas of the garden.

Reading

Articles, Blogs, Essays

"MLOps"

Part 1: Designing a Production-Grade Agentic MLOps System: A practical guide to combining LSTM forecasting, transfer learning, and AI agents in real-world ML systems

Papers

Type Inference

Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving (2023), OOPSLA'23, Ye, Fangke, et al. [pdf]
Learning Type Inference for Enhanced Dataflow Analysis (2023), ESORICS'23, Seidel, Lukas, et al. [pdf]
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors (2023), ICSE'24, Peng, Yun, et al. [pdf]
DeepInfer: Deep Type Inference from Smart Contract Bytecode (2023), ESEC/FSE '23, Zhao, Kunsong, et al. [pdf]
Statistical Type Inference for Incomplete Programs (2023), ESEC/FSE '23, Peng, Yaohui, et al. [pdf]
DeMinify: Neural Variable Name Recovery and Type Inference (2023), ESEC/FSE '23, Li, Yi, et al. [pdf]
Learning Type Inference for Enhanced Dataflow Analysis (2023), ESORICS '23, Seidel, L. & Baker Effendi, D., et al. [pdf]
FQN Inference in Partial Code by Prompt-tuned Language Model of Code (2023), TOSEM journal, Huang, Qing, et al.
Generative Type Inference for Python (2023), ASE'23, Peng, Yun, et al. [pdf]
Type Prediction With Program Decomposition and Fill-in-the-Type Training (2023), arxiv, Cassano, Federico, et al. [pdf]
TypeT5: Seq2seq Type Inference using Static Analysis (2023), ICLR'23, Wei, Jiayi, et al. [pdf]
Do Machine Learning Models Produce TypeScript Types that Type Check? (2023), arxiv, Yee, M., and Arjun G. [pdf]
Cross-Domain Evaluation of a Deep Learning-Based Type Inference System (2022), arxiv, Gruner, Bernd, et al. [pdf] [code]
Learning To Predict User-Defined Types (2022), TSE'22, Jesse, Keven, et al. [pdf]
Recovering Container Class Types in C++ Binaries (2022), CGO'22, Wang, Xudong, et al.
Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries (2022), PLDI'22, Lehmann, Daniel and Pradel, Michael [pdf]
Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python (2022), ICSE'22, Mir, Amir, et al. [pdf][code]
Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python (2022), ICSE'22, Peng, Yun, et al. [pdf]

Older papers

StateFormer: Fine-grained Type Recovery from Binaries Using Generative State Modeling (2021), FSE'21, Pei, Kexin, et al. [pdf][code]
Type Inference as Optimization (2021), NeurIPS'21 AIPLANS, Pandi, Irene Vlassi, et al. [pdf]
SimTyper: Sound Type Inference for Ruby using Type Equality Prediction (2021), OOPSLA'21, Kazerounian, Milod, et al.
Learning type annotation: is big data enough? (2021), FSE 2021, Jesse, Kevin, et al. [pdf][code]
Cross-Lingual Adaptation for Type Inference (2021), arxiv 2021, Li, Zhiming, et al. [pdf]
PYInfer: Deep Learning Semantic Type Inference for Python Variables (2021), arxiv 2021, Cui, Siwei, et al. [pdf]
Advanced Graph-Based Deep Learning for Probabilistic Type Inference (2020), arxiv 2020, Ye, Fangke, et al. [pdf]
Typilus: Neural Type Hints (2020), PLDI 2020, Allamanis, Miltiadis, et al. [pdf][code]
LambdaNet: Probabilistic Type Inference using Graph Neural Networks (2020), arxiv 2020, Wei, Jiayi, et al. [pdf]
TypeWriter: Neural Type Prediction with Search-based Validation (2019), arxiv 2019, Pradel, Michael, et al. [pdf]
NL2Type: Inferring JavaScript Function Types from Natural Language Information (2019), ICSE 2019, Malik, Rabee S., et al. [pdf][code]
Deep Learning Type Inference (2018), ESEC/FSE 2018, Hellendoorn, Vincent J., et al. [pdf][code]
Python Probabilistic Type Inference with Natural Language Support (2016), FSE 2016, Xu, Zhaogui, et al.
Predicting Program Properties from “Big Code” (2015) ACM SIGPLAN 2015, Raychev, Veselin, et al. [pdf]

Code Completion

REPOFUSE: Repository-Level Code Completion with Fused Dual Context (2024), arxiv, Liang, Ming, et al. [pdf]
Non-Autoregressive Line-Level Code Completion (2024), TOSEM, Liu, Fang, et al.
IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion (2024), arxiv, Li, Bolun, et al. [pdf]
Language Models for Code Completion: A Practical Evaluation (2024), ICSE'24, Izadi et al. [pdf]
Context Composing for Full Line Code Completion (2024), IDE'24, Semenkin et al. [pdf]
De-Hallucinator: Iterative Grounding for LLM-Based Code Completion (2024), arxiv, Eghbali, A., & Pradel, M. [pdf]
When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference (2024), ICSE'24, Sun, Zhensu, et al. [pdf]
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (2023), NeurIPS'23, Ding, Yangruibo, et al. [pdf]
Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context (2023), NeurIPS'23, Agrawal, Lakshya A., et al. [pdf]
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation (2023), NeurIPS'23, Liu, Jiawei, et al. [pdf]
Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases (2023), arxiv, Tang, Ze, et al. [pdf]
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems (2023), arxiv, Liu, T., et al. [pdf]
A Static Evaluation of Code Completion by Large Language Models (2023), arxiv, Ding, Hantian, et al. [pdf]
Large Language Models of Code Fail at Completing Code with Potential Bugs (2023), NeurIPS'23, Dinh, Tuan, et al. [pdf]
RepoFusion: Training Code Models to Understand Your Repository (2023), arxiv, Shrivastava, Disha, et al., [pdf]
LongCoder: A Long-Range Pre-trained Language Model for Code Completion (2023), ICML'23, Guo, Daya, et al. [pdf]
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents (2023), arxiv, Johnson, Daniel D, et al. [pdf]
Optimized Tokenization Process for Open-Vocabulary Code Completion: An Empirical Study (2023), EASE'23, Hussain, Yasir, et al.
Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study (2023), MSR'23, van Dam, Tim, et al. [pdf]
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation (2023), arxiv, Zhang, Fengji, et al. [pdf]

Older

COCOMIC: ✿✿✿✿ Code ✿✿✿✿ Completion By Jointly Modeling In-file and ✿✿Cross-file Context (2022), Ding, Yangruibo, et al. [pdf]
Boosting source code suggestion with self-supervised Transformer Gated Highway (2022), JSS, Hussain, Yasir, et al.
Syntax-Aware On-the-Fly Code Completion (2022), arxiv, Takerngsaksiri, W., et al. [pdf]
Learning to Prevent Profitless Neural Code Completion (2022), arxiv, Sun, Z., et al. [pdf]
All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs (2022), arxiv, Bibaev, Vitaliy, et al. [pdf]
CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences (2022), ICSE'22, Izadi, Maliheh, et al. [pdf] [code]
Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs (2021), AAAI'21, Wang, Yanlin, et al. [pdf]
Code Prediction by Feeding Trees to Transformers (2021), ICSE'21, Kim, Seohyun, et al. [pdf]
Fast and Memory-Efficient Neural Code Completion (2020), arxiv 2020, Svyatkovskoy, Alexey, et al. [pdf]
Pythia: AI-assisted Code Completion System (2019), KDD'19, Svyatkovskiy, Alexey, et al. [pdf]
Code Completion with Neural Attention and Pointer Networks (2018), arxiv 2018, Li, Jian, et al. [pdf]

Code Generation

Knowledge-Aware Code Generation with Large Language Models (2024), ICPC'24, Huang et al. [pdf]
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models (2024), arxiv, Chen, Simin, et al. [pdf]
Ocassionally Secure: A Comparative Analysis of Code Generation Assistants (2024), arxiv, Elgedawy et al. [pdf]
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback (2024), arxiv, [pdf]
Grounding Data Science Code Generation with Input-Output Specifications (2024), arxiv, Wen, Yeming, et al. [pdf]
MPIrigen: MPI Code Generation through Domain-Specific Language Models (2024), arxiv, Schneider, Nadav, et al. [pdf]
Instruction Tuning for Secure Code Generation (2024), arxiv, He, Jingxuan, et al. [pdf]
Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS (2024), arxiv, DeLorenzo, Matthew, et al. [pdf]
ARKS: Active Retrieval in Knowledge Soup for Code Generation (2024), arxiv, Su, Hongjin, et al. [pdf]
Test-Driven Development for Code Generation (2024), arxiv, Mathews, N. S., & M. Nagappan [pdf]
RRGcode: Deep hierarchical search-based code generation (2024), JSS, Gou, Qianwen, et al.
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step (2024), arxiv, Zhong et al. [pdf]
Ansible Lightspeed: A Code Generation Service for IT Automation (2024), arxiv, Sahoo, Priyam, et al. [pdf]
DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions (2024), arxiv, Wu et al. [pdf]
Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models (2024), arxiv, Yang, Guang, et al. [pdf]
DevEval: Evaluating Code Generation in Practical Software Projects (2024), arxiv, Li, Jia, et al. [pdf]
Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation (2024), arxiv, Wang, Chong, et al. [pdf]
CODEAGENT: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges (2024), arxiv, Zhang, Kechi, et al. [pdf]
On the Reliability and Explainability of Language Models for Program Generation (2024), TOSEM, Liu, Yue, et al. [pdf]
AgentCoder: Multiagent-Code Generation with Iterative Testing and Optimisation (2024), arxiv, Huang, Dong, et al. [pdf]
Dynamic Retrieval-Augmented Generation (2024), arxiv, Shapkin et al. [pdf]
Test-Case-Driven Programming Understanding in Large Language Models for Better Code Generation (2024), arxiv, Tian, Z., & Chen, J. [pdf]

Older

Context-Aware Code Generation Framework for Code Repositories: Local, Global, and Third-Party Library Awareness (2023), arxiv, Liao, Dianshu, et al. [pdf]
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules (2024), ICLR'24, Le, Hung, et al. [pdf]
Bias Testing and Mitigation in LLM-based Code Generation (2024), arxiv, Huang, Dong, et al. [pdf]
Magicoder: Source Code Is All You Need (2023), arxiv, Wei, Yuxiang, et al. [pdf]
Structured Chain-of-Thought Prompting for Code Generation (2023), arxiv, Li, Jia, et al. [pdf]
Evaluating In-Context Learning of Libraries for Code Generation (2023), arxiv, Patel, Arkil, et al. [pdf]
Neural Rankers for Code Generation via Inter-Cluster Modeling (2023), arxiv, To, Hung Quoc et al. [pdf]
Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation (2023), ICSE'24, Wang, Jiexin, et al. [pdf]
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis (2023), arxiv, Gorinski, P. J., et al. [pdf]
ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification (2023), arxiv, Mu, Fangwen, et al. [pdf]
Large Language Model-Aware In-Context Learning for Code Generation (2023), arxiv, Li, Jia, et al. [pdf]
From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining (2023), ASE'23, Ren, Xiaoxue, et al. [pdf]
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models (2023), arxiv, Weyssow, Martin, et al. [pdf]
CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation (2023), arxiv, Liu, Mingwei, et al. [pdf]
Is Model Attention Aligned with Human Attention?: An Empirical Study on LLMs for Code Generation (2023), arxiv, Kou, Bonan, et al. [pdf]
Demystifying GPT Self-Repair for Code Generation (2023), arxiv, Olausson, Theo X., et al. [pdf]
Exploring Continual Learning for Code Generation Models (2023), arxiv, Yadav, Prateek, et al. [pdf]
CodePrompt: Task-Agnostic Prefix Tuning for Program and Language Generation (2023), ACL'23, Choi, Y., & Lee, J. H. [pdf]
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models (2023), ACL'23, Dibia, Victor, et al. [pdf]
RLTF: Reinforcement Learning from Unit Test Feedback (2023), arxiv, Liu, Jiate, et al. [pdf]
A Lightweight Framework for High-Quality Code Generation (2023), arxiv, Siddiq, M. L., et al. [pdf]
Large Language Models for Code: Security Hardening and Adversarial Testing (2023), ICML'23 workshop, He, J., & Vechev, M. [pdf]
Reinforcement Learning for Syntax-Guided Synthesis (2023), arxiv, Parsert, J., and E. Polgreen [pdf]
Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues, arxiv, Liu, Yue, et al. [pdf]
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis, arxiv, Shi, Kensen, et al. [pdf]
Private-Library-Oriented Code Generation with Large Language Models (2023), arxiv, Zan, Daoguang, et al. [pdf]
LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation (2023), arxiv, Ouyang, Shuyin, et al. [pdf]
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT (2023), arxiv, Liu, Zhijie, et al. [pdf]
Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation (2023), arxiv, Li, Xin-Ye, et al. [pdf]
Neural Machine Translation for Code Generation (2023), arxiv, KC, Dharma, and Clayton T. M. [pdf]
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X (2023), arxiv, Zheng, Qinkai, et al. [pdf]
Towards Enhancing In-Context Learning for Code Generation (2023), arxiv, Li, Jia, et al. [pdf]
Knowledge Transfer for Pseudo-code Generation from Low Resource Programming Language (2023), arxiv, Sontakke, Ankita, et al. [pdf]
MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation (2023), TSE, Paul, Rishov, et al.
Self-collaboration Code Generation via ChatGPT (2023), arxiv, Dong, Yihong, et al. [pdf]
Greener yet Powerful: Taming Large Code Generation Models with Quantization (2023), arxiv, Wei, Xiaokai, et al. [pdf]
A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation (2023), arxiv, Yang, Guang, et al. [pdf]
WikiCoder: Learning to Write Knowledge-Powered Code (2023), arxiv, Matricon, Théo, et al. [pdf]
Self-planning Code Generation with Large Language Model (2023), arxiv, Jiang, Xue, et al. [pdf]
Systematically Finding Security Vulnerabilities in Black-Box Code Generation Models. (2023), arxiv, Hajipour, Hossein, et al. [pdf]
Exploring Data Augmentation for Code Generation Tasks (2023), arxiv, C., Pinzhen, and G. Lampouras [pdf]
Controlling Large Language Models to Generate Secure and Vulnerable Code (2023), arxiv, He, J., and M. Vechev [pdf]
SKCODER: A Sketch-based Approach for Automatic Code Generation (2023), arxiv, Li, Jia, et al. [pdf]
LEVER: Learning to Verify Language-to-Code Generation with Execution (2023), arxiv, Ni, Ansong, et al. [pdf]
CodeScore: Evaluating Code Generation by Learning Code Execution (2023), arxiv, Dong, Yihong, et al. [pdf]
Program Generation from Diverse Video Demonstrations (2023), arxiv, Manchin, Anthony, et al. [pdf]
Execution-based Code Generation using Deep Reinforcement Learning (2023), arxiv, Shojaee, Parshin, et al. [pdf]
SantaCoder: don't reach for the stars! (2023), arxiv, Allal, Loubna Ben, et al. [pdf]
Exploring and Evaluating Personalized Models for Code Generation, FSE'22, Zlotchevski, Andrei, et al.
Natural Language to Code Generation in Interactive Data Science Notebooks (2022), arxiv, Yin, Pengcheng, et al. [pdf]
Asking Clarification Questions for Code Generation in General-Purpose Programming Language (2022), arxiv, Li, Haau-Sing, et al. [pdf]
ExploitGen: Template-augmented exploit code generation based on CodeBERT (2022), JSS journal, Yang, Guang, et al.
Explicit Knowledge Transfer for Weakly-Supervised Code Generation (2022), arxiv, Azerbayev, Zhangir, et al. [pdf]
Program Generation from Diverse Video Demonstrations (2022), Manchin123, Anthony, et al. [pdf]
Coder Reviewer Reranking for Code Generation (2022), arxiv, Zhang, Tianyi, et al. [pdf]
Execution-based Evaluation for Data Science Code Generation Models (2022), arxiv, Huang, Junjie, et al. [pdf]
Multi-lingual Evaluation of Code Generation Models (2022), arxiv, Athiwaratkun, Ben, et al. [pdf][code]
DocCoder: Generating Code by Retrieving and Reading Docs (2022), arxiv, Zhou, Shuyan, et al. [pdf]
Compilable Neural Code Generation with Compiler Feedback (2022), ACL'22, Wang, Xin, et al. [pdf]
T5QL: Taming language models for SQL generation (2022), arxiv, Arcadinho, S., et al. [pdf]
Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation (2022), arxiv, Shen, Sijie, et al. [pdf]
Language Models Can Teach Themselves to Program Better (2022), arxiv, Haluptzok, Patrick, et al. [pdf]
DocCoder: Generating Code by Retrieving and Reading Docs (2022), arxiv, Zhou, Shuyan, et al. [pdf]
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (2022), arxiv, Le, Hung, et al. [pdf]
Repository-Level Prompt Generation for Large Language Models of Code (2022), arxiv, Shrivastava, Disha, et al. [pdf]
CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation (2022), arxiv, Zan, Daoguang, et al. [pdf]
NatGen: Generative pre-training by “Naturalizing” source code (2022), FSE'22, Chakraborty, Saikat, et al. [pdf]
StructCoder: Structure-Aware Transformer for Code Generation (2022), arxiv, Tipirneni, Sindhu, et al. [pdf]
Compilable Neural Code Generation with Compiler Feedback (2022), arxiv 2022, Wang, Xin, et al. [pdf]
Predictive Synthesis of API-Centric Code (2022), arxiv 2022, Nam, Daye, et al. [pdf]
Code Prediction by Feeding Trees to Transformers (2020), arxiv 2020, Kim, Seohyun, et al. [pdf]
TreeGen: A Tree-Based Transformer Architecture for Code Generation (2019), arxiv 2019, Zhu, Qihao, et al. [pdf]
A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation (2017), arxiv 2017, Barone, Antonio V. M., et al. [pdf]

Code Summarization

A Prompt Learning Framework for Source Code Summarization (2024), TOSEM, Sun et al.
Evaluating Code Summarization Techniques: A New Metric and an Empirical Characterization (2024), arxiv, Mastropaolo, Antonio, et al. [pdf]
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization (2024), arxiv, Wang et al. [pdf]
Towards Summarizing Code Snippets Using Pre-Trained Transformers (2024), ICPC'24, Mastropaolo et al. [pdf]
Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization (2024), ICPC'24, Li, Jiliang, et al. [pdf]
EyeTrans: Merging Human and Machine Attention for Neural Code Summarization (2024), arxiv, Zhang, Yifan, et al. [pdf]
Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code Summarization (2024), TOSEM, Zhu, Tingwei, et al.
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models (2023), arxiv, Jin, Xin, et al. [pdf]
Revisiting File Context for Source Code Summarization (2023), arxiv, Bansal, Aakash, et al. [pdf]
Distilled GPT for Source Code Summarization (2023), arxiv, Su, C. Y., & McMillan, C. [pdf]
An data augmentation method for source code summarization (2023), Journal of Neurocomputing, Song, Zixuan, et al.
Multilingual Adapter-based Knowledge Aggregation on Code Summarization for Low-Resource Languages (2023), arxiv, Saberi, Iman et al. [pdf]
Statement-based Memory for Neural Source Code Summarization (2023), arxiv, Bansal, Aakash, et al. [pdf]
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization (2023), arxiv, Ye, Tong, et al. [pdf]
Automatic Code Summarization via ChatGPT: How Far Are We? (2023), arxiv, Sun, Weisong, et al.
Function Call Graph Context Encoding for Neural Source Code Summarization (2023), TSE, Bansal, Aakash, et al.
Label Smoothing Improves Neural Source Code Summarization (2023), arxiv, Haque, Sakib, et al. [pdf]
Demystifying What Code Summarization Models Learned (2023), arxiv, Wang, Yu, and Ke Wang. [pdf]
CoSS: Leveraging Statement Semantics for Code Summarization (2023), TSE, Shi, Chaochen, et al.
An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization (2023), RG, Yang, Kang, et al. [pdf]
Interpretation-based Code Summarization (2023), arxiv, Geng, Mingyang, et al. [pdf]
Towards Retrieval-Based Neural Code Summarization: A Meta-Learning Approach (2023), TSE, Zhou, Ziyi, et al.
CLG-Trans: Contrastive Learning for Code Summarization via Graph Attention-based Transformer (2023), SCP journal, Zeng, Jianwei, et al.

Older

ClassSum: a deep learning model for class-level code summarization (2022), Springer NCA, Li, Mingchen, et al. [code]
Boosting Code Summarization by Embedding Code Structures (2022), COLING'22, Son, Jikyoeng, et al. [pdf]
Low-Resources Project-Specific Code Summarization (2022), ASE'22, Xie, Rui, et al. [pdf]
Few-shot training LLMs for project-specific code-summarization (2022), arxiv, A., Toufique, and P. Devanbu. [pdf]
Are We Building on the Rock? On the Importance of Data Preprocessing for Code Summarization (2022), FSE'22, Shi, Lin, et al. [pdf]
Learning code summarization from a small and local dataset (2022), arxiv, Ahmed, Toufique, and Devanbu, P. [pdf]
Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization (2022), ACL'22, Guo, Juncai, et al. [pdf]
AST-Trans: Code Summarization with Efficient Tree-Structured Attention (2022), ICSE'22, Tang, Ze, et al. [pdf]
GypSum: Learning Hybrid Representations for Code Summarization (2022), ICPC'22, Wang, Yu, et al. [pdf]
M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization (2022), ICPC'22, Gao, Yuexiu and Lyu, Chen [pdf]
Project-Level Encoding for Neural Source Code Summarization of Subroutines (2021), ICPC'21, Bansal, Aakash, et al. [pdf]
Code Structure Guided Transformer for Source Code Summarization (2021), arxiv 2021, Gao, Shuzheng, et al. [pdf]
Source Code Summarization Using Attention-Based Keyword Memory Networks (2020), IEEE BigComp 2020, Choi, YunSeok, et al.
A Transformer-based Approach for Source Code Summarization (2020), arxiv 2020, Ahmad, Wasi Uddin, et al. [pdf]
Learning to Represent Programs with Graphs (2018), ICLR'18, Allamanis, Miltiadis, et al. [pdf]
A Convolutional Attention Network for Extreme Summarization of Source Code (2016), ICML 2016, Allamanis, Miltiadis, et al. [pdf]

Code Embeddings/Representation

CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision (2024),ISSTA'24, Wang, Hao, et al. [pdf] [code]
CONCORD: Towards a DSL for Configurable Graph Code Representation (2024), arxiv, Saad, M., & Sharma, T. [pdf]
Code Representation Learning at Scale (2024), ICLR'24, Zhang et al. [pdf]
Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models (2024), arxiv, Agarwal, Mayank, et al. [pdf]
Pass-Tuning: Towards Structure-Aware Parameter-Efficient Tuning for Code Representation Learning (2023), EMNLP'23, Chen, Nuo, et al. [pdf]
TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation (2023), arxiv, Xian, Zixiang, et al. [pdf]
CoCoAST: Representing Source Code via Hierarchical Splitting and Reconstruction of Abstract Syntax Trees (2023), EMSE, Shi, Ensheng, et al.
Language Agnostic Code Embeddings (2023), arxiv, Utpala, Saiteja et al. [pdf]
Code Representation Pre-training with Complements from Program Executions (2023), arxiv, Huang, Jiabo, et al. [pdf]
FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations (2023), ICSE'24, Niu, Changan, et al. [pdf]
CombTransformers: Statement-Wise Transformers for Statement-Wise Representations (2023), TSE, Bertolotti, F., & Cazzola, W.
kTrans: Knowledge-Aware Transformer for Binary Code Embedding (2023), arxiv, Wenyu, Zhu, et al. [pdf][code]
TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills (2023), arxiv, Sun, Qiushi, et al. [pdf]
CodeGrid: A Grid Representation of Code (2023), ISSTA'23, Kaboré, Abdoul Kader, et al.
Symmetry-Preserving Program Representations for Learning Code Semantics (2023), arxiv, Pei, Kexin, et al. [pdf]
PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis (2023), NeurlIPS'23, TehraniJamsaz, Ali, et al. [pdf]
xASTNN: Improved Code Representations for Industrial Practice (2023), arxiv, Xu, Zhiwei, et al. [pdf]
Toward Interpretable Graph Tensor Convolution Neural Network for Code Semantics Embedding (2023), TOSEM, Yang, Jia, et al.

Older:

jTrans: Jump-Aware Transformer for Binary Code Similarity Detection (2022), ISSTA, Hao, Wang, et al. [pdf][code]
Trex: Learning Approximate Execution Semantics from Traces for Binary Function Similarity (2022), TSE, Pei, Kexin, et al. [pdf][code]
Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning (2022), ACSAC'22, Ahn, Sunwoo, et al.
CLAWSAT: Towards Both Robust and Accurate Code Models (2022), arxiv, Jia, Jinghan, et al. [pdf]
sem2vec: Semantics-Aware Assembly Tracelet Embedding (2022), TSE, Wang, Huaijin, et al.
COMBO: Pre-Training Representations of Binary Code Using Contrastive Learning (2022), arxiv, Zhang, Yifan, et al. [pdf]
Soft-Labeled Contrastive Pre-training for Function-level Code Representation (2022), arxiv, Li, Xiaonan, et al. [pdf]
A Tree-structured Transformer for Program Representation Learning (2022), arxiv, Wang, Wenhan, et al. [pdf]
What does Transformer learn about source code? (2022), arxiv, Zhang, Kechi, et al. [pdf]
Diet Code is Healthy: Simplifying Programs for Pre-Trained Models of Code (2022), arxiv, Zhang, Zhaowei, et al. [pdf]
MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning (2022), arxiv, Pian, Weiguo, et al. [pdf]
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts (2022), ACL'22, Ding, Yangruibo, et al. [pdf]
Towards Learning Generalizable Code Embeddings using Task-agnostic Graph Convolutional Networks (2022), TOSEM, Ding, Zishuo, et al.
Learning to Represent Programs with Code Hierarchies (2022), arxiv, Nguyen, Minh and Nghi DQ Bui, [pdf]
CV4Code: Sourcecode Understanding via Visual Code Representations (2022), arxiv, Shi, Ruibo, et al. [pdf]
Hyperbolic Representations of Source Code (2022), AAAI'22, Khan, Raiyan, et al. [pdf]
Unified Abstract Syntax Tree Representation Learning for Cross-Language Program Classification (2022), ICPC'22, Wang, Kesu, et al. [pdf]
Hierarchical Semantic-Aware Neural Code Representation (2022), JSS'22, Jiang, Yuan, et al.
CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training (2022), arxiv 2022, Wang, Xin, et al. [pdf]
Hierarchical Heterogeneous Graph Attention Network for Syntax-Aware Summarization (2022), AAAI'22, Song, Z., and King, I., [pdf]
Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings (2022), ICSE'22, Li, Zongjie, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM'22, Lin, Zehao, et al.
Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension (2022), TOSEM'22, Bertolotti, Francesco and Cazzola, Walter
HELoC: Hierarchical Contrastive Learning of Source Code Representation (2022), ICPC'22, Wang, Xiao, et al. [pdf]
Multi-View Graph Representation for Programming Language Processing: An Investigation into Algorithm Detection (2022), AAAI'22, Long, Tin et al. [pdf]
UniXcoder: Unified Cross-Modal Pre-training for Code Representation (2022), arxiv 2022, Guo, Daya, et al. [pdf]
SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations (2022), ICSE'22, Niu, Changan, et al. [pdf]
GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses (2022), MSR'22, Ma, Wei, et al.
OSCAR: How could Neural Networks understand Programs? (2021), ICML'21, Peng, Dinglan, et al. [pdf]
PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations (2021), ICML'21, Cummins, Chris, et al. [pdf]
CoTexT: Multi-task Learning with Code-Text Transformer (2021), arxiv, Phan, Long, et al. [pdf]
TreeCaps: Tree-Based Capsule Networks for Source Code Processing (2021), AAAI'21, Bui, Nghi DQ, et al. [pdf] [code]
Language-Agnostic Representation Learning of Source Code from Structure and Context (2021), ICLR'21, Zügner, Daniel, et al. [pdf]
IR2Vec: LLVM IR Based Scalable Program Embeddings (2020), TACO journal, VenkataKeerthy, S., et al.
Compiler-Based Graph Representations for Deep Learning Models of Code (2020), CC'20, Brauckmann, Alexander, et al.
Learning and Evaluating Contextual Embedding of Source Code (2020), ICML 2020, Kanade, Aditya, et al. [pdf]
Learning Semantic Program Embeddings with Graph Interval Neural Network (2020), OOPSLA'20, Wang, Yu, et al.
Contrastive Code Representation Learning (2020), arxiv 2020, Jain, Paras, et al. [pdf]
SCELMo: Source Code Embeddings from Language Models (2020), arxiv 2020, Karampatsis, Rafael-Michael, et al. [pdf]
code2vec: Learning Distributed Representations of Code (2019), ACM POPL 2019, Alon, Uri, et al. [pdf]
COSET: A Benchmark for Evaluating Neural Program Embeddings (2019), arxiv 2019, Wang, Ke, et al. [pdf]
A Literature Study of Embeddings on Source Code (2019), arxiv 2019, Chen, Zimin, et al. [pdf]
code2seq: Generating Sequences from Structured Representations of Code (2018), arxiv 2018, Alon, Uri, et al. [pdf]
Neural Code Comprehension: A Learnable Representation of Code Semantics (2018), NIPS 2018, Ben-Nun, Tal, et al. [pdf]
Convolutional Neural Networks over Tree Structures for Programming Language Processing (2016), AAAI'16, Mou, Lili, et al. [pdf]

Code Changes/Editing

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions (2023), arxiv, Cassano, Federico, et al. [pdf]
Grace: Language Models Meet Code Edits (2023), FSE'23, Gupta, Priyanshu, et al.
AdaptivePaste: Intelligent Copy-Paste in IDE (2023), FSE'23, Liu, Xiaoyu, et al.
Learning to Represent Patches (2023), ICSE'24, Tang, Xunzhu, et al. [pdf]
InstructCoder: Empowering Language Models to Edit Code (2023), arxiv, Hu, Qisheng, et al. [pdf]
CCBERT: Self-Supervised Code Change Representation Learning (2023), ICSME'23, Zhou, Xin, et al. [pdf]
Automated Code Editing with Search-Generate-Modify (2023), arxiv, Liu, Changshu, et al. [pdf]
Multilingual Code Co-Evolution Using Large Language Models (2023), arxiv, Zhang, Jiyang, et al. [pdf]
Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing (2023), arxiv, Wei, Jiayi, et al. [pdf]
CCT5: A Code-Change-Oriented Pre-Trained Model (2023), arxiv, Lin, Bo, et al. [pdf]
GrACE: Generation using Associated Code Edits (2023), arxiv, Gupta, Priyanshu, et al. [pdf]
Slice-Based Code Change Representation Learning (2023), arxiv, Zhang, Fengyi, et al. [pdf]
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions (2023), arxiv, Fakhoury, Sarah, et al. [pdf]
CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back (2023), arxiv, Liu, Zhongxin, et al. [pdf]
CoditT5: Pretraining for Source Code and Natural Language Editing (2022), ASE 2022, Jiyang, Zhang, et al. [pdf]
Commit2Vec: Learning Distributed Representations of Code Changes (2021), SN Computer Science, Lozoya, Rocío Cabrera, et al.
CODIT: Code Editing with Tree-Based Neural Models (2020), TSE 2020, Chakraborty, Saikat, et al.
On learning meaningful code changes via neural machine translation (2019), ICSE 2019, Tufano, Michele, et al.

Code Comments

CupCleaner: A Data Cleaning Approach for Comment Updating (2023), arxiv, Liang, Qingyuan, et al. [pdf]
Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning (2023), ICSE'24, Geng, Mingyang, et al. [pdf]
Snippet Comment Generation Based on Code Context Expansion (2023), arxiv, GUO, HANYANG, et al.
An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation (2023), arxiv, Geng, Mingyang, et al. [pdf]
An Intra-Class Relation Guided Approach for Code Comment Generation (2023), EACL'23, Wang, Zhenni, et al. [pdf]
APIContext2Com: Code Comment Generation by Incorporating Pre-Defined API Documentation (2023), arxiv, Shahbazi, R., and Fard F. [pdf]
Developer-Intent Driven Code Comment Generation (2023), arxiv, Mu, Fangwen, et al. [pdf]
ALSI-Transformer: Transformer-Based Code Comment Generation With Aligned Lexical and Syntactic Information (2023), IEEE Access, Park, Youngmi, et al.

Bug/Vulnerability Detection

Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks (2024), ICSE'24, Liu et al. [pdf]
JITGNN: A deep graph neural network framework for Just-In-Time bug prediction (2024), JSS, Keshavarz, H., and G. Rodríguez-Pérez
DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models (2024), arxiv, Berabi, Berkay, et al. [pdf]
Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT (2024), EMSE, Pujar, Saurabh, et al.
Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities (2024), arxiv, Nong, Yu, et al. [pdf]
Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models (2024), arxiv, N. T. Islam & P. Najafirad [pdf]
Vision Transformer Inspired Automated Vulnerability Repair (2024), TOSEM, Fu, Michael, et al.
Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet (2023), arxiv, Ullah, Saad, et al. [pdf]
BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning (2023), ASIACC'24, He, Xu, et al. [pdf]
Commit-Level, Neural Vulnerability Detection and Assessment (2023), FSE'23, Li, Yi, et al.
Learning Defect Prediction from Unrealistic Data (2023), arxiv, Alrashedy, Kamel, et al. [pdf]
SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning (2023), arxiv, Yang, Xueqi, et al. [pdf]
How Far Have We Gone in Vulnerability Detection Using Large Language Models (2023), arxiv, Zeyu, Gao, et al. [pdf]
Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization (2023), arxiv, Du, Y., & Yu, Z. [pdf]
PrAIoritize: Learning to Prioritize Smart Contract Bugs and Vulnerabilities (2023), arxiv, Soud, Majd, et al. [pdf]
Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? (2023), arxiv, Chan, Aaron, et al. [pdf]
LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability Types (2023), arxiv, Wen, Xin-Cheng, et al. [pdf]
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities (2023), arxiv, Fu, Michael, et al. [pdf]
CPVD: Cross Project Vulnerability Detection Based on Graph Attention Network and Domain Adaptation (2023), TSE, Zhang, Chunyong, et al.
FLAG: Finding Line Anomalies (in code) with Generative AI (2023), arxiv, Ahmad, Baleegh, et al. [pdf]
A Novel Approach to Identify Security Controls in Source Code (2023), arxiv, Okutan, Ahmet, et al. [pdf]
Limits of Machine Learning for Automatic Vulnerability Detection (2023), arxiv, Risse, N., & Böhme, M. [pdf]
Detecting Condition-Related Bugs with Control Flow Graph Neural Network (2023), ISTTA'23, Zhang, Jian, et al.
A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification (2023), arxiv, Charalambous, Yiannis, et al. [pdf]
An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph (2023), arxiv, Islam, Nafis Tanveer, et al. [pdf]
Large Language Models and Simple, Stupid Bugs (2023), arxiv, Jesse, Kevin, et al. [pdf]
Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning (2023), arxiv, Wen, Xin-Cheng, et al. [pdf]
DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection (2023), arxiv, Wang, Wenbo, et al. [pdf]
CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection (2023), JSS journal, Tang, Wei, et al.
Fixing Hardware Security Bugs with Large Language Models (2023), arxiv, Ahmad, Baleegh, et al. [pdf]
VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application (2023), Applied Sciences journal, Lin, Chun, et al. [pdf]

Older

VDGraph2Vec: Vulnerability Detection in Assembly Code using Message Passing Neural Networks (2022), ICMLA'22, Diwan, Ashita, et al. [pdf]
VulChecker: Graph-based Vulnerability Localization in Source Code (2022), Usenix, Mirsky, Yisroel, et al. [pdf]
DeepVulSeeker: A Novel Vulnerability Identification Framework via Code Graph Structure and Pre-training Mechanism (2022), arxiv, Wang, Jin, et al. [pdf]
Compact Abstract Graphs for Detecting Code Vulnerability with GNN Models (2022), ACSAC'22, Luo, Yu, et al.
An Empirical Study of Deep Learning Models for Vulnerability Detection (2022), arxiv, Steenhoek, Benjamin, et al. [pdf]
Variable-Based Fault Localization via Enhanced Decision Tree (2022), arxiv, Jiang, Jiajun, et al. [pdf]
SPVF: security property assisted vulnerability fixing via attention-based models (2022), EMSE, Zhou, Zhou, et al.
Modeling function-level interactions for file-level bug localization (2022), EMSE, Liang, H., et al.
Practical Automated Detection of Malicious npm Packages (2022), ICSE'22, Sejfia, A., and M. Schäfer [pdf]
Machine Learning for Source Code Vulnerability Detection: What Works and What Isn't There Yet (2022), IEEE Security & Privacy, Marjanov, Tina, et al.
Path-sensitive code embedding via contrastive learning for software vulnerability detection (2022), ISSTA'22, Cheng, Xiao, et al.
VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection (2022), arxiv 2022, Hanif, H. and Maffeis, S. [pdf]
Katana: Dual Slicing-Based Context for Learning Bug Fixes (2022), arxiv 2022, Sintaha, Mifta, et al. [pdf]
LineVul: A Transformer-based Line-Level Vulnerability Prediction (2022), MSR'22, Fu, M., & Tantithamthavorn, C. [pdf][code]
Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms (2022), arxiv 2022, Thapa, Chandra, et al. [pdf]
LineVD: Statement-level Vulnerability Detection using Graph Neural Networks (2022), MSR'22, Hin, David, et al. [pdf]
Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks (2022), ICSE'22, Patra, Jibesh, et al. [pdf]
Hoppity: Learning graph transformations to detect and fix bugs in programs (2020), ICLR 2020, Dinella, Elizabeth, et al. [pdf]
Deep Learning based Software Defect Prediction (2020), Neurocomputing, Qiao, Lei, et al.
Software Vulnerability Discovery via Learning Multi-domain Knowledge Bases (2019), IEEE TDSC, Lin, Guanjun, et al.
Neural Bug Finding: A Study of Opportunities and Challenges (2019), arxiv 2019, Habib, Andrew, et al. [pdf]
Automated Vulnerability Detection in Source Code Using Deep Representation Learning (2018), ICMLA 2018, Russell, Rebecca, et al.
DeepBugs: A Learning Approach to Name-based Bug Detection (2018), ACM PL 2018, Pradel, Michael, et al. [pdf]
Automatically Learning Semantic Features for Defect Prediction (2016), ICSE 2016, Wang, Song, et al.

Source Code Modeling

Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models (2024), ICSE'24, Gao, Shuzheng, et al. [pdf]
CONCORD: Clone-aware Contrastive Learning for Source Code (2023), ISSTA'23, Ding, Yangruibo, et al. [pdf]
TRACED: Execution-aware Pre-training for Source Code (2023), ICSE'24, Ding, Yangruibo, et al. [pdf]
ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning (2023), arxiv, Liu, Shangqing, et al. [pdf]
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages (2022), arxiv, Chai, Yekun, et al. [pdf]
Do Bugs Lead to Unnaturalness of Source Code? (2022), FSE'22, Jiang, Yanjie, et al.
On the Naturalness of Bytecode Instructions (2022), ASE'22, Choi, Y., and J. Nam. [pdf]
CodeBERT-nt: code naturalness via CodeBERT (2022), arxiv, Khanfir, Ahmed, et al. [pdf]
CommitBART: A Large Pre-trained Model for GitHub Commits (2022), arxiv, Liu, S., et al, [pdf]
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts (2022), ACL'22, Ding, Yangruibo, et al. [pdf]
Multilingual training for Software Engineering (2022), ICSE'22, Ahmed, Toufique, et al. [pdf]
Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code (2020), ICSE'20, Karampatsis, Rafael-Michael, et al.
Maybe Deep Neural Networks are the Best Choice for Modeling Source Code (2019), arxiv 2019, Karampatsis, Rafael-Michael, et al. [pdf]
Are Deep Neural Networks the Best Choice for Modeling Source Code? (2017), FSE 2017, Hellendoorn, Vincent J., et al. [pdf]

Program Repair

T5APR: Empowering Automated Program Repair across Languages through Checkpoint Ensemble (2024), JSS, Gharibi, Reza, et al. [pdf]
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair (2024), arxiv, Silva, André et al. [pdf]
On Repairing Quantum Programs Using ChatGPT (2024), Q-SE'24, Guo et al. [pdf]
CigaR: Cost-efficient Program Repair with LLMs (2024), arxiv, Hidvégi, Dávid, et al. [pdf]
PyTy: Repairing Static Type Errors in Python (2024), ICSE'24, Chow, Yiu W., et al. [pdf]
A Novel Approach for Automated Program Repair using Round-Trip Translation with Large Language Models (2024), arxiv, Ruiz, F. Vallecillos, et al. [pdf]
APPT: Boosting Automated Patch Correctness Prediction via Fine-tuning Pre-trained Models (2024), TSE, Zhang, Quanjun, et al. [pdf]
Towards Low-Resource Automatic Program Repair with Meta-Learning and Pretrained Language Models (2023), EMNLP'23, Wang, Weishi, et al. [pdf]
GPT-3-Powered Type Error Debugging: Investigating the Use of Large Language Models for Code Repair (2023), SLE'23, Ribeiro, Francisco, et al.
Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering (2023), arxiv, Paul, Rishov, et al. [pdf]
Code Similarity and Location-Awareness Automatic Program Repair (2023), Applied Sciences, Cao, Heling, et al. [pdf]
The Future Can’t Help Fix The Past: Assessing Program Repair In The Wild (2023), RG, Kabadi, Vinay, et al. [pdf]
Revisiting the Plastic Surgery Hypothesis via Large Language Models (2023), arxiv, Xia, Chunqiu Steven et al. [pdf]
A Survey on Automated Program Repair Techniques (2023), arxiv, Huang, Kai, et al. [pdf]
Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT (2023), arxiv, Xia, C. S., and Lingming Z. [pdf]
MUFIN: Improving Neural Repair Models with Back-Translation (2023), arxiv, Silva, André, et al. [pdf]
Explainable Automated Debugging via Large Language Model-driven Scientific Debugging (2023), arxiv, Kang, Sungmin, et al. [pdf]
A study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repair (2023), arxiv, Cao, Jialun, et al. [pdf]
ITER: Iterative Neural Repair for Multi-Location Patches (2023), arxiv, Ye, He, and Martin M. [pdf]
TraceFixer: Execution Trace-Guided Program Repair (2023), arxiv, Bouzenia, Islem, et al. [pdf]
PatchZero: Zero-Shot Automatic Patch Correctness Assessment (2023), arxiv, Zhou, Xin, et al. [pdf]
Rete: Learning Namespace Representation for Program Repair (2023), ICSE'23, Parasaram, Nikhil et al. [pdf]
InferFix: End-to-End Program Repair with LLMs over Retrieval-Augmented Prompts (2023), arxiv, Jin, Matthew, et al. [pdf]
Automated Program Repair in the Era of Large Pre-trained Language Models (2023), arxiv, Xia, C. S. et al. [pdf]
KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair (2023), ICSE'23, Jiang, Nan, et al. [pdf]
Impact of Code Language Models on Automated Program Repair (2023), ICSE'23, Jiang, Nan, et al. [pdf]
Embedding Context as Code Dependencies for Neural Program Repair (2023), ICST'23, Nashid, Noor, et al. [pdf]
Tare: Type-Aware Neural Program Repair (2023), arxiv, Zhu, Qihao, et al. [pdf]
Conversational Automated Program Repair (2023), arxiv, Xia, Chunqiu Steven et al. [pdf]
An Analysis of the Automatic Bug Fixing Performance of ChatGPT (2023), arxiv, Sobania, Dominik, et al. [pdf]
Improving Automated Program Repair with Domain Adaptation (2023), arxiv, Zirak, A., and Hemati, H. [pdf]
A Survey of Learning-based Automated Program Repair (2023), arxiv, Zhang, Quanjun, et al. [pdf]
TransplantFix: Graph Differencing-based Code Transplantation for Automated Program Repair (2023), ASE'22, Yang, Deheng, et al. [pdf]

Older:

Program Repair: Survey (2022), arxiv, Gao, Xiang, et al. [pdf]
SelfAPR: Self-supervised Program Repair with Test Execution Diagnostics (2022), ASE'22, He et al. [pdf]
Neural Program Repair using Execution-based Backpropagation (2022), ICSE'22, He et al. [pdf]
Practical Program Repair in the Era of Large Pre-trained Language Models (2022), arxiv, Xia, C. S. et al. [pdf]
SYNSHINE: improved fixing of Syntax Errors (2022), IEEE TSE, Ahmed, T. et al.
TransRepair: Context-aware Program Repair for Compilation Errors (2022), ASE'22, Li, Xueyang, et al. [pdf]
Repairing Bugs in Python Assignments Using Large Language Models (2022), arxiv, Zhang, Jialu, et al. [pdf]
Repair Is Nearly Generation: Multilingual Program Repair with LLMs (2022), arxiv, Joshi, Harshit, et al. [pdf]
VulRepair: A T5-Based Automated Software Vulnerability Repair (2022), FSE'22, Fu, Michael, et al. [pdf]
Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-shot Learning (2022), FSE'22, Xia, Chunqiu Steven, and Lingming Z. [pdf]
Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes (2022), arxiv, Richter, Cedric, and Heike W. [pdf]
AdaptivePaste: Code Adaptation through Learning Semantics-aware Variable Usage Representations (2022), arxiv 2022, Liu, Xiaoyu, et al. [pdf]
DEAR: A Novel Deep Learning-based Approach for Automated Program Repair (2022), ICSE'22, Li, Yi, et al. [pdf]
TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer (2021), ICML'21, Berabi, Berkay, et al. [pdf]
Neural Transfer Learning for Repairing Security Vulnerabilities in C Code (2021), Chen, Zimin, et al. [pdf]
Generating Bug-Fixes Using Pretrained Transformers (2021), arxiv 2021, Drain, Dawn, et al. [pdf]
Global Relational Models of Source Code (2020), ICLR'20, Hellendoorn, Vincent J., et al. [pdf]
Neural Program Repair by Jointly Learning to Localize and Repair (2019), arxiv 2019, Vasic, Marko, et al. [pdf]

Program Translation

Few-shot code translation via task-adapted prompt learning (2024), JSS, Li, Xuan, et al.
Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery (2023), EMNLP'23, Ahmad, I., & Luo, L. [pdf]
TransMap: Pinpointing Mistakes in Neural Code Translation (2023), FSE'23, Wang, Bo, et al.
On the Evaluation of Neural Code Translation: Taxonomy and Benchmark (2023), arxiv, Jiao, Mingsheng, et al. [pdf]
Attention, Compilation, and Solver-based Symbolic Analysis are All You Need (2023), arxiv, Jana, Prithwish, et al. [pdf]
Understanding the Effectiveness of Large Language Models in Code Translation (2023), arxiv, Pan, Rangeet, et al. [pdf]
On ML-Based Program Translation: Perils and Promises (2023), arxiv, Malyala, Aniketh, et al. [pdf]
Boosting Neural Networks to Decompile Optimized Binaries (2022), ACSAC'22, Cao, Ying, et al.
The Effectiveness of Transformer Models for Analyzing Low-Level Programs (2022), MIT Primes, Zifan Guo [pdf]
Code Translation with Compiler Representations (2022), arxiv, Szafraniec, Marc, et al. [pdf]
BabelTower: Learning to Auto-parallelized Program Translation (2022), ICML'22, Wen, Yuanbo, et al. [pdf]
Multilingual Code Snippets Training for Program Translation (2022), AAAI'22, Zhu, Ming, et al. [pdf]
Semantics-Recovering Decompilation through Neural Machine Translation (2021), arxiv 2021, Liang, Ruigang, et al. [pdf]
Unsupervised Translation of Programming Languages (2020), arxiv 2020, Lachaux, Marie-Anne et al. [pdf]

Program Analysis

Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning (2024), FSE'24, Yadavally, Aashish, et al. [pdf]
A Learning-Based Approach to Static Program Slicing (2024), OOPSLA'24, Yadavally, Aashish, et al. [pdf][code]
On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical Study (2024), MSR'24, Mir, Amir et al. [pdf]
Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis (2023), arxiv, Fan, Gang, et al. [pdf]
(Partial) Program Dependence Analysis (2023), ICSE'23, Yadavally, Aashish, et al. [pdf][code]
Precise Data-Driven Approximation for Program Analysis via Fuzzing (2023), ASE'23, Parasaram, Nikhil, et al. [pdf]
The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models (2023), arxiv, Li, Haonan, et al. [pdf]
AutoPruner: Transformer-Based Call Graph Pruning (2022), FSE'22, Le-Cong, Thanh, et al. [pdf][code]
Striking a Balance: Pruning False-Positives from Static Call Graphs (2022), ICSE'22, Utture, Akshay, et al. [pdf][code]

Software Testing

Automated Test Case Repair Using Language Models (2024), arxiv, Yaraghi, A. S., et al. [pdf]
Using GitHub Copilot for Test Generation in Python: An Empirical Study (2024), AST'24, El Haji, Khalid et al. [pdf]
Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents (2024), arxiv, Yoon, Juyeon et al. [pdf]
Enhancing Large Language Models for Text-to-Testcase Generation (2024), arxiv, Alagarsamy, Saranya, et al. [pdf]
CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement Learning for LLM-based Mutation (2024), arxiv, Eom, Jueon et al. [pdf]
Code-Aware Prompting: A study of Coverage guided Test Generation in Regression Setting using LLM (2024), arxiv, Ryan, Gabriel, et al. [pdf]
LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models (2024), arxiv, Shou, Chaofan, et al. [pdf]
Automated Test Case Repair Using Language Models (2024), arxiv, Yaraghi, A. S., et al. [pdf]
Fuzz4All: Universal Fuzzing with Large Language Models (2024), ICSE'24, Xia, C., et al. [pdf]
TDD Without Tears: Towards Test Case Generation from Requirements through Deep Reinforcement Learning (2024), arxiv, Takerngsaksiri, Wannita, et al. [pdf]
Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools (2024), arxiv, Bhatia, Shreya, et al. [pdf]
CAT-LM: Training Language Models on Aligned Code And Tests, ASE'23, Rao, Nikitha, et al. [pdf]
LLM4TDD: Best Practices for Test Driven Development Using Large Language Models (2023), arxiv, Piya, S., & Sullivan, A. [pdf]
Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing (2023), arxiv, Yoon, Juyeon, et al. [pdf]
White-box Compiler Fuzzing Empowered by Large Language Models (2023), arxiv, Yang, Chenyuan, et al. [pdf]
Test Case Recommendations with Distributed Representation of Code Syntactic Features (2023), ASEW'23, Rezaei, M. et al. [pdf]
Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models (2023), arxiv, Plein, Laura, et al. [pdf]
The Program Testing Ability of Large Language Models for Code (2023), arxiv, Xiong, W. et al. [pdf]
Revisiting Neural Program Smoothing for Fuzzing (2023), FSE'23, Bansal, Aakash et al. [pdf]
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (2023), arxiv, Schäfer, Max, et al. [pdf]
Automated Test Case Generation Using Code Models and Domain Adaptation (2023), arxiv, Hashtroudi, Sepehr, et al. [pdf]
Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing (2023), arxiv, Dakhel, A. M., et al. [pdf]
Automatic Unit Test Generation for Deep Learning Frameworks based on API Knowledge (2023), arxiv, Narayanan, A., et al. [pdf]
Black-Box Prediction of Flaky Test Fix Categories Using Language Models (2023), arxiv, Fatima, S., et al. [pdf]
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models (2023), ISSTA'23, Deng, Yinlin, et al. [pdf]
Understanding Large Language Model Based Fuzz Driver Generation (2023), arxiv, Zhang, Cen, et al. [pdf]
Universal Fuzzing via Large Language Models (2023), arxiv, Xia, Chunqiu Steven, et al. [pdf]
SAGA: Summarization-Guided Assert Statement Generation (2023), arxiv, Zhang, Yuwei, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2023), ISSTA'23, Liu, Zhongxin, et al. [pdf]
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models (2023), arxiv, Pan, Rongqi, et al. [pdf]
ChatGPT and Software Testing Education: Promises & Perils (2023), arxiv, Jalil, Sajed, et al. [pdf]
Adaptive Test Generation Using a Large Language Model (2023), arxiv, Schäfer, Max, et al. [pdf]
CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models (2023), ICSE'23, Lemieux, Caroline, et al. [pdf]
Learning Deep Semantics for Test Completion (2023), arxiv, Nie, Pengyu, et al. [pdf]
A3Test: Assertion-Augmented Automated Test Case Generation (2023), arxiv, Alagarsamy, Saranya, et al. [pdf]
Efficient Mutation Testing via Pre-Trained Language Models (2023), arxiv, Khanfir, Ahmed, et al. [pdf]

Older:

Test2Vec: An Execution Trace Embedding for Test Case Prioritization (2022), arxiv, Jabbar, Emad, et al. [pdf]
Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers (2022), AST'22, Tufano, Michele, et al.
On Learning Meaningful Assert Statements for Unit Test Cases (2020), ICSE'20, Watson, Cody, et al.

Code Clone Detection

CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection (2024),ISSTA'24, Wang, Hao, et al. [pdf] [code]
Investigating the Efficacy of Large Language Models for Code Clone Detection , ICPC'24, Khajezade, Mohamad, et al. [pdf]
Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks (2023), arxiv, Mehrotra, Nikita, et al.
ZC3: Zero-Shot Cross-Language Code Clone Detection (2023), arxiv, Li, Jia, et al. [pdf]
Comparison and Evaluation of Clone Detection Techniques with Different Code Representations (2023), ICSE'23, Wang, Yuekun, et al. [pdf]
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey (2023), arxiv, Dou, Shihan, et al. [pdf]
CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search (2023), arxiv, Sorokin, Nikita, et al. [pdf]
Neuro-symbolic Zero-Shot Code Cloning with Cross-Language Intermediate Representation (2023), arxiv, Hasija, Krishnam, et al. [pdf]
Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection (2023), arxiv, Pinku, Subroto Nag et al. [pdf]
Graph-based code semantics learning for efficient semantic code clone detection (2022), IST journal, Yu, Dongjin, et al.
Efficient transformer with code token learner for code clone detection (2022), arxiv, Zhang, Aiping, et al.
Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection (2022), arxiv, Zubkov, Maksim, et al. [pdf]
Cross-Language Source Code Clone Detection Using Deep Learning with InferCode (2022), arxiv 2022, Yahya, M., and Kim, D., [pdf]
funcGNN: A Graph Neural Network Approach to Program Similarity (2020), ESEM'20, Nair, Aravind, et al. [pdf]
Cross-Language Clone Detection by Learning Over Abstract Syntax Trees (2019), MSR'19, Perez, Daniel, et al.
The Adverse Effects of Code Duplication in Machine Learning Models of Code (2019), Onward! 2019, Allamanis, Miltiadis, [pdf]

Code Search

Rapid: Zero-shot Domain Adaptation for Code Search with Pre-trained Models (2024), TOSEM, Fan et al.
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search (2024), arxiv, Li, Haochen et al. [pdf]
Rapid: Zero-shot Domain Adaptation for Code Search with Pre-trained Models (2024), TOSEM, Fan, Guodong, et al.
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search (2024), arxiv, Li, Haochen, et al. [pdf]
Intervention-Based Alignment of Code Search with Execution Feedback (2023), EMNLP'23, Han, Hojae, et al. [pdf]
You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search (2023), ICSME'23, Wang, Yanlin, et al. [pdf]
Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models (2023), FSE'23, Gotmare, A., et al.
GraphSearchNet: Enhancing GNNs via capturing global dependencies for semantic code search (2023), TSE, Liu, Shangqing, et al. [pdf]
KAPE: kNN-based Performance Testing for Deep Code Search (2023), TOSEM, uo, Yuejun, et al. [pdf]
Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network (2023), OOPSLA'23, Wang, Shangwen, et al. [pdf]
Hyperbolic Code Retrieval: A Novel Approach for Efficient Code Search Using Hyperbolic Space Embeddings (2023), arxiv, Tang, Xunzhu, et al. [pdf]
Rethinking Negative Pairs in Code Search (2023), EMNLP'23, Li, Haochen, et al. [pdf][code]
Hyperbolic Code Retrieval: A Novel Approach for Efficient Code Search Using Hyperbolic Space Embeddings (2023), AAAI'24, Tang, Xunzhu, et al. [pdf]
Self-Supervised Query Reformulation for Code Search (2023), FSE'23, Mao, Yuetian, et al. [pdf]
Evaluating and Optimizing the Effectiveness of Neural Machine Translation in Supporting Code Retrieval Models: A Study on the CAT Benchmark (2023), CIKM'23, P. Hung, and A. Jannesari. [pdf]
CoCoSoDa: Effective Contrastive Learning for Code Search (2023) ICSE'23, Shi, Ensheng, et al. [pdf]
Improving Code Search with Multi-Modal Momentum Contrastive Learning (2023), ICPC'23, Shi, Zejian, et al. [pdf]
MulCS: Towards a Unified Deep Representation for Multilingual Code Search (2023), SANER'23, Ma, Yingwei, et al. [pdf]
A mutual embedded self-attention network model for code search (2023), JSS, Hu, Haize, et al.

Older:

You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search (2022), FSE'22, Wan, Yao, et al.
How to Better Utilize Code Graphs in Semantic Code Search? (2022), FSE'22, Shi, Yucen, et al.
Exploring Representation-Level Augmentation for Code Search (2022), EMNLP'22, Li, Haochen, et al. [pdf][code]
A code search engine for software ecosystems (2022), CEUR, Pfaff, Chris, et al. [pdf]
Cross-Domain Deep Code Search with Meta Learning (2022), ICSE'22, Chai, Yitian, et al. [pdf]

Code Language Models

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model (2023), arxiv, Di, Peng, et al. [pdf]
Code Llama: Open Foundation Models for Code (2023), Meta AI, Rozière et al. [pdf]
Gorilla: Large Language Model Connected with Massive APIs (2023), arxiv, Patil, Shishir G., et al. [pdf]
CodeT5+: Open Code Large Language Models for Code Understanding and Generation (2023), arxiv, Wang, Yue, et al. [pdf]
Better Language Models of Code through Self-Improvement (2023), arxiv, To, Hung Quoc, et al. [pdf]
A Systematic Evaluation of Large Language Models of Code (2022), arxiv 2022, Xu, Frank F., et al. [pdf][code]
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation (2021), EMNLP'21, Wang, Yue, et al. [pdf]
JavaBERT: Training a Transformer-Based Model for the Java Programming Language (2021), ASEW'21, De Sousa, N. T., and W. Hasselbring
TreeBERT: A Tree-Based Pre-Trained Model for Programming Language (2021), UAI'21, Jiang, Xue, et al. [pdf]
PLBART: Unified Pre-training for Program Understanding and Generation (2021), NAACL'21, Ahmad, Wasi Uddin, et al. [pdf]
Evaluating Large Language Models Trained on Code (2021), arxiv 2021, Chen, Mark, et al. [pdf] [code]
GraphCodeBERT: Pre-training Code Representations with Data Flow (2021), arxiv, Guo, Daya, et al. [pdf]
C-BERT: Exploring Software Naturalness through Neural Language Models (2020), arxiv, Buratti, Luca, et al. [pdf]
Codebert: A Pre-trained Model for Programming and Natural Languages (2020), arxiv 2020, Feng, Zhangyin, et al. [pdf]

Code Review

Code Review Automation: Strengths and Weaknesses of the State of the Art (2024), TSE'24, Tufano, et al.
Improving Automated Code Reviews: Learning from Experience (2024), MSR'24, Hong Yi Lin et al. [pdf]
GPT-3.5 for Code Review Automation: How Do Few-Shot Learning, Prompt Design, and Model Fine-Tuning Impact Their Performance? (2024), arxiv, Pornprasit, C., & Tantithamthavorn, C. [pdf]
Security Code Review by LLMs: A Deep Dive into Responses (2024), arxiv, Yu et al. [pdf]
Resolving Code Review Comments with Machine Learning (2023), ICSE'24, Frömmgen, et al. [pdf]
Team-related Features in Code Review Prediction Models (2023), arxiv, Witter, Eduardo et al. [pdf]
Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation (2023), arxiv, Sghaier et al. [pdf]
LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (2023), ISSRE'23, Lu, Junyi, et al. [pdf]
Learning to Predict Code Review Completion Time In Modern Code Review (2023), EMSE journal, Chouchen, Moataz, et al.
ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation (2023), arxiv, Mahbub, Saifullah, et al. [pdf]
ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments (2023), arxiv, Saker, Jaydeb, et al. [pdf]
Generation-based Code Review Automation: How Far Are We? (2023), arxiv, Zhou, Xin, et al. [pdf]
D-ACT: Towards Diff-Aware Code Transformation for Code Review Under a Time-Wise Evaluation (2023), arxiv, Pornprasit, Chanathip, et al. [pdf]
AUGER: Automatically Generating Review Comments with Pre-training Models (2022), FSE'22, Li, Lingwei, et al. [pdf]
Automating Code Review Activities by Large-Scale Pre-training (2022), FSE'22, Li, Zhiyu, et al. [pdf] [code]
Using Pre-Trained Models to Boost Code Review Automation (2022), ICSE'22, Tufano, et al. [pdf]
Using Towards Automating Code Review Activities (2021), ICSE'21, Tufano, et al. [pdf]

Code Documentation

APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large Language Models for Augmenting API Documentation (2024), arxiv, Yang, Chengran, et al. [pdf]
Evaluating Transfer Learning for Simplifying GitHub READMEs (2023), FSE'23, Gao, Haoyu, et al. [pdf]
Too long; didn’t read: Automatic summarization of GitHub README.MD with Transformers (2023), EASE'23, Doan, Thu TH, et al. [pdf]
HotGPT: How to Make Software Documentation More Useful with a Large Language Model? (2023), HOTOS'23, Su, Yiming, et al.
Automatic Code Documentation Generation Using GPT-3 (2022), ASE'22, Khan, J. Y., and G. Uddin. [pdf]
Learning-based Identification of Coding Best Practices from Software Documentation (2022), ICSME'22, Sawant, N., and S. H. Sengamedu [pdf]

Empirical Studies

Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code (2024), arxiv, Honarvar, Shahin, et al. [pdf]
An Empirical Study on Distilling ChatGPT for Advancing Code Intelligence Tasks (2024), arxiv, Yang et al. [pdf]
How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations (2024), arxiv, AlOmar, Eman Abdullah, et al. [pdf]
Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study (2024), arxiv, Liu, Shuo, et al. [pdf]
Do Large Code Models Understand Programming Concepts? A Black-box Approach (2024), arxiv, Hooda, Ashish, et al. [pdf]
Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants (2024), ICPC'24, Corso, Vincenzo, et al. [[pdf]][https://arxiv.org/pdf/2402.08431]
On the Reliability and Explainability of Language Models for Program Generation (2024), TSE, Liu, Yue, et al.
Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects (2024), arxiv, Grewal, Balreet, et al. [pdf]
Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation (2024), arxiv, Jin, Kailun, et al. [pdf]
Studying LLM Performance on Closed- and Open-source Data (2024), arxiv, Ahmed, Toufique, et al. [pdf]
On Trojan Signatures in Large Language Models of Code (2024), arxiv, Hussain et al. [pdf]
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? (2024), arxiv, Velasco, Alejandro, et al. [pdf]
An empirical assessment of different word embedding and deep learning models for bug assignment (2024), JSS, Wang, Rongcun, et al.
On Extracting Specialized Code Abilities from Large Language Models: A Feasibility Study (2024), ICSE'24, Li, Zongjie, et al. [pdf]
Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot (2024), MSR'24, Koyanagi, Kei, et al.
Boosting Source Code Learning with Text-Oriented Data Augmentation: An Empirical Study (2023), QRS-C'23, [pdf]
How to get better embeddings with code pre-trained models? An empirical study (2023), arxiv, Zhao, Yu, et al.[pdf]
Evaluating Pre-trained Language Models for Repairing API Misuses (2023), arxiv, Zhang, Ting, et al. [pdf]
Prompt Engineering or Fine Tuning: An Empirical Assessment of Large Language Models in Automated Software Engineering Tasks (2023), arxiv, Shin, Jiho, et al. [pdf]
Natural Language to Code: How Far Are We? (2023), FSE'23, Wang, Shangwen, et al. [pdf]
Prompt Tuning in Code Intelligence: An Experimental Evaluation (2023), TSE, Wang, Chaozheng, et al.
Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names? (2023), arxiv, Zhuo, Terry Yue, et al. [pdf]
How are We Detecting Inconsistent Method Names? An Empirical Study from Code Review Perspective (2023), arxiv, Kim, Kisub, et al. [pdf]
Benchmarking Causal Study to Interpret Large Language Models for Source Code (2023), arxiv, Rodriguez-Cardenas, D., et al. [pdf]
On the Impact of Language Selection for Training and Evaluating Programming Language Models (2023), SCAM'23, Katzy, J., et al. [pdf]
What Do Code Models Memorize? An Empirical Study on Large Language Models of Code (2023), arxiv, Yang, Zhou, et al. [pdf]
Are Code Pre-trained Models Powerful to Learn Code Syntax and Semantics? (2023), arxiv, Ma, Wei, et al. [pdf]
Can Transformers Learn to Solve Problems Recursively? (2023), arxiv, Zhang, S. D., et al. [pdf]
CODEIPPROMPT: Intellectual Property Infringement Assessment of Code Language Models (2023), ICML'23, Yu, Zhiyuan, et al. [pdf]
Towards Understanding What Code Language Models Learned (2023), arxiv, Ahmed, Toufique, et al. [pdf]
Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study (2023), arxiv, Li, Yichen, et al. [pdf]
Is this Snippet Written by ChatGPT? An Empirical Study with a CodeBERT-Based Classifier (2023), arxiv, Nguyen, Phuong T., et al. [pdf]
An Empirical Study on the Effectiveness of Noisy Label Learning for Program Understanding (2023), arxiv, Wang, Wenhan, et al. [pdf]
Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Qestions (2023), arxiv, Kabir, Samia, et al. [pdf]
Adaptive Intellect Unleashed: The Feasibility of Knowledge Transfer in Large Language Models (2023), arxiv, Huang, Qing, et al. [pdf]
Can Large Language Models Reason About Program Invariants? (2023), ICML'23, Sutton, Charles, et al.
The Scope of ChatGPT in Software Engineering: A Thorough Investigation (2023), arxiv, Ma, Wei, et al. [pdf]
Evaluating AIGC Detectors on Code Content (2023), arxiv, Wang, Jian, et al. [pdf]
“What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models (2023), CHI'23, Liu, Michael Xieyang, et al. [pdf]
Constructing Effective In-Context Demonstration for Code Intelligence Tasks: An Empirical Study (2023), arxiv, Gao, Shuzheng, et al. [pdf]
Automated Program Repair Based on Code Review: How do Pre-trained Transformer Models Perform? (2023), arxiv, Paul, Rishov, et al. [pdf]
Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data (2023), COMPSAC'23, Feng, Yunhe, et al. [pdf]
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT (2023), arxiv, Yetiştiren, Burak, et al. [pdf]
Is ChatGPT the Ultimate Programming Assistant - How far is it? (2023), arxiv, Tian, Haoye, et al. [pdf]
Study of Distractors in Neural Models of Code (2023), InteNSE'23, Rabin, Md Rafiqul Islam, et al. [pdf]
Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks (2023), arxiv, Pasechnyuk, Dmitry, et al. [pdf]
Boosting Source Code Learning with Data Augmentation: An Empirical Study (2023), arxiv, Dong, Zeming, et al. [pdf]
Source Code Recommender Systems: The Practitioners’ Perspective (2023), arxiv, Ciniselli, Matteo, et al. [pdf]
An Empirical Comparison of Pre-Trained Models of Source Code (2023), arxiv, Niu, Changan, et al. [pdf]
On the Reliability and Explainability of Automated Code Generation Approaches (2023), arxiv, Liu, Yue, et al. [pdf]
On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot (2023), arxiv, Mastropaolo, Antonio, et al. [pdf]
Practitioners’ Expectations on Code Completion (2023), arxiv, Wang, Chaozheng, et al. [pdf]

Older:

Is Self-Attention Powerful to Learn Code Syntax and Semantics? (2022), arxiv, Ma, Wei, et al. [pdf]
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? (2022), arxiv, Döderlein et al. [pdf]
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? (2022), arxiv, Mohammadkhani, Ahmad Haji, et al. [pdf]
How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective (2022), arxiv, Yang, Guang, et al. [pdf]
“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation Tools (2022), arxiv, Cheng, Ruijia, et al. [pdf]
Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs? (2022), ASE'22, Richter, Cedric, et al. [pdf]
Do Pre-trained Language Models Indeed Understand Software Engineering Tasks? (2022), arxiv, Li, Yao, et al. [pdf]
A large-scale empirical study of commit message generation: models, datasets and evaluation (2022), EMSE, Tao, Wei, et al.
Examining Zero-Shot Vulnerability Repair with Large Language Models (2022), IEEE SP, Pearce, H., et al.
Extracting Meaningful Attention on Source Code: An Empirical Study of Developer and Neural Model Code Exploration (2022), arxiv, Paltenghi, M., et al. [pdf]
SimSCOOD: Systematic Analysis of Out-of-Distribution Behavior of Source Code Models (2022), arxiv, Hajipour, H., et al. [pdf]
Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs? (2022), ASE'22, Richter, Cedric, et al. [pdf]
Open Science in Software Engineering: A Study on Deep Learning-Based Vulnerability Detection (2022), TSE, Nong, Yu, et al. [pdf]
A controlled experiment of different code representations for learning-based program repair (2022), EMSE, Namavar, M., et al.
What is it like to program with artificial intelligence? (2022), arxiv, Sarkar, Advait, et al. [pdf]
Security Implications of Large Language Model Code Assistants: A User Study (2022), arxiv, Sandoval, Gustavo, et al. [pdf]
An Empirical Study of Code Smells in Transformer-based Code Generation Techniques (2022), arxiv, Siddiq, M. L. et al. [pdf]
No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence (2022), FSE'22, Wang, Chaozheng, et al. [pdf]
Generating Realistic Vulnerabilities via Neural Code Editing: An Empirical Study (2022), FSE'22, Nong, Yu, et al. [pdf]
GitHub Copilot AI pair programmer: Asset or Liability? (2022), arxiv, Dakhel, Arghavan Moradi, et al. [pdf]
Evaluating the Impact of Source Code Parsers on ML4SE Models (2022), arxiv, Utkin, Ilya, et al [pdf]
An extensive study on pre-trained models for program understanding and generation (2022), ISSTA'22, Zeng, Zhengran, et al.
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code (2022), arxiv, Bareiß, Patrick, et al. [pdf]
Assessing Project-Level Fine-Tuning of ML4SE Models (2022), arxiv, Bogomolov, Egor, et al. [pdf]
On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages (2022), ICPC'22, Chen, Fuxiang, et al. [pdf]
Learning Program Semantics with Code Representations: An Empirical Study (2022), SANER'22, Siow, Jing Kai, et al. [pdf][code]
Assessing Generalizability of CodeBERT (2021), ICSME'21, Zhou, Xin, et al.
Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code (2021), ASE'21, Paltenghi, M. & Pradel, M.
An Empirical Study of Transformers for Source Code (2021), FSE'21, Chirkova, N., & Troshin, S.
An Empirical Study on the Usage of Transformer Models for Code Completion (2021), MSR'21, Ciniselli, Matteo, et al.

Surveys

A Survey on Machine Learning Techniques Applied to Source Code (2024), JSS, Sharma, Tushar, et al. [pdf]
A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends (2024), TOSEM, Zheng, Zibin, et al. [pdf]
A Survey on Large Language Models for Software Engineering (2023), arxiv, Zhang, Quanjun, et al. [pdf]
Large Language Models for Software Engineering: A Systematic Literature Review (2023), arxiv, Hou, Xinyi, et al. [pdf]
When Neural Model Meets NL2Code: A Survey (2023), ACL'23, Zan, Daoguang, et al. [pdf]
Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code (2022), arxiv 2022, Niu, Changan, et al. [pdf]
A Survey of Deep Learning Models for Structural Code Understanding (2022), arxiv 2022, Wu, Ruoting, et al. [pdf]
Deep Learning & Software Engineering: State of Research and Future Directions (2020), arxiv 2020, Devanbu, Prem, et al. [pdf]
A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research (2020), arxiv 2020, Watson, Cody, et al. [pdf]
Machine Learning for Software Engineering: A Systematic Mapping (2020), arxiv 2020, Shafiq, Saad, et al. [pdf]
Synergy between Machine/Deep Learning and Software Engineering: How Far Are We? (2020), arxiv 2020, Wang, Simin, et al. [pdf]
Software Engineering Meets Deep Learning: A Literature Review (2020), arxiv 2020, Ferreira, Fabio, et al. [pdf]
Software Vulnerability Detection Using Deep Neural Networks: A Survey (2020), Proceedings of the IEEE, Lin, Guanjun, et al.
Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges (2020), arxiv 2020, Le, Triet HM, et al. [pdf]
A Survey of Machine Learning for Big Code and Naturalness (2018), ACM Computing Surveys, Allamanis, Miltiadis, et al. [pdf]

Misc

CodeScholar: Growing Idiomatic Code Examples (2024), arxiv, Shetty, Manish et al. [pdf]
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models (2024), arxiv, Pourreza, M., & Rafiei, D. [pdf]
Calibration and Correctness of Language Models for Code (2024), arxiv, Spiess et al. [pdf]
Pix2Code: Learning to Compose Neural Visual Concepts as Programs (2024), arxiv, Wüst, Antonia, et al. [pdf]
Unsupervised Evaluation of Code LLMs with Round-Trip Correctness (2024), arxiv, Allamanis, Miltiadis et al. [pdf]
Can Large Language Models Write Parallel Code? (2024), arxiv, Nichols, Daniel, et al. [pdf]
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP (2024), arxiv, Chen, Le, et al. [pdf]
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking (2024), arxiv, Su, Zian, et al. [pdf]
ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPT (2024), arxiv, Lin, Jiayi, et al. [pdf]
Scaling Laws Behind Code Understanding Model (2024), arxiv, Lin, Jiayi, et al. [pdf]
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation (2024), arxiv, Song, Demin, et al. [pdf]
LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models (2024), arxiv, Liu, Zhijie, et al. [pdf]
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness (2024), arxiv, Singhal, Manav, et al. [pdf]
Importance Guided Data Augmentation for Neural-Based Code Understanding (2024), arxiv, Dong, Zeming, et al. [pdf]
CodeS: Towards Building Open-source Language Models for Text-to-SQL (2024), arxiv, Li, Haoyang, et al. [pdf]
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (2024), arxiv, Yang, Ke, et al. [pdf]
Experimenting a New Programming Practice with LLMs (2024), arxiv, Zhang, Simiao, et al. [pdf]
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching (2024), ICSE'24, Jiang, Ling, et al. [pdf]
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers (2024), arxiv, Shi, Yuling, et al. [pdf]
LILO: Learning Interpretable Libraries by Compressing and Documenting Code (2024), ICLR'24, Grand, Gabriel, et al. [pdf]
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain (2024), ICLR'24, Min, Marcus J., et al. [pdf]
Large Language Models for Test-Free Fault Localization (2024), ICSE'24, Yang, Aidan ZH, et al. [pdf]
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Software Engineering Tasks (2023), arxiv, Zou, Wentao, et al. [pdf]
Lampr: Boosting the Effectiveness of Language-Generic Program Reduction via Large Language Models (2023), arxiv, Zhang, Mengxiao, et al. [pdf]
Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation (2023), EMNLP'23, Chen, Nuo, et al. [pdf]
Nova+: Generative Language Models for Binaries (2023), arxiv, Jiang, Nan, et al. [pdf]
Naturalness of Attention: Revisiting Attention in Code Language Models (2023), arxiv, Saad, M., & Sharma, T. [pdf]
Refactoring Programs Using Large Language Models with Few-Shot Examples (2023), arxiv, Shirafuji, Atsushi, et al. [pdf]
Learning Transfers over Several Programming Languages (2023), arxiv, Baltaji, Razan, et al. [pdf]
RefactorScore: Evaluating Refactor Prone Code (2023), TSE, Jesse et al.
How Well Can Masked Language Models Spot Identifiers That Violate Naming Guidelines? (2023), SCAM'23, Villmow, Johannes, et al. [pdf]
An Explanation Method for Models of Code (2023), OOPSLA'23, Wang, Yu, et al.
Automated Bug Generation in the era of Large Language Models (2023), arxiv, Ibrahimzada, A., et al. [pdf]
Refining Decompiled C Code with Large Language Models (2023), arxiv, Wong, Wai Kin, et al. [pdf]
SUPERSONIC: Learning to Generate Source Code Optimizations in C/C++ (2023), arxiv, Chen, Z. et al. [pdf]
Method-Level Bug Severity Prediction using Source Code Metrics and LLMs (2023), ISSRE'23, Mashhadi, Ehsan, et al. [pdf]
Frustrated with Code Quality Issues? LLMs can Help! (2023), arxiv, Wadhwa, Nalin, et al. [pdf]
Generating Variable Explanations via Zero-shot Prompt Learning (2023), ASE'23, Wang, Chong, et al. [pdf]
Large Language Models for Compiler Optimization (2023), arxiv, Cummins, Chris, et al. [pdf]
Merge Conflict Resolution: Classification or Generation? (2023), ASE'23, Dong, Jinhao, et al. [pdf]
EPICURE: Distilling Sequence Model Predictions into Patterns (2023), arxiv, Allamanis, M., & Barr, E. T. [pdf]
FunProbe: Probing Functions from Binary Code through Probabilistic Analysis (2023), FSE'23, Kim, Soomin, et al. [pdf]
CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models (2023), FSE'23, Sun, Zhensu, et al. [pdf]
Toward Automatically Completing GitHub Workflows (2023), arixv, Mastropaolo, Antonio, et al. [pdf]
CUPID: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection (2023), arxiv, Zhang, Ting, et al. [pdf]
Predicting Dynamic Properties of Heap Allocations using Neural Networks Trained on Static Code (2023), ISMM'23, Navasca, Christian, et al.
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models (2023), ICSE'24, Feng, S., & Chen, C. [pdf]
LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis (2023), arxiv, Xu, Xiangzhe, et al. [pdf]
Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models (2023), arxiv, Mukherjee, M. and Hellendoorn, V.J. [pdf]
Faster sorting algorithms discovered using deep reinforcement learning (2023), Nature, Mankowitz, Daniel J., et al. [pdf]
SELFEVOLVE: A Code Evolution Framework via Large Language Models (2023), arxiv, Jiang, S., et al. [pdf]
The “Code” of Ethics: A Holistic Audit of AI Code Generators (2023), arxiv, Ma, Wanlun, et al. [pdf]
ARIST: An Effective API Argument Recommendation Approach (2023), JSS, Nguyen, Son, et al. [pdf]
A statistical approach for finding property-access errors (2023), arxiv, Arteca, E., et al. [pdf]
A Chain of AI-based Solutions for Resolving FQNs and Fixing Syntax Errors in Partial Code (2023), arxiv, Huang, Qing, et al. [pdf]
Guiding Language Models of Code with Global Context using Monitors (2023), arxiv, Agrawal, Lakshya A., et al. [pdf]
Can Large Language Models Reason about Program Invariants? (2023), ICML'23, Pei, Kexin, et al. [pdf]
LLM4CBI: Taming LLMs to Generate Effective Test Programs for Compiler Bug Isolation (2023), arxiv, Tu, Haoxin, et al. [pdf]
Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis (2023), ISSTA'23, Xu, Xiangzhe, et al. [pdf]
Exploring and Characterizing Large Language Models For Embedded System Development and Debugging (2023), arxiv, Englhardt, Zachary, et al. [pdf]
Explaining Competitive-Level Programming Solutions using LLMs (2023), arxiv, Li, Jierui, et al. [pdf]
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model (2023), EMSE journal, Lan, Jinpeng, et al.
In-IDE Generation-based Information Support with a Large Language Model (2023), arxiv, Nam, Daye, et al. [pdf]
Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering (2023), arxiv, Saberi, Iman, et al. [pdf]
Contrastive Learning for API Aspect Analysis (2023), arxiv, Shahariar, G. M., et al. [pdf]
Fixing Rust Compilation Errors using LLMs (2023), arxiv, Deligiannis, Pantazis, et al. [pdf]
CodeLens: An Interactive Tool for Visualizing Code Representations (2023), arxiv, Guo, Yuejun, et al. [pdf]
Contrastive Learning for API Aspect Analysis (2023), arxiv, Shahariar, G. M., et al. [pdf]
COME: Commit Message Generation with Modification Embedding (2023), ISSTA'23, He, Yichen, et al.
Predicting Bug Fix Time in Students’ Programming with Deep Language Models (2023), EDM'23, Tsabari, Stav, et al. [pdf]
LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning (2023), arxiv, Sun, Tiezhu, et al. [pdf]
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures (2023), arxiv, Palacio, David N., et al. [pdf]
Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback (2023), arxiv, Jain, Abhinav, et al. [pdf]
Evidence of Meaning in Language Models Trained on Programs (2023), arxiv, Jin, C., & Rinard, M. [pdf]
Neural Task Synthesis for Visual Programming (2023), arxiv, Pădurean, V. A., et al. [pdf]
AI for Low-Code for AI (2023), arxiv, Rao, Nikitha, et al. [pdf]
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring (2023), ISSTA'23, Liu, Hao, et al. [pdf]
Towards Tracing Code Provenance with Code Watermarking (2023), arxiv, Li, Wei, et al. [pdf]
SLaDe: A Portable Small Language Model Decompiler for Optimized Assembler (2023), arxiv, Armengol-Estapé, Jordi, et al. [pdf]
Text-to-SQL Error Correction with Language Models of Code (2023), arxiv, Chen, Ziru, et al. [pdf]
Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods (2023), ICSE'23, Nam, Daye, et al. [pdf]
Beryllium: Neural Search for Algorithm Implementations (2023), arxiv, Kulkarni, Adithya, et al. [pdf]
Zero-shot Prompting for Code Complexity Prediction Using GitHub Copilot (2023), arxiv, Siddiq, Mohammed Latif, et al. [pdf]
One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization (2023), arxiv, Wang, Deze, et al. [pdf]
GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching (2023), arxiv, TehraniJamsaz, Ali, et al. [pdf]
Teaching Large Language Models to Self-Debug (2023), arxiv, Chen, Xinyun, et al. [pdf]
Improving Few-shot Prompts with Relevant Static Analysis Products (2023), arxiv, Ahmed, Toufique, et al. [pdf]
Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules (2023), TSE, Kommrusch, Steve, et al.
XCODEEVAL: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval (2023), arxiv, Khan, Mohammad Abdullah Matin, et al. [pdf]
BenchDirect: A Directed Language Model for Compiler Benchmarks (2023), arxiv, Tsimpourlas, Foivos, et al. [pdf]
Creating CREATE queries with multi-task deep neural networks (2023), KBS journal, Diker, S. N., and C. Okan Sakar
Representation Learning for Stack Overflow Posts: How Far are We? (2023), arxiv, He, Junda, et al. [pdf]
Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models (2023), arxiv, Saberi, I., and Fatemeh F. [pdf]
Automating Method Naming with Context-Aware Prompt-Tuning (2023), arxiv, Zhu, Jie, et al. [pdf]
Knowledge Transfer for Pseudo-code Generation from Low Resource Programming Language (2023), arxiv, Sontakke, Ankita, et al. [pdf]
LExecutor: Learning-Guided Execution (2023), arxiv, Souza, B., and M. Pradel [pdf]
Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models (2023), arxiv, Gao, Shuzheng, et al. [pdf]
CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models (2023), arxiv, Niu, Changan, et al. [pdf]
On the Applicability of Language Models to Block-Based Programs (2023), arxiv, Niu, Changan, et al. [pdf]
AttSum: A Deep Attention-Based Summarization Model for Bug Report Title Generation (2023), IEEE TOR, Ma, Xiaoxue, et al.
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code (2023), arxiv, Zhou, Shuyan, et al. [pdf]
VULGEN: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning (2023), ICSE'23, Nong, Yu, et al. [pdf]
When to Say What: Learning to Find Condition-Message Inconsistencies (2023), ICSE'23, B., Islem, and M. Pradel. [pdf]
Automated Summarization of Stack Overflow Posts (2023), ICSE'23, Kou, Bonan, et al. [pdf]
Learning Graph-based Code Representations for Source-level Functional Similarity Detection (2023), arxiv, Liu, Jiahao, et al. [pdf]
Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning (2023), ICSE'23, Nashid, Noor, et al. [pdf]
API Entity and Relation Joint Extraction from Text via Dynamic Prompt-tuned Language Model (2023), arxiv, Huang, Qing, et al [pdf]
FLAME: A small language model for spreadsheet formulas (2023), arxiv, Joshi, Harshit, et al. [pdf]
Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning (2023), IEEE SP, Zhu, Wenyu, et al.
Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge (2023), arxiv, Yang, Shouguo, et al. [pdf]
Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries (2023), SANER23, Al-Kaswan, Ali, et al. [pdf]
CFG2VEC: Hierarchical Graph Neural Network for Cross-Architectural Software Reverse Engineering (2023), arxiv, Yu, Shih-Yuan, et al. [pdf]
Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models (2023), ICSE'23, Ahmed, Toufique, et al. [pdf]

Older:

Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5 (2022), arxiv, Bui, Nghi DQ, et al. [pdf]
Unleashing the power of pseudo-code for binary code similarity analysis (2022), Cybersecurity journal, Zhang, Weiwei, et al.
Reinforcement Learning assisted Loop Distribution for Locality and Vectorization (2022), Jain, Shalini, et al. [pdf]
Learning to Parallelize Source Code via OpenMP with Transformers (2022), Harel, Re’em, et al. [pdf]
Codex Hacks HackerRank: Memorization Issues and a Framework for Code Synthesis Evaluation (2022), arxiv, Karmakar, Anjan, et al. [pdf]
BCGen: a comment generation method for bytecode (2022), ASE, Huang, Yuan, et al.
Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation (2022), arxiv, Mahbub, Parvez, et al. [pdf]
Neural Language Models for Code Quality Identification (2022), arxiv, Sengamedu, S., et al.
Detecting Security Patches in Java Projects Using NLP Technology (2022), ICNLSP'22, Stefanoni, Andrea, et al. [pdf]
Program Merge Conflict Resolution via Neural Transformers (2022), FSE'22, Svyatkovskiy, Alexey, et al.
Teaching Algorithmic Reasoning via In-context Learning (2022), arxiv, Zhou, Hattie, et al [pdf]
Improved Evaluation of Automatic Source Code Summarisation (2022), arxiv, Phillips, Jesse, et al. [pdf]
Towards Generalizable and Robust Text-to-SQL Parsing (2022), arxiv, Gao, Chang, et al. [pdf]
CodeEditor: Learning to Edit Source Code with Pre-trained Models (2022), arxiv, Li, Jia, et al. [pdf]
Poison Attack and Defense on Deep Source Code Processing Models (2022), arxiv, Li, Jia, et al. [pdf]
NEUDEP: Neural Binary Memory Dependence Analysis (2022), FSE'22, Pei, Kexin, et al. [pdf]
Novice Type Error Diagnosis with Natural Language Models (2022), arxiv, Geng, Chuqin, et al. [pdf]
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure (2022), arxiv, Chen, Nuo, et al. [pdf]
Using Large Language Models to Enhance Programming Error Messages (2022), SIGCSE'22, Leinonen, J., et al. [pdf]
So Much in So Little: Creating Lightweight Embeddings of Python Libraries (2022), arxiv, Golubev, Yaroslav, et al. [pdf]
Code Compliance Assessment as a Learning Problem (2022), arxiv, Sawant, N., and S. H. Sengamedu [pdf]
Learning to Answer Semantic Queries over Code (2022), arxiv, Sahu, Surya Prakash, et al. [pdf]
XFL: Naming Functions in Binaries with Extreme Multi-label Learning (2022), arxiv, Patrick-Evans, J., et al. [pdf]
SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings (2022), Jin, Xin, et al. [pdf]
Out of the BLEU: how should we assess quality of the Code Generation models? (2022), arxiv, Evtikhiev, Mikhail, et al. [pdf]
Compressing Pre-trained Models of Code into 3 MB (2022), arxiv, Shi, Jieke, et al. [pdf]
A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages (2022), arxiv, Cassano, Federico, et al. [pdf]
Overwatch: Learning Patterns in Code Edit Sequences (2022), arxiv, Zhang, Yuhao, et al. [pdf]
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing (2022), KDD'22, Wang, Lihan, et al. [pdf]
DIRE and its Data: Neural Decompiled Variable Renamings with Respect to Software Class (2022), TOSEM, Dramko, Luke, et al.
Making Python Code Idiomatic by Automatic Refactoring Non-Idiomatic Python Code with Pythonic Idioms (2022), arxiv, Zhang, Zejun, et al. [pdf]
DeepPERF: A Deep Learning-Based Approach For Improving Software Performance (2022), arxiv, Garg, Spandan, et al. [pdf]
CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code (2022), ICSE ’22 Companion, Eghbali, Aryaz, and Michael, P. [pdf]
Impact of Evaluation Methodologies on Code Summarization (2022), ACL, Nie, Pengyu, et al. [pdf]
XDA: Accurate, Robust Disassembly with Transfer Learning (2021), NDSS'21, Pei, Kexin, et al. [pdf][code]

PhD Theses

Beyond Natural Language Processing: Advancing Software Engineering Tasks through Code Structure (2024), Zishuo Ding, [pdf]
Analyzing and Securing Software via Robust and Generalizable Learning (2023), Kexin Pei [pdf]
Deep Language Models for Software Testing and Optimisation (2023), Foivos Tsimpourlas [pdf]
Improving Programming Productivity with Statistical Models (2022), Tam Nguyen [pdf]
Learning to Find Bugs in Programs and their Documentation (2021), Andrew Habib [pdf]
Machine Learning and the Science of Software Engineering (2020), Vincent Hellendoorn
Deep learning for compilers (2020), Christopher E. Cummins [pdf]
Deep Learning in Software Engineering (2020), Cody Watson [pdf]
Learning Code Transformations via Neural Machine Translation (2019), Michele Tufano [pdf]
Improving the Usability of Static Analysis Tools Using Machine Learning (2019), Ugur Koc [pdf]
Learning Natural Coding Conventions (2016), Miltiadis Allamanis [pdf]

Talks

Machine Learning for Software Engineering: AMA, MSR 2020 [video]
Understanding Source Code with Deep Learning, FOSDEM 2019 [video]

Datasets

TACO - Topics in Algorithmic Code generation dataset
GitBug-Java - A Reproducible Benchmark of Recent Java Bugs
Archer - A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning
CodeLL - A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
CRUXEval - A Benchmark for Code Reasoning, Understanding and Execution
CodeComplex - A Time-Complexity Dataset for Bilingual Source Codes
BugsPHP - A dataset for Automated Program Repair in PHP
GenCodeSearchNet - A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding
CrossCodeEval - A Diverse and Multilingual Benchmark for Cross-File Code Completion
SWE-bench - An evaluation framework including software engineering problems drawn from real GitHub issues
CodeTransOcean - A Comprehensive Multilingual Benchmark for Code Translation
BioCoder - A benchmark for bioinformatics code generation with contextual pragmatic knowledge
VulBench - A benchmark of vulnerability detection with annotations for each vulnerable function detailing the vulnerability type and its root
cause
StudentEval - A Benchmark of Student-Written Prompts for Large Language Models of Code
PySecDB - Exploring Security Commits in Python
DiverseVul - A Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
RunBugRun - An Executable Dataset for Automated Program Repair
ODEX - An open-domain execution-based natural language (NL) to code generation dataset
PI-Link - A Ground-Truth Dataset of Links Between Pull-Requests and Issues in GitHub
ml-Codesmell - A code smell prediction dataset for machine
learning approaches
JEMMA - An Extensible Java Dataset for ML4Code
Applications
CS1QA (2022) - A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course
XLCoST (2022) - A Benchmark Dataset for Cross-lingual Code Intelligence
CodeS (2022) - CodeS: A Distribution Shift Benchmark Dataset for
Source Code Learning
methods2test (2022) - A supervised dataset consisting of Test Cases and their corresponding Focal Methods from a set of Java repositories
ManyTypes4TypeScript (2022) - Type prediction dataset for TypeScript
HumanEval - Program synthesis from code comments
HumanEval+ - Agumented HumanEval with sufficient tests and corrected reference solutions
GitHub Code (2022) - 115M LoC in 32 programming languages
D2A (2021) - A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis
CodeXGLUE (2021)
ogbg-code2 (2021)
ManyTypes4Py (2021) - Type prediction dataset for Python
CodeSearchNet (2020)
ManySStuBs4J (2019)
150k Python Dataset (2016)
150k Javascript Dataset (2016)
GitHub Java Corpus (2013)

Tools

Source Code Analysis & Processing

COMEX - A Tool for Generating Customized Source Code Representations
LibSA4Py - LibSA4Py: Light-weight static analysis for extracting type hints and features
LibCST - A concrete syntax tree parser library for Python
python-graphs - A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.
Semantic - Parsing, analyzing, and comparing source code across many languages
GraphGen4Code - A toolkit for creating code knowledge graphs based on WALA code analysis and extraction of documentation
Joern - Code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs
NaturalCC - An Open-Source Toolkit for Code Intelligence
Scalpel - The Python Static Analysis Framework
WALA - T.J. Watson Libraries for Analysis, with frontends for Java, Android, and JavaScript
CodeGen - General toolkit to apply machine learning to code, from dataset creation to model training and evaluation (from Facebook AI Research)
PyCG - PyCG employs static analysis to generate call graphs for Python code
HeaderGen - HeaderGen improves PyCG's call graph analysis by supporting external libraries and flow-sensitivity

Machine Learning

CodeTF - One-stop Transformer Library for State-of-the-art Code LLM
SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation
Hugging Face - Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Code de-duplication

CD4Py - Code De-Duplication for Python
Near-duplicate Source Code Detector

Misc

Utilities by the DPU team of Microsoft
A set of tools to work with Big Code - Fetching GitHub repos, tokenizers, embeddings and etc
cloc - Counts blank lines, comment lines, and physical lines of source code in many programming languages.

Research Groups

Software Engineering Research Group (SERG), Delft University of Technology
Secure, Reliable, and Intelligent Systems Lab (SRI), ETH Zurich
Software Lab (SOLA), University of Stuttgart
Machine Learning for the Analysis of Source Code Text (MAST), Edinburgh University
Deep Program Understanding, Microsoft Research
DECAL (Davis Excellent/Eclectic/Extreme Computational Analytics Lab), UC Davis
JetBrains Research
SMart software Analysis and Trustworthy computing Lab (SMAT), Monash University

Tags: ai reading

Last modified 15 January 2026

Machine Learning

Links and notes on the topic.

Reading

Articles, Blogs, Essays

"MLOps"

Papers

Type Inference

Older papers

Code Completion

Older

Code Generation

Older

Code Summarization

Older

Code Embeddings/Representation

Older:

Code Changes/Editing

Code Comments

Bug/Vulnerability Detection

Older

Source Code Modeling

Program Repair

Older:

Program Translation

Program Analysis

Software Testing

Code Clone Detection

Code Search

Code Language Models

Code Review

Code Documentation

Empirical Studies

Surveys

Misc

PhD Theses

Talks

Datasets

Tools

Source Code Analysis & Processing

Machine Learning

Code de-duplication

Misc

Research Groups