Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false. Read more

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

portfolio

publications

Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph

Paper

Published in ICSME 2018, 2018

We construct an API caveats knowledge graph for Android APIs from the API documentation on the Android Developers website. We study the abundance of different subcategories of API caveats and use a sampling method to manually evaluate the quality of the API caveats knowledge graph. We also conduct a user study to validate whether and how the API caveats knowledge graph may improve the accessibility of API caveats in API documentation. ICSME 2018 IEEE TCSE Distinguished Paper Awards. Read more

Recommended citation: Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, Xuejiao Zhao: Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME 2018)

Automatic Generation of API Documentations for Open-Source Projects

Paper

Published in ICSME 2018 (Workshop), 2018

Open-source projects often have only incomplete and insufficient API documentations. To improve the efficiency of development and ensure the correctness of API usage, it is desired that the developers can be supported with automatically generated documentation based on a combination of knowledge from different sources. In this paper, we describe OpenAPIDocGen, a system that can automatically generate API Documentations for open-source projects, including an overview of the system and the data sources and techniques used to generate different parts of the documentation. Read more

Recommended citation: Xin Peng, Yifan Zhao, Mingwei Liu, Fengyi Zhang, Yang Liu, Xin Wang, Zhenchang Xing: Automatic Generation of API Documentations for Open-Source Projects. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME 2018 Workshop)

Searching StackOverflow Questions with Multi-Faceted Categorization

Paper

Published in Internetware 2018, 2018

We propose a multi-faceted and interactive approach for searching StackOverflow questions (called MFISSO), which leverages these attributes of the questions. Read more

Recommended citation: Mingwei Liu, Xin Peng, Qingtao Jiang, Andrian Marcus, Junwen Yang, Wenyun Zhao: Searching StackOverflow Questions with Multi-Faceted Categorization. Internetware 2018: 10:1-10:10

A learning-based approach for automatic construction of domain glossary from source code and documentation

Paper

Published in ESEC/FSE 2019, 2019

In this paper, we propose a learning-based approach for automatic construction of domain glossary from source code and software documentation. The approach uses a set of high-quality seed terms identified from code identifiers and natural language concept definitions to train a domain-specific prediction model to recognize glossary terms based on the lexical and semantic context of the sentences mentioning domain-specific concepts. Read more

Recommended citation: Chong Wang, Xin Peng, Mingwei Liu, Zhenchang Xing, Xuefang Bai, Bing Xie, Tuo Wang: A learning-based approach for automatic construction of domain glossary from source code and documentation. ESEC/SIGSOFT FSE 2019: 97-108

Generating query-specific class API summaries

Paper

Published in ESEC/FSE 2019, 2019

We propose an approach for generating on-demand, extrinsic hybrid summaries for API classes, relevant to a programming task, formulated as a natural language query. The summaries include the most relevant sentences extracted from the API reference documentation and the most relevant methods. Read more

Recommended citation: Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, Yang Liu: Generating query-specific class API summaries. ESEC/SIGSOFT FSE 2019: 120-130

API Method Recommendation via Explicit Matching of Functionality Verb Phrases

Paper

Published in ESEC/FSE 2020, 2020

We identified 356 different functionality verbs from the descriptions, which were grouped into 87 functionality categories, and we extracted 523 phrase patterns from the verb phrases of the descriptions. Building on these findings, we propose an API method recommendation approach based on explicit matching of functionality verb phrases in functionality descriptions and user queries, called PreMA. Read more

Recommended citation: Wenkai Xie, Xin Peng, Mingwei Liu, Christoph Treude, Zhenchang Xing, Xiaoxin Zhang, Wenyun Zhao: API Method Recommendation via Explicit Matching of Functionality Verb Phrases. Proceedings of the 2020 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020)

Generating Concept based API Element Comparison Using a Knowledge Graph

Paper

Published in ASE 2020, 2020

we propose a knowledge graph based approach APIComp that automatically extracts API knowledge from API reference documentation to support the comparison of a pair of API classes or methods from different aspects. Read more

Recommended citation: Yang Liu, Mingwei Liu, Xin Peng, Zhenchang Xing, and Xiaoxin Zhang: Generating Concept based API Element Comparison Using a Knowledge Graph. 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020)

Source Code based On-demand Class Documentation Generation

Paper

Published in ICSME 2020 (Workshop), 2020

In this paper, we present OpenAPIDocGen2, a tool that generates on-demand class documentation based on source code and documentation analysis. For a given class, OpenAPIDocGen2 generates a combined documentation for it, which includes functionality descriptions, directives, domain concepts, usage examples, class/method roles, key methods, relevant classes/methods, characteristics and concepts classification, and usage scenarios. Read more

Recommended citation: Mingwei Liu, Xin Peng, Xiujie Meng, Huanjun Xu, Shuangshuang Xing, Xin Wang, Yang Liu, Gang Lv: Source Code based On-demand Class Documentation Generation. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME 2020 Workshop)

Learning based and Context Aware Non-Informative Comment Detection

Paper

Published in ICSME 2020 (Workshop), 2020

This report introduces the approach that we have designed and implemented for the DeClutter challenge of DocGen2, which detects non-informative code comments. The approach combines both comment based text classification and code context based prediction. Based on the approach, our “fduse” team achieved the best F1 score (0.847) in the competition. Read more

Recommended citation: Mingwei Liu, Yanjun Yang, Xin Peng, Chong Wang, Chengyuan Zhao, Xin Wang, Shuangshuang Xing: Learning based and Context Aware Non-Informative Comment Detection. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME 2020 Workshop)

Learning-Based Extraction of First-Order Logic Representations of API Directives

Paper

Published in ESEC/FSE 2021, 2021

In this paper, we propose LeadFOL, a learning based approach for extracting first-order logic representations of API directives (FOL directives for short). Read more

Recommended citation: Mingwei Liu, Xin Peng, Andrian Marcus, Christoph Treude, Xuefang Bai, Gang Lyu, Jiazhan Xie, Xiaoxin Zhang: Learning-Based Extraction of First-Order Logic Representations of API Directives. Proceedings of the 2021 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021)

Automatic Code Semantic Tag Generation based on Software Knowledge Graph

Paper

Published in Journal of Software 2021, 2021

Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through information retrieval based code search techniques. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the problem, this paper proposes a software knowledge graph based approach (called KGCodeTagger) that automatically generates semenatic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. Read more

Recommended citation: Shuangshuang Xing, Mingwei Liu, Xin Peng: Automatic Code Semantic Tag Generation based on Software Knowledge Graph. Journal of Software, 2021 (in Chinese)

API-Related Developer Information Needs in Stack Overflow

Paper

Published in IEEE Transactions on Software Engineering 2021, 2021

Stack Overflow (SO) provides informal documentation for APIs in response to questions that express API related developer needs. Navigating the information available on SO and getting information related to a particular API and need is challenging due to the vast amount of questions and answers and the tag-driven structure of SO. In this paper we focus on identifying and classifying fine-grained developer needs expressed in sentences of API-related SO questions, as well as the specific information types used to express such needs, and the different roles APIs play in these questions and their answers. We derive a taxonomy, complementing existing ones, through an empirical study of 266 SO posts. We then develop and evaluate an approach for the automated identification of the fine-grained developer needs in SO threads, which takes a thread as input and outputs the corresponding developer needs, the types of information expressing them, and the roles of API elements relevant to the needs. Read more

Recommended citation: Mingwei Liu, Xin Peng, Andrian Marcus, Shuangshuang Xing, Christoph Treude, and Chengyuan Zhao: API-Related Developer Information Needs in Stack Overflow. IEEE Transactions on Software Engineering (TSE) 2021 January

How to Formulate Specific How-To Questions in Software Development?

Paper

Published in ESEC/FSE 2022, 2022

We propose an approach (TaskKG4Q) that interactively helps developers formulate a programming related how-to question. TaskKG4Q is using a programming task knowledge graph (task KG in short) mined from Stack Overflow questions, which provides a hierarchical conceptual structure for tasks in terms of [actions], [objects], and [constraints]. Read more

Recommended citation: Mingwei Liu, Xin Peng, Andrian Marcus, Christoph Treude, Jiazhan Xie, Huanjun Xu, Yanjun Yang: How to Formulate Specific How-To Questions in Software Development?. Proceedings of the 2022 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022)

XCoS: Explainable Code Search based on Query Scoping and Knowledge Graph

Paper

Published in ACM Transactions on Software Engineering and Methodology 2023, 2023

In this paper, we conduct a developer survey to better understand and address these issues and induct some insights from the survey results. Based on the insights, we propose XCoS, an explainable code search approach based on query scoping and knowledge graph. XCoS extracts a background knowledge graph from general knowledge bases like Wikidata and Wikipedia. Given a code search query, XCoS identiies diferent parts (i.e., functionalities, functional constraints, nonfunctional constraints) from it and use the expressions of functionalities and functional constraints to search the codebase. It then links both the query and the candidate code snippets to the concepts in the background knowledge graph and generates explanations based on the association paths between these two parts of concepts together with relevant descriptions. XCoS uses an interactive user interface that allows the user to better understand the associations between candidate code snippets and the query from diferent aspects and choose the desired results. Read more

Recommended citation: Chong Wang, Xin Peng, Zhencheng Xing, Yue Zhang, Mingwei Liu, Rong Luo, Xiujie Meng: XCoS: Explainable Code Search based on Query Scoping and Knowledge Graph. ACM Transactions on Software Engineering and Methodology, 2023

Research on Knowledge Graph Representation Learning Methods for Link Prediction: A Review

Paper

Published in Journal of Software 2023, 2023

In this paper, we introduce the latest research progress of link prediction-oriented knowledge graph representation learning methods systematically from the basic concepts of link prediction and representation learning. In particular, the research progress is discussed in detail in terms of knowledge representation and algorithmic modeling. The development of the knowledge representation is used as a clue to introduce the mathematical modelling of link prediction tasks in the form of binary, multi-relational and hyper-relational knowledge representations respectively. Based on the representation learning modelling approach, the existing methods are refined into four types of models: translational distance models, tensor decomposition models, traditional deep learning models and graph neural network models, and the implementation of each type of model is described in detail together with representative models for solving link prediction tasks with different relational metrics. Based on the presentation of common datasets and criteria for link prediction, the results of four types of knowledge representation learning models for link prediction tasks with three types of knowledge representations are presented in a comparative analysis. Finally, the future development trends are presented in terms of model optimization, knowledge representation and problem scope. Read more

Recommended citation: Xueying Du, Mingwei Liu, Liwei Sheng, Xin Peng: Research on Knowledge Graph Representation Learning Methods for Link Prediction: A Review. Journal of Software, 2023 (in Chinese)

Task-Oriented ML/DL Library Recommendation based on a Knowledge Graph

Paper

Published in IEEE Transactions on Software Engineering 2023, 2023

In this paper, we conduct an empirical study on ML/DL library seeking questions on Stack Overflow to understand the developers’ requirements for ML/DL libraries. Based on the findings of the study, we propose a task-oriented ML/DL library recommendation approach, called MLTaskKG. It constructs a knowledge graph that captures AI tasks, ML/DL models, model implementations, repositories, and their relationships by extracting knowledge from different sources such as ML/DL resource websites, papers, ML/DL frameworks, and repositories. Based on the knowledge graph, MLTaskKG recommends ML/DL libraries for developers by matching their requirements on tasks, model characteristics, and implementation information. Read more

Recommended citation: Mingwei Liu, Chengyuan Zhao, Xin Peng, Simin Yu, Haofen Wang, and Chaofeng Sha: Task-Oriented ML/DL Library Recommendation based on a Knowledge Graph. IEEE Transactions on Software Engineering (TSE) 2023

Knowledge Graph based Explainable Question Retrieval for Programming Tasks

Paper

Published in ICSME 2023, 2023

To bridge the knowledge gap and enhance the performance of question retrieval, it constructs a software development related concept knowledge graph and trains a question relevance prediction model to re-rank the candidate questions. The model is trained based on a combined sentence representation of BERT-based sentence embedding and graph-based concept embedding. To help understand the relevance of the returned Stack Overflow questions, KGXQR further generates explanations based on the association paths between the concepts involved in the query and the Stack Overflow questions. Read more

Recommended citation: Mingwei Liu, Simin Yu, Xin Peng, Xueying Du, Tianyong Yang, Huanjun Xu, Gaoyang Zhang: Knowledge Graph based Explainable Question Retrieval for Programming Tasks. In 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023)

CodeGen4Libs: A Two-stage Approach for Library-oriented Code Generation

Paper

Published in ASE 2023, 2023

We propose a novel library-oriented code generation technique, CodeGen4Libs, which incorporates two stages: import generation and code generation. The import generation stage generates import statements for the natural language query with the given third-party libraries, while the code generation stage generates concrete code based on the generated imports and the query. To evaluate the effectiveness of our approach, we conduct extensive experiments on a dataset of 403,780 data items. Our results demonstrate that CodeGen4Libs outperforms baseline models in both import generation and code generation stages, achieving improvements of up to 97.4% on EM (Exact Match), 54.5% on BLEU, and 53.5% on Hit@All. Read more

Recommended citation: Mingwei Liu, Tianyong Yang, Yiling Lou, Xueying Du, Ying Wang, and Xin Peng: CodeGen4Libs: A Two-stage Approach for Library-oriented Code Generation. 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

Recommending Analogical APIs via Knowledge Graph Embedding

Paper

Published in FSE 2023, 2023

In this study, we present KGE4AR, a novel documentation-based approach using knowledge graph (KG) embedding for recommending analogical APIs during library migration. KGE4AR introduces a unified API KG to comprehensively represent documentation knowledge, capturing high-level semantics. It further embeds this unified API KG into vectors for efficient, scalable similarity calculation. We assess KGE4AR with 35,773 Java libraries in two scenarios, with and without target libraries. KGE4AR notably outperforms state-of-the-art techniques (e.g., 47.1%-143.0% and 11.7%-80.6% MRR improvements), showcasing scalability with growing library counts. Read more

Recommended citation: Mingwei Liu, Yanjun Yang, Yiling Lou, Xin Peng, Zhong Zhou, Xueying Du, Tianyong Yang: Recommending Analogical APIs via Knowledge Graph Embedding. Proceedings of the 2023 31th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023)

KG4CraSolver: Recommending Crash Solutions via Knowledge Graph

Paper

Published in FSE 2023, 2023

In this work, we propose a novel crash solution knowledge graph (KG) to summarize the complete crash context and its solution with a graph-structured representation. Read more

Recommended citation: Xueying Du, Yiling Lou, Mingwei Liu*, Xin Peng, Tianyong Yang: KG4CraSolver: Recommending Crash Solutions via Knowledge Graph. Proceedings of the 2023 31th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023)

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

Paper

Published in arxiv 2023, 2023

In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Read more

Recommended citation: Xueying Du, Mingwei Liu*, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou: ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation (Arxiv 2023)

talks