Abstract:Aiming at the problems of unreliable commit message caused by developers not consistently recording refactoring operations, and language singularityin deep learning-based refactoring detection methods, a cross-language refactoring detection method RefCode was proposed. Firstly, refactoring collection tools were employed to collect commit messages, code change information, and refactoring types from different programming languages, the edit sequences were generated from the code change information, and all the data were combined to create a dataset. Secondly, the CodeBERT pre-training model was combined with the BiLSTM-attention model to train and test on the dataset. Finally, the effectiveness of the proposed method was evaluated from six perspectives. The results show that RefCode achieves a significant improvement of about 50% in both precision and recall compared to the refactoring detection method which only uses commit messages as inputs to the LSTM model. The research results realize cross-language refactoring detection and effectively compensate for the defect of unreliable commit messages, which provides some reference for the detection of other programming languages and refactoring types.