Abstract:Aiming at the lack of research on the performance of deep learning models in predicting software refactoring in the current field, a deep learning-based software refactoring prediction evaluation method was proposed to assess the refactoring predictive performance of these models. Firstly, refactoring and non-refactoring labeled instances were collected from 303 Java projects using static analysis tools, and seven datasets comprising source code metrics were constructed for seven refactoring operations: extracting class, extracting subclass, extracting super class, extracting interface, moving class, renaming class, and moving and renaming class. Secondly, convolutional neural network (CNN), long short-term memory (LSTM) network, gated recurrent unit (GRU) model, multilayer perceptron(MLP), and autoencoder(AE) were trained and tested on the datasets. Finally, each model was evaluated based on accuracy, precision, recall, and F1-measure.The results show that the average accuracy, precision, recall, and F1-measure of the five deep learning models for predicting refactoring are all above 93%, with the highest accuracy in predicting the extract subclass refactoring, and the CNN model has a higher average accuracy compared to other models. The CNN model is efficient for software refactoring prediction evaluation, which provides reference for future utilization of deep learning models in assisting with completing refactoring recommendation tasks.