Analyzing and predicting the learning behavior data of students in blended teaching can provide reference basis for teaching. Aiming at weak generalization ability of existing algorithm models in performance prediction, a BP neural network is introduced to classify and predict the grades of students in the blended teaching. L2 regularization term is added to construct the BP neural network model in order to reduce the risk of overfitting. Combined with Pearson coefficient, effective feature data are selected as the validation dataset of the model by mining the data of Chao-Xing platform. The performance of common machine learning algorithms and the BP neural network are compared on the dataset. Experiments show that BP neural network model has stronger generalizability than common machine learning models. The BP neural network with L2 regularization has better fitting ability than the original BP neural network model. It achieves better performance with improved accuracy.

The wide application of Chao-Xing, Jreenity and other high-quality online wisdom teaching platforms in the process of university teaching has greatly promoted the development of blended teaching from concept to realities [

By mining and analyzing learning behavior data, the hidden rules can be exposed, with which the grades of students can be classified and predicted. Then the students with learning difficulties can be found and intervened timely, which is helpful to improve the teaching quality [

In order to improve the generalizability of the model, a single-layer hidden layer is used which selects a sigmoid function as the activation function of the hidden layer and a Relu function is used as the activation function of the output layer. A three-layer BP neural network model is constructed. L2 regularization term is added to reduce the risk of overfitting of the neural network. To verify the generalizability of BP neural network model, Bayesian, AdaBoost, and Logistic Regression are used to compare with BP neural network models based on the dataset. Accuracy, recall, precision and F1 score are selected as evaluation indexes of the model. The fitting effect of the model is proved by comparing the accuracy curve and loss curve of BP neural network before and after adding L2 regularization.

The paper is organized as follows: Dataset introduction and data pre-processing are first outlined in Section Feature Engineering. The Implementation of Neural Network Prediction Model is described afterwards, followed by the Experiment and Result Analysis used to evaluate predictive models. A Summary and Future Work is described in Section 5. This section includes the work and significance of this paper and possible future lines of research.

In this section, construction procedure of feature engineering is introduced, including data pre-processing, feature analysis and selection. Finally, all the features used in model training and estimation are presented.

The dataset is collected from the Chao-Xing platform and educational administration system. There are 229 students’ instances, including nearly 60000 online learning records and 229 offline performance records. The student features in the dataset have 16 dimensions, which are divided into two categories, namely basic information features and learning behavior features of students. The description of the student features is shown in

Serial number | Feature | Data type | Feature meaning |
---|---|---|---|

1 | student_id | varchar | Student ID |

2 | gender | varchar | Gender of students |

3 | faculty | varchar | Faculty |

4 | class | varchar | Class of students |

5 | semester | varchar | Semester |

Serial number | Feature | Data type | Feature meaning |
---|---|---|---|

1 | study_count | int | Times logging in to platform to learn the course content |

2 | video | double | Total duration of viewing course videos |

3 | discussion | int | Number of discussions |

4 | chapter_quiz | double | The grade of chapter test |

5 | answer_mark | double | Rush answer score |

6 | group_task | double | The grade of grouping tasks |

7 | course_point | double | Course points |

8 | class_test | double | Score for taking the quiz |

9 | homework | double | Grade of homework after class |

10 | midterm | double | Mid-term examination results |

11 | final_mark | double | Final examination results |

Due to the diversity of blended learning feature data, it is necessary to pre-process the data according to the features of data and algorithm models. Data pre-processing mainly includes the normalization of feature data which under different assessment standards and the discretization of continuous data.

Different classes have different task points, learning resources and classroom activities in the course of Chao-Xing learning, which results in different ranges of feature values. It is necessary to use the interval scaling method to map the data to the same interval. The interval scaling method is a way of normalization. The interval length of the data is calculated by the two extreme values (maximum and minimum) of the feature data, and the data is scaled on the entire interval in a certain proportion. The scale equation is as follows.

In

The type of data in the dataset mainly include varchar, int and double. The prediction algorithm used in this paper is implemented by sklearn and other libraries. Sklearn library requires that the data type input into the model must be numerical data. In order to ensure the consistency of the data type, the continuous feature data needs to be discretized by the box operation. Therefore, function cut() in the pandas library is utilized to divide and discretize the int-type and double-type data features in

Feature selection is usually required before using model to predict the target. Through feature selection, we can remove irrelevant features, reduce computational complexity and improve the interpretability of the model [

The correlation coefficients between semester, gender, faculty, class and target feature fianl_mark are less than 0.2, which means the correlation is weak. The correlation coefficients between semester, faculty and target features fianl_mark are 0, and which indicates that there is no correlation. The features of correlation coefficient greater than 0.4 with final_mark are midterm, study_count, group_task. The highest correlation coefficients with the target feature fianl_mark is midterm, which is 0.46. The correlation between other 7 features, including answer_mark, chapter_quiz etc., and target feature final_ mark is relatively average. Therefore, the useless features, such as semester and faculty with correlation of 0, are deleted. While midterm, group_ task, class_ test and other 13 relatively important features are retained.

BP neural network is an algorithm with error back propagation, which is mainly composed of input layer, output layer and several hidden layers. It can approach any continuous function with arbitrary accuracy through continuous iteration. And it is often used in function approximation, classification and pattern recognition. The basic principle of BP neural network is to update the weight and approximate the real value through the forward propagation of signal and the back propagation of errors.

Any function in a closed interval can be approximated by a BP neural network with a hidden layer. Therefore, a three-layer BP neural network with input layer, hidden layer and output layer can map any n-dimension to m-dimension [

The input layer and output layer of neural network are determined by basic features and target features of training and test. Hidden layer is needed when the data need nonlinear separation. The number of hidden layers significantly affects the performance of neural network [

The number of neurons in the input layer is decided by the number of input variables in the dataset. The number of neurons in the output layer is the number of output variables. There are many factors that can reflect the learning status of students and affect the final academic performance of students. By analyzing learning behavior data of students, 13 relatively important features, such as midterm, group_task, class_test and so on are selected as the input of the model. According to the number of features selected, the input layer of this model is set with 13 neurons. The final scores of students are divided into two categories pass and failed. Therefore, the number of neurons in the output layer is 2.

The number of hidden layers and neurons determines the accuracy of neural network. The number of neurons in hidden layer depends on the number of neurons in input layer and output layer. A small number of neurons in the hidden layer will cause the model with poor relationship and susceptible to flat. However, if too many neurons are used, it will also lead to some problems. Firstly, it will increase the number of nodes in the neural network. If the amount of information in the training set is not sufficient to train all neurons in the hidden layer, the model will overfitting. Secondly, even the data information in the training set is sufficient, too many nodes in the hidden layer will increase the training time of the neural network. Then it is difficult to achieve the expected performance. Therefore, it is important to select the appropriate number of hidden layer neurons, which is generally selected according to experience. In this paper, the number of hidden layer neurons is selected according to

The activation function adds nonlinear factors to neurons. Therefore, the output of each layer in the neural network can undertake the nonlinear transformation of the input function of the previous layer. Activation functions mainly include sigmoid and Relu. The sigmoid function, also known as logistic function, is used for the output of hidden layer, which maps a real number to the range of (0, 1). The equation of sigmoid function is as follows.

The Relu function is a piecewise linear function that changes all negative values to 0, while positive values remain unchanged. It can more efficiently reduce the gradient and back propagation, which effectively avoid the problems of gradient explosion and gradient disappearance. The Relu function is as follows.

In our model, sigmoid function is used as the activation function of the hidden layer, and Relu function is selected as the activation function of the output layer.

In supervised learning, the gap between the predicted value and the real value is usually measured by the loss function. The parameters of the prediction model are solved iteratively to minimize the loss function. In the experiments, BCE loss function is adopted as the evaluation function of the algorithm. It is the average value of the calculation vector when calculating the loss value. The definition of BCE loss function as following.

When overfitting occurs, the algorithm function curve often bends violently, so the curvature of the algorithm function will be very high at a local position. The curvature of the function is a linear or nonlinear combination of function parameters. In order to reduce overfitting, L2 norm is introduced as L2 regularization, so that the values of function parameters are dense and uniform near 0. L2 regular term refers to the square root of quadratic sum for each element in the weight vector

In this section, the effectiveness of proposed approach is evaluated through experiments. The experiments are conducted on a computer with Intel i7-8750u processor and 16GB available main memory. All algorithms are implemented on PyCharm with Python 3.7.0. The python tool libraries used in the algorithm implementation process include keras, numpy, pandas, matplotlib, seaborn and sklearn, et al. In order to train the machine learning model, the data is split into two parts, the training set and the test set with ratio 70 for training and 30 for testing.

The paper is intended to predict whether students can pass the final exam or not through relevant student features. The prediction results are divided into pass and failed, which is a binary prediction. The confusion matrix analysis table of binary prediction problem is shown in

Confusion matrix | Predicted value | ||
---|---|---|---|

Failed students | Pass students | ||

Really value | Failed students | TP (Really failed students) | FN (False pass student) |

Pass students | FP (False failed students) | TN (Really pass student) |

Accuracy (A), recall (R), precision (P) and F1 score (F1) are used to evaluate the prediction results. Accuracy is defined as the percentage of correct predictions for pass students in the test data. However, more attention should be paid to the percentage of real failed students than to that of the pass students here. Therefore, recall is introduced as the evaluation indicator. The improvement of recall will lead to the decline of precision and the virtual high of the model. Recall and precision are a pair of relatively contradictory variables. F1 score is adopted to balance the recall and precision, which evaluates the model more objectively. The calculation of each indicator is as follows.

In this paper, L2 regular term is added to the BP neural network, which can reduce the overfitting risk of BP neural network due to the increase of iterations. The accuracy curves of the experiments are shown in

The accuracy curve of the BP neural network with L2 regularization in the experiment is more stable than the BP neural network without L2 regularization in the control group, which can effectively reduce the risk of model overfitting.

As show in

In order to test the generalizability of BP neural network with L2 regularization, we compare it with other machine learning algorithms, including AdaBoost, Bayes and Logistic Regression on the dataset. The results of the empirical study using accuracy, recall, precision and F1 score performance metric are shown in

Model | Category | Recall (R) | Precision(P) | F1 score (F1) | Accuracy (A) |
---|---|---|---|---|---|

Bayes | Failed | 0.58 | 0.67 | 0.62 | 0.71 |

Pass | 0.80 | 0.73 | 0.76 | ||

AdaBoost | Failed | 0.63 | 0.68 | 0.67 | 0.70 |

Pass | 0.75 | 0.71 | 0.72 | ||

Logistic Regression | Failed | 0.63 | 0.67 | 0.65 | 0.70 |

Pass | 0.75 | 0.72 | 0.73 | ||

In conclusion, BP neural network is more effective than traditional machine learning algorithms, such as AdaBoost, Bayes, Logistic Regression in the classification and prediction of blended learning achievements. By introducing L2 regularization, BP neural network model can effectively prevent overfitting.

Binary prediction based on BP neural network is presented for the application in blended learning achievement. Firstly, pretreating and mining student information and student online learning data recorded on the Chao-Xing platform. Then, effective features are selected by Pearson correlation analysis. Finally, the effectiveness of the algorithm is verified on the dataset. Compared with the traditional machine learning algorithm, this model can better classify and predict the final grades of students. In the future work, we will build multi-layer hidden layers to improve the fitting ability of neural network and collected more data to train the model. These may be helpful to improve the prediction accuracy of the model.

We thank the editor and the anonymous reviewers for their helpful comments and suggestions in improving the paper.