Recently, online learning platforms have proven to help people gain knowledge more conveniently. Since the outbreak of COVID-19 in 2020, online learning has become a mainstream mode, as many schools have adopted its format. The platforms are able to capture substantial data relating to the students’ learning activities, which could be analyzed to determine relationships between learning behaviors and study habits. As such, an intelligent analysis method is needed to process efficiently this high volume of information. Clustering is an effect data mining method which discover data distribution and hidden characteristic from uncharacterized online learning data. This study proposes a clustering algorithm based on brain storm optimization (CBSO) to categorize students according to their learning behaviors and determine their characteristics. This enables teaching to be tailored to taken into account those results, thereby, improving the education quality over time. Specifically, we use the individual of CBSO to represent the distribution of students and find the optimal one by the operations of convergence and divergence. The experiments are performed on the 104 students’ online learning data, and the results show that CBSO is feasible and efficient.

Online learning is becoming more popular as courses can easily and repeatedly be taken from anywhere and at any time. As such, the number of students is virtually unlimited, as even individuals who are not enrolled in a learning institution are able to sign up. With the outbreak of COVID-19 in 2020, many educational institutions were compelled to embrace online learning. As these learning platforms are able to capture substantial data related to the ongoing learning activities of each student, a considerable amount of information regarding learning behavior could be gathered. For example, analysis of the data can reveal relationships between learning behaviors and performance scores, enabling tailoring of lectures accordingly for better results, improving on the educational quality and experience [

Students have different learning manners, habits, basic knowledge, and interests. Manual analysis method are difficult to process this complex information. As such, intelligence methods are needed to solve these problems. This paper designs a clustering algorithm based on brain storm optimization (CBSO) to analyze online learning behaviors [

Brain storm optimization (BSO) simulates the brainstorming process, and is an optimization algorithm based on swarm intelligence [

BSO combines swarm intelligence techniques with data mining and data analysis. The distribution of generated solutions can be changed according to the properties of a problem. However, BSO has the defects of converge slowly and falling into local optimum easily. As such, many BSO variants have been proposed. For example, Zhu proposed a BSO algorithm that replaced k-means with k-medians [

The BSO algorithm has good robustness in solve clustering problems, and current clustering methods have some defects. For example, k-means depends on the selection of initial centers and is sensitive to noise and outliers. Density-based spatial clustering of applications with noise (DBSCAN) does not perform well on the problems that are high-dimensional or have large changes in density.

Clustering is an important technique which uses data mining to discover data distribution and hidden patterns. Through clustering, the relationship between learning behavior and performance can be found from uncharacterized online learning behavior data. As such, the characteristics of students are able to be obtained.

In this paper, CBSO is applied to cluster and analyze students’ online learning data. Our objectives are as follows:

1) to construct a clustering optimization model based on BSO;

2) to improve the accuracy of the cluster;

3) to apply the proposed model to data of students’ online learning behaviors;

4) to provide suggestions for students for better education quality.

The remainder of this paper is organized as follows. Section 2 provides background information. Section 3 describes the CBSO algorithm. Section 4 describes an experiment design. Section 5 presents results and discussion, and Section 6 relates our conclusions.

Human brainstorming is a method proposed by Osborne in 1953 to enable a group of people to come up with as many ideas as possible. Everyone can present ideas freely, find inspiration by listening to others. After all ideas are evaluated, the best is selected.

BSO simulates this process and has three corresponding steps of initializing individuals, generating new ones, and selecting the best one [

To strength the search ability of BSO, Gaussian variation is used to update individuals in Step 2. The updating method is shown in

where ^{th} dimension of the generated individual, ^{th} dimension of the old individual, and

Clustering method is dividing the samples with high similarity into a cluster. Hence samples in same cluster have high similarity, and in different clusters have high heterogeneity. In addition, the Euclidean distance is used as the fitness value to judge the similarity of samples. As the division of samples can be represented well by a solution of the intelligent swarm algorithm, and BSO is a good global optimization method, CBSO is used to find the optimal solution of clustering samples [

_{i} in individual _{1}_{2}_{3}_{n}} is the cluster label. The steps are shown in

Input: Population size |

The Euclidean distance is used as object function to gather similar samples into the same cluster. The closer the distance, the more similar the samples are. The object function is given as

where _{j} is the center of the ^{th} cluster, _{i} is value of each sample in the ^{th} cluster, _{j} is the number of members in the ^{th} cluster, and

To reduce the search space and accelerate convergence, individuals in BSO are clustered by k-means [

Input: _{g} from 0 to 1;_{g}< |

New individuals are generated according to _{1}, _{2}, and _{3} are used to avoid falling into a local optimum, as the specific operations are described in this section. The global optimal solution can be found by enhancing the search capability with these parameters in iterative processes [_{1} determines whether one or two clusters are selected. If the probability value generated is less than _{1}, we select one cluster; otherwise, we select two clusters. Probability parameter _{2} is used to select the central or ordinary individual in a cluster to generate new individuals, and probability parameter _{3} is used to select the central or ordinary individual from each of the two clusters and combine them to generate new individuals. The steps are shown in

where ^{th} dimension of the selected individual, ^{th} dimension of the selected class 1, and ^{th} dimension of the selected class 2.

Input: Probability _{1}, probability _{2}, probability _{3} _{g} from 0 to 1;_{g<}_{1} do_{j} from 0 to 1;_{j<} _{2}do_{k} from 0 to 1;_{k<} _{3} do |

We collected the online learning data of students in the spring of 2020, including student number, student name, course number, days and times logged in to the web, days and times logged in to a course, number of posts, number of replies, post points, and total points. To avoid the impact of abnormal and useless information, some abnormal samples were removed, 104 sample data without student number and name were reserved for subsequent analysis.

Since data in the collected dataset differed greatly, we standardized the data with a Z-score to reduce their impacts on CBSO.

where _{i} is the normalized value, _{i} is the original value, and

Since the original data had nine types of features, when drawing the graph of clustering results, we used principal component analysis (PCA) to reduce the dimensions to two.

In the CBSO algorithm, the maximum number of iterations was 12000, and runtimes was 30, ensuring the stability and accuracy of the experiment. The probability parameters of four updating strategies and initial number are shown in

n | p | p_{1} |
p_{2} |
p_{3} |
---|---|---|---|---|

100 | 0.2 | 0.8 | 0.4 | 0.5 |

Students were clustered into four categories, and the average value of all student data for each category was calculated. Then, the characteristics of each category are able to be found by analyzing these values, enabling tailoring of lectures accordingly for better educational quality and high enthusiasm in learning.

The cluster results are shown in

The same method is used to define the remaining three characteristics of students as follows: poor performance and high enthusiasm, good performance and low enthusiasm, and poor performance and low enthusiasm.

Category | Number of courses | days logged in to web | times logged in to web | days logged in to courses | times logged in to courses | times released posts | times replied to posts | Posting integral | Total integral |
---|---|---|---|---|---|---|---|---|---|

1 | –0.3178 | 0.4574 | 0.9209 | 0.7142 | 0.6771 | –0.0177 | 1.1724 | 1.1665 | 1.4339 |

2 | –0.4268 | 0.5774 | 0.0216 | 0.4697 | –0.1387 | –0.5210 | –0.3456 | –0.4004 | –0.4265 |

3 | 0.0372 | –0.8082 | –0.2955 | –0.7079 | –0.1492 | 0.2675 | 0.0116 | 0.0402 | –0.0588 |

4 | 0.9088 | –0.4283 | –0.4743 | –0.5760 | –0.1741 | 0.5621 | –0.4252 | –0.3632 | –0.4345 |

Category | Number of students | Characteristic of students |
---|---|---|

1 | 23 | good performance and high enthusiasm |

2 | 25 | poor performance and high enthusiasm |

3 | 37 | good performance and low enthusiasm |

4 | 19 | poor performance and low enthusiasm |

Tailoring of lectures can be provided to help students achieve better learning results. For example, first-category students are highly motivated to learn and have strong learning capabilities. So, they should be provided with more learning resources for further improvement. Although the students in the second-category are motivated to learn, they do not achieve good results. They need to be given learning guidance to avoid study blindly. Conversely, students in the third-category have low learning enthusiasm, but their grades are acceptable. We should enhance teaching supervision and cultivate their interest in learning to raise their potential. Finally, students in the fourth-category have neither learning enthusiasm nor good grades. Thus, we should strengthen teaching supervision, cultivate their interest in learning, and change the manner of teaching to improve their performance.

The BSO algorithm takes advantage of swarm intelligence and data analysis, and can be used to solve efficiently clustering problems. Thus, the paper proposed a cluster algorithm based on BSO. Individuals were discretized to present the distribution of samples. Besides, Euclidean distance was used to calculate the similarity of individuals. Experiments showed that CBSO is feasible and efficient to solve clustering problems. As such, CBSO is applied to analyze the relationship between learning data and performance, so as to provide tailored guidance to each characteristic of student. However, BSO still falls easily into local optima. We look forward to optimizing the model to obtain more accurate clustering results.