The use of programming online judges (POJs) has risen dramatically in recent years, owing to the fact that the auto-evaluation of codes during practice motivates students to learn programming. Since POJs have greater number of programming problems in their repository, learners experience information overload. Recommender systems are a common solution to information overload. Current recommender systems used in e-learning platforms are inadequate for POJ since recommendations should consider learners’ current context, like learning goals and current skill level (topic knowledge and difficulty level). To overcome the issue, we propose a context-aware practice problem recommender system based on learners’ skill level navigation patterns. Our system initially performs skill level navigation pattern mining to discover frequent skill level navigations in the POJ and to find learners’ learning goals. Collaborative filtering (CF) and content-based filtering approaches are employed to recommend problems in the current and next skill levels based on frequent skill level navigation patterns. The sequence similarity measure is used to find the top k neighbors based on the sequence of problems solved by the learners. The experiment results based on the real-world POJ dataset show that our approach considering the learners’ current skill level and learning goals outperforms the other approaches in practice problem recommender systems.

Automation is currently gaining ground in a variety of fields like Digital Assistance [

The existing recommendation approaches in the field of e-learning are not suitable to recommend problems in POJ. Learner modeling, course modeling, and rating information are required in this field. Current POJ platforms don’t have such information for modeling. On the other hand, in POJ platforms, the recommendation engine should consider the learning goals and current context of the target learner (difficulty level, topic currently practicing) to recommend problems. However, the existing approaches do not consider learning goals and current context to recommend objects. To understand the problem, consider a learner having interest in learning programming and the goal of cracking competitive programming contests of product companies. The learner is currently practicing problems in branching statements. The recommender system should consider the current topic, difficulty level, and learning path to move towards the learning goal. Even though the learner’s goal is to learn data structures, the learner who is now learning and solving medium-level problems in branching statements are unable to solve problems in arrays. At the same time, solving problems related to object-oriented programming is not meaningful. In addition to the above two constraints, recommender system should consider difficulty level of the problem. Generally, the problems are tagged with topics (operators, branching, looping, arrays and more) and difficulty levels (school, basic, easy, medium, hard, and more). We use the term “skill level” to jointly represent both topic preference and difficulty level. Assume the problems are scattered in a 2D skill space where x-axis and y-axis are the topics and difficulty level, respectively. The recommender system should identify learner’s current skill level and recommend problems from current and next skill levels in the suitable learning path for the learner’s learning goals. The existing content-based e-learning recommender systems work based on the similarity between content and the quality of the content through learner ratings. CF approaches focus on similar learners and content liked by similar learners. Both the traditional approaches don’t consider the learners’ learning goal and current skill level. Based on the above scenario, target learners may face the following two issues:

The problem recommended to the target learner might be solved by other learners having same learning goals, but it may be very difficult to solve it with the learner’s current skill level.

The problem recommended to the target learner might be easy to solve, but it is irrelevant to the learner’s learning goal.

The POJ system also lacks rating information since one out of every ten learners only give explicit ratings. Most recommender systems depend on implicit ratings. The implicit rating is derived from the frequency and duration of usage of the learning content in an e-learning platform. The most visited content is the most useful content. In POJ systems, because practice problems are different from learning content, it is hard to figure out an implicit rating for a problem. Learners often practice problems at a skill level until they feel comfortable with it, then move on to next skill level in their learning path. Sequence Pattern Mining (SPM) could be used to mine the most interesting patterns in these skill level navigations. These patterns are helpful in identifying learners who have similar goals, and they can be used to identify the next skill level towards learning goals of a target learner. To address the issue of lack of rating information, the number of attempts taken to solve a problem can be considered as an implicit rating of the learner of the problem. Using the number of attempts as an implicit rating makes sense because if it takes a learner more attempts than usual to solve a problem that means the problem isn’t suitable for them right now.

This work proposes a novel approach named “context-aware practice problem recommendation using learners’ skill level navigation patterns”. The proposed work performs skill level navigation pattern mining to derive frequent skill level navigation patterns, which gives more insights into learning goals and learning paths. Top k similar learners are identified based on similarity of the sequence of problems solved by users, and implicit ratings are derived based on the number of attempts to solve problems.

The proposed work uses the above information to implement collaborative filtering and content-based approaches in suitable skill levels for effective recommendations. Finally, our approach is evaluated with the real-world POJ system’s dataset to show the performance compared to other traditional approaches.

This paper is organized as follows: Section 2 focuses on necessary background details to understand our research in practice problem recommendation. Section 3 describes proposed approach to the recommendation of problems in POJ platforms. Section 4 shows the experimental setup and analysis of our proposed work. Section 5 gives the conclusion.

This section presents previous research works that aid in understanding our work on practice problem recommendation in POJ platforms. Subsection 2.1 refers to e-learning recommendation scenarios and research works. Subsection 2.2 refers to the impact of sequence-based approaches employed in the recommendation of e-learning content. Subsection 2.2 refers to existing methods in problem recommender systems. Subsection 2.3 refers to the topic sequence mining, which is the inspiration for our skill navigation pattern mining-based practice problem recommendation in POJ.

The recommendations in e-learning system are used to recommend different learning objects like materials, books, videos, challenges to solve, web pages, topics, and courses. Various research works have focused on each specific learning object because each type of object recommendation requires different features to be considered. Material recommendation based on material similarity and good learner ratings were focused on in the work. The work creates a content profile and a rating profile that are fed into a hybrid recommendation engine [

E-learning recommender systems are tasked with recommending learning objects to learners. The e-commerce recommendation system considers users’ preferences, but it is inadequate for e-learning recommender systems. An e-learning recommendation system should consider learner knowledge level, performance, pedagogical preference, the topic of interest, learning path, and more based on the type of object it recommends. The research focused on content recommendation systems based on learner activities and performance. The system used both content and CF to avoid cold-start problems [

Various sequence mining techniques have been used in e-learning recommendation systems, demonstrating that sequential patterns have a positive impact on the accuracy of recommender systems. The majority of successful e-commerce recommender systems failed to recommend learning content in an e-learning system. The fundamental reason for failure is that those systems do not take into account learners’ changing learning behavior. The recommender system, proposed as a combination of ontology and SPM, overcomes change in learners’ learning behavior problems. The system uses the CF technique to generate top N suggestions after creating ontology to represent learning items and learners. Finally, SPM is utilized to generate final recommendations. The weighted SPM approach is used in the system to identify the importance of the sequence. This hybrid system outperformed other existing methods in e-learning content recommendation [

A SPM approach is introduced to get learners’ different learning styles. These learning styles help to build a personalized e-learning system that suggests learning objects based on learner learning behavior. The algorithm proposed by the authors is employed to identify the different learning styles by collecting data from learners, categorizing learners’ behavior based on the sequence of actions logged in the Massive Open Online Course portal, and applying SPM to identify frequent sequential patterns in all dimensions [

The extended matrix represents how a learner tries to solve a problem. Research works showed a better performance in recommendation of problems in an Online Judge portal. Still, learning paths and learning goals are not considered in the recommendation systems.

Research work | Objective | Short comings |
---|---|---|

[ |
A simple recommender system recommends only one problem in POJ. The system is designed using stored procedures in a database. | The work is suitable only for the programming contest and a basic system whose accuracy is not guaranteed. |

[ |
A CF-based problem recommender system was proposed based on evaluation of other traditional approaches. | CF only considers the rating.Systems do not consider learning goal and current skill level of the target learner. |

[ |
The three-layer CF approach works based on user-preference-item method. | |

[ |
The extended matrix used in the work gives more insights into learner-problem interaction. |

The primary goal of this research work is to analyze the impact of considering learning paths across different topics and difficulty levels of problems. The research inspires idea of working on the topic sequence mining in video recommendation and utilizing sequence signals in user behavior sequence to recommend items in an e-commerce portal [

We propose a recommender system that considers learners’ learning goals and skill levels based on skill level navigation patterns, sequence similarity, and semantic relationships between problems to provide effective problem recommendations.

In this framework, we combine skill level navigation pattern mining, sequence similarity-based neighbors, CF, and content-based filtering approaches to improve accuracy of POJ recommendation system. Skill-level navigation patterns help to identify learners’ learning goals and next skill levels (topic of interest and difficulty level) where the learner needs to solve a problem. Nearest neighbors are identified based on similarities between sequences of problems solved. CF is applied to problems in learner’s current and next skill levels to generate candidate problems to recommend. Finally, content-based approach derives the new problems which are more similar to the problems solved by user and adds them to recommendations. The workflow of the proposed framework is presented in

The learner’s current context (topic of interest and difficulty level) is more important in recommending practice problems in the POJ. In our proposed system, the term “skill-level” commonly represents both the topic and difficulty levels. To logically understand the scenario, we project the practice problems into a two-dimensional skill space where each column indicates topic of the problems and each row denotes difficulty level of problems. _{ij} and contains p problems.

p_{k} indicates the number of problems in the cell k

k starts from S_{11} and ends at S_{nm}

Generally, learners navigate from one skill level to another (one cell to another) after practicing a target number of problems in each skill level. Common patterns of skill level navigation are successful learning paths derived from sequence pattern mining, and patterns are represented as graphs. Our proposed approach recommends problems from current skill level and next skill level based on the learners’ navigation pattern. Our approach recommends problems from current skill level till target learner has an average number of problems solved by learners who successfully crossed at least next two skill levels. Later, it recommends problems from current and next skill levels. Common patterns of skill level navigation help recommendation system identify next skill levels where target learner needs to solve problems based on current skill level S_{ij}. The skill level navigation pattern mining also helps the system reduce memory while identifying similar learners for CF. Similar learners are identified based on sequence of solved problem from the root node to the current skill level node of the target learner.

A sequential pattern mining discovers frequent sequences of skill level navigations or learning paths of the learner while learning programs and problem-solving in POJ. Hereafter, learning paths are referred to as “skill level navigation patterns”. It has been proven that topic sequence mining helps scalable video recommendation systems [

Each problem P is projected in 2D skill space S_{ij}. Problem Ps is on topic i and difficulty level j. The problems solved by a user are presented in the submission history as follows.

P_{1}(S_{11})–P_{2}(S_{11})–P_{5}(S_{11})-P_{20}(S_{11})–P_{32}(S_{12})–P_{35}(S_{12})–P_{36}(S_{12})–P_{42}(S_{21})–P_{45}(S_{21})–P_{46}(S_{13}).

Our skill level navigation pattern mining derives interesting navigation patterns between the skill levels. For each user, skill level navigation is fetched from the submission history. Skill level navigation of the above submission history is S_{11}-S_{12}–S_{21}–S_{13}. These skill level navigations of all users are given as input to SPM algorithm, and output frequent sequence patterns are stored as a sequence pattern graph that can be further used by the system in implementing CF and Content Based algorithms.

However, skill level navigation pattern mining alone is not sufficient to predict problems. We use the steps below to clean noisy data before we use it to look for skill level navigation pattern patterns.

Consider only solved problems of learner’s submission sequence.

Remove solved problems of skill level S_{ij} when problems of the same skill level S_{ij} were present earlier in the sequence submission.

_{11}, S_{21}, S_{12}> can complete problems in skill levels S_{22} and S_{13}. Skill navigation pattern mining creates a boundary in search space, which improves the accuracy of problem recommendations in a practice environment.

New learners have just recently signed up, and they haven’t completed enough problems in at least three skill levels. New learners are initially recommended with problems that have more accuracy in most of the starting skill levels of other learners. Once new learner solves enough problems at a skill level, the system recommends problems from next possible skill levels based on skill navigation pattern. Learner is not categorized as a new learner if they solve enough problems in at least three skill levels. For learners who are not categorized as new learners, we find current position in two-dimensional skill space based on problems solved by learner in the latest. We use window w which projects recent

See _{31}.

Based on the sequence of problems successfully performed, we apply top k-nearest neighbor approach to find similar learners of target learners. We assume that learners solving similar problems in a similar sequence have a similar ability to solve the problems. The problems to be solved will depend on problems solved in the past. A transition-based sequence matrix is used to represent sequence structure and a cosine similarity measure to calculate similarity between learners based on the sequence of the solved problems. Learners are at different skill levels; learners in the earlier skill level solved a smaller number of problems, and learners in the top skill level solved a greater number of problems. To balance the length of sequences, we consider only the problems solved by the user at their current skill level. Consider that the target learner is at skill level S_{24} and has solved problems in the sequence S_{11}, S_{12}, S_{21}, S_{22}, S_{23}, and S_{24}. Problems solved by the target learner ls_{1} in skill levels S_{11}, S_{12}, S_{21}, S_{22}, S_{23}, and S_{24}, as well as problems solved by peer learner ls_{2} prior to solving problems in skill level S_{24}, have been used to create a transition-based matrix. The solved problems belong to previous and current skill levels of the learner and are taken using skill level navigation of the learner identified in Section 3.2.1. If two learners’ sequence is defined as ls_{1} and ls_{2}, then similarity between the two learners is

Several experiments have proven that transition-based representation of sequence data performs better in mining sequence data [

We built recommendation system based on CF using a memory-based approach that calculates score for unsolved problems of target learner based on implicit rating taken from similar learners. We construct a p X q learner-problem interaction matrix M; p is the number of similar learners, including target learner, and q is the total number of problems in current and next skill level derived from skill level navigation patterns. To construct a interaction matrix M, we use functions f(i, j) and d(i, j), which calculate implicit rating by how learner i solved the problem j.

The functions f(i, j) and d(i, j) return 0, if the learner i has not attempted the problem j to solve. For the rest of the scenarios, f(i, j) and d(i, j) return as follows:

where

n is the number of attempts taken by learner i to solve problem j.

α is the average number of attempts taken by a learner i to solve problems.

β is the average number of attempts taken to solve a problem j by learners.

f(i, j) is a function that calculates rating based on how difficult the problem is when compared to other

problems solved by user i.

d(i, j) is a function that calculates a rating based on how the user is skilled to solve a problem when compared to other users who solved the problem j.

We calculate recommendation score _{j}_{j}

Finally, top N problems with maximum scores are selected for recommendation to target learner u.

In addition to problems from CF approach, we recommend that new problems in current and next skill levels of target learner by content-based filtering approach. In content-based approach, we consider attributes of problems like the topic, difficulty level, author, editor, and contest. Each problem is represented as a vector, and size of the vector is equal to the number of non-unique attribute values of all problems. Vector values are filled with Boolean values; if specific attribute value is present in the problem, one else is zero. We use a cosine similarity measure to calculate similarity between the two problems. In offline, for each problem, k similar new problems are stored along with their similarity values. Content-based filtering algorithm works as follows to recommend new problems in current and next skill levels to target learner: Solved Problems of the target learner and high implicit ratings (6 to 10) are considered as set P and a set of candidate problems C is built as union of k-most similar new problems for each problem

Finally, problems in

An experiment was conducted using a real-world programming online judge (POJ) dataset consisting of 73 learners, 3925 problems, and 36235 submitted solutions to evaluate the proposed method. To implement skill level navigation pattern mining, find neighbors based on sequence similarity, and derive implicit ratings based on number of attempts as discussed in Section 3, we need complete history of submissions, which includes all the attempts of the learners. The dataset is crawled from the real-world POJ, which consists of complete submission history of its 73 users. The submission history includes all failed and successful submissions of users. Detail of program submission in history helps to get insights into the learner and problem characteristics. The problems in the dataset are tagged with 22 different topics and 6 different difficulty levels. Hence, problems in the dataset have been scattered across 132 skill levels. The mean value of each learner’s number of problems solved is 140, with a standard deviation of 103. We implicitly derived rating for each problem interacted with learner through attempts taken to solve the problem. This implicit rating (1 to 10) shows how the problem is suitable to the learner’s current context. Problems that received a rating of 10 are most suitable to the learner in the current context, while those that received a rating of 1 are highly unsuitable to learners in the current context. Submission history of 73 users has been taken and formatted as a suitable sequence to input SPM Algorithm. The offline module skill level navigation pattern mining is performed using the SPM Framework (SPMF: a Java open-source data mining library) with a maximum pattern length of 5. The navigation patterns over 132 skill levels are derived as output. Online modules for the experiment have been implemented using Python data science and machine learning libraries. The length l of the latest window w is set to 6 and based on the last 6 problems, current skill level of the target user is identified. The k value is set from 6 to 16 in neighbor identification process. Two common recommender system approaches, named CF and SPM, were also evaluated using the same dataset.

Collaborative Filtering Approach: CF-based approach is adopted using implicit rating matrix derived from the learner-problem interaction.

Sequential Pattern Mining Approach: SPM-based approach is adopted using sequence of solved problems derived from the submission history of users.

In the specified dataset, 80% is taken as a training set and remaining 20% is taken as a test set to assess performance of recommendation approaches. The above experiments showed that proposed approach to Sequence Enabled Practice Problem recommendation in POJ outperforms the CF-based and SPM-based recommendation approaches.

To evaluate accuracy of the proposed approach, we use metric F1 which is derived from precision and recall. Precision represents number problems are relevant among recommended problems, and recall represents number of relevant problems are recommended. If learner has solved the recommended problems, and the rating extracted is six or above, then it is considered relevant. Furthermore, if the learner has solved or not solved the recommended problem, and extracted rating is below six, then it is considered irrelevant.

where

ut is the number of learners in test set

TS(P_{i}) is the number of relevant problems solved by learner i in test set

R(P_{i}) is the number of recommended problems to learner i

Precision and recall are calculated based on experimental results using

We have evaluated the proposed and existing approaches based on metric F1 measure derived from precision and recall. It has been the main thing to look at following ways to see how accurate the proposed method is:

We have tested proposed approach with various sizes of k nearest neighbors and various Top N numbers of recommendations.

We have tested our proposed and previous approaches with a k nearest and a top n number of recommendations.

We have tested proposed approach toward recommending new problems to address cold-start problem.

Initially, we tested proposed approach by varying the size of nearest neighbor k in the range of seven to twelve. Same testing is conducted with five different Top N values (6, 8, 10, 12, 14, and 16).

Nearest neighbor (k) | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|

Size of N | ||||||

6 | 0.48 | 0.50 | 0.52 | 0.53 | 0.53 | 0.54 |

8 | 0.47 | 0.48 | 0.48 | 0.51 | 0.51 | 0.51 |

10 | 0.43 | 0.45 | 0.45 | 0.46 | 0.47 | 0.47 |

12 | 0.42 | 0.43 | 0.44 | 0.46 | 0.46 | 0.46 |

14 | 0.39 | 0.42 | 0.42 | 0.44 | 0.44 | 0.44 |

16 | 0.34 | 0.34 | 0.37 | 0.38 | 0.39 | 0.39 |

Next, we experimented to evaluate proposed approach, CF approach, and the SPM approach using real-world dataset. Experiments are conducted by varying the number of recommendations from 6 to 12.

Approaches | SPM based recommendation | CF based recommendation | Proposed approach | ||||||
---|---|---|---|---|---|---|---|---|---|

Size of N | P | r | F1 | P | r | F1 | p | r | F1 |

6 | 0.4 | 0.32 | 0.5 | 0.56 | |||||

8 | 0.36 | 0.33 | 0.47 | 0.56 | |||||

10 | 0.30 | 0.37 | 0.39 | 0.57 | |||||

12 | 0.25 | 0.40 | 0.38 | 0.58 | |||||

14 | 0.23 | 0.41 | 0.35 | 0.58 | |||||

16 | 0.18 | 0.41 | 0.6 |

Finally, our evaluation focused on how proposed approach recommends new problems to address the cold-start problem. Both the familiar approaches, CF and SPM-based approaches need interactional data on items; therefore, both approaches couldn’t handle new items (problems). To evaluate proposed approach, new problems at different skill levels are examined using a test data set. In this evaluation, we only consider recommendation of new problems to existing learner since the new learners are recommended with top accuracy problems in root skill levels.

No of new problems | Total number of existing learners | No of skill levels | Total number of problems recommended | Average number of new problems recommended | F1-Measure for new problems |
---|---|---|---|---|---|

41 | 20 | 33 | 65 | 3.25 | 0.39 |

The research work analyzed impact of considering efficient learning paths in recommending practice problems in POJ. The learning path is derived from skill level navigation patterns obtained from sequential pattern mining. These sequential patterns clearly show future skill levels where target learner can solve problems according to the current context. CF approach works only on these skill levels to recommend problems. Additionally, we have implemented sequence-based nearest neighbor method, which is most suitable in the POJ scenario to identify similar learners. The research work also covers solutions to cold start and data sparsity problems. To utilize advantages of each approach, we integrate content-based, CF, and SPM approaches in the right and appropriate direction. The proposed system outperforms other traditional approaches in terms of accuracy. In future, we will be working on heterogeneous learning object recommendations for POJ platform. This means that the system can recommend both personalized learning materials and personalized assignment problems so that people can learn how to program and practice programming at the same time.