Machine Learning is revolutionizing the era day by day and the scope is no more limited to computer science as the advancements are evident in the field of healthcare. Disease diagnosis, personalized medicine, and Recommendation system (RS) are among the promising applications that are using Machine Learning (ML) at a higher level. A recommendation system helps inefficient decision-making and suggests personalized recommendations accordingly. Today people share their experiences through reviews and hence designing of recommendation system based on users’ sentiments is a challenge. The recommendation system has gained significant attention in different fields but considering healthcare, little is being done from the perspective of drugs, disease, and medical recommendations. This study is engrossed in designing a recommendation system that is based on the fusion of sentiment analysis and radiant boosting. The polarity of the sentiments is analyzed through user reviews and the processed data is fed into the Extreme Gradient Boosting (XGBOOST) framework to generate the drug recommendation. To establish the applicability of the concept a comparative study is performed between the proposed approach and the existing approaches.

With each day passing technology is transforming in different astonishing aspects; the ability to predict future outcomes based on past experiences has completely changed the outlook to view our surroundings. We are talking about none other than one of the most prominent emerging technologies of the era-Machine Learning and Deep Learning. These technologies have completely revolutionized the crucial field of healthcare by delivering excellent results. They are proficient in doing the chores that are performed by humans in a cost-effective and time-saving manner and hence became a part of our health ecosystem. In the current scenario Internet of medical things and Artificial Intelligence is already helping individuals through their virtual assistance, monitoring and detecting probable life-threatening diseases at earlier stages. Current disease diagnosis and treatment are entirely based on the doctor’s knowledge and experiences. There is enormous availability of the data records related to patients, diseases, and concerned treatments but there are no systems that are capable of analyzing this data as well identifying patterns and associations between diseases and effective treatments. Thus, we are in a need to provide doctors with systems that can make predictions and share medical knowledge at earlier stages. The vast amount of clinical data has given rise to the need for recommendation systems. The recommendation system in healthcare is designed to generate a balance between the continuous generation of data and real-time response for the treatment. The integrated development of information technology with medical science has generated several research studies on different fields of healthcare including disease diagnosis, disease treatment, and drug discoveries. The arrival of Healthcare 4.0 has already accelerated the speed of digitization in the field of healthcare. Industry 4.0 is automating the service and production industries incredibly. The prominent technologies of Industry 4.0 such as Big data, Internet of Things (IoT), Cloud, and Fog Computing have transfigured the healthcare industry and directed its entire infrastructure towards Healthcare 4.0 [

Analysis of user reviews on the particular drug through identification of sentiment polarity.

Feeding the processed reviews into the base classifiers and the proposed method.

Recommendation of top drugs based on the integration of word2vec and XGBOOST.

Comparison of the proposed methodology with the existing recommendation systems termed as a case study.

During the first phase, comparisons have been made between several base classifiers, and performance measures of accuracy, precision, recall, and f1-score are observed. The proposed methodology Extreme Gradient Boosting Recommendation System (XGBRS) suggested top drugs for a certain condition based on the sentiment of the user reviews with an accuracy measure of 0.96, precision measure as 0.92 & 0.89, recall measure as 0.91 & 0.89, and f1-score 0.94 & 0.89 for positive and negative class, apart from this each classifier is evaluated on two more metrics; mean average precision and coverage for clarity of the concept. To establish the applicability of the method, a comparative analysis has been drawn from the existing recommendation systems that are based on the framework of machine learning. The flow of the research paper is depicted in

The advent of healthcare 4.0 has opened up numerous ways in which recommendation systems can be designed for assisting healthcare. A recommendation system is a concept where predictions are made based on the past availability of data. The origin of the recommender system is traced back to the 1990s but the current advancement in Artificial Intelligence, Big Data, Cloud and Fog Computing, IoT, and NLP has made it possible to design systems that have high efficiency and accuracy for personalized healthcare. There are 4 basic categories of recommender systems; content-based, context-based, collaborative filtering, and hybrid recommender system [

A content-based recommendation system using a neuro-fuzzy approach is designed to recommend processed items for the users. The researchers developed an Artificial Intelligence-based framework and a web application to give a glimpse of the user interface. This interface is designed to provide a simulation of real users and the proposed work is compared with a deep learning-based method [

One of the innovative researches that have been performed is designing a recommender system to quit smoking through the usage of motivation messages. The research uses a mobile application that transmits messages through the recommender system that considers users’ opinions and user’s profiles [

The evolution in the field of computer-based technologies has significantly increased the volume of user-generated data in the forms of text over different platforms. This large volume of data is still not processed completely through the techniques of NLP. In the field of medical science, textual information represents the interaction between the patient and medical professionals along with the treatments suggested by the professionals. They also provide insights into people’s emotions and reactions to these real-time situations. In this study, we are describing a method that is based upon the analysis of sentiments of drug review. The proposed architecture is depicted in

In the algorithm, the similarity score is calculated by calculating the statistical measure of gain to find the best split, and the new prediction is calculated using [

Now calculating the similarity score, XGBOOST multiplies the equation by −1 to transform the parabola over the horizontal axis.

Word2Vec is one of the promising techniques of natural language processing developed by Google. The technique uses shallow neural networks for the creation of embedding and employs neural networks in both of its two methods namely; Skip Gram and Common Bag of Words. The training and testing split for the dataset is kept to be in a ratio of 75% and 25%. Word2Vec used the cosine similarity which means that the angle between the 2 vectors should be closed to one. The skip-gram architecture is used in the study as it predicts the source context for a given center word. It calculates the probability of each word appearing without considering its distance from the center point. It describes two-dimensional word vectors (x,1). The vectorized softmax function is used for modeling the discrete probability distribution. Softmax is a version of the argmax function, the values are scaled so that each probability sum up to 1.0. The logic of stochastic gradient descent is given by algorithm 1 and algorithm 2 describes XGBOOST Classifier.

Probability = exp(value)/sum v in list exp(v)

Probability = exp(1)/(exp(1) + exp(3) + exp(2))

Probability = exp(1)/(exp(1) + exp(3) + exp(2))

A publicly available data set on the UCI Repository is considered for the study. The dataset contains multiple records related to drug reviews that are important due to various reasons such as drug reviews allow people to get to know about the effects of the drug they are using by other individuals and also it gives insights on the side effects and positive results about a particular drug. Total data points count to 161297 records. The data provides reviews given by the users on a particular drug based on a certain condition and a 10-star rating related to the overall satisfaction of the user. The python libraries ‘pandas’ are used for the process of data cleaning, ‘numpy’ for mathematical functions, ‘matplotlib’ for data visualization, and ‘scikit-learn’ for the process of sentiment analysis. After the step of checking for duplicate values, missing values, and outliers the data is then preprocessed for reducing the dimensionality of data. Tokenization is performed for breaking the textual information into a meaningful list of words and phrases. Secondly, words that are occurring multiple times in reviews such as “happen”, “happening” are converted into a standard format. Moreover, for efficient analysis, conditions with only one drug are removed from the dataset.

Sentiment’s analysis of data on the experience of a drug is challenging research that is gaining attention. Sentiment analysis identifies terms of significance importance from a large volume of data through automation. Sentiment analysis is a subfield of data mining and is built up of multiple processes. In the study, the data is converted into feature vector representation using Word2vec and is followed by classification through machine learning classifiers. Since algorithms of machine learning cannot be directly applied to the textual form of data so we need to convert it into a numerical format. Thus, to perform the classification task we have applied vectorization of words using the Word2Vec technique. Each position is defined as p and center word at that position as

Here X1 and X2 are the input variables for determining a slope of a line and A0 is the intercept is searched by the learning algorithm. This equation is responsible for the classification of new data points. SVM supports different kernel functions; hence in the study we have employed two of them; linear kernel and radial kernel. Linear kernel describes the measure of distance between a new data point and support vectors. A linear kernel is used in the study as they provide more accurate classifiers. The next kernel used is the radial kernel which is based on gamma parameters. The parameter is specified by the learning function given by

For study following classifiers are incorporated into the study; SVM (both with RBF and Linear Kernel), Multinomial Naive Bayes, Random Forest, K-Nearest Neighbor, Linear Regression and XGBOOST. There are a large number of records available in this dataset and hence due to this the primary objection of reduced time has also been taken into consideration. For the analysis of the predicted sentiments main metrics have been included namely; precision, recall, accuracy and F1-score. Let TP stands for true positive, FP for false positive, TN for true negative, and FN for false negative. The equation for performance measures is given by

Another important metric that has been taken into consideration is Mean Average Precision which is a performance measure for recommendation systems. So here in the study k number of recommendations has been generated in the following pattern; for example; if the system has generated 5 recommendations in one iteration out of which only three are relevant then the relevant recommendations are denoted by (1) and others are denoted by (0). Hence the order is (10101). Thus, with each classifier, the recommendations are generated and the mean average precision is calculated given by

After the study it has been noted down that in some cases accuracy and precision can’t be considered as an ideal performance measure and hence to establish the concept of the proposed approach more mathematically we have calculated the measure Matthews Correlation Coefficient (MCC). The measure was originally proposed for the chemical structures and back in 2000 is considered among the standard performance measures in the subject of machine learning. The significant difference between the metric F1-score and MCC is that MCC remains invariant even if we shuffle the positive label with a negative label and vice-versa. Thus it calculated the Pearson product-moment correlation coefficient between the actual values and the values that are predicted and are shown in

The best-predicted measures are identified and combined for the production of prediction, these results are then multiplied to generate an overall score for a drug related to a particular condition, the score is generated using a normalized count i.e. higher the count, the better the medicine. The top conditions identified during the study are;

Algorithm | Sentiment | Precision | Recall | F1 score | Accuracy |
---|---|---|---|---|---|

Support Vector Machine(L) | Positive | 0.85 | 0.57 | 0.45 | 0.80 |

Negative | 0.71 | 0.69 | 0.60 | ||

Support Vector Machine (RBF) | Positive | 0.75 | 0.89 | 0.71 | 0.87 |

Negative | 0.65 | 0.67 | 0.55 | ||

Multinomial Naive Bayes | Positive | 0.78 | 0.80 | 0.82 | 0.79 |

Negative | 0.60 | 0.71 | 0.71 | ||

Random Forest | Positive | 0.77 | 0.52 | 0.64 | 0.87 |

Negative | 0.62 | 0.79 | 0.72 | ||

K- Nearest Neighbor | Positive | 0.56 | 0.63 | 0.57 | 0.69 |

Negative | 0.69 | 0.50 | 0.59 | ||

Linear Regression | Positive | 0.54 | 0.65 | 0.68 | 0.71 |

Negative | 0.47 | 0.59 | 0.51 | ||

Classifier | Mean average precision | Matthews correlation coefficient |
---|---|---|

Support Vector Machine (L) | 0.72 | 0.82 |

Support Vector Machine (RBF) | 0.83 | 0.82 |

Multinomial Naive Bayes | 0.69 | 0.79 |

Random Forest | 0.88 | 0.90 |

K- Nearest Neighbor | 0.54 | 0.78 |

Linear Regression | 0.71 | 0.78 |

0.92 | 0.90 |

Condition | Recommended drug |
---|---|

Birth control | |

Birth control | Norethindrone |

Birth control | Levonorgestrel |

Depression | Buplopion |

Depression | Sertraline |

Depression | Venlafaxine |

In this section, we have explored some of the studies that are applying the machine learning framework to provide recommendations. The case studies are compared with the proposed method to establish the applicability of the method in recommending drugs for a particular condition after considering the polarity of the sentiments identified through the reviews. Extensive study is carried out and only those studies are considered for comparison which is employing the techniques of machine learning in the recommendation. For each of the studies, the input is the value of the dataset that is fed to the recommendation system, and based on the input provided the system has performed further evaluation.

Recognizing a drug that left no side effects is a challenging task in medical healthcare thus recommendation system is becoming an active area of research that is improving with time. In this study, we have enhanced drug recommendations using the XGBOOST technique based on the sentiment analysis of the collected reviews. It has been observed that XGBOOST has delivered significant results as compared to other existing techniques of recommending drugs as far as the author’s knowledge and survey. Moreover, we have also evaluated the proposed method on the performance measure mean average precision and coverage for better clarity. These results are achieved by the tuning of the parameters of the algorithm and its nature of regularization. The system recommends the top drug for a certain condition based on the sentiment polarity of the reviews given by the patients. This system will be enhanced more to predict the drugs for the real-time environment. The future objective also includes reduced time complexity and integration of deep learning framework into the proposed XGBOOST classification technique. Also, the focus will be laid upon the extension of the system to multiple class analysis of diseases and suggesting the required diagnosis and drugs based on the severity of the disease. Thus, the proposed system will evolve with time to make more intelligent decisions and could be later on used on to the recommendation of personalized medication.