The term sentiment analysis deals with sentiment classification based on the review made by the user in a social network. The sentiment classification accuracy is evaluated using various selection methods, especially those that deal with algorithm selection. In this work, every sentiment received through user expressions is ranked in order to categorise sentiments as informative and non-informative. In order to do so, the work focus on Query Expansion Ranking (QER) algorithm that takes user text as input and process for sentiment analysis and finally produces the results as informative or non-informative. The challenge is to convert non-informative into informative using the concepts of classifiers like Bayes multinomial, entropy modelling along with the traditional sentimental analysis algorithm like Support Vector Machine (SVM) and decision trees. The work also addresses simulated annealing along with QER to classify data based on sentiment analysis. As the input volume is very fast, the work also addresses the concept of big data for information retrieval and processing. The result comparison shows that the QER algorithm proved to be versatile when compared with the result of SVM. This work uses Twitter user comments for evaluating sentiment analysis.

Mind reading is the greatest challenge in sentiment analysis. Generally, the output of such reading results in the information that a user actually wants. Rather than asking others what they feel, instead of reading their mind is rather an interesting task [

It’s a type of natural language processing where the review comments are categorized as either positive/or negative. Generally, all observations in the sentimental analysis are labelled only in these two categories whereas the third category is neutral [

Before subjecting the data for processing, the irrelevant information was removed using traditional statistical methods like CHI square test. Such methods improve the data efficacy. The introduction of supervised learning methods like SVM, Bayes methods, and Decision tree supports in decision making and reviews.

The text-based classifications are the key term to be addressed in sentimental analysis. In order to forecast sentiment analysis based on reviews like “Great”, “Super”, and “Awesome” are some of the key terms to be used to describe positivity in review comments. Such comments will assist in positive review analysis based on the comments posted by the users. While using such terms like “Great” in sentimental analysis the remaining portion of the comments also justifies the same expressions. Example “The Mobile is Great”. In this example, the “Great” keyword is alone considered to make a positive impact to express sentiment feelings.

When such terms are not used in a review result in negative feedback and the results of neutral comments arise when there are no comments or there are no special tag hints like “Good” or “Bad” that justify the review. In this work, the review in the sentimental analysis is evaluated using QER algorithm that reduces the maximum factors that are considered for evaluating sentiments. The use of QER algorithm classifies the review as informative or non-informative using a sentimental analysis algorithm. The work also addresses the key concepts of introducing traditional statistical methods CHI Square test to classify the review in either of two categories like information or non-informative. On receiving the results as non-informative they are further subjected to QER. The results are compared with SVM for feature selection on different review comments data sets. The objective of this paper is to reduce the overall data size along with the time constraints. Since the time consumption is directly proportional to data volume. Such evaluation provides accuracy in sentimental analysis when data volumes are increased.

The concepts of this paper are organized as follows. The survey addresses the earlier works addressing the same on different perspectives. The algorithm addresses different classifications of the algorithm used to deal with sentiment analysis. Existing and proposed methods compare the present work with the earlier algorithm versions. Results and discussion addressing the results achieved with fixed and variable data sets and finally conclusion and future works deal with the extended approaches to be made from the present conditions.

Sentimental Analysis (SA) plays a vital role in analyzing sentiments using natural language processing. It also supports Artificial intelligence along with the machine language to automate the process of handling sentiment analysis. When a comment is made and taken as a review, it is classified as positive and negative. There is no need to focus on the positive and whereas in the case of the negative, efforts have to be taken to fine-tune the process based on review comments.

Apart from negative comments, there is another zone to be focused i.e., neither positive nor negative. Such cases need to be focused and given the additional privilege of sentiment analysis to predict how to convert neither positive nor negative case to positive [

Utkin et al. [

Liu et al. [

Lakshmanaprabu et al. [

The major difference between Naïve Bayes and Bayes multinomial is that the former uses independence assumption whereas the prior uses multinomial distributions.

The naïve Bayes assumes every case as an independent assumption instead of independent distributions whereas conditional independency addresses various classes. The features of formality for probability observation using the features f1 till fn. The function of the class is defined as c and the naïve Bayes makes the assumptions as follows.

In the classification of naïve bayes, the probability of class functions is defined by

The above function excludes a few functions based on its limitations. The above function is used to count the entire text that including word counts.

For any variable x defined in overall chain functions Ck then and x is the individual variable

When X is the proof of vector based on the functions

The function

While substituting the above-mentioned probability class functions yields

When comparing two words for assumption the use of multinomial functions as distribution makes the generalization of the binomial distribution.

In order to observe the overall count, then the observed count in the document is measured by

Text processes in general deal with Information retrieval, Sentiment Analysis, Information extraction, Machine translation and finally Question answering. Out of all in-text processing, decision tree plays a vital role in sentiment analysis. Generally, sentiment analysis deals with extracting features based on textual datasets. The data sets deal with more volume of user input. All such input has many dimensions of approaches either to send a comment or to receive feedback. The biggest challenges in these approaches are data segregation.

Decision tree decides on the factors of positive, negative or neutral. The value for positive is set as 1 and for negative is 0 and for neutral is neither 1 nor 0 in specific its −1. While considering the Corpus (Collection of texts) represented as C and the detection factors are represented as D. The list makes the analysis within C for every D in the list. The observation factors of D within C represent the overall search factors of text that leads to the Cartesian factors for every D of C is D × C.

If we have two sets of the list has the overall comments made by the user for a product is represented by

D1: The product is good

D2: Superb

In both the cases it describes the quality of the product, whereas one vertical gives a clear picture in terms of the comment and whereas the other simply addresses its edibility. While combining these two factors for evaluating the overall comment of the product, and which is described by the list of product corpus

C = {PRODUCT, SPECIFICATION, PERFORMANCE}

Then D x C is represented by the

Product | Specification | Performance | |
---|---|---|---|

D1 | 1 | 1 | 0 |

D2 | 1 | 0 | 1 |

DTF = No. of time a work occurs in the corpus

If a corpus consists of 200 words then the overall count of the words occurring in the document is 30, then the resultant 30/200 = 0.15

In order to identify the reverse factor of Term frequency, then based on the assumption if the overall identified word comes as 10000 from 1 lac words of the overall document, then it is calculated by

The next step in this evaluation is to separate training set along with data validation.

The existing method uses the concept of SVM for evaluating customer preference and their choice along with their review report. The work deals only with Yes/No type and it didn’t focus on neutral comments. The existing work deals with word correlation using association miner. The fitness value is calculated using word frequency with dendrograms to measure the frequency distance. The concept of simulated annealing is used along with SA-MSVM in order to convert the evaluation into a heuristic search. The following are the key observation and updation done in the proposed work based on the existing work. The overall process flow of the proposed work is shown in

Stage 1: Retrieves data and converts data into a prescribed format for data validations

Stage 2: Implements feature selection from the data validations

Stage 3: The data is preprocessed to separate from a training data set

Stage 4: The data is classified based on Yes/No types as one division and Neutral comments as to other division

Stage 5: Data set comparisons

Stage 6: Implement QER to convert neutral comments into near positive

The QER feature selection uses feature scores taken from two different tables that are evaluated and stored from earlier evaluations. The concept behind QER is evaluated when the feature score is low as the positive and negative values are high. The resultant process again two different evaluation platforms as feature scores with positive and negative. Let us assume the feature score as a neutral comment and enhance this score to make it more positive. The classification process has the most impact score towards sentimental analysis as the input parameters are yes/no as one parameter and neutral score as another.

QER, when compared with other algorithms like IG, OCFS, DFD, CHI2, are good at classifying features whereas QER gives feature analysis along with probabilities and provides summation between two probabilities. The analyses were done with yes/no as one set of probabilities and the other set deals with a neutral score.

While calculating F measures using QER that provide results based on precision and recall as its performance metrics. The correct classification set is precision and recall is the entire set of classifications.

The F measures in this analysis deals with neutral comment and its evaluated based on the equation

The work deals with the market basis analysis for evaluating and recommending the product based on customer satisfaction. Generally, the analysis is done based on yes or no pattern. Such a pattern gives analysis of the product strictly only on two evaluating values and results conclude whether the product is recommended or not. The confidence levels are evaluated based on 49122 rules using a scatter plot by setting the confidence level. The result of the above confidence level is represented in the

In order to focus on yes/no type for separating key data for interpreting the results for QER, the work implements two key plots for overall observation based on the customer input for product recommendations and evaluations. The two key plots take the rules-object and measure as parameters for result interpretations and the below

P |
R |
|||||
---|---|---|---|---|---|---|

Features | SVM | QER | Features | SVM | QER | |

MOVIE | 15432 | 0.8161 | 0.7854 | 5463 | 0.2716 | 0.2534 |

MOBILE | 48123 | 0.731 | 0.712 | 3829 | 0.078 | 0.0564 |

TV | 22345 | 0.7708 | 0.6892 | 2837 | 0.12 | 0.092 |

BOOK | 7854 | 0.7988 | 0.7245 | 1983 | 0.2534 | 0.18 |

HOME APPLIANCES | 4456 | 0.6582 | 0.6234 | 847 | 0.19 | 0.1409 |

POSITIVE/NEGATIVE COMMENTS | NEUTRAL COMMENTS | |||||
---|---|---|---|---|---|---|

QER | F-MEASURES | SIZE | QER | F-MEASURES | SIZE | |

MOVIE | 15432 | 0.8161 | 16000 | 5463 | 0.2716 | 6000 |

MOBILE | 48123 | 0.731 | 50000 | 3829 | 0.078 | 5000 |

TV | 22345 | 0.7708 | 25000 | 2837 | 0.12 | 5000 |

BOOK | 7854 | 0.7988 | 10000 | 1983 | 0.2534 | 2500 |

HOME APPLIANCES | 4456 | 0.6582 | 5000 | 847 | 0.19 | 2000 |

The work also focuses on the overall positive and negative commands and in parallel with neutral commands. Based on the observation F measure scores of the neutral commands is very less and while comparing with the probability score positive and negative.

The comparison classifies text and provides the best classification of the results based on the probability score. The classification categorises the text from neutral content and makes similar associations with the positive content and eradicates negative associations. The F measures take only the filled content and skip comment that is empty. The word to word is compared and the F measures are done for positive and neutral comment and taken the F measures with the optimized value.

In this work, the QER algorithm is implemented for converting the neutral (non-informative) comments into positive using probability factors. The F-measure scores are implemented using feature selection by converting training data sets into validating data sets that are taken as a part of result observations. The QER algorithm is used to interpret the probability factors for simulated annealing for information processing and classification. This work gives the way for converting the non-informative data into informative using sentiment analysis. Such initiative can be adapted to market analysis for converting the missed-out observations into formidable outcomes.