After the outbreak of COVID-19, the global economy entered a deep freeze. This observation is supported by the Volatility Index (VIX), which reflects the market risk expected by investors. In the current study, we predicted the VIX using variables obtained from the sentiment analysis of data on Twitter posts related to the keyword “COVID-19,” using a model integrating the bidirectional long-term memory (BiLSTM), autoregressive integrated moving average (ARIMA) algorithm, and generalized autoregressive conditional heteroskedasticity (GARCH) model. The Linguistic Inquiry and Word Count (LIWC) program and Valence Aware Dictionary for Sentiment Reasoning (VADER) model were utilized as sentiment analysis methods. The results revealed that during COVID-19, the proposed integrated model, which trained both the Twitter sentiment values and historical VIX values, presented better results in forecasting the VIX in time-series regression and direction prediction than those of the other existing models.

To meet the increased necessity for measuring market fluctuations, the Chicago Board Options Exchange (CBOE) developed the Volatility Index (VIX) in 1993 [

Trends in the VIX have been forecasted in previous research; there have been suggestions for exploiting the arbitrage opportunities in VIX options trading and providing useful references for risk management in volatility derivative markets [

In the finance sector, sentiment analysis on social media text has been used in some studies to predict stock prices [

Moreover, the outbreak of the COVID-19 pandemic has left an unprecedented impact on people globally, leading to a high frequency of social media posts with keywords related to COVID-19, clearly describing the general sentiments of people. These posts have been used in multiple studies to examine the correlation between public sentiment and financial market prices, such as stock or bitcoin prices [

According to this change in the financial market, sentiment features were extracted from Twitter posts related to COVID-19 and were utilized for VIX forecasting. Between December 1, 2019, and August 5, 2020, 1,000 daily Twitter posts were collected; these data were collected from December 2019, when the first case of COVID-19 was reported. After preprocessing, Linguistic Inquiry and Word Count (LIWC) and Valence Aware Dictionary and Sentiment Reasoner (VADER) were employed for sentiment analysis. LIWC has been used in existing research for the sentiment analysis of Twitter data [

In the current study, two different analysis methods were implemented: time-series regression prediction and direction prediction. For the time-series prediction, several time-series neural networks—such as the bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent unit (BiGRU), long short-term memory (LSTM), gated recurrent unit (GRU), and attention-BiLSTM—were used as the base model in the experiments. These models were utilized to forecast time-series data in finance areas [

The contributions of the current study are as follows:

First, this study shows sentiment reflected in social media texts to be an effective feature to predict the financial volatility index in the early stage of a pandemic.

Second, our suggested model is considered to be efficient enough to implement daily prediction. Though it consists of multiple models including neural networks, it requires low computations for both training and inference and takes less than an hour without a GPU.

Third, it is challenging to accurately forecast the steep rise and fall periods resulting from the outbreak of the pandemic. We present a way to improve the prediction of unexpected patterns, including the steep rise and fall, resulting from the outbreak of the pandemic: adding statistical models to neural networks that capture features of direction and volatility from recent trends.

The rest of the paper is organized as follows. This section is followed by the Method section, which elaborates on the data collection, preprocessing, models, and analytical details in the study. Then, the Results section reports the outcomes of the predictive and comparative analyses. Finally, the Conclusion section summarizes the study and provides suggestions for future research.

We collected data on Twitter posts and CBOE VIX data. These Twitter posts were English posts hashtagged with keywords that are related to COVID-19, such as “COVID,” “COVID19,” “COVID-19,” “pandemic,” “corona,” “corona-virus,” and “covid-death.” The VIX data were collected from Google Finance during the same period the Twitter data were collected. The collected data included 171 days of VIX data, based on the business days when the financial market was open.

Regarding the preprocessing of Twitter posts, the posts with websites were removed from the data as those were considered advertisements. For accurately calculating sentiment scores with LIWC and VADER, which are lexicon-based sentiment analysis methods, additional preprocessing was performed on the word level. The words in each post that were not pronouns, nouns, verbs, adjectives, and adverbs were removed. Then, the remaining words were lemmatized before analysis. Repetitive characters such as “o” in “Things will get better sooooon” were included in some of the posts, which were replaced with the corresponding single character.

VIX time series data, which are the training data of the BiLSTM, were normalized. The sentiment features of the previous four days were used to predict the VIX of the following day using the BiLSTM, based on the trials with different timestep values.

Using Twitter social data, sentiment analysis scores were generated from the posts and used as features in the VIX index prediction. LIWC and VADER were used for the analysis. LIWC is a text analysis program that shows the computed scores of more than 80 sentiments and other content features using the dictionary; the words here are classified categorically. The main categories include linguistic, emotional, grammatical, and psychological categories; the scores of each word in these categories (e.g., “positive emotion,” “negative emotion,” “anxiety,” “anger,” “sad,” and “social”) are provided [

Among neural networks, the recurrent neural network (RNN) is widely used as a sequential model to forecast time series data since the model provides the corresponding vector with sequenced input vectors. However, the RNN model suffers from the vanishing and exploding gradients problem. To avoid these issues, the LSTM model was devised and used; this model can train long-term sequence data with deeper neural models without encountering such a problem.

The LSTM model consists of an input gate (

where,

In the BiLSTM, another layer of LSTM units exists, as shown in

Considering the several models—BiLSTM, BiGRU, LSTM, GRU, and Attention-BiLSTM—used as the base neural network in the experiments, the BiLSTM was chosen to be integrated with the linear time-series prediction models, ARIMA and GARCH, because it demonstrated a better performance than the other models. The model consists of 32 nodes of two BiLSTM layers and 16 nodes of two dense layers. The number of nodes (

The ARIMA model is a traditional statistics model and has been applied to time-series forecasting in financial fields [

The moving average model, MA(q), explains

The terms p and q are defined through the Akaike Information Criterion (AIC) value, which is known to increase with lower values. The equation for the AIC is stated in

The mathematical expression of ARIMA (1, 0, 2) can be restated in

The GARCH model captures the feature of the variance of the time series data. Owing to the significance of the risk, GARCH is commonly used in studies on the financial market. The model explains the volatility at the time

According to the related studies [

The BiLSTM trained with sentiment features and ARIMA were unified to capture the linear and non-linear patterns of the data to predict the target. Although the models are different, the cases of integrating ARIMA and non-linear models are shown in previous studies in the finance sector [

Our model requires low computation to implement daily prediction. The number of parameters of the neural networks in our model is approximately 0.28M, which is 226× smaller than that of the base model of Transformer [

As shown in recent studies on VIX forecasting [

For the direction prediction, the metrics precision, recall, and F1-score were utilized to measure the classification performance. Each metric can be calculated by

Before using the ‘ARIMA-GARCH’ model, the augmented Dickey–Fuller test (ADF) was implemented with the VIX data to check if the data were stationary. An ARIMA, with the order of (p, 0, q), was utilized for the prediction based on the ADF statistics value, p-value, and critical values, implying that time-series data were applied with no differencing.

Considering the autocorrelation function (ACF) (

Through the AIC comparison among the models with the specified order of the terms, (1, 0, 2) was employed for the ARIMA with the lowest AIC of 798.726. For the GARCH model, GARCH (1, 1) was adopted based on a recent study on the VIX [

In this section, the predictions of the single BiLSTM were compared with those of the integrated model. The single BiLSTM with the test data returned MAPE values ranging from 12 to 14, indicating a low forecasting error. However, when predicting the VIX of the whole period, the model was found to underfit the data for the overall period (

The underfitting of data was the limitation of using only the non-linear model. To resolve this problem, the model was combined with models that could add the linear trends and features of the target data. Applying the multivariate statistical models ARIMA and GARCH, the hybrid model fitted the overall data better, as shown in

Compared to the results of the existing studies that forecasted the VIX in the COVID-19 era [

We experimented using the hybrid model with other non-linear base models; the observed results are shown in

RMSE | MAE | MAPE | Improvement (%) | |
---|---|---|---|---|

BiGRU | 5.070 | 4.003 | 13.989 | 33.09 |

LSTM | 5.339 | 4.596 | 17.548 | 46.66 |

GRU | 7.772 | 6.694 | 25.260 | 62.95 |

Attention-BiLSTM | 8.040 | 6.955 | 26.588 | 64.80 |

Note: Improvement values imply MAPE decreases when using our final model.

The integrated model was also compared to the ARIMA-GARCH combined model, which does not use sentiment analysis features for training. The results shown in

RMSE | MAE | MAPE | Improvement (%) | |
---|---|---|---|---|

BiLSTM-ARIMA-GARCH | – | |||

ARIMA | 3.014 | 2.592 | 9.658 | 3.09 |

ARIMA-GARCH | 4.824 | 3.818 | 12.752 | 26.60 |

Note: Improvement values imply MAPE decreases when using our final model.

With the same trained model, we evaluated the model for classifying the VIX future direction of increase or decrease (

Precision | Recall | F1-score | |
---|---|---|---|

Increase | 0.65 | 0.61 | 0.63 |

Decrease | 0.42 | 0.45 | 0.43 |

This study predicted the global volatility index in the early stage of the pandemic by using sentiments in social media texts. The sentiment information of the texts was extracted through two sentiment analysis methods: (1) LIWC, which is used to extract variable sentiments from text and (2) VADER, which is recognized to accurately analyze sentiments from texts from variable domains. The BiLSTM model, which learned sentimental features, was proven to be effective for the prediction of volatility index in that the integration of the model showed a better performance than when using only a single statistical model (i.e., ARIMA) or combining statistical models (i.e., ARIMA-GARCH).

Furthermore, by integrating the sequence neural network model with the traditional statistics models, the non-linear features from the sentiment data and the linear trends of the target values were utilized simultaneously. Using the three models, the underfitting problem was resolved, and the integrated model fitted the data patterns better for the entire period.

Even though the integrated model consists of multiple models, training and inference are completed quickly enough to support daily forecasting. Since neural networks in the unified model have a small number of parameters, the optimization requires low computations and is completed quickly without using a GPU.

The integrated BiLSTM-ARIMA-GARCH model, which used only social media sentiment data and the historical values of the target, showed lower forecasting errors in regression prediction compared to those shown in a similar study on VIX prediction conducted during the COVID-19 pandemic [

Nevertheless, these results are promising, considering that prior studies used larger amounts of data collected over multiple years than the current study [

Since the outbreak of COVID-19 is relatively recent, the integrated model still needs more data that might help the model train the dynamic patterns of the VIX during COVID-19. Such improvement will enable the model to make better predictions for extreme patterns in the future. However, this study proved that the sentiment scores of social media data could be an advantageous independent variable for predicting volatility in the finance market. Additionally, the social media posts related to global issues, such as the pandemic, also seemed to reflect the sentiments of people toward the finance market, eventually affecting the changes in the market itself.

However, future studies still need to consider such potential changes. If global issues continue to persist and the public gets used to them, the explanatory power of sentiments based on social media posts related only to keywords based on global issues might decrease from the initial stage. Therefore, using social media texts related to keywords on both global issues and the research domain is expected to show better results with the prediction task. Social media sentiments can contribute to predictions in diverse areas during the pandemic.

It should be noted that contextualized text representations learned from pre-trained language models, such as ELMo and BERT [

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (NRF-2020R1A2C1014957).

The authors declare that they have no conflicts of interest to report regarding the present study.