Cars are regarded as an indispensable means of transportation in Taiwan. Several studies have indicated that the automotive industry has witnessed remarkable advances and that the market of used cars has rapidly expanded. In this study, a price prediction system for used BMW cars was developed. Nine parameters of used cars, including their model, registration year, and transmission style, were analyzed. The data obtained were then divided into three subsets. The first subset was used to compare the results of each algorithm. The predicted values produced by the two algorithms with the most satisfactory results were used as the input of a fully connected neural network. The second subset was used with an optimization algorithm to modify the number of hidden layers in a fully connected neural network and modify the low, medium, and high parameters of the membership function (MF) to achieve model optimization. Finally, the third subset was used for the validation set during the prediction process. These three subsets were divided using k-fold cross-validation to avoid overfitting and selection bias. In conclusion, in this study, a model combining two optimal algorithms (i.e., random forest and k-nearest neighbors) with several optimization algorithms (i.e., gray wolf optimizer, multilayer perceptron, and MF) was successfully established. The prediction results obtained indicated a mean square error of 0.0978, a root-mean-square error of 0.3128, a mean absolute error of 0.1903, and a coefficient of determination of 0.9249.

In Taiwan and everywhere else, whether in a developed or a developing country, cars have become an indispensable means of transportation. In several developed countries, such as the USA, Japan, and other European countries, the sales of used cars have surpassed those of new cars. With Taiwan joining the World Trade Organization, expansions in Taiwan’s automobile sales market have been observed. Rapid developments have also been observed in Taiwan’s used car sales market, with several platforms being established for used cars. Most people tend to buy used cars for several reasons such as a limited budget or wishing to find a vehicle for temporary reasons. As many people are unfamiliar with the detailed pricing level when buying a used car, it would be impossible for them to learn about the reasonable purchasing price. For this reason, all kinds of parameters relating to used cars are employed in this research to predict the price of the used car so that consumers will instantly know about the price of the used car to be purchased. Because the optimal algorithm is applied to the models used in this research, they will help the user obtain the data from varied scenarios without needing to manually adjust the model parameters. In the meantime, it can also achieve higher accuracy to provide car purchase information for consumers.

In this study, used car data were used to develop a used car price prediction system. Machine learning (ML) models [

This price prediction system can allow consumers to determine the price of a car before buying it and also determine whether this car is a suitable choice for them given its price.

With the rapid development of artificial intelligence (AI) technology, this technology has been extensively applied in various fields. Problems that involved a large number of data or required a large amount of time to solve through experiments can now be handled by ML in AI. Supervised learning, a type of ML, can be used to learn or build models by using a large number of training data, and new instances can be inferred depending on the results obtained. This technology has been used by numerous researchers to solve multiple problems. For example, Cao et al. [

In Chapter 2, we will introduce the source and the distribution status of the dataset used in this research. In Chapter 3, the theory of the model used in this research, the data pre-processing and the “

As shown in

Year | Number of ownership transfers of used cars |
---|---|

2012 | 801,366 |

2013 | 814,893 |

2014 | 848,517 |

2015 | 828,487 |

2016 | 751,963 |

2017 | 759,002 |

2018 | 741,488 |

2019 | 760,425 |

2020 | 780,163 |

The data used in this study were retrieved from [

Parameter | Description |
---|---|

Model | Model of a used BMW car |

Year | Registration year |

Price | Price in Euro |

Transmission | Transmission style (automatic, manual, or semiautomatic) |

Mileage | Total mileage at the sale (km) |

FuelType | Type of fuel (gasoline, diesel, electricity, hybrid, or other) |

Tax | Road tax in Euro |

Mpg | Fuel consumption (MPG) |

EngineSize | Engine size (L) |

Because multiple parameters were character strings rather than numerical values, the models faced some difficulties in identifying them. Therefore, to solve this problem, data normalization was required. Among the common data normalization methods are label encoding and one-hot encoding. Label encoding is the process of mapping each category in the data to each integer, and it does not add any extra columns. In this study, a neural network combining TL, random forest (RF) [

In this study, the algorithm of the ML model was developed basis on a neural network. Neural networks are connected by a large number of artificial neurons and are considered a mathematical or computational model that imitates how biological neurons transmit messages to each other. Such a model is used to evaluate or approximate functions to facilitate identification, decision-making, and prediction. It also has several advantages; for example, it has great tolerance to different data types, has superior adaptability, and can fully approximate any nonlinear functions.

Any ML algorithm has two prediction goals: regression and classification. Regression is mainly used to predict continuous data, such as stock and house price prediction. To predict a continuous function, mathematical functions are employed to combine different parameters. The goal of regression is to minimize the error between the prediction result and the actual value. Classification is typically used to predict data with noncontinuous values, such as in handwriting recognition and stamen classification. Several parameters are applied to create a decision boundary for differentiation. The goal of classification is to minimize the error of misclassification. In this study, the prediction was considered a regression problem.

To facilitate subsequent model building, raw data should first be preprocessed. This can accelerate model convergence, increase prediction accuracy, and avoid result distortion. Therefore, a min-max normalization preprocessing method was used [_{min}, _{max}, and

Following data preprocessing, cross-validation is performed. Next, after ML is performed, the dataset is typically divided into a training set and a validation set. The training set is used to train the ML model, whereas the validation set is used to verify whether the model has been trained well. When the sample size is small, the data extracted as the validation set are generally unrepresentative. This means that the verification results of some of the extracted data may be satisfactory and those of other data may be unsatisfactory. To avoid this problem and more effectively evaluate the quality of the model, cross-validation is adopted. Among the various cross-validation methods available, a 10-fold cross-validation method was selected in this study.

Generally, the term “

In this study, a composite model combining decision tree (DT) [

Before training, the data were fuzzified by the membership function (MF) of a fuzzy set. The MF is used to transform the input provided into a fuzzy inference system. Several methods can be used to run an MF. The most widely accepted and applied method is the triangular membership function (TMF). The TMF defines the input as a triangle and transforms this input into three levels (i.e., low, medium, and high) with a converted value. A linear representation of TMF is presented in

In this study, these points were optimized using an optimization algorithm, which allowed the users to easily perform predictions with this model without having to thoroughly understand the TMF. Three optimization algorithms were compared: gray wolf optimization (GWO) [

GWO is an optimization algorithm deduced from the social class and hunting behavior of gray wolves (see

The hunting behavior can be divided into the following three steps: (1) stalking, chasing, and approaching the prey; (2) chasing and surrounding the prey until it stops moving; and (3) attacking the prey.

Surrounding the prey: Gray wolves surround their prey during hunting. To mathematically model this surrounding behavior, the following equations were proposed:

where

Both

where the component of

Hunting: Gray wolves first identify their prey and surround it. Hunting is generally led by

Attacking the prey: Gray wolves complete their hunting by attacking their prey after it stops moving. As a mathematical model of their approach to their prey, the value of

Searching for the prey: Gray wolves typically search depending on the locations of

In CPSO, which is based on swarm search behavior, the center of the swarm is regarded as an extra particle. If the original population size contains _{c}

CSO is an optimization algorithm deduced by imitating the natural behavior of cats. Generally, the behavior of cats of staying still and moving slowly corresponds to a seeking mode, whereas their behavior of quickly chasing their prey corresponds to a tracing mode. Those seeking and tracing modes are included in each iteration. The number of agents is fixed at a predefined ratio called MR. In this scenario, the cats move in a solution space, and their locations represent the solution set. Each cat has a fitness value and a location and velocity in each dimension. Each cat also has a flag that identifies whether this cat is in the seeking or tracing mode. The process of CSO is presented in

MLP is a forward-propagation neural network that consists of at least three layers: an input layer, a hidden layer, and an output layer. Back-propagation technology is applied to achieve ML in supervised learning. The structure of MLP is presented in

In TL, the trained model is transferred to another new model to avoid retraining the new model from scratch. TL is mainly used to solve difficult data labeling and acquisition problems. Several methods can be used to construct a TL model. In this study, multi-task learning was used as the model framework. In general, multitask learning increases the generalization error by using the information in the training signals of relevant tasks as inductive bias. The framework of the model proposed in this study is presented in

In this study, various optimization algorithms were compared. As shown in

To estimate the accuracy of the data predicted by the algorithms, several statistical measures, such as the mean square error (MSE), root-mean-square error (RMSE), mean absolute error (MAE), and coefficient of determination (^{2}), were used. In addition, the performances of the algorithms were compared.

MSE, also referred to as L2 loss in mathematics, is the most commonly used regression evaluation indicator. It is obtained by calculating the sum of the squares of the distance between the predicted value and the true value. Because of this squaring, MSE penalizes deviations from the true value, making it suitable for gradient calculation. A smaller MSE value indicates that the prediction model describes the experimental data with greater accuracy. MSE is calculated as follows:

RMSE is the square root of the square of the deviation of the predicted value from the true value and the ratio of the total data (i.e., the square root of MSE). Because of this square rooting, RMSE is suitable for evaluating data with high values, such as house prices. A larger RMSE value implies a less accurate prediction, whereas a smaller RMSE value implies a more accurate prediction. RMSE is calculated as follows:

MAE, also referred to as L1 loss in mathematics, is a loss function of regression. MAE is the mean of the absolute values of the errors in each measure and is used to estimate the accuracy of an algorithm. A smaller MAE value suggests that the predicted value is more accurate. MAE is calculated as follows:

^{2} is a statistical measure commonly used as an indicator to measure the performance of a regression model. The value of ^{2} typically lies within the range [0, 1]. This value indicates the degree of fitness of the true and predicted values. A value closer to 1 implies greater accuracy. The following equation is used to calculate ^{2}:

In this section, the algorithms introduced in Section 3 were individually trained. As shown in ^{2} offer more precise model accuracy results (

Model | MSE | RMSE | MAE | R^{2} |
---|---|---|---|---|

DT | 0.1446 | 0.2586 | 0.3803 | 0.889 |

RF | 0.1116 | 0.334 | 0.9144 | |

SVR | 0.4377 | 0.5169 | 0.6616 | 0.6641 |

KNN | 0.1235 | 0.1943 | 0.3515 | 0.9052 |

TL | 0.1903 |

Overall, the model constructed in this study can be used to predict the selling prices of used BMW cars and is expected to be applicable in the valuation and price prediction of used cars of other brands. The model is also expected to adapt to other methods through iterations and improvements. Several parameters were used for cross-validation to facilitate training and prediction and select the most optimal model. From SVR, KNN, RF and DT basic models, we see that RF and KNN are two basic models that can provide the highest accuracy. Therefore, both models were combined with MLP in which, the fuzzy and the optical algorithms were added to carry out the TL training. The model obtained was more satisfactory than other models in terms of MSE, RMSE, and ^{2}. Compared with a single model, TL can more effectively improve the levels of accuracy and achieve the desired training results. In the future, it is hoped that the models used in this research can help the consumers wishing to buy used cars predict the price of the used car quickly and accurately. Before using the models developed for this experiment, therefore, the data of the used cars in the respective market should be retrieved for conducting the training beforehand. The larger the data, the higher the model accuracy.

This work was supported by the

The authors declare that they have no conflicts of interest to report regarding the present study.

As shown in _{m}_{j}_{m}_{j}

RF is a type of combination algorithm based on DTs, as shown in

SVR is an algorithm based on SVMs and is used to solve regression problems. An SVM simply adds an extra dimension to the original one to find a straight line that divides the data for classification, as shown in

The SVM contains three essential parameters: a kernel, a hyperplane, and a decision boundary. The kernel is used to identify the hyperplane among high-level dimensions. The hyperplane is the dividing line between two data points in the SVM and is used to predict continuous output in SVR. The decision boundary is the boundary between decisions.

SVR first considers the sample points in the decision boundary and then fully utilizes these points. A schematic of SVR is presented in

where ^{2}. The vectors outside the ε-tube can be obtained by using the slack variables ξ_{i}

KNN is a simple and easy-to-use supervised algorithm. It can be used to solve classification and regression problems. However, although KNN is easy to use and understand, it has a major disadvantage: it considerably slows down when the size of the data increases. As shown in

The working principle of KNN is to estimate the distance between a query and all samples in the data, select the specified number of classification

As shown in _{u}_{3} group than to the members of the other groups. Therefore, _{u}_{3} group.