Due to the inability of the Global Positioning System (GPS) signals to penetrate through surfaces like roofs, walls, and other objects in indoor environments, numerous alternative methods for user positioning have been presented. Amongst those, the Wi-Fi fingerprinting method has gained considerable interest in Indoor Positioning Systems (IPS) as the need for line-of-sight measurements is minimal, and it achieves better efficiency in even complex indoor environments. Offline and online are the two phases of the fingerprinting method. Many researchers have highlighted the problems in the offline phase as it deals with huge datasets and validation of Fingerprints without pre-processing of data becomes a concern. Machine learning is used for the model training in the offline phase while the locations are estimated in the online phase. Many researchers have considered the concerns in the offline phase as it deals with huge datasets and validation of Fingerprints becomes an issue. Machine learning algorithms are a natural solution for winnowing through large datasets and determining the significant fragments of information for localization, creating precise models to predict an indoor location. Large training sets are a key for obtaining better results in machine learning problems. Therefore, an existing WLAN fingerprinting-based multi-story building location database has been used with 21049 samples including 19938 training and 1111 testing samples. The proposed model consists of mean and median filtering as pre-processing techniques applied to the database for enhancing the accuracy by mitigating the impact of environmental dispersion and investigated machine learning algorithms (^{2} area.

Finding the location of a person can be defined as localization [

During the last decade, there has been a massive development related to the domain of localization. The expansion of modern communication technologies has resulted in widespread positioning services. In outdoor environments, adequate services have been provided by GPS for positioning and localization. GPS works efficiently with LoS and is not suitable for indoor locations because the signals do not penetrate through hard surfaces. Their attenuation and dispersion are mostly caused by the rooftops, walls, and numerous other objects. Therefore, different localization systems for indoor environments have been proposed and developed by the researchers with their pros and cons. The fingerprinting method emerges as most prominent amongst them as it provides better precision, even the practical implementation is relatively arduous, however, the working is very simple or least complex as opposed to other localization techniques. Moreover, with fingerprinting no additional equipment is needed and it can be introduced using existing infrastructure [

The measurement of signal power from an Access Point (AP) to a receiver that can be sampled in the WLAN environment without any additional requirement can be defined as a Received Signal Strength Indicator (RSSI). RSSI-based fingerprint positioning method uses location-dependent features and the position is estimated by using these features. Offline and online are the two phases involved in the fingerprint-based indoor positioning. During the offline step, a database is developed that has fingerprints in it where RSSI values are collected from APs at predetermined Reference Points (RPs) over a fixed time. The fingerprints stored in the database consist of the reference point position and every single RSSI value collected from each access point in Decibel Milliwatts (dBm). To successfully locate the fingerprint, it is important to apply some pre-processing techniques on the RSSI readings because of the noise present in the environment. In the online phase, RSSI readings from APs at random RPs are taken by the mobile users in the form of queries. Their suitable location would ultimately be determined by the machine learning algorithms through fingerprint matching. Mean position error is attained as a result of the difference between the actual and predicted location of a user in motion [

The main contributions of this paper are as follows.

We have proposed a solution for indoor localization where mean and median filtering techniques are used as pre-processing techniques with the machine learning algorithms (

Large training sets are key for obtaining better results in machine learning problems. Therefore, we have used the largest database available online created by authors in [

Outliers were removed from the database after the proposed pre-processing techniques and then machine learning algorithms were used to estimate the position of a mobile user.

Moreover, to validate the superior performance of the proposed solution, a comparative analysis was done where machine learning algorithms were compared with one another with mean, median, and without filtering.

The results have shown that the proposed SVM with median filtering algorithm outperformed other investigated machine learning algorithms with mean and median filtering.

The remaining paper is organized as: Section 2 briefly discusses the related work. Section 3 explains the pre-processing techniques and machine learning algorithms. The proposed model is elaborated in Section 4 followed by the simulation results and it`s discussion in Sections 5 and 6. Section 7 summarizes the paper as conclusions.

The collection of wireless signal samples as fingerprints from nearby Wi-Fi APs is the most popular method for an IPS [

Author | Methods | Mean position error |
---|---|---|

Li et al. [ |
IFCM & W |
2.53 m |

Gu et al. [ |
Landmark graph-based fingerprint collection method | 1.5 m |

Abbas et al. [ |
Deep learning model | 1.21 m |

Hoang et al. [ |
RNN | 0.75 m |

Wang et al. [ |
Fisher-SSAE | 2.09 m |

Sun et al. [ |
MLP | 1.73 m |

Xue et al. [ |
RP selection method | 2.6 m |

Bai et al. [ |
SISAE & RNN | 1.60 m |

Ayesha et al. [ |
GA & EDOP | 1.149 m |

Li et al. [ |
FS |
1.7 m |

Li et al. [ |
W |
0.9 m |

Mean filtering is one of the pre-processing techniques that is used to minimize the noise in the RSSI database. It plays a vital role in the indoor positioning system as it takes an average of the recorded RSSI samples thus, minimizing the effect of environmental factors. Mean filtering is applied to the database before the online phase where machine learning algorithms are used [

Consider a case when an extreme value ‘y’ is added to the data set due to noise. Then

Median filtering is used to remove outliers from the recorded RSSI values. By applying median filtering, the RSSI values on a current reference point, from a particular access point are arranged in ascending order, and the median is calculated. If the number is odd then simply the central value is taken as median but if the number is even then the median is the average of the two central values.

If ‘n’ is odd, then median is given by:

For supervised machine learning, the

Let (_{1}, RSS_{2}, …, RSS_{N}_{x}^{th}^{th}^{th}_{m,n}^{th}^{th}

Weighted

Assuming that there are ^{th}_{i =} RSS_{i1}, RSS_{i2},…, RSS_{ij,}…, RSS_{iN,}_{un =} RSS_{1}, RSS_{2},…, RSS_{j},…, RSS_{N} _{un} _{i} _{i}_{un}_{t}_{i}^{th}

^{′}_{m} ^{th}_{l}_{i}^{th}_{i}_{i}_{i}_{i}_{i} _{i}_{i}_{i}

SVM has become popular due to its classification/regression effect, a relatively new multivariate statistical approach. A support-vector network that can be used for supervised learning models is a classic support SVM. This model is a binary, non-probabilistic classifier that can be used to classify the hyperplane that distinguishes the training set's classes. It is possible to evaluate the predicted mark of a previously unobserved data point by the side of the hyperplane it falls on [

SVM is an exceptional supervised learning model that can handle high-dimensional data sets effectively [

A linear support vector regression challenge may be designated as a restricted optimization problem defined as [

The proposed model suggests using mean and median filtering as pre-processing techniques with the investigated machine learning algorithms (

The pseudo-code of the

For the weighted

The pseudo-code of the FS

Representation, evaluation, and optimization are the three stages of the FSkNN model. In representation, a filtered training set prepared the fingerprints with known coordinates for localizing a mobile user with an unknown position. The testing set was used for iteratively evaluating the localization performance of adjusted coefficients obtained during different iterations. The sum of distance errors denoted by cost based on the testing set was calculated by _{i} and _{i} are the actual coordinates of the ith element in the testing set. The RPs present in the testing set was taken as anonymous locations in the evaluation process. Larger sum results in greater accuracy of the positioning. Thus, the optimal solution is obtained by using coefficients to make cost = 0. In optimization, new coefficients were searched by simulated annealing (SA) for obtaining better accuracy. Coefficients were changed randomly during each iteration. This whole process continued until the iteration number reached a pre-set maximum number.

Support Vector Regression (SVR) is quite distinct from other models of regression. To predict a continuous variable, it uses the SVM algorithm. SVR tries to fit the best line within a predefined or threshold error value while other linear regression models attempt to minimize the error between the predicted value and the actual value. The pseudo-code of the SVM algorithm for the proposed model is shown below:

The system parameters for an indoor simulation environment have been listed in ^{2} along with 933 RPs and 520 APs. The RSSI values are the negative integer values that are measured in dBm (−100 dBm is considered as a very weak signal while 0 dBm is an extremely good signal). It is very important to deploy access points at suitable positions so that the RSSI signal from every AP is received at the current RP. When no signal is received at the given RP then that signal is simply replaced by −100 dBm in the database. Each reading represents the real-world coordinates using three values, the longitude and latitude coordinates, and the building floor.

Parameters | Values |
---|---|

No. of RPs | 933 |

No. of APs | 520 |

Area | 108703 m^{2} |

Training samples | 19938 |

Testing samples | 1111 |

3 | |

3 | |

5 | |

0.1 |

Data were collected by more than 20 users using 25 different mobile devices [

To create the training set, all the closed spaces of the three buildings (offices, laboratories, classrooms) were deemed valuable locations for capturing. Then, for all the considered closed spaces, an RP is chosen inside each space and at least one RP outside each space (i.e., in corridors). At the center of the closed space, the point inside the space is located, while the outside point is located in front of the door. One RP was chosen for each entry if there are several accesses (door). A graphical example of the positioning and location of the RPs has been shown in

In this section, simulation results of the pre-processing techniques used with the investigated machine learning algorithms (

The standard metric for performance evaluation of IPS algorithms is localization accuracy and precision. Localization accuracy is defined as the mean position error diverged from actual location whereas the distribution of positioning errors is considered as positioning precision [

The cumulative distribution function (CDF) of

In

It can be seen in

The mean positioning error of various machine learning algorithms with and without filtering is shown in

Applying median filtering on

So, overall proposed SVM with median filtering algorithm gives us the best results in terms of mean position error as it outperforms other machine learning algorithms which are using both mean and median filtering as depicted in

Algorithms | No filtering | Mean filtering | Median filtering |

SVM | 1.4791 m | 1.1139 m | 0.7959 m |

FS |
2.6743 m | 2.2361 m | 1.5461 m |

W |
3.7602 m | 3.1511 m | 2.7604 m |

5.9638 m | 5.0896 m | 4.2581 m |

Determining the value of the ‘

It is obvious from

Algorithms | No filtering | Mean filtering | Median filtering |

SVM | 86.69% | 89.57% | 92.84% |

FS |
75.93% | 79.87% | 85.68% |

W |
66.15% | 71.64% | 74.85% |

54.19% | 57.25% | 61.67% |

From the above results and discussion, the SVM algorithm outperformed all the other variants of the proposed model as it provided better regularization and generalization capabilities by handling non-linear data efficiently [

In this paper, we have proposed an efficient solution for enhancing the positioning accuracy and efficiency of an IPS. Large training sets are key for obtaining better results in machine learning problems. Therefore, we have used the largest database available online created by authors in [

In the future, authors are interested in investigating the location of a mobile user through reinforcement learning algorithms. Moreover, the proposed method can also be applied to neural network algorithms like Dynamic Nearest Neighbor and decision tree algorithms like Random Forest for mitigating the impact of environmental factors. Spearman, Minkowski, Chebyshev, and Manhattan distances can also be used instead of Euclidean distance with the current matching algorithms to minimize the positioning errors. Even larger datasets can be used for improving the overall performance of the proposed model.

We are thankful to J. Torres-Sospedra for making the database publicly available to us.