The training and validation sets are composed of several vectors in a multidimensional feature space. These features represent the received signal strength indicator (RSSI) for each wireless access point (WAP) detected. Every one of these vectors corresponds to a position in the space, given by its longitude and latitude.

The probabilistic approach for wifi fingerprinting is based on the collection of several measurements of the RSSI for any perceived WAP at a particular location. This group of measurements is then fit to a normal (Gaussian) distribution for every WAP, so that the signal intensity distribution is determined by the mean and the standard deviation of the Gaussian fit.

Then, given a validation feature vector (i) from the validation set, the probabilistic fingerprinting algorithm selects the k most probable locations using the previously calculated training set probabilistic distributions. For the j-th WAP measurement (\(v_{i,j}\)) of the validation vector, the probability (\(p_{i,j}\)) to belong to the n-th training distribution determined by its mean (\(\mu_{n,j}\)) and standard deviation (\(\sigma_{n,j}\)) is:

\[\begin{aligned} p_{i,j} = \int_{v_{i,j}-\delta}^{v_{i,j}+\delta} {e^{-(x-\mu_{n,j})/2\sigma_{n,j} ^ 2} \over {\sigma_{n,j}\sqrt{2\pi}}}dx \end{aligned}\]

where \(\delta\) is a parameter to take into account the dimmension of the steps in which the RSSI is measured. The set of all probabilities \(p_{i,j}\) obtained for a given observation \(i\) express the similarity between the observation measurement and the training survey for a particular location. This application uses the sum of the probabilities as an evaluation of this similarity, but other functions may be used to determine it.

This particular implementation of the algorithm works as follows:

First, the training set is fitted to a Gaussian distribution, and for each WAP at a given position, the mean and standard deviation of the distribution are stored. To estimate the position of a validation measurement, the k more probable locations are calculated, and the predicted position for the observation is estimated as the weighted average of the selected positions. The weights are assigned for the most probable positions to contribute more to the average than the more least ones.


Let’s see how to implement the algorithm with an example. Generally, RSSI values are represented in a negative form, where the closer the value is to 0, the stronger the received signal is. For convenience, in this example RSSI is stored as a positive integer value ranging from 0 (no signal received) to 100.

The training set contains 50 measurements for 10 different locations. This is a summary of the signal values for the 500 observations of the set:

##       WAP1            WAP2            WAP3             WAP4       
##  Min.   : 0.00   Min.   : 0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:10.00   1st Qu.: 0.00   1st Qu.: 15.00   1st Qu.: 15.00  
##  Median :20.00   Median : 5.00   Median : 30.00   Median : 30.00  
##  Mean   :20.92   Mean   : 7.28   Mean   : 35.53   Mean   : 33.44  
##  3rd Qu.:30.00   3rd Qu.:10.00   3rd Qu.: 60.00   3rd Qu.: 55.00  
##  Max.   :80.00   Max.   :45.00   Max.   :100.00   Max.   :100.00  
##       WAP5            WAP6            WAP7            WAP8      
##  Min.   : 0.00   Min.   : 0.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.:10.00  
##  Median : 5.00   Median :10.00   Median :10.00   Median :20.00  
##  Mean   : 6.82   Mean   :14.82   Mean   :12.56   Mean   :21.48  
##  3rd Qu.:10.00   3rd Qu.:25.00   3rd Qu.:20.00   3rd Qu.:35.00  
##  Max.   :40.00   Max.   :65.00   Max.   :75.00   Max.   :85.00  
##       WAP9           WAP10      
##  Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 0.00   1st Qu.: 0.00  
##  Median : 0.00   Median : 0.00  
##  Mean   : 6.01   Mean   : 6.61  
##  3rd Qu.:10.00   3rd Qu.:10.00  
##  Max.   :55.00   Max.   :45.00

and this is the validation example:

## 1    0    0   60   50    0   40   15   20    0     0      1054     3180

This plot shows the position of the measurements for both training and validation sets.