Pairing W-Fi and Bluetooth MAC addresses through passive ... · Media Access Control (MAC) address....
Transcript of Pairing W-Fi and Bluetooth MAC addresses through passive ... · Media Access Control (MAC) address....
POLITECNICO DI MILANOSchool of Industrial and Information Engineering
MSc in Computer Science and Engineering
Pairing W-Fi and Bluetooth
MAC addresses through passive packets
capture
ANTLab
Advanced Network Technologies LABoratory
Supervisor: Prof. Alessandro Enrico Cesare Redondi
Master thesis by:
Edoardo Longo, ID 841677
Academic year 2016-2017
Abstract
Nowadays the majority of smart devices (e.g. smartphones, tablets, personal
computers) use wireless communication, especially Bluetooth and Wi-Fi.
These network interfaces are uniquely identified by a 48 bits name, called
Media Access Control (MAC) address. Since every device is identified by a
different Bluetooth and Wi-Fi MAC address, the MAC addresses analysis
provides useful statistical data as crowd density, travel time estimation and
indoor positioning. These two addresses are found in different broadcast
packets: the Wi-Fi MAC address is contained in the probe requests, the
Bluetooth one is visible during an inquiry scan or establishing a connection.
The goal of the thesis is pairing a Wi-Fi MAC address with a Bluetooth
MAC address. In particular, to understand how Wi-Fi and Bluetooth sig-
nals are related. In this thesis we want to propose and evaluate a system
composed by a sensor network of capturing devices and by algorithms that
are capable of pairing the Wi-Fi and the Bluetooth MAC addresses. The
conditions that influence the measurement accuracy are firstly studied, then
two experiments both in a controlled scenario and in a real scenario are per-
formed. We have shown that the algorithms are accurate enough to allow
the pairing. We also analyze a possible Bluetooth attack scenario using our
system.
Contents
1 Introduction 1
2 State of the Art 4
2.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Technical Overview and System Architecture 9
3.1 Wi-Fi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Passive Scanning . . . . . . . . . . . . . . . . . . . . . 10
3.1.2 Active Scanning . . . . . . . . . . . . . . . . . . . . . 10
3.1.3 Probe Request Structure . . . . . . . . . . . . . . . . . 11
3.2 Bluetooth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Bluetooth Connections . . . . . . . . . . . . . . . . . . 13
3.2.2 Discover a Bluetooth device . . . . . . . . . . . . . . . 15
3.2.3 Bluez . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.4 Inquiry with RSSI and hcitool RSSI . . . . . . . . . . 16
3.2.5 l2ping . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 MAC Address . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . 19
4 Experiments and Algorithms 23
4.1 Preliminary experiments . . . . . . . . . . . . . . . . . . . . . 24
4.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Home experiment parameters . . . . . . . . . . . . . . 32
4.2 Home experiment . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 RSSI conversion from Bluetooth to Wi-Fi . . . . . . . 40
4.3.3 RSSI conversion from Bluetooth and Wi-Fi to distance 40
4.3.4 Trilateration . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.5 Fingerprint . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.1 Top-k value . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 Adding anchors . . . . . . . . . . . . . . . . . . . . . . 50
4.4.3 Receiver Operating Characteristic . . . . . . . . . . . 52
5 Real Scenario Experiment 57
5.1 The environment . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 The devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 Top-k values . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.2 Receiver Operating Characteristic . . . . . . . . . . . 62
6 Blended attack scenario 65
6.1 Attack scenario . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Discover the Wi-Fi and infer the Bluetooth MAC ad-
dress . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.1 Denial of Service . . . . . . . . . . . . . . . . . . . . . 67
6.2.2 Battery Exhaustion Attack . . . . . . . . . . . . . . . 68
7 Conclusions 70
Bibliography 73
Chapter 1
Introduction
The use of smartphonea, tablets, laptops and other smart devices is spread-
ing more and more in everyday life. People are always connected and ev-
erything can be done remotely through smartphones. Connectivity is the
way through which these operations can be done. It is used to access to the
internet, to share files, to use mobile application, to make phone calls, to
play music, to use internet tethering and other useful features.
In order to carry out these operations nowadays the majority of smart-
phones, laptops and portable electronics devices use wireless communication,
especially Bluetooth and Wi-Fi. Bluetooth technology is useful when trans-
ferring information between two or more devices that are near each other
and speed is not a concern. It is best suited to low-bandwidth applications
like transferring sound data with telephones (i.e. with a Bluetooth headset)
or byte data with hand-held computers (transferring files) or keyboard and
mouse. Wi-Fi is suited for operating on full-scale networks. It enables a
faster connection, an high range from the base station and a good wireless
security. For these reasons Wi-Fi technology powers most home networks,
many business local area networks and public hotspot networks.
Every network adapter (Wi-Fi, Bluetooth, but also Ethernet or ZigBee) is
uniquely identified by a 48-bits name, called Media Access Control (MAC)
address. It is embedded into the network hardware during the manufactur-
ing process, or stored in firmware, and designed not to be modified. Hence,
every smart device has a couple of MAC addresses, one for Wi-Fi network
and one for Bluetooth, that uniquely identifies a device.
The goal of the thesis is pairing a Wi-Fi MAC address with a Bluetooth
MAC address. In particular, to understand how Wi-Fi and Bluetooth sig-
nals are related. Indeed, a Bluetooth and a Wi-Fi MAC address coming
from the same device cannot be immediately related to each other because
1
the two addresses are different.
A sensor network of capturing devices was implemented for this purpose. It
was composed by several Raspberry Pis (a single-board computer) to cap-
ture Wi-Fi and Bluetooth signals that are later analyzed by different off-line
algorithms. The results of the algorithms showed us the possibility to link
the Wi-Fi and the Bluetooth MAC’s addresses.
In order to link the MAC addresses we use Bluetooth connection param-
eters and Wi-Fi probe requests. Bluetooth allows two or more devices to
communicate with each other. To establish a connection between the de-
vices, the target MAC address must be known. The MAC address is found
using an inquiry scan. The scan shows various device details including its
MAC address, its name and the services it supports. In addition to this
information, the Bluetooth stack allows to discover some connection param-
eters useful to the thesis scope and to localize a device (i.e. RSSI, RX power
level, TPL, Link Quality ).
Wi-Fi interfaces need to be connected to a network in order to provide
connectivity. Every minute, smartphones search for the presence of Wi-Fi
networks to connect with [10]. This operation generates a traffic of probe
requests, a special network packet containing some useful information among
which device MAC address, Access Point (AP) MAC address, list of past
SSIDs and Received Signal Strength. This information is sent in broadcast
and can be easily captured by another device, in our case the network of
Raspberry Pis.
The privacy issue is crucial because the data explained above reveals a lot of
information regarding the device owner: from the device name is possible to
discover the device model or the owner’s name; from the RSSI, the location
can be inferred; the past SSIDs list shows the names of the previous Wi-Fi
networks to which the device owner was connected and from this informa-
tion social analysis can be done [2].
Collection data from capturing wireless technologies which communicate
based on MAC address standards have been recently applied [1]. The prob-
lems is that the Bluetooth and the Wi-Fi MAC addresses are completely
unrelated, therefore it is difficult to do a cross-study between the two tech-
nology and in particular treat the data as if it has the same source.
To cover this gap, the thesis aims to link the Bluetooth and the Wi-Fi MAC
addresses using Wi-Fi probes and Bluetooth connection parameters. The
possibility of pairing two different MAC addresses opens to different impli-
cations. It can create a more accurate indoor localization system, because
the use of two technologies can increase the precision of the position using
2
different approaches. It is also a malicious attackers weapon. The malicious
hacker can commit blended attacks on both two interfaces creating denial
of service (DoS), battery drain attacks or exploit other vulnerabilities. The
pair process can also operate a sort of de-randomization (replace the address
with a fake one) of the Wi-Fi MAC address. If we know that the random
Wi-Fi MAC is related to a true Bluetooth MAC we can infer the real Wi-
Fi address and break the MAC address randomization performed by some
vendors.
During this thesis, in order to pair the two MAC addresses, a wireless sen-
sor network and different algorithms are implemented. The sensor network
is composed by up to 6 Raspberry Pis that are in charge of capturing the
Bluetooth and the Wi-Fi signals, in particular the Received Signal Strength
Indicator (RSSI). We create five different algorithms to link the MAC ad-
dresses. The scope of the algorithms is to link a Bluetooth signal coming
from a device to a Wi-Fi signal coming from the same device. For this
purpose the system uses two datasets (one regarding Bluetooth and one re-
garding Wi-Fi) of devices RSSI captured by our sensor network; notice that
these two sets are completely disjoint.
The results obtained proved that the linking algorithms we introduced in
this thesis have an high grade of accuracy in both the scenarios we tested.
The structure of this thesis is the following. In Chapter 2 we discuss a
number of works that are related to ours and that inspired this study. In
Chapter 3 we explain the technical details of Wi-Fi and Bluetooth, together
with the model of the implemented system. In Chapter 4 and Chapter 5
the experiments are presented. In Chapter 4 we first show the preliminary
experiment and the study of the Wi-Fi and Bluetooth parameters. Then we
explain the home experiment and the details of the implemented algorithms
along with the obtained results. In Chapter 5 we explain the experiment per-
formed in ANTlab and the obtained results. Chapter 6 presents a possible
and a realistic attack scenarios using the acquired knowledge. In Chapter 7,
we conclude by summarizing the purposes and the final evaluations of this
thesis. Some suggestions for future works are also proposed.
3
Chapter 2
State of the Art
This chapter describes the related works about Wi-Fi and Bluetooth. To
date, in literature, a crossed analysis between Wi-Fi and Bluetooth MAC
addresses is not present, but a lot of studies about the two technologies were
done.
There are three main thematic areas:
• localization;
• privacy;
• attacks.
2.1 Localization
Tracking people by Bluetooth or Wi-Fi signals has been discussed previ-
ously in literature. These are usually used in indoor localization, because
in buildings the Global Positioning System (GPS) is not suitable due to the
presence of roofs and walls.
Density estimation in crowded mass events has been studied using Bluetooth
scans or Wi-Fi from collaborating smartphones inside the crowd. Zhu et al.
[13] developed a crowd-sourcing localization system that uses both Wi-Fi
scene analysis and Bluetooth beacons. The system uses Wi-Fi fingerprint
(the RSSI). Bluetooth beacons are only used to share the location of a device
and populate a signal map.
An interest study was performed in a German airport. Using the ground-
truth provided by the security check process, Schauer er al. [22] discussing
the quality and the feasibility of pedestrian flow estimations for both Wi-
Fi and Bluetooth. They used inquiry scans and probes collection to cap-
ture respectively Bluetooth and Wi-Fi MAC addresses. Their results have
4
shown Wi-Fi is a good estimator of the pedestrian flow and Bluetooth is not
adequate for a reliable flow estimation system. Probably the inaccuracy of
Bluetooth is due to the use of inquiry scan. This method allows to locate
visible devices only.
Another confirmation that the Wi-Fi allows for a good indoor location comes
from Ruiz et al. [21]. They localize devices in an hospital using the Access
Points to capture the traffic. Using the trilateration algorithm their mean
error is 15 meters.
As we can see, the localization using Wi-Fi is possible and already stud-
ied. Bluetooth needs a separate discussion.
Naini et al. [19] conducted an experiment where ten attendees of an open-
air music festival acted as a Bluetooth scanner. The selected attendees are
equipped with a mobile phone programmed to scan Bluetooth devices and
capture Bluetooth devices having their Bluetooth visibility turned on. By
comparing their estimated result with ground truth information provided at
the entrances of the festival, Naini shows that the total population can be
estimated with a surprisingly low error (1.26% in this experiment).
Another similar experiment is performed by Weppner [26] and by Bullock
[4] that confirm the possibility of using Bluetooth as crowd indicator.
More interesting for our research is the discussion on Bluetooth signal pa-
rameters with respect to localization made by Hossain et al. [12]. According
to their analysis and experimental results, RSSI and Transmit Power Level
turn out to be poor candidates for localization. On the other hand, RX
Received Power Level correlates nicely with distance, which makes it the
most desirable Bluetooth signal parameter to be used in location systems.
In our opinion, they discard RSSI due to a methodological error. In fact
that they use a Class 1 dongle to get the RSSI of a device within 18 meters.
As we will see below, class 1 devices can range up to 100 meters. So they
always stay inside the GRPR getting a value of 0 for the RSSI.
The confirmation that is possible to find out the relationship of RX-power
level with distance was done by Subhan et al. [23]. They demonstrated that
the conversion between RX-power level and RSSI is possible if the upper
and lower bounds for GRPR are known. Using the trilateration and the
fingerprint combined with a gradient filter in the measurement stage they
minimized the average error to 2.67 meter. A similar result is obtained by
Chai [6]. He uses a pre-processed BLE RSSI, Kalman filtering and triangu-
lation algorithm to calculate the location of a mobile device. Experiment
results show that his algorithm achieves positioning accuracy of 0.2∼0.5m.
5
From these researches, it is evident that the distance estimation is impossi-
ble with the RSSI raw and is possible with the RSSI average data [14].
As we can see from the previous research, Wi-Fi is a strong technology
for the localization. Bluetooth research has incongruous results, but the
majority confirms that it is possible to use it for indoor localization pur-
pose.
2.2 Privacy
Bluetooth and Wi-Fi present, not only benefits like localization, but also
critical challenges like privacy. Collection data from capturing wireless tech-
nologies needs the exchange of MAC addresses, a unique identifier for the
technology and it can be associated to a specific person. The MAC address
is easily visible in Wi-Fi probes and in Bluetooth signals because it is sent
without encryption [25] and in broadcast. Some mobile devices send probe
requests as often as 55 times per hour, thus revealing their unique MAC
address at high frequency [10].
These problems allow the use of MAC addresses scanning to deliver signifi-
cant information from spatiotemporal dynamics of people movements [1]. A
mobile phone also broadcasts the list of Wi-Fi network saved on the device
(SSID). This list can be used to classify people, to extract social connec-
tions among the smartphone owners and to uncover the underlying social
network of the participants in a venue. It is also possible to understand the
international nature of an event and the density of foreign participants or
to analyze the travel frequency of a person.
Another interesting topic is the distribution of the smartphone vendors
across events and the analysis of the expected socioeconomic background
of the participants. Starting from this assumption, Barbera et al. [2] de-
veloped an automated methodology to derive the underlying relationship
graphs between the users in each scenario. They also performed language
detection on the broadcast SSIDs and exploited the vendor ID to show how
the probes can directly reflect the sociological aspects of the people involved
in each scenario, including nationality, age, and socioeconomic status.
This information can be manipulated using WiGLE1. It allows to discover
where a Wi-Fi network is located starting from its name. Using the MAC
address and the probe requests it is also possible to discover the name of a
1https://wigle.net/
6
person or the vendor of a device.
Bluetooth is also affected by privacy issues. During an inquiry scan it is
possible to discover personal information like device name (that sometimes
corresponds to the owner’s name) and device model.
Mei et al. developed a travel time estimation method based on Bluetooth
MAC address [17]. This allows a possible attacker to understand the move-
ment of a target. Tracking people movement is also possible using Wi-Fi.
Cunche [8] presents methods that, given an individual of interest, allow to
identify the MAC address of its Wi-Fi device.
These privacy issues are mitigated by the Wi-Fi MAC address randomiza-
tion. In order to impede tracking and leverage privacy issues some vendors
implement in their devices MAC address randomization. Under some condi-
tions (i.e. screen turned off) the broadcast MAC address is substituted with
a fake address. This technique is adopted only by a few vendors (e.g. Apple,
Motorola and other few Android). Nevertheless Martin et al. [15] showed a
method that can be used to track 100% of devices using randomization, re-
gardless of manufacturer, by taking advantage of a previously unknown flaw
found in the way existing wireless chipsets handle low-level control frames.
As regards Bluetooth, [9] suggest that Bluetooth address randomization
would not be implemented as it would adversely affect existing implementa-
tion. The Bluetooth defense mechanism is the non-visible mode. Indeed a
device can have the Bluetooth interface turned on, but not be visible. This
allows the device to remain hidden to an inquiry scan. Recent studies [7]
demonstrated that using Ubertooth One, a low-cost open source Bluetooth
development platform is possible to discover up to ten times as many hidden
devices respect a normal inquiry scan.
2.3 Attacks
The issues previously discussed allows a malicious attacker to exploit the
presented vulnerabilities in different ways. The most trivial attack is the
stalker attack. It consists in following a person at a reasonable distance
with a monitor device to understand his unique MAC address [8]. In addi-
tion, Wi-Fi routers can be easily turned into Wi-Fi tracking devices through
software modification [20] and this can be used to follow a person’s path.
A common attack is the Denial of Service on battery-powered mobile de-
7
vices. The attack can be performed on Wi-Fi, Bluetooth or with a blended
approach. Moyers et al. [18] demonstrate that these attacks can accelerate
battery depletion by as much as 18.5%. For Wi-Fi ping flood, ACK flood and
SYN flood are used. For Bluetooth l2ping flood, bluesmack flood, bluespam
flood, blueper flood are used. The two types of attacks can be blended with
each other.
Bluetooth have several security issues during its various implementations
of the standard stack since late 2003. The most commons are [5]:
• BlueSnarf which allows an attacker to access the vulnerable device’s
phone book and calendar without authentication. A recently upgraded
version of this attack gives the attacker full read-write access.
• Bluejacking which allows an attacker to access to the phone book
and also to access the files on the device using the principle of the
hijacking.
• BlueBug favours the access to the cell phone’s set of commands,
which lets an aggressor use the phone’s services, including placing
outgoing calls, sending, receiving, or deleting SMSs, diverting calls,
and so on.
• BlueBump takes advantage of a weakness in the handling of Blue-
tooth link keys, giving devices that are no longer authorized the ability
to access services as they were still paired to the target device. It can
lead to data theft or to the abuse of mobile Internet connectivity ser-
vices.
8
Chapter 3
Technical Overview and
System Architecture
3.1 Wi-Fi
Wi-Fi is a technology for wireless local area networking with devices based
on the IEEE 802.11 standards. Wi-Fi operates at 2.4 GHz (802.11b/g)
over 11 channels in USA and over 13 channels in Europe, three of which are
not overlapping (1, 6, 11). In figure 3.1 the way the channels are arranged
is shown. They may only be separated by 5MHz but the spread spectrum
uses 25MHz centred on each channel. The use of different non-overlapping
channels permits to reduce the collision between Wi-Fi packets.
Figure 3.1: Graphical representation of Wireless LAN (Wi-Fi) channels in 2.4 GHz band
9
Recently Wi-Fi supports also 5 GHz (802.11n) with 21 channels with higher
capacity, but a shorter range compared to 2.4 GHz. Modern device can
switch between 2.4 GHz and 5 GHz, using a technique called band steering,
depending on traffic demand.
When a smartphone or a laptop want to access to the internet through
Wi-Fi, it needs to connect to an Access Point (AP).
So, every device with Wi-Fi interface turned on, regularly broadcasts some
Wi-Fi probe requests in order to advertise its presence and actively discover
Wi-Fi access points in proximity. This mechanism is called active scan and
permits devices to have a list of nearby access points.
IEEE 802.11 define another mechanism to discover Wi-Fi AP: a passive
mechanism, in which APs periodically advertise their presence to mobile
devices using beacons.
3.1.1 Passive Scanning
When a device performs passive scanning, it starts to listen over the 11 Wi-
Fi channels hopping periodically from one to another and passively detect
nearby APs. When a beacon is captured, the mobile device responds with
a Wi-Fi association frame.
The beacons contain network configuration parameters, such as the Service
Set Identifier (SSID), the type of encryption and the supported data rates.
The beacon interval is not a fixed number: most APs set an interval every
100ms, but it depends on the hardware specification.
The main disadvantage of the passive scanning is listening on all the eleven
channels. This operation is time consuming and do not ensure all the beacon
are captured.
3.1.2 Active Scanning
During the active scanning, the mobile device stimulates its nearby access
points sending probe requests. The probe packet includes the device unique
identifier, the device supported standards, the probe sequence number (SN)
and other fields. The probe can be directed to all the APs (broadcast) or to
a specific access point by indicating its SSID.
Active scanning is particularly helpful in scenarios where a mobile device
roams across APs. It is also faster and less energy consuming than passive
scanning because less packets are lost.
Also, active scanning is the only method to connect to a hidden network
indicating the access point SSID.
10
3.1.3 Probe Request Structure
Figure number 3.2 represents the packet structure of a probe request. The
interesting fields are:
• Frame Ctrl: the type of the frame, usually 0x00;
• Address 1: the receiver MAC address, usually broadcast (FF:FF:FF:FF:FF:FF);
• Address 2: the sender MAC address, the device MAC address;
• Address 3: the Access Point MAC address (BSSID);
• Sequence Control: the sequence number (SN) that represent a single
probe request;
• Frame Body: the list of the mobile devices SSID;
• FCS: a redundant check code.
Figure 3.2: Probe request packet structure
In the frame body, a list containing the Wi-Fi APs on which the device
was connected is present. This allows a faster connection between device
and access point, on the other hand it helps understanding the origin of the
device and the places his owner visited.
In table number 3.1 is shown as a credible example of probe request. It
follows the IEEE 802.11 standard so it is not encrypted.
In our case, a device with MAC address 14:10:9F:d5:04:01 is broadcasting
a probe request with SSID polimi-protected and sequence number equal
to 12.
Table 3.1: Example of Wi-Fi probe requests
Frame Ctrl Duration Destination Source BSSID SN SSID
... ... ff:ff:ff:ff:ff:ff 14:10:9F:d5:04:01 ff:ff:ff:ff:ff:ff 12 polimi protected
... ... ff:ff:ff:ff:ff:ff 88:30:8a:49:db:0d ff:ff:ff:ff:ff:ff 245 null
11
Probe request number
The number of probe requests sent by a mobile phone is very variable among
devices. On average some mobile devices send probe requests as often as 55
times per hour, but they might broadcast about 2000 probes per hour [10].
The frequency of the probe request depends on:
• Wi-Fi chipset: the vendor can set up different parameters depending
on the company policies;
• Device operating system: the OS version and the device settings
can affect the number of probes. For example, a fast speed connection
setting can send an high number of probes or an energy saving mode
can emit a low number of probes;
• Frequency of screen unlocking: unlock the screen stimulates the
probes activity, this allows a faster device connection;
• Number of applications running on the device: the more is
the number of applications and programs that use Wi-Fi, the more
the device is forced to send probe requests to maintain the services
connected.
3.2 Bluetooth
Bluetooth (IEEE 802.15.1 ) is a wireless technology. It is the standard for
exchanging data over short distances from fixed and mobile devices, and to
build personal area networks (PANs). Bluetooth was originated in 1994,
when Jaap Haartsen, an electro technician employed at Ericsson, developed
it in cooperation with Sven Mattisson. The name is based on the Danish
word Blatand, the tenth-century king of Denmark and Norway.
The purpose of Bluetooth is to replace cables with short-range and cheap
radio connection that favours communication between mobile devices and
peripherals.
Bluetooth is open and royalty-free and, thanks to this, it is widely used
for short-range wireless communication in WPAN (Wireless Personal Area
Network) situations. It operates in the universally unlicensed (but not un-
regulated) Industrial, Scientific and Medical (ISM) band at 2.4 GHz. In the
available frequency band, 79 sub-frequencies are used to transmit data, hop-
ping from a frequency to another 1600 times per second in a pseudo random
way.
12
The range of communication of Bluetooth and the maximum transmission
power are determined by their Class. As we can see in 3.2 Class 1 radios
has the longest range of transmission (100 meters), instead Class 3 has a
range of up to 1 meter. In this research, the used devices are mostly belong-
ing to Class 2 (e.g. smart phones, tablets, laptops), their internal chipset
range is about 10 meters. Bluetooth architecture is based on master/slave
Table 3.2: Bluetooth power classes
Class Max Trasmission Power Range
Class 1 100 mW (20dBm) 100m
Class 2 2.5 mW (4dBm) 10m
Class 3 1 mW (0dBm) 1m
model. A single master device can be connected with up to seven different
slaves devices to generate a network, called piconet. The master shares his
clock with the slaves; it also coordinates and manages the connection in the
piconet and sends/requests data to the slaves.
3.2.1 Bluetooth Connections
Bluetooth connections can be of two types: Synchronous Connection Ori-
ented (SCO) or Asynchronous ConnectionLess (ACL). SCO is a real-time
band, it is used mainly for Voice Communication (or data and voice com-
bined). ACL is used exclusively to transport data (i.e. audio/video) and it
is the most used type of connection both during the daily use both during
this research. ACL is the base connection that are established between a
master and a slave, indeed each device receives a default ACL logical trans-
port when it joins the piconet. The connection must be explicitly set up
and accepted between two devices before packets can be transferred [11].
Directly above the ACL is the Logical Link Control and Adaptation Pro-
tocol (L2CAP) layer. This is a packet-based layer, its primary tasks are:
transporting data for higher layer protocols; providing packet sequencing, re-
assembling and segmentation; providing one-way transmission management
of multicast data to a group of other Bluetooth devices and allowing Quality
of Service(QoS) for higher layer. Once established, an L2CAP connection
remains open until it is explicitly closed or the Link Supervision Time Out
(LSTO) expires.
13
L2CAP actually serves as the transport protocol for RFCOMM, so every
RFCOMM connection is actually encapsulated within an L2CAP connec-
tion.
RFCOMM (Radio Frequency Communications) layer is the reliable stream-
based protocol (similar to TCP) used by most Bluetooth applications. It is
used directly by many telephony related profiles as a carrier for AT com-
mands indeed represents the type of connection most people mean by Blue-
tooth connection. RFCOMM emulates RS-232 serial ports and it is necessary
for OBEX transport layer because OBEX needs serial transportation.
RFCOMM is bounded to OBEX (OBject EXchange). OBEX is the commu-
nication session-level protocol that facilitates the data exchange (e.g. object
push profile (OPP), file transfer profile (FTP), vCard, basic imaging, basic
printing, phonebook access, etc.).
In the figure 3.3 the Bluetooth stack architecture is presented. From the
bottom to the top we find ACL and SCO, the Host Controller Interface,
L2CAP, RFCOMM and on the top OBEX.
Figure 3.3: Bluetooth protocol layer
14
3.2.2 Discover a Bluetooth device
In order to start Bluetooth connections between devices, the target device
must be turned on and be visible. The device can be also turned on, but
not be visible; in this case the pairing process is possible only if the target
address is known.
To discovery visible devices, an inquiry mode has been defined. Basically,
a device which wants to set up a Bluetooth connection with another one,
sends out an inquiry packet and the other visible devices listening for them
can answer.
A single Bluetooth inquiry scan process can last until 10.24 seconds [1] and,
at the end of the scan, zero or more devices can be discovered.
The inquiry scan, called Inquiry with RSSI, contains information about:
• Device name: the name that the owner assigns to the device;
• Device profile: the type of the device (e.g.: phone, laptop, Bluetooth
headset, etc.);
• Supported services: the Bluetooth services provided by the device
(e.g.: Advanced Audio Distribution Profile (A2DP), Audio Video Re-
mote Control Profile (AVRCP), Basic Imaging Profile (BIP);
• Unique MAC address: a physical address assigned uniquely to each
device;
• Timestamp: the date and the time of the discovery;
• Received Signal Strength Indicator (RSSI): the measurement of
the power present in a received radio signal.
3.2.3 Bluez
In the Linux kernel-based family operating system, the Bluetooth stack is
managed by Bluez. The most useful command of Bluez is hcitool. Hcitool
(Host Controller Interface Tool) is used to configure Bluetooth connections
and send some special command to the Bluetooth devices. The main func-
tionalities are to discover (inquire a remote device), add, and manage devices
on the piconet; to configure controller properties; to set up, manage and re-
lease logical transports and links. In particular, hcitool provide access to
the RSSI, the LQ and the TPL of a connected device, these are three fun-
damental connection status parameters.
To obtain the previously mentioned values an active connection between the
master device and the slave is needed.
15
Received Signal Strength Indicator (RSSI): According to the Blue-
tooth Core Specification, the RSSI is an 8-bit signed integer that indicates
the difference between the received power level and the Golden Receiver
Power Range (GRPR).
Using the command hcitool rssi <bdaddr> a value between +15dBm
and -35dBm is obtained.
A positive RSSI value indicates how many dB the RSSI is above the upper
limit; a negative value indicates how many dB the RSSI is below the lower
limit. The value zero indicates that the RSSI is inside the Golden Receive
Power Range [3].
The Golden Receive Power Range indicates a zone in which a raw bit error
rate is better than 0.1 % (BER <103).
Transmit Power Level (TPL): TPL is an 8-bit signed integer which
specifies the Bluetooth module’s maximum transmit power level (in dBm)
[12]. Every Bluetooth class has a fixed value and it does not change dur-
ing a Bluetooth connection. For example, Class 2 devices has +4 dBm as
maximum power, Class 3 has 0 dBm and Class 1 has +20 dBm.
Link Quality (LQ): Link Quality is a value from 0 to 255, which repre-
sents the quality of the link between two devices. The higher the value, the
better the link quality is. For most Bluetooth modules, it is derived from
the average bit error rate (BER) seen at the receiver and it is constantly
updated as packets are received.
3.2.4 Inquiry with RSSI and hcitool RSSI
As explained in section 3.2.3, using hcitool of Bluez we can obtain two
different types of RSSI values. The first value is the RSSI obtained from
the inquiry scan (inqury with RSSI ) and identify the power level of the
Bluetooth target device that the receiver sees; the second one is the RSSI
obtained directly from a connected device.
To be clearer, from now on, the value obtained from the inquiry scan will be
called RX. On the other hand, the value obtained from a connected device
will be simply called RSSI.
These two values are strictly related with a linear dependence, indeed they
represents the same value. The RX is the real power level, instead the RSSI
is the power level minus the GRPR. RSSI can be converted to RX power
level if the Upper and Lower threshold values of the GRPR are known. The
relation is further analyzed in section 4.1.1.
16
3.2.5 l2ping
The Linux Bluetooth stack also allows to ping a Bluetooth device.
Ping is an utility used to test the reachability of an host, in our case a Blue-
tooth machine. It measures the round-trip time for messages sent from the
originating host to a destination that are echoed back to the source.
For Bluetooth the command l2ping is used. L2ping sends a L2CAP echo
request to the Bluetooth MAC address [16] and waits for an echo response
from the target device. L2CAP echo requests are directly analogous to the
familiar ICMP ping packet in IP. The ping feature is useful to understand
if a Bluetooth device is in a particular range. If so, l2ping utility starts to
send several echo requests to the target. If not, an error message is shown.
In particular, if the echo request is successful l2ping (fig. 3.4) starts to ping
the Bluetooth target device. In the default mode these fields are shown:
• The size of the single packet of the echo request (default 44 bytes);
• The MAC address of the target;
• The progressive id of the packets;
• The echo Round-Trip Time (RTT) in milliseconds.
Figure 3.4: l2ping utility in Kali Linux
17
The use of l2ping permits to create a basic L2CAP connection that almost
universally authorisation-free (explained in section 3.2.1). Although the
resultant connections are limited in use for communications (they support
little more than low-level testing) they are sufficient to run successfully RSSI,
LQ, or TPL Linux commands.
3.3 MAC Address
MAC address is the acronym of Media Access Control Address. It is an
unique identifier of a IEEE 802 network interface. Some examples of IEEE
802 standards are: ethernet, Wi-Fi, ZigBee, FDDI (Fiber Distributed Data
Interface) and Bluetooth.
In our case MAC address is a fundamental information because it identifies
uniquely a particular network interface of the device. Considering that a
smartphone is equipped with Wi-Fi and Bluetooth chipset, a device is char-
acterized by two MAC addresses: one for the Wi-Fi interface and one for
the Bluetooth interface.
In both cases the structure is the same: a 12 digits (48 bits or 6 bytes)
address, usually written in the following three formats:
• MM:MM:MM:SS:SS:SS
• MM-MM-MM-SS-SS-SS
• MMM.MMM.SSS.SSS
The leftmost 6 digits (24 bits) called prefix is associated with the adapter
manufacturer, called OUI (Organizationally Unique Identifier). Each ven-
dor registers and obtains MAC prefixes as assigned by the IEEE. Vendors
often possess many prefix numbers associated with their different products.
Discover on the web the vendor from the prefix is quite easy. Whireshark
provides a way to look up OUIs and other MAC address prefixes1.
The rightmost digits of a MAC address represent an identification num-
ber for the specific device. It is called Network Interface Controller (NIC).
Among all devices manufactured with the same vendor prefix, each is given
its own unique 24 bits number.
1https://www.wireshark.org/tools/oui-lookup.html
18
A real example of MAC address of the same device is:
• Wi-Fi address: F4:E3:FB:85:53:1D
• Bluetooth address: F4:E3:FB:A5:66:D8
In the example above the the vendors digits are the same, but often, the
same device has two completely different Wi-Fi and Bluetooth prefixes.
Privacy implications
Due to the fact that the MAC address identifies uniquely a device, this can
be used to identify a person. As explained in Section 2 this can rise a great
deal of privacy issues. Indeed, as explained above, both Wi-Fi and Blue-
tooth addresses are easy to obtain: the first one is sent in clear with the
probe request and the Bluetooth address is visible during the inquiry scan
but the two addresses are different.
As explain in section 2.2, to protect mobile devices from this issue, some
vendors perform a technique known as MAC address randomization. This
replaces the number that uniquely identifies a device’s Wi-Fi hardware with
randomly generated values.
3.4 System Architecture
During this thesis a tool capable of capturing Wi-Fi probes and of collect-
ing Bluetooth parameters was implemented. We used the terms Bluetooth
signals or parameters to denote all the status parameters of a Bluetooth
connection together with any other signal strength values made available in
Bluetooth Core Specification.
To capture probe requests and signals, depending on the test, up to 6 Rasp-
berry Pis 3 equipped with a NETGEAR N150 Wireless USB Adapter were
used. The Raspberry Pis running Raspbian Jessy version 4.9.24 and all
of them are synchronized with NTP server. They are remotely controlled
through SSH (Secure Shell) over the Wi-Fi network. This facilitated the
experimenter to have complete control over the whole system from remote.
The Raspberry Pis run a Python script. Besides the ease with which Python
manipulates data and variables, this programming language was also used
in view of the immediacy in launching Linux bash scripts.
19
When the user starts the program (fig. 3.5) it can set two options: the
time of capture (-t option) and the name of the capture (-n option).
The program consists in a main function that creates three different threads.
The first one gathers Wi-Fi probes; the second one starts to inquiry the
Bluetooth devices; the last one collects RSSI, TPL and LQ. As soon as a
new client is found, the script outputs in real time a message containing the
MAC address of the device; in the meantime the main process stores in a
dictionary all the data regarding the clients.
Figure 3.5: Developed script running on the Raspberry Pi through SSH in Kali Linux
Wi-Fi probes collection To capture Wi-Fi probes Aircrack-ng was used.
Aircrack-ng is an open-source suite of tools, written in C language, to assess
WiFi network security. In particular, the command airodump-ng <wlan
interface> is used for packet capturing of raw 802.11 frames. For this
purpose, the source code of Airodump was modified to show the sequence
number and the timestamp of the captured packets.
In order to run Airodump-ng the Wi-Fi interface must be in monitor mode,
the NETGEAR dongles are used for this purpose. Monitor mode allows the
Raspberry Pi to monitor all traffic received from the wireless network and
to listen the probes.
Inquiry with RSSI Bluetooth RX power level is obtained through hcitool
spinq. It allows to inquire automatically other Bluetooth devices endlessly.
In parallel, hcidump retrieves the raw data and the python script parses the
useful information.
20
Other Bluetooth parameters Received signal strength indicator (RSSI),
link quality (LQ) and transmit power level (TPL) are three fundamental
parameters about Bluetooth connection. In order to obtain this data, a
connection is required.
As explained in section 3.2.5 during the ping process a L2CAP connection
between the Raspberry Pi and the target device is established. Thanks to
it, it is possible to obtain RSSI, LQ and TPL.
The used commands were:
• l2ping <mac address> to ping the Bluetooth MAC address
• hcitool rssi <mac address> to gather the RSSI
• hcitool tpl <mac address> to gather the Transmit Power Level
• hcitool lq <mac address> to gather the Link Quality
When the thread in charge of capture Bluetooth parameters starts, it imme-
diately runs a bash script properly written to ensure a continuous Bluetooth
connection with the target device using l2ping. After the connection is set,
the thread sends the three hcitool commands every second at the same time,
it parses the results and stores them in a dictionary.
The capturing process ends when a timer set by the user expire or when
the user voluntarily stops the script. The program creates three .csv files,
one for each category explained before. The csv files contain the MAC ad-
dress of the device, the timestamp and all the useful data regarding Wi-Fi or
Bluetooth. Automatically, using mysqlimport command, the csv are loaded
in a MySQL database running on a external server.
The database is composed by three tables.
• The Wi-Fi table. In each row a probe request is stored. It contains:
– the probe sequence number (SN) ;
– the time and the data of capture (timestamp);
– the device Wi-Fi MAC address (mac address);
– the list of past SSID (SSID);
– the RSSI of the probe request (RSSI)
– the ID of the Raspberry Pi that capture the probe (Raspberry Pi
number).
• The Bluetooth inquiry table. In each row a inquiry of a device is
stored. It contains:
21
– the time and the data of capture (timestamp);
– the device Bluetooth MAC address (mac address);
– the RX power level of the inquiry (RX);
– the ID of the Raspberry Pi that capture the inquiry (Raspberry
Pi number).
• The Bluetooth parameters table. In each row capture of the three
fundamental parameters is stored. It contains:
– the time and the data of capture (timestamp);
– the device Bluetooth MAC address (mac address);
– the RSSI of the device (RSSI);
– the Link Quality of the device (LQ);
– the Transmit Power Level of the device (TPL);
– the echo round-trip time of the device (echo time);
– the ID of the Raspberry Pi that capture the parameters (Rasp-
berry Pi number).
22
Chapter 4
Experiments and Algorithms
If a smartphone Wi-Fi is turned on, it emits a number of probe requests. If
the Bluetooth is also turned on, we can stimulate the smartphone to emit
some Bluetooth signals. Both the probes and the Bluetooth signals are iden-
tified by two different MAC addresses based on the wireless communication
that we are using.
Pair the Wi-Fi MAC address and the Bluetooth MAC addresses allows to
uniquely identify a mobile device. Indeed these two signals derive from the
same device but they are not immediately related. As we will see below,
the founded values are completely different, but they represent the same
information: the distance between two devices.
The distance between the two mobile devices can be expressed in different
ways:
• Time of arrival (ToA): the estimation of the distance is obtained by
measuring the signal propagation time. The Time of Flight is Tf = dc .
d is the distance between the nodes and c is the speed propagation (c
= 299792, 458km/s);
• Time Difference of Arrival (TDoA): in TDoA the receivers deduce
the distance from instant differences and propagation speeds;
• Angle of Arrival (AoA): In AoA there are directional antennas to
estimate the signal arrival angle and deduce the distance;
• Received Signal Strength Indicator (RSSI): RSSI uses the signal
attenuation to infer the distance, indeed a signal attenuates during
propagation.
23
Line-Of-Sight (LOS) propagation is a characteristic of signals propagation
which means waves that travel in a direct path from the source to the re-
ceiver. In closed environments it is difficult to have a straight line between a
sender and a receiver. The signal is affected to multipath, that is the prop-
agation of the signal through different path. It is caused by atmospheric
ducting, reflection and refraction caused by walls, body, windows, etc... .
These issues make techniques like ToA, TDoA or AoA inaccurate. So, in
our experiments we choose the RSSI based approach.
It is important to remind that we are not only focused on the absolute
distance between a sender and a receiver. We want to determine if the
Wi-Fi and the Bluetooth signals have the same path loss to establish if the
device is the same.
In this section, are first described the experimental test-bed and the de-
vices used during the experiment. The experiments are mainly two: the
analysis of the device’s Wi-Fi and Bluetooth parameters and the matching
experiment. The first analysis allow us to understand the best choice in
term of parameters. These values are used during the matching experiment.
Successively the linking algorithms and the methodology are described. In
the end there is the interpretation of the results.
4.1 Preliminary experiments
In this experiment we have captured the Wi-Fi probes (containing the Wi-Fi
RSSI) and the Bluetooth signals (RSSI, TPL, LQ, echo round trip time).
The goals are to understand the correlation between distance and the signals
originating from the target devices and the relation between Wi-Fi and
Bluetooth. Indeed our main scope is not to find the absolute position of
a device, but to comprehend if the Bluetooth and the Wi-Fi signals have
origin from the same device.
The environment
The preliminary experiments were held in a home environment with a di-
mension of 9.50 meters x 4.50 meters and an area of 42.75 m2. During the
first phase of the experiment, the home environment was chosen because it
was important to have an isolated environment and no other devices that
could cause any noise. In addition, it was also crucial to have a direct path
between the studied devices.
24
The devices
The target devices used during this experiment were a LG-E450 with An-
droid 4.1.2 (Ultra Slim custom ROM) and an iPad with iOs 10.
A Raspberry Pi 3 was used to capture Wi-Fi probes and Bluetooth signals.
The Wi-Fi module was a NETGEAR W150 and the Bluetooth module was
the internal one. The presence of the Raspberry Pi’s case does not influence
the strength of the signals.
Execution
The Raspberry Pi was placed in a fixed point, while the target devices
were moved to different distances every 10 minutes. The path between the
Raspberry Pi and the devices has a straight line without any obstacle in the
middle.
In the end, our script made the average of all the values to obtain a single
value for each position.
4.1.1 Results
As explained before, we want to understand if the collected parameters are
in relation with the distance and if they are in relation among each others.
It is also important to comprehend how we can infer the distance from a
RSSI value and to study the other variables to understand if they are useful
in our case.
Bluetooth
The Bluetooth signals analyzed during this experiment are the connection
based RSSI, the TPL (Transmit Power Level), the LQ (Link Quality), the
echo Round Trip Time (obtained from ping) and the RX power level (ob-
tained from inquiry with RSSI).
25
From figure 4.1, the following observations can be made:
-30
-20
-10
0
0.0 2.5 5.0 7.5 10.0
Distance
Blu
etoot
hR
SSI
DeviceiPad
LG
a) Distance Vs Bluetooth RSSI
0
50
100
150
200
250
0.0 2.5 5.0 7.5 10.0
Distance
LQ
DeviceiPad
LG
b) Distance vs Link Quality
3.50
3.75
4.00
4.25
4.50
0.0 2.5 5.0 7.5 10.0
Distance
TP
L
DeviceiPad
LG
c) Distance vs TPL
120
150
180
210
0.0 2.5 5.0 7.5 10.0
Distance
Ech
oR
TT
DeviceiPad
LG
d) Distance vs Echo RTT
Figure 4.1: Bluetooth signals behavior from 0 to 10 meters
Connection based RSSI: The Received Signal Strength Indicator strongly
depends to the distance. It starts from 0 dBm, which means that the target
device is inside the GRPR and then decrease. As we can note from the
graph (4.1.a), the iPad chipset is more powerful than the LG one. Indeed it
is easy to imagine that after ten meters the LG lose the connection (-35 dBm
is the maximum for RSSI value), instead the iPad can move apart and be
26
connected yet. So, the RSSI value strongly depends from the device model.
Finally, the curves follows a logarithmic trend as all the powers of the sig-
nals. This is true, but not so evident as we imagine. However is evident
that is possible to infer the distance starting from RSSI.
LQ: The link quality, as specification said, start from 255 if the connection
is strong and goes down until 0 when the connection is poor. In our exper-
iment the LQ values poorly correlates with the distance. When the devices
are near and distant from the Raspberry Pi the value is respectively high
and low, but the intermediate values are not meaningful. For these reasons,
for our measurement LQ is discarded.
TPL: Fig. 4.1.c shows a horizontal straight line for Transmit Power Level
values, indeed this value does not change during a Bluetooth connection.
The iPad and LG lines are overlapping in +4 dBm. This fact makes impos-
sible use TPL in our calculation.
Echo Round Trip Time: Echo RTT is obtained pinging the target de-
vice. It measures the Round-Trip Time (RTT) for messages sent from the
originating host to a destination computer that are echoed back to the
source.
We have imagined the more is the distance and the more is the round-trip
time, but this supposition is not completely true. Indeed, the iPad has a
RTT of approximately 120ms during all the phases of the experiment; the
LG RTT decrease until 4 meters and then rapidly increase. In figure 4.1.d
the trends of the round trip time of echo requests are shown. Also the Echo
RTT is discarded due to its poor correlation with the distance.
27
RX Power Level The Raspberry Pi Bluetooth chipset provide absolute
RX power level through inquiry, as opposed to the relative RSSI values sug-
gested by Bluetooth specification that depends on the GRPR range. Fig. 4.2
certainly establishes the RX power level shows a great correlation with dis-
tance. Also in this case, there are evident differences between the LG RX
power level and the iPad RX.
-90
-80
-70
-60
-50
0.0 2.5 5.0 7.5 10.0
Distance
Blu
etooth
RX
DeviceiPad
LG
Figure 4.2: Distance vs Bluetooth RX power level
Bluetooth RSSI vs Bluetooth RX Power Level
As we have seen before, the two principal Bluetooth signals parameters are
the RSSI and the RX Power Level. They represent the same value, but the
first one includes the presence of the GRPR.
In figure 4.3 the relation between the two signals is shown. Their dependence
is linear, so it possible to easily convert the RX power level in RSSI and vice
versa.
28
-100
-80
-60
-50 -40 -30 -20 -10 0
RSSI
RX
a) LG
-100
-80
-60
-50 -40 -30 -20 -10 0
RSSI
RX
b) iPad
Figure 4.3: Bluetooth RSSI vs Bluetooth RX of two different devices
In the following experiments we decide to use only the RSSI. Whilst the
RX seems more precise, the RSSI collects many more values than RX. This
allows to be more accurate and to reduce experiments time, thinking also
of a real scenario. Indeed, as we can see in figure 4.4, during a ten minutes
measurement, the number of RSSI values are almost ten times more than
the RX values obtained from the inquiry. The RSSI can be request every
seconds (or more), while the RX is affected to inquiry time that is around
10.24 milliseconds.
29
0
100
200
300
400
500
0 1 2 3 4 5 6 7 8 9 10
Meters
Fre
qu
ency
Type RX RSSI
a) LG
0
100
200
300
400
500
0 1 2 3 4 5 6 7 8 9 10
Meters
Fre
qu
ency
Type RX RSSI
b) iPad
Figure 4.4: Number of Bluetooth RSSI and Bluetooth RX of two different devices
during a ten minutes measurement
In addition, the RSSI can be also obtained for non-visible devices, while
the RX is only for the visible ones. As explained before (section 3.2.5) it is
possible to establish a connection with a device using ping. The ping process
is also possible if the device has the invisible Bluetooth setting. This feature
allow us to use the hcitool rssi, hcitool tpl and hcitool lq commands
because a l2cap connection is established.
In a real world scenario, obtain the unseen devices values is a big advantage
because the majority of the devices have the Bluetooth set to non-visible.
30
Wi-Fi
The last preliminary experiment is the relation between Wi-Fi and distance.
As said previously, the Wi-Fi probes have a field containing the RSSI. After
capturing it and averaging the data on the basis of the distance, the graph
in figure 4.5 was been created.
-80
-60
-40
-20
0.0 2.5 5.0 7.5 10.0
Distance
Wi-
Fi
RSSI
DeviceiPad
LG
Figure 4.5: Distance vs Wi-Fi RSSI
The Wi-Fi RSSI follows a logarithmic distribution depending on the dis-
tance. It is quite obvious due to the fact that RSSI represents the power of
a signal in logarithmic scale. Therefore, as we imagine, the Wi-Fi RSSI is a
good indicator of the distance of a device.
The distribution of the Wi-Fi RSSI is rather similar to the distribution of
the Bluetooth RX power, but the signal strength is higher in Wi-Fi. This is
due to the fact that the Wi-Fi range is greater than the one of Bluetooth,
which is only around 10 meters for a Class 2 device.
31
4.1.2 Home experiment parameters
In the previous sections, we have analyzed which parameters fit better with
the distance. The choices has been Wi-Fi RSSI, Bluetooth RSSI and Blue-
tooth RX power. As regards Bluetooth only the RSSI was chosen due to
the fact its high number of collectible values and the possibility of capturing
data also in non-visible mode.
Hence, in the following experiment we will only consider Bluetooth RSSI
and Wi-Fi RSSI.
In the experiment above, we understand that different devices have different
RSSI-distance logarithmic curve. This is due to the different internal chipset
of the devices. In figures 4.6 and 4.7 the different logarithmic regression of
five different smartphones and tablets are shown.
As regards Wi-Fi, the logarithmic regressions are very close each other. The
probes Wi-Fi power level are not vastly different between various devices.
-80
-70
-60
-50
2.5 5.0 7.5 10.0
Distance
Wi-
Fi
RS
SI
Devices S3 S Adv LG S TAB iPad
Figure 4.6: Wi-Fi RSSI logarithmic regression of the target devices
32
Instead, there are a high dissimilarity between devices in term of Bluetooth
RSSI (Figure 4.7). In the following algorithms we use a different line for
each device. For example, the LG (cyan line) is the less powerful in term of
Bluetooth RSSI and also in term of Wi-Fi RSSI.
-30
-20
-10
0
2.5 5.0 7.5 10.0
Distance
Blu
etoot
hR
SS
I
Devices S3 S Adv LG S TAB iPad
Figure 4.7: Bluetooth RSSI logarithmic regression of the target devices
It is also important understand the relation between Wi-Fi and Bluetooth
RSSI. It is plotted in the following graph (figure 4.8). The dependence
between Wi-Fi and Bluetooth is linear and it is possible to convert the
Bluetooth in Wi-Fi and vice versa. Although some curves are similar, also
in this case every device model has a different characteristic curve trend, so
a model for each device is created.
This relation is fundamental in the matching of Wi-Fi and Bluetooth MAC
addresses.
33
-100
-80
-60
0 10 20 30
Bluetooth RSSI
Wi-
Fi
RS
SI
Devices S3 S Adv LG S TAB iPad
Figure 4.8: Bluetooth RSSI vs Wi-Fi RSSI of the target devices
4.2 Home experiment
Starting from the previous data and considerations, now we can explain the
real MAC address coupling experiment.
During this test we have collected the Bluetooth RSSI and the Wi-Fi probes
of 15 placed randomly devices. The devices positions are known and they
are kept in the same position during all the experiment’s time. In this way
we obtain two different RSSI signals (Bluetooth and Wi-Fi) of each device
at the same time and in the same place. This signals are not related because
they come from two different chipset. The goal is to link two MAC addresses,
one coming from Wi-Fi and the other one coming from Bluetooth. It allows
us to identify uniquely a device. Linking the MAC addresses means under-
stand if the Wi-Fi and the Bluetooth RSSI have origin from the same device.
34
To link the two RSSI we create various algorithms and we test them to
understand which algorithm is better as matching one.
The environment
Also this phase was held in an home environment with a dimension of 9.50
meters x 4.50 meters and an area of 42.75 m2. The home environment was
chosen because it was important to have an isolated environment and no
other devices that could cause noise. It was also crucial to have a direct
path between the devices.
In the figure 4.9 the planimetry of the room is shown. It has been divided
in 50 squares of side 0.9 meters and an area of 8.1 m2.
1 2
34
a) 4 Raspberry Pis
1 2
5
6
34
b) 6 Raspberry Pis
Figure 4.9: Room planimetry with different Raspberry Pis configuration
The scenario choice is fundamental. There are two possibilities: anchor
based or anchor free. In the anchor based scenario only the anchor nodes
(in our case the Raspberry Pis) know the position. The other nodes (in our
case the devices) position are derived through the anchors. This coordinate
system is absolute. In the anchor free scenario no node knows his position.
A relative coordinate system is obtained.
35
Our choice was the anchor based scenario, because only the Raspberry Pis
are able to catch the probes and manipulate the data. Indeed, the target
devices are passive.
The devices
In the environment we placed in a random way five different target devices.
Every device is moved in three different random positions in order to simu-
late the presence of 15 different devices (figure 4.11).
The used devices are:
• a LG-E450 with Android 4.1.2 (Ultra Slim ROM). Device number
1,6,11
• a Samsung S advance with Android 4.4.4 (CyanogenMOD 11). Device
number 2,7,12
• a Samsung S3 mini with Android 5.1.1 (CyanogenMOD 12). Device
number 3,8,13
• a Samsung Galaxy Tab S2 with Android 7.0. Device number 4,9,14
• an iPad with iOs 10. Device number 5,10,15
Figure 4.10: Photos of the capturing phase.
As anchors we used 4 Raspberry Pis, with the NETGEAR dongle, in the
four corners of the room (4.9.a). In the second phase two more Raspberry
Pis were added (4.9.b).
The six anchors configuration allows to cover all the zone of the room and
to have different capturing angles.
36
1
1
2
3
76
4
5
14 11
10
8
12
13
15
9
2
5
6
34
Figure 4.11: Room planimetry. In green the six Raspberry Pis, in red the fifteen devices.
Execution
During the experiment the Raspberry Pis stayed in a fixed point and the
five devices were placed in three different positions every 10 minutes. The
script was run in order to capture the signals.
At the end of the capturing phase the script deletes the corrupted data and
generates a Wi-Fi dataset and a Bluetooth one.
The datasets are composed of:
• a column for each Raspberry Pi (4 or 6 columns, depending on the
configuration) containing the RSSI value captured by the respectively
Raspberry Pi;
• a MAC address column (Wi-Fi or Bluetooth, depending on the dataset)
37
indicating the MAC address device;
• a timestamp column indicating the time of capture.
Each row represents a vector of values captured in the same instant (same
timestamp). In this way two datasets with n rows and 6 columns (in case
of 4 Raspberry Pis configuration) was created. One dataset is for the Wi-Fi
and one dataset is for the Bluetooth.
After this process, we calculate the average of the RSSI of each device for
each Raspberry Pi in the two datasets. As a result, we have two different
datasets (Bluetooth and Wi-Fi) with 15 lines, one for each device. So a
MAC address is identified by a vector of four (or six) averaged RSSI, one
for each Raspberry Pis. In table 4.1 is represented an example of Bluetooth
dataset. There are 4 columns with the RSSI and one column with the MAC
address. In the first line there is the device number 1, the LG device. Its
average RSSI from Raspberry Pi number 1 is -15.8, RSSI from Raspberry Pi
number 2 is -22 and so on. The Wi-Fi dataset (4.2) has the same structure
Table 4.1: Bluetooth Dataset
device rasp1 rasp2 rasp3 rasp4 mac address
1 -15.8034 -22.2419 -33.4667 -34.9384 88:C9:D0:1F:3E:48
2 -0.8027 -3.2450 -15.1118 -21.0058 D8:90:E8:32:D3:3E
3 -6.6547 0.0000 -19.0269 -25.2993 C8:14:79:A3:93:2E
... ... ... ... ... ...
15 -24.4265 -14.3408 -12.2055 0.1200 DC:A9:04:4F:D9:36
of the Bluetooth dataset. The each line of a dataset correspond to the same
line of the other dataset.
Table 4.2: Wi-Fi Dataset
device rasp1 rasp2 rasp3 rasp4 mac address
1 -67.5986 -71.1032 -83.5181 -91.7776 C4:43:8F:B3:0A:F7
2 -44.5576 -58.2103 -75.6285 -84.0279 D8:90:E8:29:AD:3F
3 -65.5698 -57.9744 -73.1944 -84.8966 C8:14:79:31:3C:2A
... ... ... ... ... ...
15 -72.8848 -70.7097 -63.5971 -51.9083 DC:A9:04:4F:D9:35
38
4.3 Algorithms
After the capturing phase and the manipulation of the datasets, we focused
on the matching algorithms. Various approaches were tested, the best ones
are:
1. normalization;
2. RSSI conversion from Bluetooth to Wi-Fi;
3. RSSI conversion from Bluetooth/Wi-Fi to distance
4. trilateration;
5. fingerprint.
The goal of these algorithms is pair a line of the Wi-Fi dataset with one of
the Bluetooth dataset or vice versa. These algorithms find the Wi-Fi vector
more similar to a Bluetooth vector. The found vector is presumably the
correspondent Bluetooth MAC address.
Euclidean Distance In order to find the most similar vector we use the
euclidean distance. It is the straight-line distance between two, or more,
points in euclidean space. In our case, we have 4 points, one for each Rasp-
berry Pi. The euclidean distance is calculated as follows:
d(w, b) =√
(w1 − b1)2 + (w2 − b2)2 + ...+ (wi − bi)2 + ...+ (wn − bn)2
(4.1)
where wi is the ith Wi-Fi RSSI and bi is is the ith Bluetooth RSSI, with
i = 1, 2, ..., n and n = 4 or n = 6 depending on the configuration.
d(w, b) is close to 0 if the two lines are very similar and became greater if
the lines are different.
Every time we use an algorithm, at the end of the process, we compare
each Wi-Fi vector with each Bluetooth vector using the euclidean distance.
It allows to create a list of Bluetooth addresses for each Wi-Fi address. An
increasing order list based on the euclidean distance is created. The value
closest to zero is the first of the list, the greatest value is the last one. So,
on top of list there are the Bluetooth MAC addresses that are more similar
to the Wi-Fi MAC address. Presumably on the top of the Wi-Fi list there
is its Bluetooth corresponding address and then we can link them.
39
4.3.1 Normalization
The simplest algorithm we have implemented is the normalization of each
line.
The normalization is a process that adjust values measured on different
scales to a common scale, e.g. between 0 and 1.
Both the Wi-Fi RSSI and the Bluetooth one represent the strength of the
respective signal, but they are on different scales (i.e. as we saw in section
4.1 the Wi-Fi RSSI is more powerful than the Bluetooth one). Thanks to
normalization we can take back these two values on the same 0 and 1 scale.
We have normalized separately each line of the two datasets to standardize
Wi-Fi and Bluetooth data for the same device.
The normalization formula is:
zi =xi −min(x)
max(x) −min(x)(4.2)
where x = (x1, ..., xn) and zi is the ith normalized data.
After normalizing the data, we obtain two datasets of values between 0 and
1 representing the Wi-Fi RSSI and the Bluetooth RSSI in a common scale.
Since the two vectors (Wi-Fi and Bluetooth) represent the same distance,
normalizing the vectors should get very similar values. So it is possible
compare the data and link the MAC addresses.
4.3.2 RSSI conversion from Bluetooth to Wi-Fi
In section 4.1.2 we talked about the linear relation between the Wi-Fi RSSI
and the Bluetooth RSSI. This relation was used to convert the Bluetooth
values of the Bluetooth dataset in Wi-Fi values. As mentioned above, every
device has a different regression line, so five different functions were used
during the conversion.
Thanks to that, we have obtained two Wi-Fi datasets (the real one and the
fake one). The last part of the algorithm is to compare each line of the
datasets using the euclidean distance and link the addresses.
This operation can also done converting Wi-Fi in Bluetooth.
4.3.3 RSSI conversion from Bluetooth and Wi-Fi to distance
Starting from the dependence between RSSI (Bluetooth or Wi-Fi) and the
distance we elaborated this algorithm. The idea is to convert the RSSI of
40
the two datasets in distance, obtaining two distance datasets (Wi-Fi and
Bluetooth) and then, using the euclidean distance, match the line that are
more similar.
In order to convert the RSSI in distance is possible to use the following
formula:
RSSI = p0 − 10αlogd
d0(4.3)
• RSSI: the RSSI value (path loss);
• p0: the received power from the node when the distance is d0 (RSSI
in d0);
• d: distance sender-receiver
• α: a path loss constant. It assumes values between 1 and 3, depending
on the environment
The precision of the distance strongly depends on the values that are used
in the previous formula. The correct calculation of α and p0 is fundamental
in order to obtain an accurate distance value.
α is determined by the environment in which the devices are located and can
be found using the inverse formula of the RSSI (usually it is a value between
1 and 3). p0, that is the power level measured at 1 meter, was determined
in an empirical way during the previous tests.
As we can see, using the formula (4.3) is quite complicated due to the esti-
mation of the previous parameters. Furthermore, in our case the distance
calculation was not so accurate as we could expect.
So, to convert the RSSI in distance the curves obtained in section 4.1.2
were used. We create a different regression for each device and for each
technology used (Wi-Fi or Bluetooth). It is useful due to the differences of
power among the devices. Of course, the chosen regression was the logarith-
mic one (we analyzed the behaviour in the previous sections).
At the end of the process we obtain two datasets containing distances be-
tween the devices and the anchors. These two datasets represent the distance
obtained from Wi-Fi and the distance obtained from the Bluetooth. The
last step is to compare the distances vectors using the euclidean distance.
41
4.3.4 Trilateration
Trilateration is trigonometric approach for tracking mobile objects consider-
ing the concept of circles. Since the device knows distance from a minimum
of three known Raspberry Pis, trilateration is performed to determine its
coordinates. The position is obtained intersecting the circles created by the
distance between devices and anchors; the point of intersection is the coor-
dinate of the target device.
In our case, we have 4 or more anchors and not always the intersections are
in a single point. In this case the problem of trilateration can be approached
from an optimisation point of view. We want to find the point P = (x, y)
that provides us with the best approximation to the actual position P. For
this purpose we use the Ordinary Least Squares (OLS) method:
minimizen∑
i=1
[di − dist(P , Li)]2
N(4.4)
Where:
• di is the distance between the anchor and the target device;
• P is the coordinate of the device;
• Li is the coordinate of the ith anchor.
• N is the number of anchors.
The device coordinates are obtained minimizing the error.
We apply the ordinary least square method to the Wi-Fi dataset and the
Bluetooth dataset in order to find the coordinates of each device through
Wi-Fi and the coordinates through Bluetooth.
The coordinates of the devices are obtained starting from the coordinates
of the anchors. The top left anchor is (0,0), the top right anchor is (4.5, 0),
the bottom left is (0, 9.5) and the bottom right anchor is (4.5, 9.5).
A pair of coordinates (one for Bluetooth and one for Wi-Fi) for each device
is obtained, hence may be also possible to locate the device. In this case we
are not interested to the position of a device, but only to the relative values
between Wi-Fi and Bluetooth.
The last step is to compare the two types of coordinates to find the more
similar couple. The Bluetooth coordinates and the Wi-Fi coordinates that
are nearest each other are named as a single device and the MAC addresses
are linked.
42
4.3.5 Fingerprint
Fingerprint is one of the most popular method for indoor object tracking.
Wi-Fi probe requests and Bluetooth signals located in a certain area create
an unique fingerprint that is used for the localization.
The fingerprinting based positioning systems are carried out in two phases:
off-line and on-line.
First one is the off-line phase, during this phase the system is calibrated.
The first step is to divide the location in squared grids. The grid dimension
choice is fundamental to obtain a good measurement of the fingerprint. It is
useless to use a dense grid because it is hard to locate Wi-Fi and Bluetooth
with the accuracy of centimeters; but it is also useless to use a sparse grid
because no significant results would be obtained.
In our test we choose to divide the room in fifty squares with a side of 0.9
meters and an area of 0.81 m2.
1
1
2
3
76
4
5
14 11
10
8
12
13
15
9
2
5
6
34
Figure 4.12: Fingerprint grid. In blue the center of the cells
43
The next step is the collection of the fingerprints and the calibration of
each cell. The Raspeberry Pis were used in the previous configuration, four
anchors in the angles and two anchors in the middle (as figure 4.9.b). As
fingerprint target devices we used the LG, the Samsung S Advance and the
Samsung S3 mini.
The devices are placed in the middle of each cell in order to capture the Wi-
Fi fingerprint and the Bluetooth fingerprint. The cell calibration position is
identified in the figure number 4.12 by a small blue point. At the end of the
process, based on the device, the RSSI values are compressed to obtain ten
measurements for each cells.
Eight datasets are created:
• Wi-Fi LG fingerprint dataset and Bluetooth LG fingerprint dataset
• Wi-Fi Samsung S Advance fingerprint dataset and Bluetooth Samsung
S Advance fingerprint dataset
• Wi-Fi Samsung S3 fingerprint dataset and Bluetooth Samsung S3 fin-
gerprint dataset
• Wi-Fi average fingerprint dataset and Bluetooth average fingerprint
dataset
The datasets are device specific because, as we saw previously, the devices
have different behavior.
It would have been logical create the fingerprint also for the other two de-
vices, the Samsung Tab and the iPad. We have chosen to leave them out to
test if it is possible to link a device regardless the device model. Indeed the
last two datasets are composed by the average of the previous fingerprint
datasets.
In table 4.3 the Wi-Fi average fingerprint dataset is shown. The vector
obtained of the RSSI values at a cell is called the location fingerprint of that
cell. All the vectors create a fingerprint Wi-Fi dataset and a fingerprint
Bluetooth dataset. The datasets are 7 columns and 500 rows, ten row for
each cell. As we can see this operation is very time consuming. This is a
great drawback of the fingerprint method.
The second part is called the on-line phase. During this phase the pre-
viously created datasets are used to determine the cell in which the device
is located. For this purpose some machine learning algorithm are used, in
particular K-Nearest Neighbors (k-NN). Due to the fact that the devices are
44
Table 4.3: Wi-Fi Average Fingerprint Dataset
cell rasp1 rasp2 rasp3 rasp4 rasp5 rasp6
1.1 -53.0611 -64.9880 -81.5163 -87.0331 -69.6519 -71.1136
1.1 -52.2466 -64.8562 -82.3245 -86.3631 -69.9400 -71.4375
1.1 -52.5101 -65.3720 -82.6396 -87.1128 -69.3515 -72.0071
... ... ... ... ... ... ...
1.1 -52.6337 -65.0909 -82.0393 -86.9924 -70.1663 -72.5507
1.2 -59.6698 -60.6080 -77.7671 -87.2319 -68.1927 -67.8045
1.2 -59.5273 -59.8640 -77.4314 -87.0188 -67.3188 -67.2139
... ... ... ... ... ... ...
1.3 -71.6366 -45.9241 -86.5242 -80.6321 -70.1093 -79.6348
... ... ... ... ... ... ...
9.5 -74.0613 -83.958 -70.4495 -75.1549 -66.5281 -63.7448
not always in the middle of the cell a variation to the algorithm is done.
Instead of find the cell we find the coordinates of the device.
45
To find the coordinates the following operations are done:
• Step 1: for each target device find the n most similar cells called
candidates. The candidates are selected using the euclidean distance,
hence the n candidates are the n RSSI vectors closest to the target
device. This is a sort of k-NN, but the majority vote between the k
selected items is not performed.
Hence each candidate has a coordinate representing the center of the
cell C(xi, yi) and a distance di to the target device, with i = 1, 2, ..., n.
• Step 2: A weight for each candidate is computed. The weights are:
wi =1
(di)2(4.5)
• Step 3: The sum of the weights wi is normalized to 1, so the new
weights wi are calculated:
wi =win∑
i=1wi
(4.6)
• Step 4: The position of the target device (x, y) is calculated in the
following manner:
(x, y) =
n∑i=1
wi · (xi, yi) (4.7)
The previous four steps are done to find the coordinate of the Bluetooth
and of the Wi-Fi of a single device.
The last step is linking the Wi-Fi coordinates with the Bluetooth coordinates
and checking which are the two most similar coordinates using the euclidean
distance.
46
4.4 Results
The problem is linking a Wi-Fi MAC address and a Bluetooth one. In par-
ticular find which Wi-Fi vector is more similar to a Bluetooth vector and
vice versa.
In the following sections the term accuracy is used as the degree of correct-
ness an algorithm. So it is the number of MAC addresses correctly linked
over the total number of devices.
To link two devices we use the euclidean distance. For each Wi-Fi MAC
address we created a ordered list of Bluetooth MAC addresses from the
most similar to the most different.
This method has allowed us to use a top-k value approach.
4.4.1 Top-k value
For each target MAC address, the ordered list of possible MAC addresses is
15 lines long (15 is the number of devices). The list is ordered based on the
proximity between the vectors.
Top-k approach means that we select the first k MAC addresses of the or-
dered list and we decide that the correct MAC address is inside that k values.
In this way we do not know exactly what is the correct MAC address, but
we create k possibles candidates for the target MAC address.
This approach allows to not exclude some MAC addresses that for any rea-
son are not on the top of the list.
We identify three breakpoints (the k values):
• Top 1
• Top 3
• Top 5
A particular case of top-k is when k = 1. This means that we pick the most
similar value and we decide that value is the correct MAC address. In top
3 and top 5 we chose the first 3 or 5 MAC addresses as possible MAC address.
In figure 4.13 the percentage of the correct MAC addresses inside the k
values is shown. These percentage values identify a 4 Raspberry Pis sce-
nario.
47
fing avg 5
fing 5
trilateration
conv dist
conv WiFi to BT
norm
0% 25% 50% 75% 100%
Accuracy
Alg
orit
hm
Top 5 3 1
Figure 4.13: Algorithms accuracy percentages of the top-k value approaches with 4
Raspberry Pis
The algorithm that performs better in term of top-5 values is the conversion
from Wi-Fi and Bluetooth to distance. The accuracy is 87%, this means
that the correct MAC address is inside the nearest five devices 13 times up
15.
We can imagine that the conversion of the RSSI to distance performs well,
because both Wi-Fi RSSI and Bluetooth RSSI are in a strong relation to
the distance. Also, the conversion models are very accurate because we use
a different trend for each device.
The conversion from RSSI to distance is very precise using the Top-5 ap-
proach, but it is only 40% in top 1.
A good algorithm for the top-1 method is the conversion from Bluetooth to
48
Wi-Fi. This algorithm allows to pair correctly the 53% of the devices. This
result shows the strong relation between the Wi-Fi RSSI and the Bluetooth
RSSI as we saw in figure 4.8.
A good trade-off between accuracy and cost of the algorithm is the nor-
malization. It does not need a phase of pre-computation of the regression as
the conversion algorithms nor a minimization of the errors like the trilatera-
tion. This algorithm is very fast and cheap. We obtain satisfactory results:
33% in top1, 67% in top3 and 80% in top 5, only 7 percentage points less
than the best algorithm. Normalization can be used in unknown scenario,
when the model of the devices are unknown and we cannot perform a pre-
liminary phase to study the RSSI regressions.
As regards fingerprint we tested different approaches:
• using the average fingerprint dataset for all the devices, called average
fingerprint ;
• using the specific device fingerprint dataset for LG, Samsung S Ad-
vance and Samsung S3 and the average fingerprint dataset for the
other two devices (that ones without a specific fingerprint dataset),
simply called fingerprint.
For both approaches, as explained in section 4.3.5, we have set the n value.
n=1 means that the center of the cell is used and no cell adjustment is done.
Increasing the n refines the position of the device especially since not all the
devices are placed in the middle of a cell. We have tested the algorithm
with n=1,2,3,4,5,7. The best results are obtained with n=5. In figure 4.13
the levels of accuracy of fingerprint and average fingerprint with n=5 are
shown. They quite are similar, this means that using a dataset of average
fingerprint allow us to use the fingerprint algorithm with different types of
unknown devices.
Analyzing the devices positions we understand that was difficult to match
the devices placed in the middle of the room. In the following table the
percentage of times that a device is correctly linked using the different al-
gorithms are shown. Devices number 1,2,3,7,12,14 and 15 are the ones with
an high percentage. That means that they are often linked properly. In-
stead, devices number 4,8,10,13 are the worst in this respect. The values in
table 4.4 may depend from two factors: the device position or the device
model. From the table is evident that the Samsung S Advance (id: 2,7,12) is
49
Table 4.4: Percentage of exact pairing. In bold the top values are highlighted.
Id Device Top1 Top3 Top5 Position
1 LG 0,70 0,83 0,89 Top Left
2 S Adv 0,41 0,83 0,89 Top Left
3 S3 0,31 0,68 0,87 Top Right
4 S TAB 0,37 0,52 0,79 Center Left
5 iPad 0,08 0,27 0,50 Center
6 LG 0,27 0,47 0,62 Top-Center Left
7 S Adv 0,60 0,85 0,93 Top-Center Right
8 S3 0,02 0,12 0,27 Bottom-Center Left
9 S TAB 0,06 0,37 0,83 Center
10 iPad 0,06 0,35 0,50 Center Right
11 LG 0,20 0,64 0,77 Center Right
12 S Adv 0,43 0,77 0,89 Bottom Right
13 S3 0,12 0,37 0,52 Bottom
14 S TAB 0,47 0,83 0,95 Center
15 iPad 0,52 0,77 0,87 Bottom Left
a trustworthy device and the S3 is an untrustworthy one. The device model
does not affect too much the correct pairing, also because we use different
model for different device.
The position highly affects the accuracy instead. The devices placed in the
corners of the room have an high degree of corrects matching and the de-
vices in the center have worst results. This happen because the devices in
the center of the room are equidistant from all the anchors, so all the RSSI
in the vector are similar. Hence they are confused with a nearby device. To
fix this problem we decide to add two more Raspberry Pis.
4.4.2 Adding anchors
The previous results (section 4.4.1) refers to a four Raspberry Pis scenario.
In order to increase the accuracy of the algorithms were added two more
anchors. In the 4 Raspberry Pis scenario the density of anchors was one
anchor every 10,7 m2. Adding two more anchors we achieve a density of one
anchor every 7 m2.
The two supplementary Raspberry Pis were placed in the middle of the
room, as in figure 4.9.b. We have chosen this configuration to capture the
variations of the distance of the devices placed in the center of the room.
50
The results have proven our assumption. All algorithms showed an accuracy
increase. Using the conversion to distance we obtained 100% of accuracy in
the top 5 approach. The only exception was the conversion from Bluetooth
to Wi-Fi for which the same results were obtained.
In any case good results have been achieved: in top 1 method the mean
increase of percentage has been the 9%. The top 3 have shown an average
10% increase and the algorithm average increase of top 5 has been 7%.
The best algorithm has been reconfirmed the conversion from Bluetooth/Wi-
Fi to distance. The results are excellent: 67% of accuracy using the top 1
approach and 100% of accuracy using top 5.
As we can see from figure 4.14, the dissimilarity between the algorithms
is the same between a 4 Raspberry Pis scenario and the 6 Raspberry Pis
scenario.
fing avg 5
fing 5
trilateration
conv dist
conv WiFi to BT
norm
0% 25% 50% 75% 100%
Accuracy
Alg
orit
hm
Top 5 3 1
Figure 4.14: Algorithms accuracy percentages of the top-k value approaches with 6
Raspberry Pis
Another interesting consideration. The increase of accuracy adding anchors
seems to be a linear function. We test this behavior using only 3 Raspberry
Pis and using the normalization algorithm. The results are show in the
51
following figure (4.15):
0
25
50
75
100
2 3 4 5 6 7
Number of Raspberry Pis
Acc
ura
cy
Top 1 3 5
Figure 4.15: Increase of accuracy of the normalization algorithm
From the figure we can easily see that if we add more anchors the accuracy
will increase. In this case we suppose that with 8 anchors we can reach
100% using top 5 approach. Hence an increase of the number of the anchors
increase the accuracy of the system.
4.4.3 Receiver Operating Characteristic
Using the top-k value the distance between the two vectors (Bluetooh and
Wi-Fi) is not considered. Using top 1 values, it may happen that a far
away Bluetooth vector is the first of the list of a Wi-Fi vector in terms of
distance. Using the top-k method we would have linked them. This is prob-
ably a wrong result because the euclidean distance between a Wi-Fi and a
Bluetooth vector must tend towards zero.
For this purpose we introduce the concept of threshold. The threshold is a
limit beyond which we consider each pair of rows MAC addresses false and
therefore we do not match them. Within the threshold the two MAC ad-
dresses are considerate automatically of the same device and so we link them.
52
Using threshold four different cases are possible:
• True Positive: Wi-Fi and Bluetooth MAC addresses coming from
the same device correctly identified as the same device.
• False Positive: Wi-Fi and Bluetooth MAC addresses coming from
different devices incorrectly identified as the same device.
• True Negative: Wi-Fi and Bluetooth MAC addresses coming from
the same device correctly identified as the different devices.
• False Negative: Wi-Fi and Bluetooth MAC addresses coming from
the same device incorrectly identified as different devices.
To represent these values the Receiver Operating Characteristic (ROC) is
used. The ROC curve, is a graphical plot that illustrates the diagnostic
ability of a binary classifier system as its discrimination threshold is varied.
Thanks to the ROC we can identify which threshold value is the best to
have an high rate of True Positive and at the same time a low rate of False
Positive.
Indeed, the ROC curve is created by plotting the True Positive Tate (TPR)
against the False Positive Tate (FPR) at various threshold settings.
The TPR is called sensitivity and it measures the proportion of positives
that are correctly identified as such (e.g. the number of Wi-Fi and Blue-
tooth MAC addresses from the same device correctly identified as a single
device).
The FPR is called fall-out. It measures the proportion of negative couple
of MAC addresses that are incorrectly identified as positive. It is closely
related to specificity and is equal to (1−specificity). Specificity is the True
Negative Rate (TNR) and it measures the proportion of negatives that are
correctly identified as such.
All the algorithms have different threshold, in order to plot in one graph
we normalize them and then calculate the rates of the true positive and the
false positive. We obtain the ROCs in figure 4.16 and in figure 4.17.
53
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
RAlgorithms
conv
conv dist
finger
finger avg
norm
trilateration
Figure 4.16: ROC of the home experiment
The FPR and the TPR depending on the threshold are represented. The top
left corner is the best case in term of ration between sensitivity and fall-out
(or specificity). In this corner all the positive values are true (TPR = 100%)
and there are not false positive (FPR = 0%). The point on the curve closest
to the top left corner is the best threshold value for that specific algorithm.
Considering the conversion from Bluetooth/Wi-Fi to distance (4.17.c), we
obtain the optimal point when the normalized threshold is 0.13. In this
point the FPR is only 22% and the TPR is the 50%. So, if the threshold
is set to 0.13 we obtain 6 true values (5 true positives and 1 false positives)
and 9 negatives (5 false negatives and 4 true negatives).
54
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm norm
a) Normalization
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm conv
b) Conversion Bluetooth to Wi-Fi
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm conv dist
c) Conversion to distance
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm trilateration
d) Trilateration
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm finger avg
e) Fingerprint average dataset
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm finger
f) Fingerprint
Figure 4.17: ROC of the different algorithms of the home experiment
55
If the threshold is chosen in a proper way we are sure to match correctly
a device. Indeed if the MAC address is under the threshold it is the right
MAC address three times out of four.
Using the right threshold in our computation the results become more pre-
cise because we know exactly which Bluetooth address match with the Wi-Fi
address, on the other hand the algorithm become less accurate because we
exclude some true values that are over the threshold.
Defining an upper bound for threshold can be useful to precisely assert if
two MAC addresses are the same. This method may cause some drawbacks
because some correct values are excluded also if the best TPR/FPR thresh-
old is set. This problem is evident analyzing the area under the ROC curve
(AUC). An area of 1 represents a perfect test, in our case we reach hardly
a value of 0.60. This value may depend to the low number of value (only
15) used to create the ROC or may depend to the poor correlation between
the threshold and the correct pairing of MAC addresses. However we think
that the threshold value can be use in situation where we are interested in
a precise pairings even if some correct values are excluded.
56
Chapter 5
Real Scenario Experiment
In the previous chapter was presented a test performed in a isolated envi-
ronment with known devices. That type of experiment was important to
understand the behavior of the devices and to test our algorithms.
To proof if the home results are valid in a real scenario we decided to repli-
cate the previous experiments. We have chosen an university laboratory in
which we do not know how many devices are presents and we also do not
know a priori the Wi-Fi MAC addresses and the Bluetooth MAC addresses
of the devices.
We decided not to make preliminary tests. The relations between distance
and RSSI and between Wi-Fi RSSI and Bluetooth RSSI have been calcu-
lated using a spy device placed in a known point. We chose this approach
because we want to simulate a real scenario in which is not possible to per-
form preliminary tests.
Another difference with the home experiment was choosing not to use the
fingerprint algorithm. It is costly and time consuming. In an unknown
scenario the fingerprint is difficult to replicate due to time and cost con-
sumption.
There is also a difference in term of datasets dimension. During the home
experiment the Bluetooth and the Wi-Fi datasets have the same dimensions.
In reality people use much more Wi-Fi than Bluetooth. Often the Bluetooth
is keep off or it is invisible, instead Wi-Fi is almost always turned on. Hence,
the number of unique Wi-Fi MAC addresses will be greater than the number
of unique Bluetooth MAC addresses.
57
5.1 The environment
The environment of this experiment is the ANTLab, an university laboratory
of 10 meters x 8 meters and an area of about 80 square meters. To cover all
the area of the laboratory six Raspberry Pis are placed (figure 5.1). There
are desks, computers, chairs in the laboratory and during the experiment
there were about 10 people. This configuration causes a different path loss
than the previous experiment.
Figure 5.1: ANTLab planimetry. The six Raspberry Pis are placed on the perimeter
5.2 The devices
Before doing the experiment, we did not know how many devices would have
been in the environment nor the position.
All the devices are unknown except two. We used the previous LG and
Samsung S smartphones and we placed them in a known position. This was
done to perform a sort of real time mapping of the environment. We chose
these two devices because in the home experiment they result the ones that
have more trustworthiness.
58
5.3 Execution
As mentioned above, we do not know the number of devices in the labora-
tory.
hcitool scan allowed us to discover the visible Bluetooth devices. We
found eleven different Bluetooth MAC addresses that are present during all
the experiment time.
Our script has been run for ten minutes. We suppose that during this period
the devices are in a static position.
An high number of Wi-Fi probe requests have been captured. The tool
deleted all the corrupted probes. We have also decided to delete all ad-
dresses that have less than 10 probe requests. We suppose that these probes
come from people outside the laboratory or from passers.
We obtain 35 different Wi-Fi MAC addresses and we made the average of
each different address creating a dataset of 35 lines and 7 rows (six RSSI
rows and a MAC address row).
As regard Bluetooth we generate a dataset of eleven lines and 7 rows.
The next phase is the matching one. As before the used algorithms were:
1. normalization;
2. RSSI conversion from Bluetooth to Wi-Fi;
3. RSSI conversion from Bluetooth/Wi-Fi to distance
4. trilateration.
The way in which the algorithm were used has been the same like the home
experiment, explained in chapter 4.3. As mentioned above, the only unused
algorithm has been the fingerprint due to the time consuming.
In order to verify the correct algorithms pairing, at the end of the exper-
iment, people in the laboratory were asked for their Wi-Fi and Bluetooth
MAC addresses. In this way we obtain the correct MAC address couple and
it was possible to check the algorithms accuracy.
59
5.4 Results
The goal of the experiment is the same of the home experiment: to link
two MAC addresses, one coming from Wi-Fi and the other one coming from
Bluetooth. Linking the MAC addresses allows to identify uniquely a device.
The are several differences respect to the first experiment. The most ev-
ident difference is that we do not know a priori which is the correct MAC
address couple, indeed almost all devices are not directly in our control. It
allows us understand if our algorithm are valid in a not controlled environ-
ment.
There is a difference of path loss due to the layout of the laboratory. Also
the devices models are dissimilar. These two differences made the previous
regressions impossible to use to convert the RSSI in distance and the convert
the two type of RSSI each other. Indeed the curves presented in section 4.1
are device and environment specific.
The regression models have been computed on the fly, using our two known
devices. We expect that these models are less accurate than the ones we
used during the home experiment.
5.4.1 Top-k values
In figure 5.2 the bar plots representing the percentage of accuracy of each
algorithm using the top k approach are shown.
As we can see, the best algorithm in term of accuracy is the conversion from
Bluetooth and Wi-Fi to distance. It reaches 93% of correct coupling in top
5 and the 45% in top 1.
The algorithms that use the regression have a high degree of accuracy, about
40%, 70%, and 80% using top 1, top 3 and top 5 respectively. This means
that the creation of on-the-fly regressions has been quite accurate and they
are able to roughly approximate the RSSI variation in the laboratory. It is
interesting note the variation of exact pairings between the home experiment
and the laboratory experiment is almost the same between the algorithms.
Because of the size of the environment and its configuration we expect the
accuracy to be lower. During the home experiment with six Raspberry Pis
the density of anchors was one every 7 square meters. In the laboratory the
density is one anchor every 13 square meters, almost half. It is a bit less
than the home experiment density with four Raspberry Pis.
Compared to the home experiment with six anchors we obtain a total aver-
age decrease of accuracy of the 10%. It may look like an high value, but if we
60
trilateration
conv dist
conv WiFi to BT
norm
0% 25% 50% 75% 100%
Accuracy
Alg
orit
hm
Top 5 3 1
Figure 5.2: Algorithms accuracy percentages of the top-k value approaches of the
laboratory experiment
take into account the worsening of conditions in the laboratory experiment
the result is more than satisfactory.
Compared to the case with four Raspberry Pis the decrease of accuracy is
only 3%. This result point out that similar anchors densities generate simi-
lar accuracy results.
From these results, we can infer that is possible to link the MAC addresses
using the previous algorithms in an unknown scenario. Indeed the results
are coherent with the home experiment results and they provide a good
accuracy.
61
5.4.2 Receiver Operating Characteristic
As explained in section 4.4.3, the ROC curve represent the threshold values
and their relation with the false positive rate (FPR) and the true positive
rate (TPR). Thanks to the ROC we can identify the precision of an algo-
rithm and its sensitivity.
In the following figures (5.3 and 5.4) the ROC of the algorithms in the
laboratory are plotted.
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithms conv conv dist norm tri
Figure 5.3: ROC of the laboratory experiment
Looking at the graph the algorithm nearest to the top left corner (the best
point in the ROC) is the normalization. Looking more closely at the per-
centages of accuracy of normalization (5.4.a) we find that in the case of the
laboratory experiment they are very low, so in the analysis of the ROC we
discard the normalization.
62
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
RAlgorithm norm
a) Normalization
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm conv
b) Conversion Bluetooth to Wi-Fi
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm conv dist
c) Conversion to distance
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
FPR
TP
R
Algorithm tri
d) Trilateration
Figure 5.4: ROC of the different algorithms of the laboratory experiment
It is interesting analyze the logarithmic conversion from Wi-Fi/Bluetooth to
distance (5.4.c). During the home experiment threshold that maximize the
precision of the algorithm is 0.13. We want to understand if this threshold
value is the same also in the laboratory experiment.
If we use 0.13 as threshold value we obtain 40% of TPR and 0% of FPR. It
is obviously not the optimal value. The laboratory best threshold is 0.29,
the TPR is 60% and the FPR 33%. This means that the threshold varies
considerably due to the environment. A higher threshold specifies that the
Top 1 euclidean distance is greater, hence more distant from zero. This
63
means that the algorithm was a bit less precise than the home experiment.
But that was what we expected, as we said before.
In the following table (table 5.1) the differences of the best threshold of
each algorithm between the home experiment and the laboratory experiment
are shown. This table confirm that every single algorithm have a different
threshold. Also the environment is the cause of a different threshold.
Table 5.1: Threshold Table
Algorithm Home Experiment Lab Experiment
Normalization 0.58 0.49
Conversion 0.25 0.55
Conversion distance log 0.13 0.29
Trilateration 0.12 0.08
Average 0.27 0.35
All the algorithms have an area under the ROC curve of almost 0.60. Con-
sidering the resemblance between the home experiment and the laboratory
experiment, the same consideration can be done. The threshold is not a
good parameter if we are interested to the total accuracy, but can be useful
if we look to an high precision.
64
Chapter 6
Blended attack scenario
Blended attacks are those that combine two attack mediums, Wi-Fi and
Bluetooth, into a single more powerful attack. In most cases, these attacks
are designed with the intention of inflicting far quicker damage to a target
device than is possible using only a single attack medium.
To use blended attacks a malicious attacker needs to know the two MAC ad-
dresses of the target device. Usually it is not easy to know for sure whether
two MAC addresses (Bluetooth and Wi-Fi) are coming from the same de-
vice. Our algorithms can almost certainly find the corresponding Wi-Fi
MAC address of a Bluetooth address or vice versa.
Another possibility is to attack only one interface (for example Bluetooth)
but that specific address is not known. Using our tool is possible to link the
known address (in this case Wi-Fi) to the unknown one and after perform
the attack.
We will now see more specifically how an attack is possible and what re-
sults can be achieved.
6.1 Attack scenario
To perform an attack the assumption are the two attacked interfaces (Wi-Fi
and Bluetooth) of the device are turned on. Another assumption is that we
know the device owner and we are close to him during the attack. Without
these conditions an attack is not possible.
We can start to analyze the worst case: both the Wi-Fi and the Bluetooth
MAC addresses are unknown. There are other two options at this point.
65
Start discovering the Wi-Fi address and then the Bluetooth one or start
discovering the Bluetooth address and then the Wi-Fi one.
6.1.1 Discover the Wi-Fi and infer the Bluetooth MAC ad-
dress
To obtain the MAC address of a person we have already explained in chapter
2 the method proposed by Cunche [8]. It consist in following the target for
a short time at a reasonable distance with a monitor tool (i.e. tshark or
airdoump-ng). The only Wi-Fi MAC address that is always present is the
target MAC address.
The same procedure can be done with the Bluetooth interface using the
inquiry (hcitool scan), but only if the Bluetooth interface is visible. In
our scenario we suppose that the Bluetooth interface is not visible so is not
possible to use the Cunche’s method for the Bluetooth.
To discover an invisible Bluetooth MAC address, RedFang is the necessary
tool. It is an application which finds non discoverable Bluetooth devices
using brute force. It is available in Kali Linux and in the most common
Linux distros. The only RedFang drawback is the time consuming like all
the brute forcing methods, but at now it is the only way to discover a non
discoverable device.
Previously we have found the Wi-Fi MAC address. Using an OUI table is
easy to discover the vendor of the device. Starting from a known vendor is
possible to reduce the range of the Bluetooth MAC addresses that RedFang
needs to discover. In this way the operation will be faster. If for some reasons
we want to know all the invisible Bluetooth devices RedFang can scan all the
possible MAC addresses (from 00:00:00:00:00:00 to FF:FF:FF:FF:FF:FF).
When RedFang finished we obtain a list of available MAC addresses.
Using our tool we compare the list of Bluetooth addresses with the pre-
viously found Wi-Fi address. In order to obtain better results we might
place a couple of known devices. As we saw in chapter 5, this operation
allows to be more precise when we use algorithms like the conversion from
Bluetooth and Wi-Fi to distance or the conversion between the two tech-
nologies.
To obtain faster results it is possible to use the normalization, if we want
to be more accurate it is possible to use some algorithms that perform con-
version. Also in this case it is hard to use the fingerprint, unless we are in
a familiar environment.
66
At the end of the process we presumably have known the correct couple of
MAC addresses.
6.2 Attacks
There are a lot of attacks that involves smartphones. The most commons
are the Battery Exhaustion Attack and the Denial of Service.
As explained below, these two attacks are extremely simply and effective.
They only need a common hardware and the consumption of resources on
the attacker machine is very low.
6.2.1 Denial of Service
In a denial-of-service (DoS) attack, an attacker attempts to prevent legit-
imate users from accessing information or services. Even if Bluetooth is
theoretically quite robust, DoS can prevent the use of Bluetooth. It pre-
vents to send files, to scan devices or to use Bluetooth services.
There are several methods to implement a Denial of Service in the Blue-
tooth stack. After finding out the Bluetooth MAC address an attacker can
use:
• Ping of Death Flood: as explained in the previous chapters, l2ping
allows an user to ping a Bluetooth MAC address to determine if the
host is reachable. Using l2ping at a high rate of speed both outgoing
bandwidth as well as incoming bandwidth are consumed. If the tar-
get Bluetooth is slow enough, it is possible to consume enough of its
resource for a significant slowdown or interruption of the availability.
• BlueSmack Flood: This Bluetooth flooding attack is essentially a Ping
of Death attack, but it is deployed with a much larger data payload,
600 bytes. Using the 600 byte payload size sometimes causes Bluetooth
stacks to malfunction on some devices.
• BlueSpam Flood: BlueSpam is an attack that identifies Bluetooth-
enabled devices in discoverable mode and spams selected targets with
repeated vCard messages. This attack is most often used as an an-
noyance, but can be classified as a DoS flood if the rate at which the
sending of the vCard messages is extremely elevated.
• Blueper Flood: this attack resembles BlueSpam in nature, but repeat-
edly floods a device with file transfers instead of vCard messages.
67
Ping of Death Flood is an attack very easy to perform. Only a script that
pings in flooding the target device is needed. To perform this attack we
create the following script that takes in input the Bluetooth MAC address
and pings it in flooding. The -s option is the size of the echo packet. It is
set to 300 bytes in order to speed up the attack. Obviously only one pinging
thread is not enough. We use twenty ping threads as test.
1 #! /bin /bash
2
3 mac address=$1
4
5 echo ”ping $mac address ”
6
7 whi le :
8 do
9 nohup sudo l2p ing −f $mac address −s 300
10 done
The attacks is perpetrated on a Huawei Honor 4c smartphone, the Android
version is 6.0 and the security patch level is dated 1st April 2016. During
the ping of death attack the Huawei device does not see anything and con-
tinue to behave as usual. The problem happens when another device try to
send a file to the Huawei smartphone. The file is not seen on the attacked
device and the sender receive as output ”file not sent”. So, it is impossible
to transfer file between the two devices. The attack is successful because
the smartphone is busy to respond to all the echo requests and it fails to
receive the file.
In the figure 6.1.a the screenshot of the sender device (a Samsung smart-
phone) after the sending timeout is shown. The test was also done using the
Huawei as sender and the Samsung as attacked device. The result was the
same (figure 6.1.b).
6.2.2 Battery Exhaustion Attack
During a battery exhaustion attack the goal is to drain the battery of the
target device. To obtain more damage the attack can be blended on Wi-Fi
and Bluetooth. The battery depletion can be accelerate almost to 20% [18].
BlueSYN Flood is an attack that consist to launch simultaneously a BlueS-
mack l2ping flood and an hping3 SYN flood.
68
a) Huawei DoS b) Samsung S3 DoS
Figure 6.1: DoS attacks on the target devices
The commands used to implement the attack against the target device are:
• hping3 --syn --faster <IP Address> : it sends sync request on
the Wi-Fi channel;
• l2ping -s 600 -f <Bluetooth MAC Address> : it pings the Blue-
tooth stack with a packet of 600 bytes.
PingBlender Flood is very similar to BlueSYN but uses a combination of
ping floods from both Wi-Fi and Bluetooth mediums. The commands are:
• hping3 --faster <IP Address>: it pings the Wi-Fi stack using
flood;
• l2ping -f <Bluetooth MAC Address>: it pings the Bluetooth stack
using flood.
69
Chapter 7
Conclusions
This thesis has focused on the analysis of the Bluetooth signals and of the
Wi-Fi probe requests. In particular to find a relation between the two dif-
ferent RSSI to link the Bluetooth and the Wi-Fi MAC addresses. In this
thesis we propose five algorithms that permit the MAC addresses pairing.
In the first phase was developed a sensor network composed by Raspberry
Pis capable to capture all the Wi-Fi and the Bluetooth signals. During this
phase we also studied the behavior of the probe requests and the behav-
ior of the Bluetooth connection parameters (RSSI, TPL, LQ, echo RTT,
RX power level). This study was fundamental to decide which information
could be useful in our case and how to use it.
In the second phase the sensor system was used to capture the MAC ad-
dresses of different devices. These experiments were executed in two environ-
ments with different topological characteristics, different number of devices
and different assumptions. We explored how the performance (accuracy of
the devices coupled properly) is influenced by the variation of the number of
anchors (Raspberry Pis), anchors density and environment characteristics.
The obtained results were consistent with what we expected. Our algorithms
show that is possible to link the Wi-Fi and the Bluetooth MAC addresses
with a good grade of accuracy. The results are valid both in a controlled
scenario (the home experiment) and in a real scenario (the laboratory ex-
periment), showing that the accuracy percentage is coherent in both cases.
Moreover, we noticed that better results are achieved when we increase the
number of Raspberry Pis and when they cover all the area of the environ-
ment. The results are presented in using the top-k value approach, for each
MAC address we select the first k candidates that can compose the couple.
70
As regards the algorithms, we have discovered that the best one is the con-
version from RSSI to distance that allow us to correctly pair up to 100%
of the MAC addresses using the top 5 approach. This algorithm, the con-
version between Wi-Fi RSSI and Bluetooth RSSI and the trilateration need
a pre-computation of the relation between the RSSI and the distance. We
have noticed that this problem can be overcome using a spy device and
processing the relation between distance and RSSI on the fly. As for tri-
lateration we can not use this method because it requires a more complex
calibration phase. In any case, its results are compatible with the results of
other algorithms.
In the last part of the thesis we put into practice the results obtained to
analyze and to simulate an attack on a smartphone. We have discovered
that is easily possible to perform a Denial of Service (DoS) attack on the
Bluetooth interface and a blended attack on both Wi-Fi and Bluetooth in-
terface can drain the battery of the device.
Even if the top 5 algorithms accuracy is already satisfactory, the next step
is to increase the algorithms accuracy also in top 3 and especially in top 1.
This can be done in several ways. Increasing the number of the anchors is
the simplest solution. Another option can be increasing the precision when
the RSSI is captured using a filtering of the data, for example the Kalman
filtering proposed by [6]. To increase the Bluetooth accuracy of the visible
devices a mix between the RSSI and the RX power level can be used. An-
other option to increase accuracy is to mix the algorithms depending on the
environment characteristics or to analyze different vendors behavior. As we
saw during our research, the implementation and design choices may differ
for each manufacturer, hence analyze different vendors can help to have a
better pairing system. The future study may also extend the number of de-
vices used, simulating a scenario with an high density of devices. Extending
the number of the devices can be also useful to use some machine learning
algorithms. It will be possible to create an artificial neural network that
finds patterns in data to create a more precise pairing.
There are other things which did not fit in the scope of this research, but
require further investigation. A mixed indoor location system or a mixed
crowd density system using both Bluetooth and Wi-Fi can be developed.
They can exploit the MAC addresses pairing to increase the precision of the
system. Another future work is the de-randomization of the Wi-Fi MAC
address using the Bluetooth MAC address. Using our system it is possible
71
to cross the real Wi-Fi data, the random Wi-Fi data and the Bluetooth data
(that never changes) to discover which are the fake Wi-Fi MAC addresses.
During a second step it is possible to pair the real Wi-Fi address with the
Bluetooth address, to pair the fake Wi-Fi address with the Bluetooth ad-
dress and in the end understand which fake address correspond to the real
Wi-Fi MAC address. The de-randomization can be blended with the device
tracking, it can improve the stalker attach proposed by Cunche [8]. In fact,
switching between the Wi-Fi and the Bluetooth MAC addresses allows to
track a device regardless its network interface availability.
This thesis also points out the easiness of a DoS attack on the Bluetooth
interface. This can be an incentive to study more thoroughly the behavior
of the Bluetooth stack when it receives layer 2 echo request packets.
72
Acronyms List
AP Access Point
LOS Line Of Sight
RSSI Received Signal Strength Indicator
RTT Round-Trip Time
SN Sequence number
Wi-Fi Wireless Fidelity
OUI Organizationally Unique Identifier
NIC Network Interface Controller
SSID Service Set IDentifier
BSSID Basic Service Set IDentifier
GPS Global Positioning System
MAC Media Access Control
ROC Receiver Operating Characteristic
RX Receiver
TPL Transmit Power Level
LQ Link Quality
AP Access Point
DoS Denial of Service
GRPR Golden Receiver Power Range
BLE Bluetooth Low Energy
IEEE Institute of Electrical and Electronics Engineers
FCS Frame Check Sequence
WPAN Wireless Personal Area Network
SCO Synchronous Connection Oriented
ACL Asynchronous ConnectionLess
L2CAP Logical Link Control and Adaptation Protocol
RFCOMM Radio Frequency Communications
HCI Host Control Interface
BER Bit Error Ratio
NTP Network Time Protocol
73
Bibliography
[1] Naeim Abedi, Ashish Bhaskar, and Edward Chung. Bluetooth and wi-
fi mac address based crowd data collection and monitoring: Benefits,
challenges and enhancement. 2013.
[2] Marco V. Barbera, Alessandro Epasto, Alessandro Mei, Vasile C. Perta,
and Julinda Stefa. Signals from the crowd: Uncovering social relation-
ships through smartphone probes. In Proceedings of the 2013 Confer-
ence on Internet Measurement Conference, IMC ’13, pages 265–276,
New York, NY, USA, 2013. ACM.
[3] Bluetooth SIG Proprietary. Bluetooth Core Specification, 12 2016.
[4] DM Bullock, R Haseman, JS Wasson, and R Spitler. Anonymous blue-
tooth probes for airport security line service time measurement: the
indianapolis pilot deployment. In 89th Annual Meeting in Transporta-
tion Research Board, 2010.
[5] Luca Carettoni, Claudio Merloni, and Stefano Zanero. Studying blue-
tooth malware propagation: The bluebag project. IEEE Security &
Privacy, 5(2), 2007.
[6] Song Chai, Renbo An, and Zhengzhong Du. An indoor positioning
algorithm using bluetooth low energy rssi. 2016.
[7] Maxim Chernyshev, Craig Valli, and Michael Johnstone. Revisiting ur-
ban war nibbling: Mobile passive discovery of classic bluetooth devices
using ubertooth one. IEEE Transactions on Information Forensics and
Security, 12(7):1625–1636, jul 2017.
[8] Mathieu Cunche. I know your MAC Address: Targeted tracking of
individual using Wi-Fi. In International Symposium on Research in
Grey-Hat Hacking - GreHack, Grenoble, France, November 2013.
74
[9] Christos Douligeris and Dimitrios N. Serpanos, editors. Network Secu-
rity. John Wiley & Sons, Inc., jun 2007.
[10] Julien Freudiger. How talkative is your mobile device?: An experi-
mental study of wi-fi probe requests. In Proceedings of the 8th ACM
Conference on Security & Privacy in Wireless and Mobile Networks,
WiSec ’15, pages 8:1–8:6, New York, NY, USA, 2015. ACM.
[11] Simon Hay and Robert Harle. Bluetooth tracking without discoverabil-
ity. In Lecture Notes in Computer Science, pages 120–137. Springer
Berlin Heidelberg, 2009.
[12] AKM Mahtab Hossain and Wee-Seng Soh. A comprehensive study
of bluetooth signal parameters for localization. In Personal, Indoor
and Mobile Radio Communications, 2007. PIMRC 2007. IEEE 18th
International Symposium on, pages 1–5. IEEE, 2007.
[13] Zhu Jindan, Zeng Kai, Kyu-Han Kim, and Prasant Mohapatra. Improv-
ing crowd-sourced wi-fi localization systems using bluetooth beacons.
9th Annual IEEE Communications Society Conference on Sensor, Mesh
and Ad Hoc Communications and Networks (SECON), 2012.
[14] Joonyoung Jung, Dongoh Kang, and Changseok Bae. Distance esti-
mation of smart device using bluetooth. In ICSNC 2013 : The Eighth
International Conference on Systems and Networks Communications.
The Government of South Korea, 2013. Used by permission to IARIA,
2013.
[15] Jeremy Martin, Travis Mayberry, Collin Donahue, Lucas Foppe, La-
mont Brown, Chadwick Riggins, Erik C. Rye, and Dane Brown. A
study of MAC address randomization in mobile devices and when it
fails. CoRR, abs/1703.02874, 2017.
[16] Krasnyansky Maxim and Holtmann Marcel. l2ping Linux Man Page.
[17] Zhenyu Mei, Dianhai Wang, Jun Chen, and Wei Wang. Investigation of
bicycle travel time estimation using bluetooth sensors for low sampling
rates. PROMET - Traffic&Transportation, 26(5), oct 2014.
[18] Benjamin R. Moyers, John P. Dunning, Randolph C. Marchany, and
Joseph G. Tront. Effects of wi-fi and bluetooth battery exhaustion at-
tacks on mobile devices. In 2010 43rd Hawaii International Conference
on System Sciences. IEEE, 2010.
75
[19] Farid Movahedi Naini, Olivier Dousse, Patrick Thiran, and Martin Vet-
terli. Population size estimation using a few individuals as agents. In
Information Theory Proceedings (ISIT), 2011 IEEE International Sym-
posium on, pages 2499–2503. IEEE, 2011.
[20] Pierre Rouveyrol, Patrice Raveneau, and Mathieu Cunche. Large Scale
Wi-Fi tracking using a Botnet of Wireless Routers. In SAT 2015 -
Workshop on Surveillance & Technology, Philadelphia, United States,
June 2015.
[21] Antonio J Ruiz-Ruiz, Henrik Blunck, Thor S Prentow, Allan Stisen,
and Mikkel B Kjaergaard. Analysis methods for extracting knowledge
from large-scale wifi monitoring to inform building facility planning.
In Pervasive Computing and Communications (PerCom), 2014 IEEE
International Conference on, pages 130–138. IEEE, 2014.
[22] Lorenz Schauer, Martin Werner, and Philipp Marcus. Estimating crowd
densities and pedestrian flows using wi-fi and bluetooth. In Proceed-
ings of the 11th International Conference on Mobile and Ubiquitous
Systems: Computing, Networking and Services, pages 171–177. ICST
(Institute for Computer Sciences, Social-Informatics and Telecommu-
nications Engineering), 2014.
[23] Fazli Subhan, Halabi Hasbullah, Azat Rozyyev, and Sheikh Tahir
Bakhsh. Indoor positioning in bluetooth networks using fingerprint-
ing and lateration approach. In Information Science and Applications
(ICISA), 2011 International Conference on, pages 1–9. IEEE, 2011.
[24] Mathias Versichele, Tijs Neutens, Stephanie Goudeseune, Frederik van
Bossche, and Nico Van de Weghe. Mobile mapping of sporting event
spectators using bluetooth sensors: Tour of flanders 2011. Sensors,
12(12):14196–14213, Oct 2012.
[25] Donald Welch and Scott Lathrop. Wireless security threat taxonomy.
In Information Assurance Workshop, 2003. IEEE Systems, Man and
Cybernetics Society, pages 76–83. IEEE, 2003.
[26] Jens Weppner, Paul Lukowicz, Ulf Blanke, and Gerhard Troster. Partic-
ipatory bluetooth scans serving as urban crowd probes. IEEE Sensors
Journal, 14(12):4196–4206, 2014.
76