Pairing W-Fi and Bluetooth MAC addresses through passive ... · Media Access Control (MAC) address....

POLITECNICO DI MILANOSchool of Industrial and Information Engineering

MSc in Computer Science and Engineering

Pairing W-Fi and Bluetooth

MAC addresses through passive packets

capture

ANTLab

Advanced Network Technologies LABoratory

Supervisor: Prof. Alessandro Enrico Cesare Redondi

Master thesis by:

Edoardo Longo, ID 841677

Academic year 2016-2017

Abstract

Nowadays the majority of smart devices (e.g. smartphones, tablets, personal

computers) use wireless communication, especially Bluetooth and Wi-Fi.

These network interfaces are uniquely identified by a 48 bits name, called

Media Access Control (MAC) address. Since every device is identified by a

different Bluetooth and Wi-Fi MAC address, the MAC addresses analysis

provides useful statistical data as crowd density, travel time estimation and

indoor positioning. These two addresses are found in different broadcast

packets: the Wi-Fi MAC address is contained in the probe requests, the

Bluetooth one is visible during an inquiry scan or establishing a connection.

The goal of the thesis is pairing a Wi-Fi MAC address with a Bluetooth

MAC address. In particular, to understand how Wi-Fi and Bluetooth sig-

nals are related. In this thesis we want to propose and evaluate a system

composed by a sensor network of capturing devices and by algorithms that

are capable of pairing the Wi-Fi and the Bluetooth MAC addresses. The

conditions that influence the measurement accuracy are firstly studied, then

two experiments both in a controlled scenario and in a real scenario are per-

formed. We have shown that the algorithms are accurate enough to allow

the pairing. We also analyze a possible Bluetooth attack scenario using our

system.

Contents

1 Introduction 1

2 State of the Art 4

2.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Technical Overview and System Architecture 9

3.1 Wi-Fi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Passive Scanning . . . . . . . . . . . . . . . . . . . . . 10

3.1.2 Active Scanning . . . . . . . . . . . . . . . . . . . . . 10

3.1.3 Probe Request Structure . . . . . . . . . . . . . . . . . 11

3.2 Bluetooth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Bluetooth Connections . . . . . . . . . . . . . . . . . . 13

3.2.2 Discover a Bluetooth device . . . . . . . . . . . . . . . 15

3.2.3 Bluez . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.4 Inquiry with RSSI and hcitool RSSI . . . . . . . . . . 16

3.2.5 l2ping . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 MAC Address . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . 19

4 Experiments and Algorithms 23

4.1 Preliminary experiments . . . . . . . . . . . . . . . . . . . . . 24

4.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.2 Home experiment parameters . . . . . . . . . . . . . . 32

4.2 Home experiment . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.2 RSSI conversion from Bluetooth to Wi-Fi . . . . . . . 40

4.3.3 RSSI conversion from Bluetooth and Wi-Fi to distance 40

4.3.4 Trilateration . . . . . . . . . . . . . . . . . . . . . . . 42

4.3.5 Fingerprint . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.1 Top-k value . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.2 Adding anchors . . . . . . . . . . . . . . . . . . . . . . 50

4.4.3 Receiver Operating Characteristic . . . . . . . . . . . 52

5 Real Scenario Experiment 57

5.1 The environment . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2 The devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.1 Top-k values . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.2 Receiver Operating Characteristic . . . . . . . . . . . 62

6 Blended attack scenario 65

6.1 Attack scenario . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.1 Discover the Wi-Fi and infer the Bluetooth MAC ad-

dress . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.2.1 Denial of Service . . . . . . . . . . . . . . . . . . . . . 67

6.2.2 Battery Exhaustion Attack . . . . . . . . . . . . . . . 68

7 Conclusions 70

Bibliography 73

Chapter 1

Introduction

The use of smartphonea, tablets, laptops and other smart devices is spread-

ing more and more in everyday life. People are always connected and ev-

erything can be done remotely through smartphones. Connectivity is the

way through which these operations can be done. It is used to access to the

internet, to share files, to use mobile application, to make phone calls, to

play music, to use internet tethering and other useful features.

In order to carry out these operations nowadays the majority of smart-

phones, laptops and portable electronics devices use wireless communication,

especially Bluetooth and Wi-Fi. Bluetooth technology is useful when trans-

ferring information between two or more devices that are near each other

and speed is not a concern. It is best suited to low-bandwidth applications

like transferring sound data with telephones (i.e. with a Bluetooth headset)

or byte data with hand-held computers (transferring files) or keyboard and

mouse. Wi-Fi is suited for operating on full-scale networks. It enables a

faster connection, an high range from the base station and a good wireless

security. For these reasons Wi-Fi technology powers most home networks,

many business local area networks and public hotspot networks.

Every network adapter (Wi-Fi, Bluetooth, but also Ethernet or ZigBee) is

uniquely identified by a 48-bits name, called Media Access Control (MAC)

address. It is embedded into the network hardware during the manufactur-

ing process, or stored in firmware, and designed not to be modified. Hence,

every smart device has a couple of MAC addresses, one for Wi-Fi network

and one for Bluetooth, that uniquely identifies a device.

The goal of the thesis is pairing a Wi-Fi MAC address with a Bluetooth

MAC address. In particular, to understand how Wi-Fi and Bluetooth sig-

nals are related. Indeed, a Bluetooth and a Wi-Fi MAC address coming

from the same device cannot be immediately related to each other because

1

the two addresses are different.

A sensor network of capturing devices was implemented for this purpose. It

was composed by several Raspberry Pis (a single-board computer) to cap-

ture Wi-Fi and Bluetooth signals that are later analyzed by different off-line

algorithms. The results of the algorithms showed us the possibility to link

the Wi-Fi and the Bluetooth MAC’s addresses.

In order to link the MAC addresses we use Bluetooth connection param-

eters and Wi-Fi probe requests. Bluetooth allows two or more devices to

communicate with each other. To establish a connection between the de-

vices, the target MAC address must be known. The MAC address is found

using an inquiry scan. The scan shows various device details including its

MAC address, its name and the services it supports. In addition to this

information, the Bluetooth stack allows to discover some connection param-

eters useful to the thesis scope and to localize a device (i.e. RSSI, RX power

level, TPL, Link Quality ).

Wi-Fi interfaces need to be connected to a network in order to provide

connectivity. Every minute, smartphones search for the presence of Wi-Fi

networks to connect with [10]. This operation generates a traffic of probe

requests, a special network packet containing some useful information among

which device MAC address, Access Point (AP) MAC address, list of past

SSIDs and Received Signal Strength. This information is sent in broadcast

and can be easily captured by another device, in our case the network of

Raspberry Pis.

The privacy issue is crucial because the data explained above reveals a lot of

information regarding the device owner: from the device name is possible to

discover the device model or the owner’s name; from the RSSI, the location

can be inferred; the past SSIDs list shows the names of the previous Wi-Fi

networks to which the device owner was connected and from this informa-

tion social analysis can be done [2].

Collection data from capturing wireless technologies which communicate

based on MAC address standards have been recently applied [1]. The prob-

lems is that the Bluetooth and the Wi-Fi MAC addresses are completely

unrelated, therefore it is difficult to do a cross-study between the two tech-

nology and in particular treat the data as if it has the same source.

To cover this gap, the thesis aims to link the Bluetooth and the Wi-Fi MAC

addresses using Wi-Fi probes and Bluetooth connection parameters. The

possibility of pairing two different MAC addresses opens to different impli-

cations. It can create a more accurate indoor localization system, because

the use of two technologies can increase the precision of the position using

2

different approaches. It is also a malicious attackers weapon. The malicious

hacker can commit blended attacks on both two interfaces creating denial

of service (DoS), battery drain attacks or exploit other vulnerabilities. The

pair process can also operate a sort of de-randomization (replace the address

with a fake one) of the Wi-Fi MAC address. If we know that the random

Wi-Fi MAC is related to a true Bluetooth MAC we can infer the real Wi-

Fi address and break the MAC address randomization performed by some

vendors.

During this thesis, in order to pair the two MAC addresses, a wireless sen-

sor network and different algorithms are implemented. The sensor network

is composed by up to 6 Raspberry Pis that are in charge of capturing the

Bluetooth and the Wi-Fi signals, in particular the Received Signal Strength

Indicator (RSSI). We create five different algorithms to link the MAC ad-

dresses. The scope of the algorithms is to link a Bluetooth signal coming

from a device to a Wi-Fi signal coming from the same device. For this

purpose the system uses two datasets (one regarding Bluetooth and one re-

garding Wi-Fi) of devices RSSI captured by our sensor network; notice that

these two sets are completely disjoint.

The results obtained proved that the linking algorithms we introduced in

this thesis have an high grade of accuracy in both the scenarios we tested.

The structure of this thesis is the following. In Chapter 2 we discuss a

number of works that are related to ours and that inspired this study. In

Chapter 3 we explain the technical details of Wi-Fi and Bluetooth, together

with the model of the implemented system. In Chapter 4 and Chapter 5

the experiments are presented. In Chapter 4 we first show the preliminary

experiment and the study of the Wi-Fi and Bluetooth parameters. Then we

explain the home experiment and the details of the implemented algorithms

along with the obtained results. In Chapter 5 we explain the experiment per-

formed in ANTlab and the obtained results. Chapter 6 presents a possible

and a realistic attack scenarios using the acquired knowledge. In Chapter 7,

we conclude by summarizing the purposes and the final evaluations of this

thesis. Some suggestions for future works are also proposed.

3

Chapter 2

State of the Art

This chapter describes the related works about Wi-Fi and Bluetooth. To

date, in literature, a crossed analysis between Wi-Fi and Bluetooth MAC

addresses is not present, but a lot of studies about the two technologies were

done.

There are three main thematic areas:

• localization;

• privacy;

• attacks.

2.1 Localization

Tracking people by Bluetooth or Wi-Fi signals has been discussed previ-

ously in literature. These are usually used in indoor localization, because

in buildings the Global Positioning System (GPS) is not suitable due to the

presence of roofs and walls.

Density estimation in crowded mass events has been studied using Bluetooth

scans or Wi-Fi from collaborating smartphones inside the crowd. Zhu et al.

[13] developed a crowd-sourcing localization system that uses both Wi-Fi

scene analysis and Bluetooth beacons. The system uses Wi-Fi fingerprint

(the RSSI). Bluetooth beacons are only used to share the location of a device

and populate a signal map.

An interest study was performed in a German airport. Using the ground-

truth provided by the security check process, Schauer er al. [22] discussing

the quality and the feasibility of pedestrian flow estimations for both Wi-

Fi and Bluetooth. They used inquiry scans and probes collection to cap-

ture respectively Bluetooth and Wi-Fi MAC addresses. Their results have

4

shown Wi-Fi is a good estimator of the pedestrian flow and Bluetooth is not

adequate for a reliable flow estimation system. Probably the inaccuracy of

Bluetooth is due to the use of inquiry scan. This method allows to locate

visible devices only.

Another confirmation that the Wi-Fi allows for a good indoor location comes

from Ruiz et al. [21]. They localize devices in an hospital using the Access

Points to capture the traffic. Using the trilateration algorithm their mean

error is 15 meters.

As we can see, the localization using Wi-Fi is possible and already stud-

ied. Bluetooth needs a separate discussion.

Naini et al. [19] conducted an experiment where ten attendees of an open-

air music festival acted as a Bluetooth scanner. The selected attendees are

equipped with a mobile phone programmed to scan Bluetooth devices and

capture Bluetooth devices having their Bluetooth visibility turned on. By

comparing their estimated result with ground truth information provided at

the entrances of the festival, Naini shows that the total population can be

estimated with a surprisingly low error (1.26% in this experiment).

Another similar experiment is performed by Weppner [26] and by Bullock

[4] that confirm the possibility of using Bluetooth as crowd indicator.

More interesting for our research is the discussion on Bluetooth signal pa-

rameters with respect to localization made by Hossain et al. [12]. According

to their analysis and experimental results, RSSI and Transmit Power Level

turn out to be poor candidates for localization. On the other hand, RX

Received Power Level correlates nicely with distance, which makes it the

most desirable Bluetooth signal parameter to be used in location systems.

In our opinion, they discard RSSI due to a methodological error. In fact

that they use a Class 1 dongle to get the RSSI of a device within 18 meters.

As we will see below, class 1 devices can range up to 100 meters. So they

always stay inside the GRPR getting a value of 0 for the RSSI.

The confirmation that is possible to find out the relationship of RX-power

level with distance was done by Subhan et al. [23]. They demonstrated that

the conversion between RX-power level and RSSI is possible if the upper

and lower bounds for GRPR are known. Using the trilateration and the

fingerprint combined with a gradient filter in the measurement stage they

minimized the average error to 2.67 meter. A similar result is obtained by

Chai [6]. He uses a pre-processed BLE RSSI, Kalman filtering and triangu-

lation algorithm to calculate the location of a mobile device. Experiment

results show that his algorithm achieves positioning accuracy of 0.2∼0.5m.

5

From these researches, it is evident that the distance estimation is impossi-

ble with the RSSI raw and is possible with the RSSI average data [14].

As we can see from the previous research, Wi-Fi is a strong technology

for the localization. Bluetooth research has incongruous results, but the

majority confirms that it is possible to use it for indoor localization pur-

pose.

2.2 Privacy

Bluetooth and Wi-Fi present, not only benefits like localization, but also

critical challenges like privacy. Collection data from capturing wireless tech-

nologies needs the exchange of MAC addresses, a unique identifier for the

technology and it can be associated to a specific person. The MAC address

is easily visible in Wi-Fi probes and in Bluetooth signals because it is sent

without encryption [25] and in broadcast. Some mobile devices send probe

requests as often as 55 times per hour, thus revealing their unique MAC

address at high frequency [10].

These problems allow the use of MAC addresses scanning to deliver signifi-

cant information from spatiotemporal dynamics of people movements [1]. A

mobile phone also broadcasts the list of Wi-Fi network saved on the device

(SSID). This list can be used to classify people, to extract social connec-

tions among the smartphone owners and to uncover the underlying social

network of the participants in a venue. It is also possible to understand the

international nature of an event and the density of foreign participants or

to analyze the travel frequency of a person.

Another interesting topic is the distribution of the smartphone vendors

across events and the analysis of the expected socioeconomic background

of the participants. Starting from this assumption, Barbera et al. [2] de-

veloped an automated methodology to derive the underlying relationship

graphs between the users in each scenario. They also performed language

detection on the broadcast SSIDs and exploited the vendor ID to show how

the probes can directly reflect the sociological aspects of the people involved

in each scenario, including nationality, age, and socioeconomic status.

This information can be manipulated using WiGLE1. It allows to discover

where a Wi-Fi network is located starting from its name. Using the MAC

address and the probe requests it is also possible to discover the name of a

1https://wigle.net/

6

person or the vendor of a device.

Bluetooth is also affected by privacy issues. During an inquiry scan it is

possible to discover personal information like device name (that sometimes

corresponds to the owner’s name) and device model.

Mei et al. developed a travel time estimation method based on Bluetooth

MAC address [17]. This allows a possible attacker to understand the move-

ment of a target. Tracking people movement is also possible using Wi-Fi.

Cunche [8] presents methods that, given an individual of interest, allow to

identify the MAC address of its Wi-Fi device.

These privacy issues are mitigated by the Wi-Fi MAC address randomiza-

tion. In order to impede tracking and leverage privacy issues some vendors

implement in their devices MAC address randomization. Under some condi-

tions (i.e. screen turned off) the broadcast MAC address is substituted with

a fake address. This technique is adopted only by a few vendors (e.g. Apple,

Motorola and other few Android). Nevertheless Martin et al. [15] showed a

method that can be used to track 100% of devices using randomization, re-

gardless of manufacturer, by taking advantage of a previously unknown flaw

found in the way existing wireless chipsets handle low-level control frames.

As regards Bluetooth, [9] suggest that Bluetooth address randomization

would not be implemented as it would adversely affect existing implementa-

tion. The Bluetooth defense mechanism is the non-visible mode. Indeed a

device can have the Bluetooth interface turned on, but not be visible. This

allows the device to remain hidden to an inquiry scan. Recent studies [7]

demonstrated that using Ubertooth One, a low-cost open source Bluetooth

development platform is possible to discover up to ten times as many hidden

devices respect a normal inquiry scan.

2.3 Attacks

The issues previously discussed allows a malicious attacker to exploit the

presented vulnerabilities in different ways. The most trivial attack is the

stalker attack. It consists in following a person at a reasonable distance

with a monitor device to understand his unique MAC address [8]. In addi-

tion, Wi-Fi routers can be easily turned into Wi-Fi tracking devices through

software modification [20] and this can be used to follow a person’s path.

A common attack is the Denial of Service on battery-powered mobile de-

7

vices. The attack can be performed on Wi-Fi, Bluetooth or with a blended

approach. Moyers et al. [18] demonstrate that these attacks can accelerate

battery depletion by as much as 18.5%. For Wi-Fi ping flood, ACK flood and

SYN flood are used. For Bluetooth l2ping flood, bluesmack flood, bluespam

flood, blueper flood are used. The two types of attacks can be blended with

each other.

Bluetooth have several security issues during its various implementations

of the standard stack since late 2003. The most commons are [5]:

• BlueSnarf which allows an attacker to access the vulnerable device’s

phone book and calendar without authentication. A recently upgraded

version of this attack gives the attacker full read-write access.

• Bluejacking which allows an attacker to access to the phone book

and also to access the files on the device using the principle of the

hijacking.

• BlueBug favours the access to the cell phone’s set of commands,

which lets an aggressor use the phone’s services, including placing

outgoing calls, sending, receiving, or deleting SMSs, diverting calls,

and so on.

• BlueBump takes advantage of a weakness in the handling of Blue-

tooth link keys, giving devices that are no longer authorized the ability

to access services as they were still paired to the target device. It can

lead to data theft or to the abuse of mobile Internet connectivity ser-

vices.

8

Chapter 3

Technical Overview and

System Architecture

3.1 Wi-Fi

Wi-Fi is a technology for wireless local area networking with devices based

on the IEEE 802.11 standards. Wi-Fi operates at 2.4 GHz (802.11b/g)

over 11 channels in USA and over 13 channels in Europe, three of which are

not overlapping (1, 6, 11). In figure 3.1 the way the channels are arranged

is shown. They may only be separated by 5MHz but the spread spectrum

uses 25MHz centred on each channel. The use of different non-overlapping

channels permits to reduce the collision between Wi-Fi packets.

Figure 3.1: Graphical representation of Wireless LAN (Wi-Fi) channels in 2.4 GHz band

9

Recently Wi-Fi supports also 5 GHz (802.11n) with 21 channels with higher

capacity, but a shorter range compared to 2.4 GHz. Modern device can

switch between 2.4 GHz and 5 GHz, using a technique called band steering,

depending on traffic demand.

When a smartphone or a laptop want to access to the internet through

Wi-Fi, it needs to connect to an Access Point (AP).

So, every device with Wi-Fi interface turned on, regularly broadcasts some

Wi-Fi probe requests in order to advertise its presence and actively discover

Wi-Fi access points in proximity. This mechanism is called active scan and

permits devices to have a list of nearby access points.

IEEE 802.11 define another mechanism to discover Wi-Fi AP: a passive

mechanism, in which APs periodically advertise their presence to mobile

devices using beacons.

3.1.1 Passive Scanning

When a device performs passive scanning, it starts to listen over the 11 Wi-

Fi channels hopping periodically from one to another and passively detect

nearby APs. When a beacon is captured, the mobile device responds with

a Wi-Fi association frame.

The beacons contain network configuration parameters, such as the Service

Set Identifier (SSID), the type of encryption and the supported data rates.

The beacon interval is not a fixed number: most APs set an interval every

100ms, but it depends on the hardware specification.

The main disadvantage of the passive scanning is listening on all the eleven

channels. This operation is time consuming and do not ensure all the beacon

are captured.

3.1.2 Active Scanning

During the active scanning, the mobile device stimulates its nearby access

points sending probe requests. The probe packet includes the device unique

identifier, the device supported standards, the probe sequence number (SN)

and other fields. The probe can be directed to all the APs (broadcast) or to

a specific access point by indicating its SSID.

Active scanning is particularly helpful in scenarios where a mobile device

roams across APs. It is also faster and less energy consuming than passive

scanning because less packets are lost.

Also, active scanning is the only method to connect to a hidden network

indicating the access point SSID.

10

3.1.3 Probe Request Structure

Figure number 3.2 represents the packet structure of a probe request. The

interesting fields are:

• Frame Ctrl: the type of the frame, usually 0x00;

• Address 1: the receiver MAC address, usually broadcast (FF:FF:FF:FF:FF:FF);

• Address 2: the sender MAC address, the device MAC address;

• Address 3: the Access Point MAC address (BSSID);

• Sequence Control: the sequence number (SN) that represent a single

probe request;

• Frame Body: the list of the mobile devices SSID;

• FCS: a redundant check code.

Figure 3.2: Probe request packet structure

In the frame body, a list containing the Wi-Fi APs on which the device

was connected is present. This allows a faster connection between device

and access point, on the other hand it helps understanding the origin of the

device and the places his owner visited.

In table number 3.1 is shown as a credible example of probe request. It

follows the IEEE 802.11 standard so it is not encrypted.

In our case, a device with MAC address 14:10:9F:d5:04:01 is broadcasting

a probe request with SSID polimi-protected and sequence number equal

to 12.

Table 3.1: Example of Wi-Fi probe requests

Frame Ctrl Duration Destination Source BSSID SN SSID

... ... ff:ff:ff:ff:ff:ff 14:10:9F:d5:04:01 ff:ff:ff:ff:ff:ff 12 polimi protected

... ... ff:ff:ff:ff:ff:ff 88:30:8a:49:db:0d ff:ff:ff:ff:ff:ff 245 null

11

Probe request number

The number of probe requests sent by a mobile phone is very variable among

devices. On average some mobile devices send probe requests as often as 55

times per hour, but they might broadcast about 2000 probes per hour [10].

The frequency of the probe request depends on:

• Wi-Fi chipset: the vendor can set up different parameters depending

on the company policies;

• Device operating system: the OS version and the device settings

can affect the number of probes. For example, a fast speed connection

setting can send an high number of probes or an energy saving mode

can emit a low number of probes;

• Frequency of screen unlocking: unlock the screen stimulates the

probes activity, this allows a faster device connection;

• Number of applications running on the device: the more is

the number of applications and programs that use Wi-Fi, the more

the device is forced to send probe requests to maintain the services

connected.

3.2 Bluetooth

Bluetooth (IEEE 802.15.1 ) is a wireless technology. It is the standard for

exchanging data over short distances from fixed and mobile devices, and to

build personal area networks (PANs). Bluetooth was originated in 1994,

when Jaap Haartsen, an electro technician employed at Ericsson, developed

it in cooperation with Sven Mattisson. The name is based on the Danish

word Blatand, the tenth-century king of Denmark and Norway.

The purpose of Bluetooth is to replace cables with short-range and cheap

radio connection that favours communication between mobile devices and

peripherals.

Bluetooth is open and royalty-free and, thanks to this, it is widely used

for short-range wireless communication in WPAN (Wireless Personal Area

Network) situations. It operates in the universally unlicensed (but not un-

regulated) Industrial, Scientific and Medical (ISM) band at 2.4 GHz. In the

available frequency band, 79 sub-frequencies are used to transmit data, hop-

ping from a frequency to another 1600 times per second in a pseudo random

way.

12

The range of communication of Bluetooth and the maximum transmission

power are determined by their Class. As we can see in 3.2 Class 1 radios

has the longest range of transmission (100 meters), instead Class 3 has a

range of up to 1 meter. In this research, the used devices are mostly belong-

ing to Class 2 (e.g. smart phones, tablets, laptops), their internal chipset

range is about 10 meters. Bluetooth architecture is based on master/slave

Table 3.2: Bluetooth power classes

Class Max Trasmission Power Range

Class 1 100 mW (20dBm) 100m

Class 2 2.5 mW (4dBm) 10m

Class 3 1 mW (0dBm) 1m

model. A single master device can be connected with up to seven different

slaves devices to generate a network, called piconet. The master shares his

clock with the slaves; it also coordinates and manages the connection in the

piconet and sends/requests data to the slaves.

3.2.1 Bluetooth Connections

Bluetooth connections can be of two types: Synchronous Connection Ori-

ented (SCO) or Asynchronous ConnectionLess (ACL). SCO is a real-time

band, it is used mainly for Voice Communication (or data and voice com-

bined). ACL is used exclusively to transport data (i.e. audio/video) and it

is the most used type of connection both during the daily use both during

this research. ACL is the base connection that are established between a

master and a slave, indeed each device receives a default ACL logical trans-

port when it joins the piconet. The connection must be explicitly set up

and accepted between two devices before packets can be transferred [11].

Directly above the ACL is the Logical Link Control and Adaptation Pro-

tocol (L2CAP) layer. This is a packet-based layer, its primary tasks are:

transporting data for higher layer protocols; providing packet sequencing, re-

assembling and segmentation; providing one-way transmission management

of multicast data to a group of other Bluetooth devices and allowing Quality

of Service(QoS) for higher layer. Once established, an L2CAP connection

remains open until it is explicitly closed or the Link Supervision Time Out

(LSTO) expires.

13

L2CAP actually serves as the transport protocol for RFCOMM, so every

RFCOMM connection is actually encapsulated within an L2CAP connec-

tion.

RFCOMM (Radio Frequency Communications) layer is the reliable stream-

based protocol (similar to TCP) used by most Bluetooth applications. It is

used directly by many telephony related profiles as a carrier for AT com-

mands indeed represents the type of connection most people mean by Blue-

tooth connection. RFCOMM emulates RS-232 serial ports and it is necessary

for OBEX transport layer because OBEX needs serial transportation.

RFCOMM is bounded to OBEX (OBject EXchange). OBEX is the commu-

nication session-level protocol that facilitates the data exchange (e.g. object

push profile (OPP), file transfer profile (FTP), vCard, basic imaging, basic

printing, phonebook access, etc.).

In the figure 3.3 the Bluetooth stack architecture is presented. From the

bottom to the top we find ACL and SCO, the Host Controller Interface,

L2CAP, RFCOMM and on the top OBEX.

Figure 3.3: Bluetooth protocol layer

14

3.2.2 Discover a Bluetooth device

In order to start Bluetooth connections between devices, the target device

must be turned on and be visible. The device can be also turned on, but

not be visible; in this case the pairing process is possible only if the target

address is known.

To discovery visible devices, an inquiry mode has been defined. Basically,

a device which wants to set up a Bluetooth connection with another one,

sends out an inquiry packet and the other visible devices listening for them

can answer.

A single Bluetooth inquiry scan process can last until 10.24 seconds [1] and,

at the end of the scan, zero or more devices can be discovered.

The inquiry scan, called Inquiry with RSSI, contains information about:

• Device name: the name that the owner assigns to the device;

• Device profile: the type of the device (e.g.: phone, laptop, Bluetooth

headset, etc.);

• Supported services: the Bluetooth services provided by the device

(e.g.: Advanced Audio Distribution Profile (A2DP), Audio Video Re-

mote Control Profile (AVRCP), Basic Imaging Profile (BIP);

• Unique MAC address: a physical address assigned uniquely to each

device;

• Timestamp: the date and the time of the discovery;

• Received Signal Strength Indicator (RSSI): the measurement of

the power present in a received radio signal.

3.2.3 Bluez

In the Linux kernel-based family operating system, the Bluetooth stack is

managed by Bluez. The most useful command of Bluez is hcitool. Hcitool

(Host Controller Interface Tool) is used to configure Bluetooth connections

and send some special command to the Bluetooth devices. The main func-

tionalities are to discover (inquire a remote device), add, and manage devices

on the piconet; to configure controller properties; to set up, manage and re-

lease logical transports and links. In particular, hcitool provide access to

the RSSI, the LQ and the TPL of a connected device, these are three fun-

damental connection status parameters.

To obtain the previously mentioned values an active connection between the

master device and the slave is needed.

15

Received Signal Strength Indicator (RSSI): According to the Blue-

tooth Core Specification, the RSSI is an 8-bit signed integer that indicates

the difference between the received power level and the Golden Receiver

Power Range (GRPR).

Using the command hcitool rssi <bdaddr> a value between +15dBm

and -35dBm is obtained.

A positive RSSI value indicates how many dB the RSSI is above the upper

limit; a negative value indicates how many dB the RSSI is below the lower

limit. The value zero indicates that the RSSI is inside the Golden Receive

Power Range [3].

The Golden Receive Power Range indicates a zone in which a raw bit error

rate is better than 0.1 % (BER <103).

Transmit Power Level (TPL): TPL is an 8-bit signed integer which

specifies the Bluetooth module’s maximum transmit power level (in dBm)

[12]. Every Bluetooth class has a fixed value and it does not change dur-

ing a Bluetooth connection. For example, Class 2 devices has +4 dBm as

maximum power, Class 3 has 0 dBm and Class 1 has +20 dBm.

Link Quality (LQ): Link Quality is a value from 0 to 255, which repre-

sents the quality of the link between two devices. The higher the value, the

better the link quality is. For most Bluetooth modules, it is derived from

the average bit error rate (BER) seen at the receiver and it is constantly

updated as packets are received.

3.2.4 Inquiry with RSSI and hcitool RSSI

As explained in section 3.2.3, using hcitool of Bluez we can obtain two

different types of RSSI values. The first value is the RSSI obtained from

the inquiry scan (inqury with RSSI ) and identify the power level of the

Bluetooth target device that the receiver sees; the second one is the RSSI

obtained directly from a connected device.

To be clearer, from now on, the value obtained from the inquiry scan will be

called RX. On the other hand, the value obtained from a connected device

will be simply called RSSI.

These two values are strictly related with a linear dependence, indeed they

represents the same value. The RX is the real power level, instead the RSSI

is the power level minus the GRPR. RSSI can be converted to RX power

level if the Upper and Lower threshold values of the GRPR are known. The

relation is further analyzed in section 4.1.1.

16

3.2.5 l2ping

The Linux Bluetooth stack also allows to ping a Bluetooth device.

Ping is an utility used to test the reachability of an host, in our case a Blue-

tooth machine. It measures the round-trip time for messages sent from the

originating host to a destination that are echoed back to the source.

For Bluetooth the command l2ping is used. L2ping sends a L2CAP echo

request to the Bluetooth MAC address [16] and waits for an echo response

from the target device. L2CAP echo requests are directly analogous to the

familiar ICMP ping packet in IP. The ping feature is useful to understand

if a Bluetooth device is in a particular range. If so, l2ping utility starts to

send several echo requests to the target. If not, an error message is shown.

In particular, if the echo request is successful l2ping (fig. 3.4) starts to ping

the Bluetooth target device. In the default mode these fields are shown:

• The size of the single packet of the echo request (default 44 bytes);

• The MAC address of the target;

• The progressive id of the packets;

• The echo Round-Trip Time (RTT) in milliseconds.

Figure 3.4: l2ping utility in Kali Linux

17

The use of l2ping permits to create a basic L2CAP connection that almost

universally authorisation-free (explained in section 3.2.1). Although the

resultant connections are limited in use for communications (they support

little more than low-level testing) they are sufficient to run successfully RSSI,

LQ, or TPL Linux commands.

3.3 MAC Address

MAC address is the acronym of Media Access Control Address. It is an

unique identifier of a IEEE 802 network interface. Some examples of IEEE

802 standards are: ethernet, Wi-Fi, ZigBee, FDDI (Fiber Distributed Data

Interface) and Bluetooth.

In our case MAC address is a fundamental information because it identifies

uniquely a particular network interface of the device. Considering that a

smartphone is equipped with Wi-Fi and Bluetooth chipset, a device is char-

acterized by two MAC addresses: one for the Wi-Fi interface and one for

the Bluetooth interface.

In both cases the structure is the same: a 12 digits (48 bits or 6 bytes)

address, usually written in the following three formats:

• MM:MM:MM:SS:SS:SS

• MM-MM-MM-SS-SS-SS

• MMM.MMM.SSS.SSS

The leftmost 6 digits (24 bits) called prefix is associated with the adapter

manufacturer, called OUI (Organizationally Unique Identifier). Each ven-

dor registers and obtains MAC prefixes as assigned by the IEEE. Vendors

often possess many prefix numbers associated with their different products.

Discover on the web the vendor from the prefix is quite easy. Whireshark

provides a way to look up OUIs and other MAC address prefixes1.

The rightmost digits of a MAC address represent an identification num-

ber for the specific device. It is called Network Interface Controller (NIC).

Among all devices manufactured with the same vendor prefix, each is given

its own unique 24 bits number.

1https://www.wireshark.org/tools/oui-lookup.html

18

A real example of MAC address of the same device is:

• Wi-Fi address: F4:E3:FB:85:53:1D

• Bluetooth address: F4:E3:FB:A5:66:D8

In the example above the the vendors digits are the same, but often, the

same device has two completely different Wi-Fi and Bluetooth prefixes.

Privacy implications

Due to the fact that the MAC address identifies uniquely a device, this can

be used to identify a person. As explained in Section 2 this can rise a great

deal of privacy issues. Indeed, as explained above, both Wi-Fi and Blue-

tooth addresses are easy to obtain: the first one is sent in clear with the

probe request and the Bluetooth address is visible during the inquiry scan

but the two addresses are different.

As explain in section 2.2, to protect mobile devices from this issue, some

vendors perform a technique known as MAC address randomization. This

replaces the number that uniquely identifies a device’s Wi-Fi hardware with

randomly generated values.

3.4 System Architecture

During this thesis a tool capable of capturing Wi-Fi probes and of collect-

ing Bluetooth parameters was implemented. We used the terms Bluetooth

signals or parameters to denote all the status parameters of a Bluetooth

connection together with any other signal strength values made available in

Bluetooth Core Specification.

To capture probe requests and signals, depending on the test, up to 6 Rasp-

berry Pis 3 equipped with a NETGEAR N150 Wireless USB Adapter were

used. The Raspberry Pis running Raspbian Jessy version 4.9.24 and all

of them are synchronized with NTP server. They are remotely controlled

through SSH (Secure Shell) over the Wi-Fi network. This facilitated the

experimenter to have complete control over the whole system from remote.

The Raspberry Pis run a Python script. Besides the ease with which Python

manipulates data and variables, this programming language was also used

in view of the immediacy in launching Linux bash scripts.

19

When the user starts the program (fig. 3.5) it can set two options: the

time of capture (-t option) and the name of the capture (-n option).

The program consists in a main function that creates three different threads.

The first one gathers Wi-Fi probes; the second one starts to inquiry the

Bluetooth devices; the last one collects RSSI, TPL and LQ. As soon as a

new client is found, the script outputs in real time a message containing the

MAC address of the device; in the meantime the main process stores in a

dictionary all the data regarding the clients.

Figure 3.5: Developed script running on the Raspberry Pi through SSH in Kali Linux

Wi-Fi probes collection To capture Wi-Fi probes Aircrack-ng was used.

Aircrack-ng is an open-source suite of tools, written in C language, to assess

WiFi network security. In particular, the command airodump-ng <wlan

interface> is used for packet capturing of raw 802.11 frames. For this

purpose, the source code of Airodump was modified to show the sequence

number and the timestamp of the captured packets.

In order to run Airodump-ng the Wi-Fi interface must be in monitor mode,

the NETGEAR dongles are used for this purpose. Monitor mode allows the

Raspberry Pi to monitor all traffic received from the wireless network and

to listen the probes.

Inquiry with RSSI Bluetooth RX power level is obtained through hcitool

spinq. It allows to inquire automatically other Bluetooth devices endlessly.

In parallel, hcidump retrieves the raw data and the python script parses the

useful information.

20

Other Bluetooth parameters Received signal strength indicator (RSSI),

link quality (LQ) and transmit power level (TPL) are three fundamental

parameters about Bluetooth connection. In order to obtain this data, a

connection is required.

As explained in section 3.2.5 during the ping process a L2CAP connection

between the Raspberry Pi and the target device is established. Thanks to

it, it is possible to obtain RSSI, LQ and TPL.

The used commands were:

• l2ping <mac address> to ping the Bluetooth MAC address

• hcitool rssi <mac address> to gather the RSSI

• hcitool tpl <mac address> to gather the Transmit Power Level

• hcitool lq <mac address> to gather the Link Quality

When the thread in charge of capture Bluetooth parameters starts, it imme-

diately runs a bash script properly written to ensure a continuous Bluetooth

connection with the target device using l2ping. After the connection is set,

the thread sends the three hcitool commands every second at the same time,

it parses the results and stores them in a dictionary.

The capturing process ends when a timer set by the user expire or when

the user voluntarily stops the script. The program creates three .csv files,

one for each category explained before. The csv files contain the MAC ad-

dress of the device, the timestamp and all the useful data regarding Wi-Fi or

Bluetooth. Automatically, using mysqlimport command, the csv are loaded

in a MySQL database running on a external server.

The database is composed by three tables.

• The Wi-Fi table. In each row a probe request is stored. It contains:

– the probe sequence number (SN) ;

– the time and the data of capture (timestamp);

– the device Wi-Fi MAC address (mac address);

– the list of past SSID (SSID);

– the RSSI of the probe request (RSSI)

– the ID of the Raspberry Pi that capture the probe (Raspberry Pi

number).

• The Bluetooth inquiry table. In each row a inquiry of a device is

stored. It contains:

21


– the device Bluetooth MAC address (mac address);

– the RX power level of the inquiry (RX);

– the ID of the Raspberry Pi that capture the inquiry (Raspberry

Pi number).

• The Bluetooth parameters table. In each row capture of the three

fundamental parameters is stored. It contains:


– the device Bluetooth MAC address (mac address);

– the RSSI of the device (RSSI);

– the Link Quality of the device (LQ);

– the Transmit Power Level of the device (TPL);

– the echo round-trip time of the device (echo time);

– the ID of the Raspberry Pi that capture the parameters (Rasp-

berry Pi number).

22

Chapter 4

Experiments and Algorithms

If a smartphone Wi-Fi is turned on, it emits a number of probe requests. If

the Bluetooth is also turned on, we can stimulate the smartphone to emit

some Bluetooth signals. Both the probes and the Bluetooth signals are iden-

tified by two different MAC addresses based on the wireless communication

that we are using.

Pair the Wi-Fi MAC address and the Bluetooth MAC addresses allows to

uniquely identify a mobile device. Indeed these two signals derive from the

same device but they are not immediately related. As we will see below,

the founded values are completely different, but they represent the same

information: the distance between two devices.

The distance between the two mobile devices can be expressed in different

ways:

• Time of arrival (ToA): the estimation of the distance is obtained by

measuring the signal propagation time. The Time of Flight is Tf = dc .

d is the distance between the nodes and c is the speed propagation (c

= 299792, 458km/s);

• Time Difference of Arrival (TDoA): in TDoA the receivers deduce

the distance from instant differences and propagation speeds;

• Angle of Arrival (AoA): In AoA there are directional antennas to

estimate the signal arrival angle and deduce the distance;

• Received Signal Strength Indicator (RSSI): RSSI uses the signal

attenuation to infer the distance, indeed a signal attenuates during

propagation.

23

Line-Of-Sight (LOS) propagation is a characteristic of signals propagation

which means waves that travel in a direct path from the source to the re-

ceiver. In closed environments it is difficult to have a straight line between a

sender and a receiver. The signal is affected to multipath, that is the prop-

agation of the signal through different path. It is caused by atmospheric

ducting, reflection and refraction caused by walls, body, windows, etc... .

These issues make techniques like ToA, TDoA or AoA inaccurate. So, in

our experiments we choose the RSSI based approach.

It is important to remind that we are not only focused on the absolute

distance between a sender and a receiver. We want to determine if the

Wi-Fi and the Bluetooth signals have the same path loss to establish if the

device is the same.

In this section, are first described the experimental test-bed and the de-

vices used during the experiment. The experiments are mainly two: the

analysis of the device’s Wi-Fi and Bluetooth parameters and the matching

experiment. The first analysis allow us to understand the best choice in

term of parameters. These values are used during the matching experiment.

Successively the linking algorithms and the methodology are described. In

the end there is the interpretation of the results.

4.1 Preliminary experiments

In this experiment we have captured the Wi-Fi probes (containing the Wi-Fi

RSSI) and the Bluetooth signals (RSSI, TPL, LQ, echo round trip time).

The goals are to understand the correlation between distance and the signals

originating from the target devices and the relation between Wi-Fi and

Bluetooth. Indeed our main scope is not to find the absolute position of

a device, but to comprehend if the Bluetooth and the Wi-Fi signals have

origin from the same device.

The environment

The preliminary experiments were held in a home environment with a di-

mension of 9.50 meters x 4.50 meters and an area of 42.75 m2. During the

first phase of the experiment, the home environment was chosen because it

was important to have an isolated environment and no other devices that

could cause any noise. In addition, it was also crucial to have a direct path

between the studied devices.

24

The devices

The target devices used during this experiment were a LG-E450 with An-

droid 4.1.2 (Ultra Slim custom ROM) and an iPad with iOs 10.

A Raspberry Pi 3 was used to capture Wi-Fi probes and Bluetooth signals.

The Wi-Fi module was a NETGEAR W150 and the Bluetooth module was

the internal one. The presence of the Raspberry Pi’s case does not influence

the strength of the signals.

Execution

The Raspberry Pi was placed in a fixed point, while the target devices

were moved to different distances every 10 minutes. The path between the

Raspberry Pi and the devices has a straight line without any obstacle in the

middle.

In the end, our script made the average of all the values to obtain a single

value for each position.

4.1.1 Results

As explained before, we want to understand if the collected parameters are

in relation with the distance and if they are in relation among each others.

It is also important to comprehend how we can infer the distance from a

RSSI value and to study the other variables to understand if they are useful

in our case.

Bluetooth

The Bluetooth signals analyzed during this experiment are the connection

based RSSI, the TPL (Transmit Power Level), the LQ (Link Quality), the

echo Round Trip Time (obtained from ping) and the RX power level (ob-

tained from inquiry with RSSI).

25

From figure 4.1, the following observations can be made:

-30

-20

-10

0

0.0 2.5 5.0 7.5 10.0

Distance

Blu

etoot

hR

SSI

DeviceiPad

LG

a) Distance Vs Bluetooth RSSI

0

50

100

150

200

250

0.0 2.5 5.0 7.5 10.0

Distance

LQ

DeviceiPad

LG

b) Distance vs Link Quality

3.50

3.75

4.00

4.25

4.50

0.0 2.5 5.0 7.5 10.0

Distance

TP

L

DeviceiPad

LG

c) Distance vs TPL

120

150

180

210

0.0 2.5 5.0 7.5 10.0

Distance

Ech

oR

TT

DeviceiPad

LG

d) Distance vs Echo RTT

Figure 4.1: Bluetooth signals behavior from 0 to 10 meters

Connection based RSSI: The Received Signal Strength Indicator strongly

depends to the distance. It starts from 0 dBm, which means that the target

device is inside the GRPR and then decrease. As we can note from the

graph (4.1.a), the iPad chipset is more powerful than the LG one. Indeed it

is easy to imagine that after ten meters the LG lose the connection (-35 dBm

is the maximum for RSSI value), instead the iPad can move apart and be

26

connected yet. So, the RSSI value strongly depends from the device model.

Finally, the curves follows a logarithmic trend as all the powers of the sig-

nals. This is true, but not so evident as we imagine. However is evident

that is possible to infer the distance starting from RSSI.

LQ: The link quality, as specification said, start from 255 if the connection

is strong and goes down until 0 when the connection is poor. In our exper-

iment the LQ values poorly correlates with the distance. When the devices

are near and distant from the Raspberry Pi the value is respectively high

and low, but the intermediate values are not meaningful. For these reasons,

for our measurement LQ is discarded.

TPL: Fig. 4.1.c shows a horizontal straight line for Transmit Power Level

values, indeed this value does not change during a Bluetooth connection.

The iPad and LG lines are overlapping in +4 dBm. This fact makes impos-

sible use TPL in our calculation.

Echo Round Trip Time: Echo RTT is obtained pinging the target de-

vice. It measures the Round-Trip Time (RTT) for messages sent from the

originating host to a destination computer that are echoed back to the

source.

We have imagined the more is the distance and the more is the round-trip

time, but this supposition is not completely true. Indeed, the iPad has a

RTT of approximately 120ms during all the phases of the experiment; the

LG RTT decrease until 4 meters and then rapidly increase. In figure 4.1.d

the trends of the round trip time of echo requests are shown. Also the Echo

RTT is discarded due to its poor correlation with the distance.

27

RX Power Level The Raspberry Pi Bluetooth chipset provide absolute

RX power level through inquiry, as opposed to the relative RSSI values sug-

gested by Bluetooth specification that depends on the GRPR range. Fig. 4.2

certainly establishes the RX power level shows a great correlation with dis-

tance. Also in this case, there are evident differences between the LG RX

power level and the iPad RX.

-90

-80

-70

-60

-50

0.0 2.5 5.0 7.5 10.0

Distance

Blu

etooth

RX

DeviceiPad

LG

Figure 4.2: Distance vs Bluetooth RX power level

Bluetooth RSSI vs Bluetooth RX Power Level

As we have seen before, the two principal Bluetooth signals parameters are

the RSSI and the RX Power Level. They represent the same value, but the

first one includes the presence of the GRPR.

In figure 4.3 the relation between the two signals is shown. Their dependence

is linear, so it possible to easily convert the RX power level in RSSI and vice

versa.

28

-100

-80

-60

-50 -40 -30 -20 -10 0

RSSI

RX

a) LG

-100

-80

-60

-50 -40 -30 -20 -10 0

RSSI

RX

b) iPad

Figure 4.3: Bluetooth RSSI vs Bluetooth RX of two different devices

In the following experiments we decide to use only the RSSI. Whilst the

RX seems more precise, the RSSI collects many more values than RX. This

allows to be more accurate and to reduce experiments time, thinking also

of a real scenario. Indeed, as we can see in figure 4.4, during a ten minutes

measurement, the number of RSSI values are almost ten times more than

the RX values obtained from the inquiry. The RSSI can be request every

seconds (or more), while the RX is affected to inquiry time that is around

10.24 milliseconds.

29

0

100

200

300

400

500

0 1 2 3 4 5 6 7 8 9 10

Meters

Fre

qu

ency

Type RX RSSI

a) LG

0

100

200

300

400

500

0 1 2 3 4 5 6 7 8 9 10

Meters

Fre

qu

ency

Type RX RSSI

b) iPad

Figure 4.4: Number of Bluetooth RSSI and Bluetooth RX of two different devices

during a ten minutes measurement

In addition, the RSSI can be also obtained for non-visible devices, while

the RX is only for the visible ones. As explained before (section 3.2.5) it is

possible to establish a connection with a device using ping. The ping process

is also possible if the device has the invisible Bluetooth setting. This feature

allow us to use the hcitool rssi, hcitool tpl and hcitool lq commands

because a l2cap connection is established.

In a real world scenario, obtain the unseen devices values is a big advantage

because the majority of the devices have the Bluetooth set to non-visible.

30

Wi-Fi

The last preliminary experiment is the relation between Wi-Fi and distance.

As said previously, the Wi-Fi probes have a field containing the RSSI. After

capturing it and averaging the data on the basis of the distance, the graph

in figure 4.5 was been created.

-80

-60

-40

-20

0.0 2.5 5.0 7.5 10.0

Distance

Wi-

Fi

RSSI

DeviceiPad

LG

Figure 4.5: Distance vs Wi-Fi RSSI

The Wi-Fi RSSI follows a logarithmic distribution depending on the dis-

tance. It is quite obvious due to the fact that RSSI represents the power of

a signal in logarithmic scale. Therefore, as we imagine, the Wi-Fi RSSI is a

good indicator of the distance of a device.

The distribution of the Wi-Fi RSSI is rather similar to the distribution of

the Bluetooth RX power, but the signal strength is higher in Wi-Fi. This is

due to the fact that the Wi-Fi range is greater than the one of Bluetooth,

which is only around 10 meters for a Class 2 device.

31

4.1.2 Home experiment parameters

In the previous sections, we have analyzed which parameters fit better with

the distance. The choices has been Wi-Fi RSSI, Bluetooth RSSI and Blue-

tooth RX power. As regards Bluetooth only the RSSI was chosen due to

the fact its high number of collectible values and the possibility of capturing

data also in non-visible mode.

Hence, in the following experiment we will only consider Bluetooth RSSI

and Wi-Fi RSSI.

In the experiment above, we understand that different devices have different

RSSI-distance logarithmic curve. This is due to the different internal chipset

of the devices. In figures 4.6 and 4.7 the different logarithmic regression of

five different smartphones and tablets are shown.

As regards Wi-Fi, the logarithmic regressions are very close each other. The

probes Wi-Fi power level are not vastly different between various devices.

-80

-70

-60

-50

2.5 5.0 7.5 10.0

Distance

Wi-

Fi

RS

SI

Devices S3 S Adv LG S TAB iPad

Figure 4.6: Wi-Fi RSSI logarithmic regression of the target devices

32

Instead, there are a high dissimilarity between devices in term of Bluetooth

RSSI (Figure 4.7). In the following algorithms we use a different line for

each device. For example, the LG (cyan line) is the less powerful in term of

Bluetooth RSSI and also in term of Wi-Fi RSSI.

-30

-20

-10

0

2.5 5.0 7.5 10.0

Distance

Blu

etoot

hR

SS

I


Figure 4.7: Bluetooth RSSI logarithmic regression of the target devices

It is also important understand the relation between Wi-Fi and Bluetooth

RSSI. It is plotted in the following graph (figure 4.8). The dependence

between Wi-Fi and Bluetooth is linear and it is possible to convert the

Bluetooth in Wi-Fi and vice versa. Although some curves are similar, also

in this case every device model has a different characteristic curve trend, so

a model for each device is created.

This relation is fundamental in the matching of Wi-Fi and Bluetooth MAC

addresses.

33

-100

-80

-60

0 10 20 30

Bluetooth RSSI

Wi-

Fi

RS

SI


Figure 4.8: Bluetooth RSSI vs Wi-Fi RSSI of the target devices

4.2 Home experiment

Starting from the previous data and considerations, now we can explain the

real MAC address coupling experiment.

During this test we have collected the Bluetooth RSSI and the Wi-Fi probes

of 15 placed randomly devices. The devices positions are known and they

are kept in the same position during all the experiment’s time. In this way

we obtain two different RSSI signals (Bluetooth and Wi-Fi) of each device

at the same time and in the same place. This signals are not related because

they come from two different chipset. The goal is to link two MAC addresses,

one coming from Wi-Fi and the other one coming from Bluetooth. It allows

us to identify uniquely a device. Linking the MAC addresses means under-

stand if the Wi-Fi and the Bluetooth RSSI have origin from the same device.

34

To link the two RSSI we create various algorithms and we test them to

understand which algorithm is better as matching one.

The environment

Also this phase was held in an home environment with a dimension of 9.50

meters x 4.50 meters and an area of 42.75 m2. The home environment was

chosen because it was important to have an isolated environment and no

other devices that could cause noise. It was also crucial to have a direct

path between the devices.

In the figure 4.9 the planimetry of the room is shown. It has been divided

in 50 squares of side 0.9 meters and an area of 8.1 m2.

1 2

34

a) 4 Raspberry Pis

1 2

5

6

34

b) 6 Raspberry Pis

Figure 4.9: Room planimetry with different Raspberry Pis configuration

The scenario choice is fundamental. There are two possibilities: anchor

based or anchor free. In the anchor based scenario only the anchor nodes

(in our case the Raspberry Pis) know the position. The other nodes (in our

case the devices) position are derived through the anchors. This coordinate

system is absolute. In the anchor free scenario no node knows his position.

A relative coordinate system is obtained.

35

Our choice was the anchor based scenario, because only the Raspberry Pis

are able to catch the probes and manipulate the data. Indeed, the target

devices are passive.

The devices

In the environment we placed in a random way five different target devices.

Every device is moved in three different random positions in order to simu-

late the presence of 15 different devices (figure 4.11).

The used devices are:

• a LG-E450 with Android 4.1.2 (Ultra Slim ROM). Device number

1,6,11

• a Samsung S advance with Android 4.4.4 (CyanogenMOD 11). Device

number 2,7,12

• a Samsung S3 mini with Android 5.1.1 (CyanogenMOD 12). Device

number 3,8,13

• a Samsung Galaxy Tab S2 with Android 7.0. Device number 4,9,14

• an iPad with iOs 10. Device number 5,10,15

Figure 4.10: Photos of the capturing phase.

As anchors we used 4 Raspberry Pis, with the NETGEAR dongle, in the

four corners of the room (4.9.a). In the second phase two more Raspberry

Pis were added (4.9.b).

The six anchors configuration allows to cover all the zone of the room and

to have different capturing angles.

36

1

1

2

3

76

4

5

14 11

10

8

12

13

15

9

2

5

6

34

Figure 4.11: Room planimetry. In green the six Raspberry Pis, in red the fifteen devices.

Execution

During the experiment the Raspberry Pis stayed in a fixed point and the

five devices were placed in three different positions every 10 minutes. The

script was run in order to capture the signals.

At the end of the capturing phase the script deletes the corrupted data and

generates a Wi-Fi dataset and a Bluetooth one.

The datasets are composed of:

• a column for each Raspberry Pi (4 or 6 columns, depending on the

configuration) containing the RSSI value captured by the respectively

Raspberry Pi;

• a MAC address column (Wi-Fi or Bluetooth, depending on the dataset)

37

indicating the MAC address device;

• a timestamp column indicating the time of capture.

Each row represents a vector of values captured in the same instant (same

timestamp). In this way two datasets with n rows and 6 columns (in case

of 4 Raspberry Pis configuration) was created. One dataset is for the Wi-Fi

and one dataset is for the Bluetooth.

After this process, we calculate the average of the RSSI of each device for

each Raspberry Pi in the two datasets. As a result, we have two different

datasets (Bluetooth and Wi-Fi) with 15 lines, one for each device. So a

MAC address is identified by a vector of four (or six) averaged RSSI, one

for each Raspberry Pis. In table 4.1 is represented an example of Bluetooth

dataset. There are 4 columns with the RSSI and one column with the MAC

address. In the first line there is the device number 1, the LG device. Its

average RSSI from Raspberry Pi number 1 is -15.8, RSSI from Raspberry Pi

number 2 is -22 and so on. The Wi-Fi dataset (4.2) has the same structure

Table 4.1: Bluetooth Dataset

device rasp1 rasp2 rasp3 rasp4 mac address

1 -15.8034 -22.2419 -33.4667 -34.9384 88:C9:D0:1F:3E:48

2 -0.8027 -3.2450 -15.1118 -21.0058 D8:90:E8:32:D3:3E

3 -6.6547 0.0000 -19.0269 -25.2993 C8:14:79:A3:93:2E

... ... ... ... ... ...

15 -24.4265 -14.3408 -12.2055 0.1200 DC:A9:04:4F:D9:36

of the Bluetooth dataset. The each line of a dataset correspond to the same

line of the other dataset.

Table 4.2: Wi-Fi Dataset

device rasp1 rasp2 rasp3 rasp4 mac address

1 -67.5986 -71.1032 -83.5181 -91.7776 C4:43:8F:B3:0A:F7

2 -44.5576 -58.2103 -75.6285 -84.0279 D8:90:E8:29:AD:3F

3 -65.5698 -57.9744 -73.1944 -84.8966 C8:14:79:31:3C:2A

... ... ... ... ... ...

15 -72.8848 -70.7097 -63.5971 -51.9083 DC:A9:04:4F:D9:35

38

4.3 Algorithms

After the capturing phase and the manipulation of the datasets, we focused

on the matching algorithms. Various approaches were tested, the best ones

are:

1. normalization;

2. RSSI conversion from Bluetooth to Wi-Fi;

3. RSSI conversion from Bluetooth/Wi-Fi to distance

4. trilateration;

5. fingerprint.

The goal of these algorithms is pair a line of the Wi-Fi dataset with one of

the Bluetooth dataset or vice versa. These algorithms find the Wi-Fi vector

more similar to a Bluetooth vector. The found vector is presumably the

correspondent Bluetooth MAC address.

Euclidean Distance In order to find the most similar vector we use the

euclidean distance. It is the straight-line distance between two, or more,

points in euclidean space. In our case, we have 4 points, one for each Rasp-

berry Pi. The euclidean distance is calculated as follows:

d(w, b) =√

(w1 − b1)2 + (w2 − b2)2 + ...+ (wi − bi)2 + ...+ (wn − bn)2

(4.1)

where wi is the ith Wi-Fi RSSI and bi is is the ith Bluetooth RSSI, with

i = 1, 2, ..., n and n = 4 or n = 6 depending on the configuration.

d(w, b) is close to 0 if the two lines are very similar and became greater if

the lines are different.

Every time we use an algorithm, at the end of the process, we compare

each Wi-Fi vector with each Bluetooth vector using the euclidean distance.

It allows to create a list of Bluetooth addresses for each Wi-Fi address. An

increasing order list based on the euclidean distance is created. The value

closest to zero is the first of the list, the greatest value is the last one. So,

on top of list there are the Bluetooth MAC addresses that are more similar

to the Wi-Fi MAC address. Presumably on the top of the Wi-Fi list there

is its Bluetooth corresponding address and then we can link them.

39

4.3.1 Normalization

The simplest algorithm we have implemented is the normalization of each

line.

The normalization is a process that adjust values measured on different

scales to a common scale, e.g. between 0 and 1.

Both the Wi-Fi RSSI and the Bluetooth one represent the strength of the

respective signal, but they are on different scales (i.e. as we saw in section

4.1 the Wi-Fi RSSI is more powerful than the Bluetooth one). Thanks to

normalization we can take back these two values on the same 0 and 1 scale.

We have normalized separately each line of the two datasets to standardize

Wi-Fi and Bluetooth data for the same device.

The normalization formula is:

zi =xi −min(x)

max(x) −min(x)(4.2)

where x = (x1, ..., xn) and zi is the ith normalized data.

After normalizing the data, we obtain two datasets of values between 0 and

1 representing the Wi-Fi RSSI and the Bluetooth RSSI in a common scale.

Since the two vectors (Wi-Fi and Bluetooth) represent the same distance,

normalizing the vectors should get very similar values. So it is possible

compare the data and link the MAC addresses.

4.3.2 RSSI conversion from Bluetooth to Wi-Fi

In section 4.1.2 we talked about the linear relation between the Wi-Fi RSSI

and the Bluetooth RSSI. This relation was used to convert the Bluetooth

values of the Bluetooth dataset in Wi-Fi values. As mentioned above, every

device has a different regression line, so five different functions were used

during the conversion.

Thanks to that, we have obtained two Wi-Fi datasets (the real one and the

fake one). The last part of the algorithm is to compare each line of the

datasets using the euclidean distance and link the addresses.

This operation can also done converting Wi-Fi in Bluetooth.

4.3.3 RSSI conversion from Bluetooth and Wi-Fi to distance

Starting from the dependence between RSSI (Bluetooth or Wi-Fi) and the

distance we elaborated this algorithm. The idea is to convert the RSSI of

40

the two datasets in distance, obtaining two distance datasets (Wi-Fi and

Bluetooth) and then, using the euclidean distance, match the line that are

more similar.

In order to convert the RSSI in distance is possible to use the following

formula:

RSSI = p0 − 10αlogd

d0(4.3)

• RSSI: the RSSI value (path loss);

• p0: the received power from the node when the distance is d0 (RSSI

in d0);

• d: distance sender-receiver

• α: a path loss constant. It assumes values between 1 and 3, depending

on the environment

The precision of the distance strongly depends on the values that are used

in the previous formula. The correct calculation of α and p0 is fundamental

in order to obtain an accurate distance value.

α is determined by the environment in which the devices are located and can

be found using the inverse formula of the RSSI (usually it is a value between

1 and 3). p0, that is the power level measured at 1 meter, was determined

in an empirical way during the previous tests.

As we can see, using the formula (4.3) is quite complicated due to the esti-

mation of the previous parameters. Furthermore, in our case the distance

calculation was not so accurate as we could expect.

So, to convert the RSSI in distance the curves obtained in section 4.1.2

were used. We create a different regression for each device and for each

technology used (Wi-Fi or Bluetooth). It is useful due to the differences of

power among the devices. Of course, the chosen regression was the logarith-

mic one (we analyzed the behaviour in the previous sections).

At the end of the process we obtain two datasets containing distances be-

tween the devices and the anchors. These two datasets represent the distance

obtained from Wi-Fi and the distance obtained from the Bluetooth. The

last step is to compare the distances vectors using the euclidean distance.

41

4.3.4 Trilateration

Trilateration is trigonometric approach for tracking mobile objects consider-

ing the concept of circles. Since the device knows distance from a minimum

of three known Raspberry Pis, trilateration is performed to determine its

coordinates. The position is obtained intersecting the circles created by the

distance between devices and anchors; the point of intersection is the coor-

dinate of the target device.

In our case, we have 4 or more anchors and not always the intersections are

in a single point. In this case the problem of trilateration can be approached

from an optimisation point of view. We want to find the point P = (x, y)

that provides us with the best approximation to the actual position P. For

this purpose we use the Ordinary Least Squares (OLS) method:

minimizen∑

i=1

[di − dist(P , Li)]2

N(4.4)

Where:

• di is the distance between the anchor and the target device;

• P is the coordinate of the device;

• Li is the coordinate of the ith anchor.

• N is the number of anchors.

The device coordinates are obtained minimizing the error.

We apply the ordinary least square method to the Wi-Fi dataset and the

Bluetooth dataset in order to find the coordinates of each device through

Wi-Fi and the coordinates through Bluetooth.

The coordinates of the devices are obtained starting from the coordinates

of the anchors. The top left anchor is (0,0), the top right anchor is (4.5, 0),

the bottom left is (0, 9.5) and the bottom right anchor is (4.5, 9.5).

A pair of coordinates (one for Bluetooth and one for Wi-Fi) for each device

is obtained, hence may be also possible to locate the device. In this case we

are not interested to the position of a device, but only to the relative values

between Wi-Fi and Bluetooth.

The last step is to compare the two types of coordinates to find the more

similar couple. The Bluetooth coordinates and the Wi-Fi coordinates that

are nearest each other are named as a single device and the MAC addresses

are linked.

42

4.3.5 Fingerprint

Fingerprint is one of the most popular method for indoor object tracking.

Wi-Fi probe requests and Bluetooth signals located in a certain area create

an unique fingerprint that is used for the localization.

The fingerprinting based positioning systems are carried out in two phases:

off-line and on-line.

First one is the off-line phase, during this phase the system is calibrated.

The first step is to divide the location in squared grids. The grid dimension

choice is fundamental to obtain a good measurement of the fingerprint. It is

useless to use a dense grid because it is hard to locate Wi-Fi and Bluetooth

with the accuracy of centimeters; but it is also useless to use a sparse grid

because no significant results would be obtained.

In our test we choose to divide the room in fifty squares with a side of 0.9

meters and an area of 0.81 m2.

1

1

2

3

76

4

5

14 11

10

8

12

13

15

9

2

5

6

34

Figure 4.12: Fingerprint grid. In blue the center of the cells

43

The next step is the collection of the fingerprints and the calibration of

each cell. The Raspeberry Pis were used in the previous configuration, four

anchors in the angles and two anchors in the middle (as figure 4.9.b). As

fingerprint target devices we used the LG, the Samsung S Advance and the

Samsung S3 mini.

The devices are placed in the middle of each cell in order to capture the Wi-

Fi fingerprint and the Bluetooth fingerprint. The cell calibration position is

identified in the figure number 4.12 by a small blue point. At the end of the

process, based on the device, the RSSI values are compressed to obtain ten

measurements for each cells.

Eight datasets are created:

• Wi-Fi LG fingerprint dataset and Bluetooth LG fingerprint dataset

• Wi-Fi Samsung S Advance fingerprint dataset and Bluetooth Samsung

S Advance fingerprint dataset

• Wi-Fi Samsung S3 fingerprint dataset and Bluetooth Samsung S3 fin-

gerprint dataset

• Wi-Fi average fingerprint dataset and Bluetooth average fingerprint

dataset

The datasets are device specific because, as we saw previously, the devices

have different behavior.

It would have been logical create the fingerprint also for the other two de-

vices, the Samsung Tab and the iPad. We have chosen to leave them out to

test if it is possible to link a device regardless the device model. Indeed the

last two datasets are composed by the average of the previous fingerprint

datasets.

In table 4.3 the Wi-Fi average fingerprint dataset is shown. The vector

obtained of the RSSI values at a cell is called the location fingerprint of that

cell. All the vectors create a fingerprint Wi-Fi dataset and a fingerprint

Bluetooth dataset. The datasets are 7 columns and 500 rows, ten row for

each cell. As we can see this operation is very time consuming. This is a

great drawback of the fingerprint method.

The second part is called the on-line phase. During this phase the pre-

viously created datasets are used to determine the cell in which the device

is located. For this purpose some machine learning algorithm are used, in

particular K-Nearest Neighbors (k-NN). Due to the fact that the devices are

44

Table 4.3: Wi-Fi Average Fingerprint Dataset

cell rasp1 rasp2 rasp3 rasp4 rasp5 rasp6

1.1 -53.0611 -64.9880 -81.5163 -87.0331 -69.6519 -71.1136

1.1 -52.2466 -64.8562 -82.3245 -86.3631 -69.9400 -71.4375

1.1 -52.5101 -65.3720 -82.6396 -87.1128 -69.3515 -72.0071

... ... ... ... ... ... ...

1.1 -52.6337 -65.0909 -82.0393 -86.9924 -70.1663 -72.5507

1.2 -59.6698 -60.6080 -77.7671 -87.2319 -68.1927 -67.8045

1.2 -59.5273 -59.8640 -77.4314 -87.0188 -67.3188 -67.2139

... ... ... ... ... ... ...

1.3 -71.6366 -45.9241 -86.5242 -80.6321 -70.1093 -79.6348

... ... ... ... ... ... ...

9.5 -74.0613 -83.958 -70.4495 -75.1549 -66.5281 -63.7448

not always in the middle of the cell a variation to the algorithm is done.

Instead of find the cell we find the coordinates of the device.

45

To find the coordinates the following operations are done:

• Step 1: for each target device find the n most similar cells called

candidates. The candidates are selected using the euclidean distance,

hence the n candidates are the n RSSI vectors closest to the target

device. This is a sort of k-NN, but the majority vote between the k

selected items is not performed.

Hence each candidate has a coordinate representing the center of the

cell C(xi, yi) and a distance di to the target device, with i = 1, 2, ..., n.

• Step 2: A weight for each candidate is computed. The weights are:

wi =1

(di)2(4.5)

• Step 3: The sum of the weights wi is normalized to 1, so the new

weights wi are calculated:

wi =win∑

i=1wi

(4.6)

• Step 4: The position of the target device (x, y) is calculated in the

following manner:

(x, y) =

n∑i=1

wi · (xi, yi) (4.7)

The previous four steps are done to find the coordinate of the Bluetooth

and of the Wi-Fi of a single device.

The last step is linking the Wi-Fi coordinates with the Bluetooth coordinates

and checking which are the two most similar coordinates using the euclidean

distance.

46

4.4 Results

The problem is linking a Wi-Fi MAC address and a Bluetooth one. In par-

ticular find which Wi-Fi vector is more similar to a Bluetooth vector and

vice versa.

In the following sections the term accuracy is used as the degree of correct-

ness an algorithm. So it is the number of MAC addresses correctly linked

over the total number of devices.

To link two devices we use the euclidean distance. For each Wi-Fi MAC

address we created a ordered list of Bluetooth MAC addresses from the

most similar to the most different.

This method has allowed us to use a top-k value approach.

4.4.1 Top-k value

For each target MAC address, the ordered list of possible MAC addresses is

15 lines long (15 is the number of devices). The list is ordered based on the

proximity between the vectors.

Top-k approach means that we select the first k MAC addresses of the or-

dered list and we decide that the correct MAC address is inside that k values.

In this way we do not know exactly what is the correct MAC address, but

we create k possibles candidates for the target MAC address.

This approach allows to not exclude some MAC addresses that for any rea-

son are not on the top of the list.

We identify three breakpoints (the k values):

• Top 1

• Top 3

• Top 5

A particular case of top-k is when k = 1. This means that we pick the most

similar value and we decide that value is the correct MAC address. In top

3 and top 5 we chose the first 3 or 5 MAC addresses as possible MAC address.

In figure 4.13 the percentage of the correct MAC addresses inside the k

values is shown. These percentage values identify a 4 Raspberry Pis sce-

nario.

47

fing avg 5

fing 5

trilateration

conv dist

conv WiFi to BT

norm

0% 25% 50% 75% 100%

Accuracy

Alg

orit

hm

Top 5 3 1

Figure 4.13: Algorithms accuracy percentages of the top-k value approaches with 4

Raspberry Pis

The algorithm that performs better in term of top-5 values is the conversion

from Wi-Fi and Bluetooth to distance. The accuracy is 87%, this means

that the correct MAC address is inside the nearest five devices 13 times up

15.

We can imagine that the conversion of the RSSI to distance performs well,

because both Wi-Fi RSSI and Bluetooth RSSI are in a strong relation to

the distance. Also, the conversion models are very accurate because we use

a different trend for each device.

The conversion from RSSI to distance is very precise using the Top-5 ap-

proach, but it is only 40% in top 1.

A good algorithm for the top-1 method is the conversion from Bluetooth to

48

Wi-Fi. This algorithm allows to pair correctly the 53% of the devices. This

result shows the strong relation between the Wi-Fi RSSI and the Bluetooth

RSSI as we saw in figure 4.8.

A good trade-off between accuracy and cost of the algorithm is the nor-

malization. It does not need a phase of pre-computation of the regression as

the conversion algorithms nor a minimization of the errors like the trilatera-

tion. This algorithm is very fast and cheap. We obtain satisfactory results:

33% in top1, 67% in top3 and 80% in top 5, only 7 percentage points less

than the best algorithm. Normalization can be used in unknown scenario,

when the model of the devices are unknown and we cannot perform a pre-

liminary phase to study the RSSI regressions.

As regards fingerprint we tested different approaches:

• using the average fingerprint dataset for all the devices, called average

fingerprint ;

• using the specific device fingerprint dataset for LG, Samsung S Ad-

vance and Samsung S3 and the average fingerprint dataset for the

other two devices (that ones without a specific fingerprint dataset),

simply called fingerprint.

For both approaches, as explained in section 4.3.5, we have set the n value.

n=1 means that the center of the cell is used and no cell adjustment is done.

Increasing the n refines the position of the device especially since not all the

devices are placed in the middle of a cell. We have tested the algorithm

with n=1,2,3,4,5,7. The best results are obtained with n=5. In figure 4.13

the levels of accuracy of fingerprint and average fingerprint with n=5 are

shown. They quite are similar, this means that using a dataset of average

fingerprint allow us to use the fingerprint algorithm with different types of

unknown devices.

Analyzing the devices positions we understand that was difficult to match

the devices placed in the middle of the room. In the following table the

percentage of times that a device is correctly linked using the different al-

gorithms are shown. Devices number 1,2,3,7,12,14 and 15 are the ones with

an high percentage. That means that they are often linked properly. In-

stead, devices number 4,8,10,13 are the worst in this respect. The values in

table 4.4 may depend from two factors: the device position or the device

model. From the table is evident that the Samsung S Advance (id: 2,7,12) is

49

Table 4.4: Percentage of exact pairing. In bold the top values are highlighted.

Id Device Top1 Top3 Top5 Position

1 LG 0,70 0,83 0,89 Top Left

2 S Adv 0,41 0,83 0,89 Top Left

3 S3 0,31 0,68 0,87 Top Right

4 S TAB 0,37 0,52 0,79 Center Left

5 iPad 0,08 0,27 0,50 Center

6 LG 0,27 0,47 0,62 Top-Center Left

7 S Adv 0,60 0,85 0,93 Top-Center Right

8 S3 0,02 0,12 0,27 Bottom-Center Left

9 S TAB 0,06 0,37 0,83 Center

10 iPad 0,06 0,35 0,50 Center Right

11 LG 0,20 0,64 0,77 Center Right

12 S Adv 0,43 0,77 0,89 Bottom Right

13 S3 0,12 0,37 0,52 Bottom

14 S TAB 0,47 0,83 0,95 Center

15 iPad 0,52 0,77 0,87 Bottom Left

a trustworthy device and the S3 is an untrustworthy one. The device model

does not affect too much the correct pairing, also because we use different

model for different device.

The position highly affects the accuracy instead. The devices placed in the

corners of the room have an high degree of corrects matching and the de-

vices in the center have worst results. This happen because the devices in

the center of the room are equidistant from all the anchors, so all the RSSI

in the vector are similar. Hence they are confused with a nearby device. To

fix this problem we decide to add two more Raspberry Pis.

4.4.2 Adding anchors

The previous results (section 4.4.1) refers to a four Raspberry Pis scenario.

In order to increase the accuracy of the algorithms were added two more

anchors. In the 4 Raspberry Pis scenario the density of anchors was one

anchor every 10,7 m2. Adding two more anchors we achieve a density of one

anchor every 7 m2.

The two supplementary Raspberry Pis were placed in the middle of the

room, as in figure 4.9.b. We have chosen this configuration to capture the

variations of the distance of the devices placed in the center of the room.

50

The results have proven our assumption. All algorithms showed an accuracy

increase. Using the conversion to distance we obtained 100% of accuracy in

the top 5 approach. The only exception was the conversion from Bluetooth

to Wi-Fi for which the same results were obtained.

In any case good results have been achieved: in top 1 method the mean

increase of percentage has been the 9%. The top 3 have shown an average

10% increase and the algorithm average increase of top 5 has been 7%.

The best algorithm has been reconfirmed the conversion from Bluetooth/Wi-

Fi to distance. The results are excellent: 67% of accuracy using the top 1

approach and 100% of accuracy using top 5.

As we can see from figure 4.14, the dissimilarity between the algorithms

is the same between a 4 Raspberry Pis scenario and the 6 Raspberry Pis

scenario.

fing avg 5

fing 5

trilateration

conv dist

conv WiFi to BT

norm

0% 25% 50% 75% 100%

Accuracy

Alg

orit

hm

Top 5 3 1

Figure 4.14: Algorithms accuracy percentages of the top-k value approaches with 6

Raspberry Pis

Another interesting consideration. The increase of accuracy adding anchors

seems to be a linear function. We test this behavior using only 3 Raspberry

Pis and using the normalization algorithm. The results are show in the

51

following figure (4.15):

0

25

50

75

100

2 3 4 5 6 7

Number of Raspberry Pis

Acc

ura

cy

Top 1 3 5

Figure 4.15: Increase of accuracy of the normalization algorithm

From the figure we can easily see that if we add more anchors the accuracy

will increase. In this case we suppose that with 8 anchors we can reach

100% using top 5 approach. Hence an increase of the number of the anchors

increase the accuracy of the system.

4.4.3 Receiver Operating Characteristic

Using the top-k value the distance between the two vectors (Bluetooh and

Wi-Fi) is not considered. Using top 1 values, it may happen that a far

away Bluetooth vector is the first of the list of a Wi-Fi vector in terms of

distance. Using the top-k method we would have linked them. This is prob-

ably a wrong result because the euclidean distance between a Wi-Fi and a

Bluetooth vector must tend towards zero.

For this purpose we introduce the concept of threshold. The threshold is a

limit beyond which we consider each pair of rows MAC addresses false and

therefore we do not match them. Within the threshold the two MAC ad-

dresses are considerate automatically of the same device and so we link them.

52

Using threshold four different cases are possible:

• True Positive: Wi-Fi and Bluetooth MAC addresses coming from

the same device correctly identified as the same device.

• False Positive: Wi-Fi and Bluetooth MAC addresses coming from

different devices incorrectly identified as the same device.

• True Negative: Wi-Fi and Bluetooth MAC addresses coming from

the same device correctly identified as the different devices.

• False Negative: Wi-Fi and Bluetooth MAC addresses coming from

the same device incorrectly identified as different devices.

To represent these values the Receiver Operating Characteristic (ROC) is

used. The ROC curve, is a graphical plot that illustrates the diagnostic

ability of a binary classifier system as its discrimination threshold is varied.

Thanks to the ROC we can identify which threshold value is the best to

have an high rate of True Positive and at the same time a low rate of False

Positive.

Indeed, the ROC curve is created by plotting the True Positive Tate (TPR)

against the False Positive Tate (FPR) at various threshold settings.

The TPR is called sensitivity and it measures the proportion of positives

that are correctly identified as such (e.g. the number of Wi-Fi and Blue-

tooth MAC addresses from the same device correctly identified as a single

device).

The FPR is called fall-out. It measures the proportion of negative couple

of MAC addresses that are incorrectly identified as positive. It is closely

related to specificity and is equal to (1−specificity). Specificity is the True

Negative Rate (TNR) and it measures the proportion of negatives that are

correctly identified as such.

All the algorithms have different threshold, in order to plot in one graph

we normalize them and then calculate the rates of the true positive and the

false positive. We obtain the ROCs in figure 4.16 and in figure 4.17.

53

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

RAlgorithms

conv

conv dist

finger

finger avg

norm

trilateration

Figure 4.16: ROC of the home experiment

The FPR and the TPR depending on the threshold are represented. The top

left corner is the best case in term of ration between sensitivity and fall-out

(or specificity). In this corner all the positive values are true (TPR = 100%)

and there are not false positive (FPR = 0%). The point on the curve closest

to the top left corner is the best threshold value for that specific algorithm.

Considering the conversion from Bluetooth/Wi-Fi to distance (4.17.c), we

obtain the optimal point when the normalized threshold is 0.13. In this

point the FPR is only 22% and the TPR is the 50%. So, if the threshold

is set to 0.13 we obtain 6 true values (5 true positives and 1 false positives)

and 9 negatives (5 false negatives and 4 true negatives).

54

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm norm

a) Normalization

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm conv

b) Conversion Bluetooth to Wi-Fi

0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm conv dist

c) Conversion to distance

0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm trilateration

d) Trilateration

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm finger avg

e) Fingerprint average dataset

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm finger

f) Fingerprint

Figure 4.17: ROC of the different algorithms of the home experiment

55

If the threshold is chosen in a proper way we are sure to match correctly

a device. Indeed if the MAC address is under the threshold it is the right

MAC address three times out of four.

Using the right threshold in our computation the results become more pre-

cise because we know exactly which Bluetooth address match with the Wi-Fi

address, on the other hand the algorithm become less accurate because we

exclude some true values that are over the threshold.

Defining an upper bound for threshold can be useful to precisely assert if

two MAC addresses are the same. This method may cause some drawbacks

because some correct values are excluded also if the best TPR/FPR thresh-

old is set. This problem is evident analyzing the area under the ROC curve

(AUC). An area of 1 represents a perfect test, in our case we reach hardly

a value of 0.60. This value may depend to the low number of value (only

15) used to create the ROC or may depend to the poor correlation between

the threshold and the correct pairing of MAC addresses. However we think

that the threshold value can be use in situation where we are interested in

a precise pairings even if some correct values are excluded.

56

Chapter 5

Real Scenario Experiment

In the previous chapter was presented a test performed in a isolated envi-

ronment with known devices. That type of experiment was important to

understand the behavior of the devices and to test our algorithms.

To proof if the home results are valid in a real scenario we decided to repli-

cate the previous experiments. We have chosen an university laboratory in

which we do not know how many devices are presents and we also do not

know a priori the Wi-Fi MAC addresses and the Bluetooth MAC addresses

of the devices.

We decided not to make preliminary tests. The relations between distance

and RSSI and between Wi-Fi RSSI and Bluetooth RSSI have been calcu-

lated using a spy device placed in a known point. We chose this approach

because we want to simulate a real scenario in which is not possible to per-

form preliminary tests.

Another difference with the home experiment was choosing not to use the

fingerprint algorithm. It is costly and time consuming. In an unknown

scenario the fingerprint is difficult to replicate due to time and cost con-

sumption.

There is also a difference in term of datasets dimension. During the home

experiment the Bluetooth and the Wi-Fi datasets have the same dimensions.

In reality people use much more Wi-Fi than Bluetooth. Often the Bluetooth

is keep off or it is invisible, instead Wi-Fi is almost always turned on. Hence,

the number of unique Wi-Fi MAC addresses will be greater than the number

of unique Bluetooth MAC addresses.

57

5.1 The environment

The environment of this experiment is the ANTLab, an university laboratory

of 10 meters x 8 meters and an area of about 80 square meters. To cover all

the area of the laboratory six Raspberry Pis are placed (figure 5.1). There

are desks, computers, chairs in the laboratory and during the experiment

there were about 10 people. This configuration causes a different path loss

than the previous experiment.

Figure 5.1: ANTLab planimetry. The six Raspberry Pis are placed on the perimeter

5.2 The devices

Before doing the experiment, we did not know how many devices would have

been in the environment nor the position.

All the devices are unknown except two. We used the previous LG and

Samsung S smartphones and we placed them in a known position. This was

done to perform a sort of real time mapping of the environment. We chose

these two devices because in the home experiment they result the ones that

have more trustworthiness.

58

5.3 Execution

As mentioned above, we do not know the number of devices in the labora-

tory.

hcitool scan allowed us to discover the visible Bluetooth devices. We

found eleven different Bluetooth MAC addresses that are present during all

the experiment time.

Our script has been run for ten minutes. We suppose that during this period

the devices are in a static position.

An high number of Wi-Fi probe requests have been captured. The tool

deleted all the corrupted probes. We have also decided to delete all ad-

dresses that have less than 10 probe requests. We suppose that these probes

come from people outside the laboratory or from passers.

We obtain 35 different Wi-Fi MAC addresses and we made the average of

each different address creating a dataset of 35 lines and 7 rows (six RSSI

rows and a MAC address row).

As regard Bluetooth we generate a dataset of eleven lines and 7 rows.

The next phase is the matching one. As before the used algorithms were:

1. normalization;

2. RSSI conversion from Bluetooth to Wi-Fi;

3. RSSI conversion from Bluetooth/Wi-Fi to distance

4. trilateration.

The way in which the algorithm were used has been the same like the home

experiment, explained in chapter 4.3. As mentioned above, the only unused

algorithm has been the fingerprint due to the time consuming.

In order to verify the correct algorithms pairing, at the end of the exper-

iment, people in the laboratory were asked for their Wi-Fi and Bluetooth

MAC addresses. In this way we obtain the correct MAC address couple and

it was possible to check the algorithms accuracy.

59

5.4 Results

The goal of the experiment is the same of the home experiment: to link

two MAC addresses, one coming from Wi-Fi and the other one coming from

Bluetooth. Linking the MAC addresses allows to identify uniquely a device.

The are several differences respect to the first experiment. The most ev-

ident difference is that we do not know a priori which is the correct MAC

address couple, indeed almost all devices are not directly in our control. It

allows us understand if our algorithm are valid in a not controlled environ-

ment.

There is a difference of path loss due to the layout of the laboratory. Also

the devices models are dissimilar. These two differences made the previous

regressions impossible to use to convert the RSSI in distance and the convert

the two type of RSSI each other. Indeed the curves presented in section 4.1

are device and environment specific.

The regression models have been computed on the fly, using our two known

devices. We expect that these models are less accurate than the ones we

used during the home experiment.

5.4.1 Top-k values

In figure 5.2 the bar plots representing the percentage of accuracy of each

algorithm using the top k approach are shown.

As we can see, the best algorithm in term of accuracy is the conversion from

Bluetooth and Wi-Fi to distance. It reaches 93% of correct coupling in top

5 and the 45% in top 1.

The algorithms that use the regression have a high degree of accuracy, about

40%, 70%, and 80% using top 1, top 3 and top 5 respectively. This means

that the creation of on-the-fly regressions has been quite accurate and they

are able to roughly approximate the RSSI variation in the laboratory. It is

interesting note the variation of exact pairings between the home experiment

and the laboratory experiment is almost the same between the algorithms.

Because of the size of the environment and its configuration we expect the

accuracy to be lower. During the home experiment with six Raspberry Pis

the density of anchors was one every 7 square meters. In the laboratory the

density is one anchor every 13 square meters, almost half. It is a bit less

than the home experiment density with four Raspberry Pis.

Compared to the home experiment with six anchors we obtain a total aver-

age decrease of accuracy of the 10%. It may look like an high value, but if we

60

trilateration

conv dist

conv WiFi to BT

norm

0% 25% 50% 75% 100%

Accuracy

Alg

orit

hm

Top 5 3 1

Figure 5.2: Algorithms accuracy percentages of the top-k value approaches of the

laboratory experiment

take into account the worsening of conditions in the laboratory experiment

the result is more than satisfactory.

Compared to the case with four Raspberry Pis the decrease of accuracy is

only 3%. This result point out that similar anchors densities generate simi-

lar accuracy results.

From these results, we can infer that is possible to link the MAC addresses

using the previous algorithms in an unknown scenario. Indeed the results

are coherent with the home experiment results and they provide a good

accuracy.

61

5.4.2 Receiver Operating Characteristic

As explained in section 4.4.3, the ROC curve represent the threshold values

and their relation with the false positive rate (FPR) and the true positive

rate (TPR). Thanks to the ROC we can identify the precision of an algo-

rithm and its sensitivity.

In the following figures (5.3 and 5.4) the ROC of the algorithms in the

laboratory are plotted.

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithms conv conv dist norm tri

Figure 5.3: ROC of the laboratory experiment

Looking at the graph the algorithm nearest to the top left corner (the best

point in the ROC) is the normalization. Looking more closely at the per-

centages of accuracy of normalization (5.4.a) we find that in the case of the

laboratory experiment they are very low, so in the analysis of the ROC we

discard the normalization.

62

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

RAlgorithm norm

a) Normalization

0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm conv

b) Conversion Bluetooth to Wi-Fi

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm conv dist

c) Conversion to distance

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

FPR

TP

R

Algorithm tri

d) Trilateration

Figure 5.4: ROC of the different algorithms of the laboratory experiment

It is interesting analyze the logarithmic conversion from Wi-Fi/Bluetooth to

distance (5.4.c). During the home experiment threshold that maximize the

precision of the algorithm is 0.13. We want to understand if this threshold

value is the same also in the laboratory experiment.

If we use 0.13 as threshold value we obtain 40% of TPR and 0% of FPR. It

is obviously not the optimal value. The laboratory best threshold is 0.29,

the TPR is 60% and the FPR 33%. This means that the threshold varies

considerably due to the environment. A higher threshold specifies that the

Top 1 euclidean distance is greater, hence more distant from zero. This

63

means that the algorithm was a bit less precise than the home experiment.

But that was what we expected, as we said before.

In the following table (table 5.1) the differences of the best threshold of

each algorithm between the home experiment and the laboratory experiment

are shown. This table confirm that every single algorithm have a different

threshold. Also the environment is the cause of a different threshold.

Table 5.1: Threshold Table

Algorithm Home Experiment Lab Experiment

Normalization 0.58 0.49

Conversion 0.25 0.55

Conversion distance log 0.13 0.29

Trilateration 0.12 0.08

Average 0.27 0.35

All the algorithms have an area under the ROC curve of almost 0.60. Con-

sidering the resemblance between the home experiment and the laboratory

experiment, the same consideration can be done. The threshold is not a

good parameter if we are interested to the total accuracy, but can be useful

if we look to an high precision.

64

Chapter 6

Blended attack scenario

Blended attacks are those that combine two attack mediums, Wi-Fi and

Bluetooth, into a single more powerful attack. In most cases, these attacks

are designed with the intention of inflicting far quicker damage to a target

device than is possible using only a single attack medium.

To use blended attacks a malicious attacker needs to know the two MAC ad-

dresses of the target device. Usually it is not easy to know for sure whether

two MAC addresses (Bluetooth and Wi-Fi) are coming from the same de-

vice. Our algorithms can almost certainly find the corresponding Wi-Fi

MAC address of a Bluetooth address or vice versa.

Another possibility is to attack only one interface (for example Bluetooth)

but that specific address is not known. Using our tool is possible to link the

known address (in this case Wi-Fi) to the unknown one and after perform

the attack.

We will now see more specifically how an attack is possible and what re-

sults can be achieved.

6.1 Attack scenario

To perform an attack the assumption are the two attacked interfaces (Wi-Fi

and Bluetooth) of the device are turned on. Another assumption is that we

know the device owner and we are close to him during the attack. Without

these conditions an attack is not possible.

We can start to analyze the worst case: both the Wi-Fi and the Bluetooth

MAC addresses are unknown. There are other two options at this point.

65

Start discovering the Wi-Fi address and then the Bluetooth one or start

discovering the Bluetooth address and then the Wi-Fi one.

6.1.1 Discover the Wi-Fi and infer the Bluetooth MAC ad-

dress

To obtain the MAC address of a person we have already explained in chapter

2 the method proposed by Cunche [8]. It consist in following the target for

a short time at a reasonable distance with a monitor tool (i.e. tshark or

airdoump-ng). The only Wi-Fi MAC address that is always present is the

target MAC address.

The same procedure can be done with the Bluetooth interface using the

inquiry (hcitool scan), but only if the Bluetooth interface is visible. In

our scenario we suppose that the Bluetooth interface is not visible so is not

possible to use the Cunche’s method for the Bluetooth.

To discover an invisible Bluetooth MAC address, RedFang is the necessary

tool. It is an application which finds non discoverable Bluetooth devices

using brute force. It is available in Kali Linux and in the most common

Linux distros. The only RedFang drawback is the time consuming like all

the brute forcing methods, but at now it is the only way to discover a non

discoverable device.

Previously we have found the Wi-Fi MAC address. Using an OUI table is

easy to discover the vendor of the device. Starting from a known vendor is

possible to reduce the range of the Bluetooth MAC addresses that RedFang

needs to discover. In this way the operation will be faster. If for some reasons

we want to know all the invisible Bluetooth devices RedFang can scan all the

possible MAC addresses (from 00:00:00:00:00:00 to FF:FF:FF:FF:FF:FF).

When RedFang finished we obtain a list of available MAC addresses.

Using our tool we compare the list of Bluetooth addresses with the pre-

viously found Wi-Fi address. In order to obtain better results we might

place a couple of known devices. As we saw in chapter 5, this operation

allows to be more precise when we use algorithms like the conversion from

Bluetooth and Wi-Fi to distance or the conversion between the two tech-

nologies.

To obtain faster results it is possible to use the normalization, if we want

to be more accurate it is possible to use some algorithms that perform con-

version. Also in this case it is hard to use the fingerprint, unless we are in

a familiar environment.

66

At the end of the process we presumably have known the correct couple of

MAC addresses.

6.2 Attacks

There are a lot of attacks that involves smartphones. The most commons

are the Battery Exhaustion Attack and the Denial of Service.

As explained below, these two attacks are extremely simply and effective.

They only need a common hardware and the consumption of resources on

the attacker machine is very low.

6.2.1 Denial of Service

In a denial-of-service (DoS) attack, an attacker attempts to prevent legit-

imate users from accessing information or services. Even if Bluetooth is

theoretically quite robust, DoS can prevent the use of Bluetooth. It pre-

vents to send files, to scan devices or to use Bluetooth services.

There are several methods to implement a Denial of Service in the Blue-

tooth stack. After finding out the Bluetooth MAC address an attacker can

use:

• Ping of Death Flood: as explained in the previous chapters, l2ping

allows an user to ping a Bluetooth MAC address to determine if the

host is reachable. Using l2ping at a high rate of speed both outgoing

bandwidth as well as incoming bandwidth are consumed. If the tar-

get Bluetooth is slow enough, it is possible to consume enough of its

resource for a significant slowdown or interruption of the availability.

• BlueSmack Flood: This Bluetooth flooding attack is essentially a Ping

of Death attack, but it is deployed with a much larger data payload,

600 bytes. Using the 600 byte payload size sometimes causes Bluetooth

stacks to malfunction on some devices.

• BlueSpam Flood: BlueSpam is an attack that identifies Bluetooth-

enabled devices in discoverable mode and spams selected targets with

repeated vCard messages. This attack is most often used as an an-

noyance, but can be classified as a DoS flood if the rate at which the

sending of the vCard messages is extremely elevated.

• Blueper Flood: this attack resembles BlueSpam in nature, but repeat-

edly floods a device with file transfers instead of vCard messages.

67

Ping of Death Flood is an attack very easy to perform. Only a script that

pings in flooding the target device is needed. To perform this attack we

create the following script that takes in input the Bluetooth MAC address

and pings it in flooding. The -s option is the size of the echo packet. It is

set to 300 bytes in order to speed up the attack. Obviously only one pinging

thread is not enough. We use twenty ping threads as test.

1 #! /bin /bash

2

3 mac address=$1

4

5 echo ”ping $mac address ”

6

7 whi le :

8 do

9 nohup sudo l2p ing −f $mac address −s 300

10 done

The attacks is perpetrated on a Huawei Honor 4c smartphone, the Android

version is 6.0 and the security patch level is dated 1st April 2016. During

the ping of death attack the Huawei device does not see anything and con-

tinue to behave as usual. The problem happens when another device try to

send a file to the Huawei smartphone. The file is not seen on the attacked

device and the sender receive as output ”file not sent”. So, it is impossible

to transfer file between the two devices. The attack is successful because

the smartphone is busy to respond to all the echo requests and it fails to

receive the file.

In the figure 6.1.a the screenshot of the sender device (a Samsung smart-

phone) after the sending timeout is shown. The test was also done using the

Huawei as sender and the Samsung as attacked device. The result was the

same (figure 6.1.b).

6.2.2 Battery Exhaustion Attack

During a battery exhaustion attack the goal is to drain the battery of the

target device. To obtain more damage the attack can be blended on Wi-Fi

and Bluetooth. The battery depletion can be accelerate almost to 20% [18].

BlueSYN Flood is an attack that consist to launch simultaneously a BlueS-

mack l2ping flood and an hping3 SYN flood.

68

a) Huawei DoS b) Samsung S3 DoS

Figure 6.1: DoS attacks on the target devices

The commands used to implement the attack against the target device are:

• hping3 --syn --faster <IP Address> : it sends sync request on

the Wi-Fi channel;

• l2ping -s 600 -f <Bluetooth MAC Address> : it pings the Blue-

tooth stack with a packet of 600 bytes.

PingBlender Flood is very similar to BlueSYN but uses a combination of

ping floods from both Wi-Fi and Bluetooth mediums. The commands are:

• hping3 --faster <IP Address>: it pings the Wi-Fi stack using

flood;

• l2ping -f <Bluetooth MAC Address>: it pings the Bluetooth stack

using flood.

69

Chapter 7

Conclusions

This thesis has focused on the analysis of the Bluetooth signals and of the

Wi-Fi probe requests. In particular to find a relation between the two dif-

ferent RSSI to link the Bluetooth and the Wi-Fi MAC addresses. In this

thesis we propose five algorithms that permit the MAC addresses pairing.

In the first phase was developed a sensor network composed by Raspberry

Pis capable to capture all the Wi-Fi and the Bluetooth signals. During this

phase we also studied the behavior of the probe requests and the behav-

ior of the Bluetooth connection parameters (RSSI, TPL, LQ, echo RTT,

RX power level). This study was fundamental to decide which information

could be useful in our case and how to use it.

In the second phase the sensor system was used to capture the MAC ad-

dresses of different devices. These experiments were executed in two environ-

ments with different topological characteristics, different number of devices

and different assumptions. We explored how the performance (accuracy of

the devices coupled properly) is influenced by the variation of the number of

anchors (Raspberry Pis), anchors density and environment characteristics.

The obtained results were consistent with what we expected. Our algorithms

show that is possible to link the Wi-Fi and the Bluetooth MAC addresses

with a good grade of accuracy. The results are valid both in a controlled

scenario (the home experiment) and in a real scenario (the laboratory ex-

periment), showing that the accuracy percentage is coherent in both cases.

Moreover, we noticed that better results are achieved when we increase the

number of Raspberry Pis and when they cover all the area of the environ-

ment. The results are presented in using the top-k value approach, for each

MAC address we select the first k candidates that can compose the couple.

70

As regards the algorithms, we have discovered that the best one is the con-

version from RSSI to distance that allow us to correctly pair up to 100%

of the MAC addresses using the top 5 approach. This algorithm, the con-

version between Wi-Fi RSSI and Bluetooth RSSI and the trilateration need

a pre-computation of the relation between the RSSI and the distance. We

have noticed that this problem can be overcome using a spy device and

processing the relation between distance and RSSI on the fly. As for tri-

lateration we can not use this method because it requires a more complex

calibration phase. In any case, its results are compatible with the results of

other algorithms.

In the last part of the thesis we put into practice the results obtained to

analyze and to simulate an attack on a smartphone. We have discovered

that is easily possible to perform a Denial of Service (DoS) attack on the

Bluetooth interface and a blended attack on both Wi-Fi and Bluetooth in-

terface can drain the battery of the device.

Even if the top 5 algorithms accuracy is already satisfactory, the next step

is to increase the algorithms accuracy also in top 3 and especially in top 1.

This can be done in several ways. Increasing the number of the anchors is

the simplest solution. Another option can be increasing the precision when

the RSSI is captured using a filtering of the data, for example the Kalman

filtering proposed by [6]. To increase the Bluetooth accuracy of the visible

devices a mix between the RSSI and the RX power level can be used. An-

other option to increase accuracy is to mix the algorithms depending on the

environment characteristics or to analyze different vendors behavior. As we

saw during our research, the implementation and design choices may differ

for each manufacturer, hence analyze different vendors can help to have a

better pairing system. The future study may also extend the number of de-

vices used, simulating a scenario with an high density of devices. Extending

the number of the devices can be also useful to use some machine learning

algorithms. It will be possible to create an artificial neural network that

finds patterns in data to create a more precise pairing.

There are other things which did not fit in the scope of this research, but

require further investigation. A mixed indoor location system or a mixed

crowd density system using both Bluetooth and Wi-Fi can be developed.

They can exploit the MAC addresses pairing to increase the precision of the

system. Another future work is the de-randomization of the Wi-Fi MAC

address using the Bluetooth MAC address. Using our system it is possible

71

to cross the real Wi-Fi data, the random Wi-Fi data and the Bluetooth data

(that never changes) to discover which are the fake Wi-Fi MAC addresses.

During a second step it is possible to pair the real Wi-Fi address with the

Bluetooth address, to pair the fake Wi-Fi address with the Bluetooth ad-

dress and in the end understand which fake address correspond to the real

Wi-Fi MAC address. The de-randomization can be blended with the device

tracking, it can improve the stalker attach proposed by Cunche [8]. In fact,

switching between the Wi-Fi and the Bluetooth MAC addresses allows to

track a device regardless its network interface availability.

This thesis also points out the easiness of a DoS attack on the Bluetooth

interface. This can be an incentive to study more thoroughly the behavior

of the Bluetooth stack when it receives layer 2 echo request packets.

72

Acronyms List

AP Access Point

LOS Line Of Sight

RSSI Received Signal Strength Indicator

RTT Round-Trip Time

SN Sequence number

Wi-Fi Wireless Fidelity

OUI Organizationally Unique Identifier

NIC Network Interface Controller

SSID Service Set IDentifier

BSSID Basic Service Set IDentifier

GPS Global Positioning System

MAC Media Access Control

ROC Receiver Operating Characteristic

RX Receiver

TPL Transmit Power Level

LQ Link Quality

AP Access Point

DoS Denial of Service

GRPR Golden Receiver Power Range

BLE Bluetooth Low Energy

IEEE Institute of Electrical and Electronics Engineers

FCS Frame Check Sequence

WPAN Wireless Personal Area Network

SCO Synchronous Connection Oriented

ACL Asynchronous ConnectionLess

L2CAP Logical Link Control and Adaptation Protocol

RFCOMM Radio Frequency Communications

HCI Host Control Interface

BER Bit Error Ratio

NTP Network Time Protocol

73

Bibliography

[1] Naeim Abedi, Ashish Bhaskar, and Edward Chung. Bluetooth and wi-

fi mac address based crowd data collection and monitoring: Benefits,

challenges and enhancement. 2013.

[2] Marco V. Barbera, Alessandro Epasto, Alessandro Mei, Vasile C. Perta,

and Julinda Stefa. Signals from the crowd: Uncovering social relation-

ships through smartphone probes. In Proceedings of the 2013 Confer-

ence on Internet Measurement Conference, IMC ’13, pages 265–276,

New York, NY, USA, 2013. ACM.

[3] Bluetooth SIG Proprietary. Bluetooth Core Specification, 12 2016.

[4] DM Bullock, R Haseman, JS Wasson, and R Spitler. Anonymous blue-

tooth probes for airport security line service time measurement: the

indianapolis pilot deployment. In 89th Annual Meeting in Transporta-

tion Research Board, 2010.

[5] Luca Carettoni, Claudio Merloni, and Stefano Zanero. Studying blue-

tooth malware propagation: The bluebag project. IEEE Security &

Privacy, 5(2), 2007.

[6] Song Chai, Renbo An, and Zhengzhong Du. An indoor positioning

algorithm using bluetooth low energy rssi. 2016.

[7] Maxim Chernyshev, Craig Valli, and Michael Johnstone. Revisiting ur-

ban war nibbling: Mobile passive discovery of classic bluetooth devices

using ubertooth one. IEEE Transactions on Information Forensics and

Security, 12(7):1625–1636, jul 2017.

[8] Mathieu Cunche. I know your MAC Address: Targeted tracking of

individual using Wi-Fi. In International Symposium on Research in

Grey-Hat Hacking - GreHack, Grenoble, France, November 2013.

74

[9] Christos Douligeris and Dimitrios N. Serpanos, editors. Network Secu-

rity. John Wiley & Sons, Inc., jun 2007.

[10] Julien Freudiger. How talkative is your mobile device?: An experi-

mental study of wi-fi probe requests. In Proceedings of the 8th ACM

Conference on Security & Privacy in Wireless and Mobile Networks,

WiSec ’15, pages 8:1–8:6, New York, NY, USA, 2015. ACM.

[11] Simon Hay and Robert Harle. Bluetooth tracking without discoverabil-

ity. In Lecture Notes in Computer Science, pages 120–137. Springer

Berlin Heidelberg, 2009.

[12] AKM Mahtab Hossain and Wee-Seng Soh. A comprehensive study

of bluetooth signal parameters for localization. In Personal, Indoor

and Mobile Radio Communications, 2007. PIMRC 2007. IEEE 18th

International Symposium on, pages 1–5. IEEE, 2007.

[13] Zhu Jindan, Zeng Kai, Kyu-Han Kim, and Prasant Mohapatra. Improv-

ing crowd-sourced wi-fi localization systems using bluetooth beacons.

9th Annual IEEE Communications Society Conference on Sensor, Mesh

and Ad Hoc Communications and Networks (SECON), 2012.

[14] Joonyoung Jung, Dongoh Kang, and Changseok Bae. Distance esti-

mation of smart device using bluetooth. In ICSNC 2013 : The Eighth

International Conference on Systems and Networks Communications.

The Government of South Korea, 2013. Used by permission to IARIA,

2013.

[15] Jeremy Martin, Travis Mayberry, Collin Donahue, Lucas Foppe, La-

mont Brown, Chadwick Riggins, Erik C. Rye, and Dane Brown. A

study of MAC address randomization in mobile devices and when it

fails. CoRR, abs/1703.02874, 2017.

[16] Krasnyansky Maxim and Holtmann Marcel. l2ping Linux Man Page.

[17] Zhenyu Mei, Dianhai Wang, Jun Chen, and Wei Wang. Investigation of

bicycle travel time estimation using bluetooth sensors for low sampling

rates. PROMET - Traffic&Transportation, 26(5), oct 2014.

[18] Benjamin R. Moyers, John P. Dunning, Randolph C. Marchany, and

Joseph G. Tront. Effects of wi-fi and bluetooth battery exhaustion at-

tacks on mobile devices. In 2010 43rd Hawaii International Conference

on System Sciences. IEEE, 2010.

75

[19] Farid Movahedi Naini, Olivier Dousse, Patrick Thiran, and Martin Vet-

terli. Population size estimation using a few individuals as agents. In

Information Theory Proceedings (ISIT), 2011 IEEE International Sym-

posium on, pages 2499–2503. IEEE, 2011.

[20] Pierre Rouveyrol, Patrice Raveneau, and Mathieu Cunche. Large Scale

Wi-Fi tracking using a Botnet of Wireless Routers. In SAT 2015 -

Workshop on Surveillance & Technology, Philadelphia, United States,

June 2015.

[21] Antonio J Ruiz-Ruiz, Henrik Blunck, Thor S Prentow, Allan Stisen,

and Mikkel B Kjaergaard. Analysis methods for extracting knowledge

from large-scale wifi monitoring to inform building facility planning.

In Pervasive Computing and Communications (PerCom), 2014 IEEE

International Conference on, pages 130–138. IEEE, 2014.

[22] Lorenz Schauer, Martin Werner, and Philipp Marcus. Estimating crowd

densities and pedestrian flows using wi-fi and bluetooth. In Proceed-

ings of the 11th International Conference on Mobile and Ubiquitous

Systems: Computing, Networking and Services, pages 171–177. ICST

(Institute for Computer Sciences, Social-Informatics and Telecommu-

nications Engineering), 2014.

[23] Fazli Subhan, Halabi Hasbullah, Azat Rozyyev, and Sheikh Tahir

Bakhsh. Indoor positioning in bluetooth networks using fingerprint-

ing and lateration approach. In Information Science and Applications

(ICISA), 2011 International Conference on, pages 1–9. IEEE, 2011.

[24] Mathias Versichele, Tijs Neutens, Stephanie Goudeseune, Frederik van

Bossche, and Nico Van de Weghe. Mobile mapping of sporting event

spectators using bluetooth sensors: Tour of flanders 2011. Sensors,

12(12):14196–14213, Oct 2012.

[25] Donald Welch and Scott Lathrop. Wireless security threat taxonomy.

In Information Assurance Workshop, 2003. IEEE Systems, Man and

Cybernetics Society, pages 76–83. IEEE, 2003.

[26] Jens Weppner, Paul Lukowicz, Ulf Blanke, and Gerhard Troster. Partic-

ipatory bluetooth scans serving as urban crowd probes. IEEE Sensors

Journal, 14(12):4196–4206, 2014.

76

Pairing W-Fi and Bluetooth MAC addresses through passive ... · Media Access Control (MAC) address....

Documents

Transcript of Pairing W-Fi and Bluetooth MAC addresses through passive ... · Media Access Control (MAC) address....