A Cryptographic Scheme for Cell Granularity Authenticated ...

A Cryptographic Scheme for CellGranularity Authenticated QueryProcessing in Cloud Databases

by

Bruno Ramos Cruz

MASTER IN SCIENCE, COMPUTER SCIENCE

Instituto Nacional de Astrofísica, Ópticay Electrónica

Tonantzintla, Puebla, MexicoFebruary 27, 2020

Advisors:

Lil María Xibai Rodríguez Henríquez, PhD.CONACyT-INAOE

Saúl Eduardo Pomares Hernández, PhD.INAOE

c©INAOE 2020All rights reserved

The author hereby grants to INAOE permission to reproduceand to distribute copies of this research proposal in

whole or in part.

Abstract

Cloud computing is a model that offers outsourced services, databases as a service(DaaS) consist of a service where the data owner delegates his data to a server to obtainsavings in system administration, human resources, software licenses, etc. However, thisnew paradigm also entails security risks that should be addressed.

There are mainly three concerns that the data owner should be considered: confi-dentiality, integrity and availability. These concerns have been studied independentlyby the cryptographic community due to the complexity involved.

Specifically, integrity in structured data considers the scenario where the server towhich the database is delegated is untrusted. So, the data owner after executing aquery, can get as a response the set of records that satisfy the query (correct answer)or some result (incorrect answer). An incorrect response may contain tuples modifiedin an unauthorized manner or include tuples that were not delegated by the data owner(compromising integrity), omit tuples that meet the conditions of the query (violatingcompleteness) or include tuples that existed in some version of the database but donot correspond to the current one (compromising freshness). Therefore, a process thatcan protect the data owner from malicious server behavior must be implemented, thisprocess is known in the literature as authenticated query processing.

Authenticated query processing considers other aspects, including granularity andtype query. That is, what level of detail (database, table, row, or cell) is guaranteedintegrity and what types of queries are be supported (selection , range, aggregation,joins). These aspects impose restrictions that make it difficult to propose a robustenough scheme to support all possible transactions.

In this work, we propose a cryptographic scheme for authenticated query process-ing with granularity at cell level that allows us to verify integrity and completeness inselection and range queries in databases in cloud computing. The proposed scheme em-ploys a probabilistic data structure known as the Cuckoo filter to provide integrity. Toensure completeness, it uses another data structure called Bitmaps and a cryptographicprimitive known as message authentication codes (MACs).

Resumen

El cómputo en la nube es un modelo que ofrece servicios externalizados, uno deéstos son las bases de datos como servicio (DaaS) donde el cliente delega sus datos a unservidor para obtener ahorros en la administración del sistema, en recursos humanos,licencias de software, etcétera. Sin embargo, este nuevo paradigma también conllevariesgos de seguridad que deben ser atendidos.

Existen principalmente tres preocupaciones que el propietario de los datos debeconsiderar: la confidencialidad, la integridad y la disponibilidad. Estas inquietudes hansido estudiadas por la comunidad criptográfica de manera independiente debido a lacomplejidad que implican.

Específicamente la integridad en datos estructurados considera el escenario dondeel servidor al que se le delega la base de datos no es confiable. Por lo que el clientedespués de ejecutar una consulta, puede obtener como respuesta el conjunto de reg-istros que satisfacen la consulta (respuesta correcta) o bien algún resultado (respuestaincorrecta). Una respuesta incorrecta puede contener tuplas modificadas de forma noautorizada o incluir tuplas que no fueron delegadas por el cliente (comprometiendo laintegridad), omitir tuplas que cumplen con las condiciones de la consulta (violentandola completitud) o incluir tuplas que existieron en alguna versión de la base de datos peroque no corresponden a la actual (comprometiendo la frescura). Por ello debe existir unproceso que pueda proteger al cliente de comportamientos maliciosos del servidor, talproceso es conocido en la literatura como procesamiento de consultas autenticado.

El procesamiento de consultas autenticado considera otros aspectos entre los quedestacan la granularidad, es decir a qué nivel de detalle (base de datos, tabla, fila o celda)se garantiza la integridad y los tipos de consulta que serán soportados (selección, rango,agregación, joins, etc). Estos aspectos imponen restricciones que hacen difícil proponerun esquema lo suficientemente robusto para soportar todas las posibles transacciones.

En este trabajo se propone un esquema criptográfico para el procesamiento de con-sultas autenticado con granularidad a nivel de celda que permite verificar integridad ycompletitud en consultas de selección y rango en bases de datos en la nube. El esquemapropuesto emplea una estructura probabilística de datos conocida como filtro de Cuckoopara brindar integridad. Para garantizar la completitud utiliza otra estructura de datosllamada Bitmaps y una primitiva criptográfica conocida como códigos de autenticaciónde mensajes (MACs).

A mi familia y amigos

Agradecimientos

Gracias a la Dra. Lil y al Dr. Saúl por darme la oportunidad de trabajar en estemaravilloso proyecto, por todo su tiempo que me han dedicado y también por cada unade las enseñanzas que me han compartido.

Agradezco a mis revisores de tesis, Dra. Claudia Feregrino Uribe, Dr. Alfonso MartínezCruz y Dr. Ignacio Algredo Badillo, por su tiempo dedicado en la revisión del docu-mento y las aportaciones al mismo.

Estoy muy agradecido con mi familia y amigos que siempre me están apoyando y moti-vando para que cumpla cada una de mis metas. En particular le doy las gracias a Jessiquien me orienta y me ayuda día a día.

Gracias al Instituto Nacional de Astrofísica Óptica y Electrónica por permitir queeste proyecto cobrara vida, por estos dos años de aventuras donde adquirí nuevosconocimientos.

Finalmente, agradezco al CONACyT por el apoyo económico que me otorgo a travésde la beca N 489054, que fue de gran importancia para terminar este proyecto.

Contents

1 Introduction 11.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Particular Objectives . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 About Document Content . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Preliminaries 82.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Bitmaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.3 Authenticated Skip List . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Cuckoo Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Cryptographic Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1 Unkeyed Primitives . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.2 Symmetric-key Primitives . . . . . . . . . . . . . . . . . . . . . 182.3.3 Public-key Primitives . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 State of the Art 203.1 Generalities of the State of the Art . . . . . . . . . . . . . . . . . . . . 20

3.1.1 Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.3 Freshness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.4 Overview: State of the art . . . . . . . . . . . . . . . . . . . . . 23

3.2 Main Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Approaches Based on Signature chain . . . . . . . . . . . . . . . 243.2.2 Approaches Based on Authenticated Data Structures . . . . . . 26

3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

iv

4 A Cryptographic Scheme for Cell Granularity Authenticated QueryProcessing in Cloud Databases 364.1 Use of the Cuckoo Filter to Provide Integrity in the Data-base . . . . . 364.2 Use of the Bitmap Index to Provide Completeness in the Database . . . 394.3 Scheme Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.1 Verification Process . . . . . . . . . . . . . . . . . . . . . . . . . 494.4 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.1 Some Basic Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 554.4.2 Security Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.4.3 CLIC-ODB Security . . . . . . . . . . . . . . . . . . . . . . . . 58

4.5 Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.5.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.5.2 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.5.3 Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Implementation 645.1 System Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.1 Experiment 1: Data Settings . . . . . . . . . . . . . . . . . . . . 655.2.2 Experiment 2: Queries Process . . . . . . . . . . . . . . . . . . 675.2.3 Experiment 3: Database attacks . . . . . . . . . . . . . . . . . . 695.2.4 Experiment 4: Cloud . . . . . . . . . . . . . . . . . . . . . . . . 705.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Conclusion and Future work 726.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A PostgreSQL 74A.1 Installing PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 74A.2 Upload databases in PostgreSQL . . . . . . . . . . . . . . . . . . . . . 74

B Amazon Web Services 76B.1 Create Databases Instance . . . . . . . . . . . . . . . . . . . . . . . . . 76

B.1.1 Create database . . . . . . . . . . . . . . . . . . . . . . . . . . . 76B.1.2 Select engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.1.3 Choose use case . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.1.4 Specify DB details . . . . . . . . . . . . . . . . . . . . . . . . . 78B.1.5 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79B.1.6 Configure advanced settings . . . . . . . . . . . . . . . . . . . . 79B.1.7 Click . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Bibliography 86

List of Figures

1.1 Authenticated Queries Processing . . . . . . . . . . . . . . . . . . . . . 6

2.1 Authenticated Skip List . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Cuckoo Filter with m buckets and b = 4 entries. . . . . . . . . . . . . . 15

3.1 State of the Art Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Signature chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Merkle hash tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 A general HADS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5 Two levels’ HADS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1 Outsourced data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2 Query Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3 Security Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

vi

List of Tables

1.1 Employees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Employees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Rmn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Employees with an extra attribute named Nonce . . . . . . . . . . . . . 29

4.1 Employees 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Employees 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3 Rmn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.4 Relation Rα

mn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.5 Relation Rβ

mn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.6 Relation Rγ

mn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.7 Employees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1 NoMAC vs HMAC Times . . . . . . . . . . . . . . . . . . . . . . . . . 655.2 False Positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.3 Querys Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.4 Authenticated Queries Process Times . . . . . . . . . . . . . . . . . . . 695.5 Modify Cells Rα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.6 Modify Cells Rβ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.7 Integrity Cloud Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

vii

Chapter 1

Introduction

Nowadays, many companies use databases to store information in order to assist indecision making and economic growth (Mather et al., 2009). According to (Silberschatzet al., 1997), a database is a collection of interrelated data that is organized through amanagement system to provide a way to store and retrieve database information thatis both convenient and efficient. At first, companies stored the databases locally. Theabove implies costs such as hardware and software’s resources. In addition, companiesneeded specialized resources to manage the data. Later, a new computing model namedcloud computing emerged that offers different services, including storage and databasemanagement. This service is known in the literature as Databases as a Service (DaaS).In DaaS, the data owner (client) delegates his data to an outsourced service provider(server) to store and manage his information. From the business point of view, DaaSallows to transfer some of the costs to the service provider, thus companies can geteconomic savings. However, from the security point of view, DaaS is a new paradigmthat involves security risks that must be addressed such as an availability interruption,confidentiality, or integrity violations.

These security services’ violations can happen due to a non-malicious event or byan attack. As it happened with the following cases: Google, in August 2015, losesuser data due to a thunderstorm (News, 2015), Microsoft Azure in February 2019 lostdatabase records as part of a mass DNS outage (Bradbury, 2019), and Amazon EC2in 2011 when Amazon’s huge EC2 cloud services crash permanently destroyed somedata (Blodget, 2011). The cryptographic community is studying how to mitigate theserisks. Since the cloud computing model imposes a more challenging scenario becausethe server is not trusted. Therefore, the server is considered as the main opponent.

In DaaS, Availability is a security service that guarantees access to an outsourceddatabase at some time. As a consequence, it is important to provide the authenticationof stored database. The community has proposed a set of data possession schemes.Data possession is a problem where the client to verify that a server has retained filedata without retrieving the data from the server and without having the server accessthe entire file (Ateniese et al., 2007; Wang et al., 2019).

In the case of Confidentiality, DaaS is particularly challenging, since if the databaseis encrypted by means of flow or blocks ciphers, then the server will be unable to answerqueries. This is because the goal of an encryption process is not to reveal anything

1

about the plaintext or the key that generates the ciphertext. Thus, one of the mainchallenges in the cryptographic community is to design an encryption scheme where theencrypted text retains some properties of the plaintext and allows the server to answerqueries, such as order preserving encryption (Boldyreva et al., 2009) and homomorphicencryption schemes (Fontaine and Galand, 2007).

Integrity in DaaS is more than detecting if the data has been modified in an unau-thorized manner. Since when a query is posed by the client, the server must be capableto prove that the response is correct (without unauthorized modifications), completethat there are not added or missing tuples, and finally that is fresh; i.e. that it has beenobtained from the most updated database. In the literature, the problem of providingthe server with the ability to prove integrity, completeness, and freshness on the processof querying outsourced databases is known as authenticated query processing (AQP).In this research we will focus on this problem, therefore it is described in detail.

When the client outsources his data, it also desires to consult the data later. Fur-thermore, the client after pose a query expects as a response to the set of tuples thatsatisfy the query. However, the server can answer with the correct response or withsome other result (incorrect answer). For example, consider the relationship shown inTable 1.1.

ID NAME GENDER LEVEL AGEGLO Omar M L1 30LRL Laura F L3 25FJO Oscar M L2 27VBR Rosy F L1 26MRB Betty F L1 25

Table 1.1: Employees

Consider that the client poses the following query:

Q1: SELECT * FROM Employees WHERE Level=’L1’;

The correct answer for the query Q1 is the set:

(GLO,Omar,M,L1, 30), (V BR,Rosy, F, L1, 26), (MRB,Betty, F, L1, 25)

But if the server responds with an incorrect response, it can do one or more of thefollowing malicious actions:

1. Alter the data,

2. Add tuples, or

3. Omit tuples

An incorrect response could contain tuples modified in an unauthorized manner. Fol-lowing with the example, an incorrect answer is shown below:

(GLO,Omar,M,L1, 30), (V BR,Rosy,M,L1, 26), (MRB,Betty, F, L1, 27)

2

In the above answer, the gender of Rosy and the age of Betty were altered.Moreover, it may be that the server includes tuples that were not delegated by the

client. For example:

(GLO,Omar,M,L1, 30), (V BR,Rosy, F, L1, 26), (MRB,Betty, F, L1, 25)

∪(ASP, Juan,M,L1, 28)

Notice that (ASP, Juan,M,L1, 28) is an extra tuple which is not in the Employeesrelationship of Table 1.1.

Also, the server can omit tuples that meet the conditions of the query. Consideringthe previous query Q1, an incomplete answer can be expressed such as:

(GLO,Omar,M,L1, 30), (MRB,Betty, F, L1, 25)

Notice that, the complete answer has three tuples, however, the server only respondedwith two tuples. The completeness requirement is fundamental to verify the correctnessof data because the client only receives the information that the server sent and doesnot know if the data that satisfies the query is complete or not.

In addition, the server can include tuples that are in some version of the database,but they do not correspond to the current one (compromising freshness). When theclient performs the following update in the Employees relation in Table 1.1:

U1:UPDATE Employees SET Level=’L3’ WHERE ID=’FJO’;

The client obtains a relation shown in the Table 1.2 where the level of Oscar wasupdated.

ID NAME GENDER LEVEL AGEGLO Omar M L1 30LRL Laura F L3 25FJO Oscar M L3 27VBR Rosy F L1 26MRB Betty F L1 25


So the client poses the next query:

Q2:SELECT * FROM Employees WHERE Level=’L3’;

The server should execute the query in the updated database (Table 1.2) to generatethe updated answer as shown below:

(LRL,Laura, F, L3, 25), (FJO,Oscar,M,L3, 27)

If the server performs the query in the relation shown in the Table 1.1 then a not-updated answer is generated:

(LRL,Laura, F, L3, 25)

3

Thus, the freshness implies that the client may verify if the response sent by theserver was obtained using the most recent data and does not belong to old versions. Thefreshness is required in dynamic databases because the information constantly changes.In static databases, the data are stored for a relatively big period of time, therefore,the updates are scheduled based on previous knowledge.

Until now, the requirements of the AQP problem was explained, which are: in-tegrity, completeness, and freshness. In the following lines we will discuss the differentcharacteristics that make this problem difficult; granularity and types of queries.

The integrity service is achieved through cryptographic primitives such as hash func-tions, MAC’s or digital signatures. These primitives are used to design cryptographicschemes that allow providing integrity with granularity at the tuple level (Narashimaet al., 2004; Rodríguez-Henríquez and Chakraborty, 2013; Etemad and Küpçü, 2018;Ausekar and Pasupuleti, 2018). The granularity allows defining integrity at differentlevels: database, table, tuple or cell. For some time, granularity was defined at thetuple level because the cryptographic community considered that tuple granularity wasthe best option regarding query availability and computational cost (Hacigümüş et al.,2004; Mykletun et al., 2006). Using a MAC or a signature chain to provide integrity ineach cell of the database is not convenient, since it would cause an overhead in the costof processing and storage. For this reason, integrity in AQP problem remains an areaof study for the community. Choosing a suitable granularity is very important for theprecision of the selection response. Let us see an example: Consider the relationshipin Table 1.2 and assume that integrity is defined with granularity at the tuple level. Ifthe client executes the following query:

Q3:SELECT NAME, AGE FROM Employees WHERE LEVEL=’L3’;

Then, the server responds with the following information:

(Laura, 25), (Oscar, 27) .

To verify the integrity of the above response, the client needs that the server sends thefull tuples:

(LRL,Laura, F, L3, 25), (FJO,Oscar,M,L3, 27)

The above happens because the granularity is defined at the tuple level, which isnot very convenient due the server sends additional information such as ID, GENDER andLEVEL for each tuple that satisfies the query Q3, affecting bandwidth. As we can see,some type of queries, such as Q3, are not very efficient for databases where granularityis defined at tuple level or greater granularity, however for databases with granularityat the cell level, this type of queries is very efficient.

In addition to granularity, another important characteristic to consider for the AQPproblem is the types of queries. It is important to mention that for each type of querythe AQP problem is different. In order to explain these differences between the typesof queries, it is necessary to mention that to provide the completeness requirement,control information is added to the database. This control information can be a datastructure or a signature chain as explained in more detail in Chapter 3.

4

In selection queries, the WHERE clause uses different logical operators (OR,AND,NOT )to retrieve information from the database. To guarantee the correctness of the data,the client must combine the control information respecting the logical operators inthe WHERE clause, which, sometimes is complicated due to the design of the controlinformation.

Unlike selection queries, range queries employ inequality operators in the WHEREclause (<,≤,≥, >) to retrieve information from the database that satisfies a certainrange. The problem in this type of query is to ensure that between any two elementsthere is no other element that satisfies the query condition.

In aggregation queries, the WHERE clause using aggregation functions such assum, avg (average), max (maximum), and min (minimum) to obtain information of thedatabase. For example: if the client performs an aggregation query in the Table 1.1:

Q4:SELECT ave(AGE) FROM Employees;

The server returns answer with the average age: 26.6. With this information, theclient should be able to guarantee the correctness of response, which is a difficult prob-lem because the information obtained from the query Q4 is very limited. That is, theclient does not know if the data that was used to obtain the information (in this examplethe average) is all that satisfies the query.

According to the previous paragraphs, trying to unify different types of queries in ageneral scheme is a big challenge for the cryptographic community. As a consequence,in this research project, we explored selection and range queries in outsourced staticdatabases.

Outsourcing the databases represents responsibility for both the service provider andthe client. The provider must guarantee the privacy and integrity of the data, besides,the client has the need to perform a process to verify the correctness and completenessof the data.

1.1 Problem StatementWe consider a scenario where the client delegates a relational database into a serverto store it, as shown in Figure 1.1. Two very important entities participate in thisscenario, the client and the server. The client is a trustworthy entity while the server isnot, that is, the first protects your data and will never send confidential information tothird parties. However, the untrusted server can alter the data of the client. It may bethat the changes in the data are due to multiple factors such as software and hardwarefailures or attacks caused by both internal and external adversaries. From the client’spoint of view, some unauthorized modification represents a data integrity violation.

When the client queries his outsourced data, expects a response as a set of recordssatisfying the query’s conditions. As the server provider is not trusted, then, the clientshould be capable of verifying the correctness of its responses.

5

Data

Data

Query

Answer

IntegrityCompletenessFreshness

Client Server

Untrusted

Figure 1.1: Authenticated Queries Processing

In other words, a malicious server could can to insert false records in the database,modify existing records (integrity) or, simply, omit or add some of them from thequery response (completeness), in addition, the query should be done in the databaseof most up-to-date data (freshness). Therefore, there should be a process that protectsthe client from such malicious behavior of the server, this process is the authenticatedquery processing. As discussed in the previous section, in addition to the integrity,completeness and freshness requirements for the AQP problem, there are characteristicsthat are involved in the design of a cryptographic scheme such as granularity and typesof queries. For these reasons, designing a cryptographic scheme for authenticated queryprocessing with fine granularity to provide all requirements and allow all types of queriesis complicated.

Due to the complexity of the AQP problem, in this work, we design a cryptographicscheme for authenticated query processing with cell level granularity to provide integrityand completeness in selection and range queries in cloud databases. Notice that, thefreshness property only makes sense in dynamic databases, in this research, we workexclusively with static databases.

1.2 General ObjectiveTo design a cryptographic scheme for authenticated query processing with cell levelgranularity to provide integrity and completeness in selection and range queries incloud databases.

1.2.1 Particular Objectives• To design a cryptographic scheme for authenticated queries processing with cell

level granularity to provide integrity in cloud databases.

• To design a cryptographic scheme for authenticated queries processing with celllevel granularity to provide completeness in cloud databases.

• Integration of previous schemes to obtain a cryptographic scheme to provide in-tegrity and completeness in cloud databases.

6

1.3 HypothesisIt is possible to design a cryptographic scheme for authenticated query processing withcell level granularity to provide integrity and completeness in selection and range queriesin cloud databases.

1.4 ContributionsThe main expected contributions of this research are:

1. A novel cryptographic scheme for authenticated query processing with cell levelgranularity to provide integrity and completeness in selection and range queriesin cloud databases.

2. A mathematical model that allows us to analyze and evaluate the proposed cryp-tographic scheme.

1.5 About Document ContentThe remaining part of the document is organized as follows: In chapter 2 some basicdefinitions of the areas of our interest such as cryptography, databases, and filters aregiven. In Chapter 3, a description of generalities of the state of the art related to thisproblem is shown. Also, a review of the research proposals and their products are pre-sented. The proposed cryptographic scheme is displayed in Chapter 4 together withthe security analysis and cost analysis. In Chapter 5, the system information and thedataset characteristics, the experiment conditions, and the results of our experimentsare shown. Finally, the conclusions, the main contributions, and future work are pre-sented in Chapter 6. For further details of the implementation, you may refer to theAppendixes.

7

Chapter 2

Preliminaries

In this chapter, some definitions and concepts of Databases and Cryptography arepresented. In Section 2.1 we give a definition of data structure and show differentkinds of structures such as relational databases, bitmaps, and skip lists. In Section2.2 a probabilistic data structure named filters is described. Finally, in Section 2.3we discuss some cryptographic primitives such as unkeyed primitives, symmetric-keyprimitive and public-key primitives.

2.1 Data StructureData Structures are collections of variables, possibly of several different data types,connected in such a way (Hopcroft and Ullman, 1983) that it is possible to performoperations processing, retrieving and storing data in an effective form (Abiteboul et al.,1995). Data structures are used to manage large amounts of data. There are differentkinds of structures, the classification of these data structures is made based on theircharacteristics and the type of operations that can be performed.

In this section, we present a definition of relational databases and introduce ourgeneral notation. Subsequently, other data structures called bitmap and skip list aredescribed.

2.1.1 Relational DatabasesAccording to (Abiteboul et al., 1995), the data model is a collection of conceptual toolsfor describing data, data relationships, data semantics, and consistency constraints.This model provides a way to describe the design of a database at the physical, logical,and view levels. There are a number of different data models, relational model is a datamodel that is based on tables to represent data (Abiteboul et al., 1995).

Definition 2.1.1.1 Relational databases. A relational database is based on the re-lational model. It is a collection of tables to represent both data and the relationshipsamong those data, and it is designed to manage large bodies of information (Silberschatzet al., 1997).

8

a1 a2 · · · aj · · · an

t1 v11 v12 · · · v1j · · · v1nt2 v21 v22 · · · v2j · · · v2n...

......

......

......

ti vi1 vi2 · · · vij · · · vin...

......

......

......

tm vm1 vm2 · · · vmj · · · vmn

Table 2.1: Rmn

Each table of relational databases has multiple columns, and each column has aunique name. The tables contain records of a particular type. Each record type definesa fixed number of fields, or attributes. The columns of the table correspond to theattributes of the record type. In a relational database, the data are represented astwo-dimensional table named relation (Garcia-Molina, 2008). The Table 4.3 illustratesa sample relation, then, a set of elements that belong to the relation Rmn are definedas follows:

1. A = a1, a2, . . . , an is the set of attributes where n is the number of attributes.

2. T = t1, t2, . . . , tm is the set of rows or tuples where m is the number of tuples.

3. vij is the value of a ti tuple and an aj attribute in Rmn.

4. The domain of attributes is represented by Dom (aj) = v1j, v2j, . . . , vlj , wherel is the number of all possible values that aj can take.

5. Rmn : T ×A→ Dom (aj) is a relation that associates to each ordered pair (ti, aj)a value in its specific domain. For example,

Rmn(ti, aj) = vij

In a relational database model, we can apply different operations or transactionssuch as create, modify, retrieve, delete to a database using a relational query language.The relational query languages define a set of operations or transactions that operateon tables, and output tables as their results (Silberschatz et al., 1997).

The SQL (Structured Query Language) is a relational database language that pro-vides the ability to query information from the database and to insert, modify, anddelete tuples in the database. The basic structure of an SQL query consists of threeclauses:

SELECT · · · FROM · · · WHERE

The SELECT clause is used to list the attributes desired in the result of a query. TheFROM clause is a list of the relations to be accessed in the evaluation of the query. TheWHERE clause is a predicate involving attributes of the relation in the from clause. If weconsider the relationship Rmn in the Table 4.3, then a SQL query has the form:

9

SELECT a1, a2, · · · , am FROM Rmn WHERE P .

where a1, a2, · · · , am are the attributes, Rmn the relation and P is a predicate.A query is a statement requesting the retrieval of information. Users access the data

through different types of queries:A SELECT statement is used to retrieve data from a table. The statement is split intoa select list (the part that lists the columns to be returned), a table list (the part thatlists the tables from which to retrieve the data), and an optional qualification (the partthat specifies any restrictions) (Group, 1996).Selection query. A query can be qualified by adding a WHERE clause that specifieswhich tuples are wanted. The WHERE clause contains a Boolean expression, and onlytuples for which the Boolean expression is true are returned (Group, 1996). The usualBoolean operators (AND, OR, and NOT) are allowed in the qualification. The basicsyntax is:

SELECT <attributes> FROM <relation> WHERE <conditions with AND, OR,NOT>

Range query. For this query, the WHERE clause contains an inequality expression,and only tuples for which the inequality is true are returned. The usual comparisonoperators are: <,>,6,>. The syntax is:

SELECT <attributes> FROM <relation> WHERE <conditions with <,>,6,>>

Aggregation query. The relational database supports aggregate functions. An aggre-gate function computes a single result from multiple input tuples. For example, thereare aggregates to compute the count, sum, avg (average), max (maximum) and min(minimum) over a set of tuples. The syntax is:

SELECT <aggregate function> FROM <relation>

Join query. Queries can access multiple tables at once or access the same table insuch a way that multiple tuples of the table are being processed at the same time. Aquery that accesses multiple tuples of the same or different tables at one time is calleda join query (Group, 1996). The syntax is:

SELECT <attributes> FROM <relations> WHERE <conditions>

In addition to retrieving information, the user can access and modify the databasethrough the following operations:The INSERT operation is used to insert data into a relation. The syntax for thisoperation is:

INSERT INTO <relation> VALUES (v1, v2, · · · , vn)

In certain situations, we want to change a value in a tuple without changing allvalues in the tuple. For this purpose, the UPDATE statement can be used. The syntaxis:

10

UPDATE <relation> SET < attributes> WHERE <conditions>

The DELETE operation can remove only whole tuples; we cannot delete values ononly particular attributes. The respective syntax is:

DELETE FROM <relation> WHERE <conditions>

Until this moment was defined concepts related to databases, to continue we defineother types of structure.

2.1.2 BitmapsThe bitmaps indexes are very efficient for databases applications (accelerate query pro-cessing) where the data records are read only and the queries usually produce a largenumber of data items (Wu et al., 2003).

Definition 2.1.2.1 A bitmap index is a bit string to describe if the value of an attributeis equal to a specific value or not. The position of the bit denotes the tuple number inthe relation (Rodríguez-Henríquez, 2015).

A key to design parameter for bitmap indexes is the encoding scheme, which de-termines the bits that are set to 1 in each bitmap in an index (Chan and Ioannidis,1999). The equality encoding for select queries and range encoding for range queriesare defined as follows:

Equality encoding. Consider the relation Rmn in the Table 4.3. We definethe bitmap of an attribute aj corresponding to its value vlj in the relation Rmn asBitmap(aj, vlj) = Xl , where Xl is a binary string, such that | Xl |= m and for1 ≤ k ≤ m (Rodríguez-Henríquez and Chakraborty, 2013),

bitk (Xl) =

1 ifRmn (tk, aj) = vlj0 otherwise

Range encoding. In the literature, there are other kinds of bitmap encoding toallow different types of queries in particular range queries. In this case, we are goingto check specific encoding called less-than-encoding. For the relation Rmn shown in theTable 4.3 we have Bitmap<Rmn

(aj, vlj) = Yl where:

bitk (Yl) =

1 ifRmn (tk, aj)<vlj0 otherwise

Other encodings were generated from equality and less-than-encoding using logicaloperations as shown below (Rodríguez-Henríquez, 2015).

Bitmap<Rmn(aj, vlj) = Bitmap<Rmn

(aj, vlj)⊕

BitmapRmn(aj, vlj)Bitmap6Rmn

(aj, vlj) = Bitmap<Rmn(aj, vlj) ∨ BitmapRmn(aj, vlj)

Bitmap>Rmn(aj, vlj) = Bitmap<Rmn

(aj, vlj)

Ordered Matrices. Let M a m× n matrix, mij denote the entry in the ith tupleand jth column (Rodríguez-Henríquez, 2015).

11

Definition 2.1.2.2 Let M a m× n bit matrix. A column j of M is said to be ordered if(Rodríguez-Henríquez, 2015)

m1j ≤ m2j ≤ · · · ≤ mmj

Definition 2.1.2.3 A matrixM is said to be column ordered if all its columns are ordered(Rodríguez-Henríquez, 2015).

For example, we can build an ordered bit matrix with the bitmaps in less-than-encoding. Firstly, the bitmaps in less-than-encoding for the AGE attribute in the relationEmployees show in Table 1.1 are computed. The bitmaps are the following:

Bitmap<(AGE, 25) = 00000Bitmap<(AGE, 26) = 01001Bitmap<(AGE, 27) = 01011Bitmap<(AGE, 30) = 01111

From these bitmaps we may generate the matrix M1 as follows:

M1 =

0 0 0 0 00 1 0 0 10 1 0 1 10 1 1 1 1

Notice that the matrix M1 is a ordered bit matrix. From M1 we can computed an

array Arr:Arr[0, 2, 4, 3, 2]

In the position one the array Arr has a value of 0 because, in the first column ofM1, 1 does not appear. The position two of the array Arr has a value of 2 because inthe second column of M1 the first 1 appears in row two. The array Arr has a valueof 4 in position three because in the third column the first 1 appears in row four. Sosuccessively for the remaining positions of Arr. A formal definition of this concept isgiven below.

Definition 2.1.2.4 Let M be a m× n column ordered bit matrix. Arr[m] is an array ofsize m that is obtained from M , where the position i of the array corresponds to columni of the matrix and the value Arr[i] = k, k is the row number where the first 1 of columni appears.

Definition 2.1.2.5 Given a relation Rmn and an attribute aj ∈ A. Let L be a sorted list(in in-creasing order) of the values in Dom(aj). Then, L as the ordered list of aj forRmn. The length of L as denoted by len(L). Hence, len(L) = |Dom(aj)| (Rodríguez-Henríquez, 2015).

Consider the relation Employees in Table 1.1. A sorted list L1 for the AGE attributeis given below:

L1[25, 26, 27, 30]

In general, given the ordered list L and array Arr, different bitmaps can be obtainedfor an attribute aj and value vij. The procedures and algorithms to compute thesebitmaps are illustrated in Chapter 4.

12

2.1.3 Authenticated Skip ListAccording to (Pugh, 1990) skip list is a new data structure proposed as an alternativeto balanced trees. An authenticated skip list is an extension of the skip (Etemad andKüpçü, 2018). The leaves store hashes of data items, for example in Figure 2.1 we showthe authenticated skip list for the next set of items e1, e2, · · · , em. Then, the internalnodes store a hash of a function of the values of their children. Following with theexample, to compute h1 it uses h(−∞) with h(e1), it uses h(e2) and h(· · · ) to computeh2. This process continues until the root is calculated using h5 with h(+∞). Finally,h6 is signed using a signature scheme. It is important to mention that the values on thepath from a leaf node up to the root constitute a membership proof, that is, to rebuildthe hash in the root node it is necessary to compute the hashes related to the node inquestion.

𝑒1 𝑒2 ⋯ 𝑒𝑚

ℎ(𝑒1) ℎ(𝑒2) ℎ(⋯ ) ℎ(𝑒𝑚)ℎ(−∞) ℎ(+∞)

ℎ1

ℎ5

ℎ6

ℎ2

ℎ4

ℎ3

Figure 2.1: Authenticated Skip List

2.2 FiltersThe filters are a data structure that are used to represent large data sets and allowingmembership tests with a false positive or negative rate (Dillinger and Manolios, 2004).The membership test consists of determining whether an item is in the set. In mostcases, this test is developed differently in each filter. The filters admit operations suchas insert, search or delete items. The number of operations that a filter can performand the false positive or negative rate depends on the filter design.

This filters are very using in novel manner in databases and allows to verify theintegrity of the data (Ferretti et al., 2018). In this following paragraph, we present theBloom filter and the Cuckoo filter.

2.2.1 Bloom FilterThe Bloom filter was proposed by Burton Bloom in 1970 as a probabilistic data struc-ture that is used to represent the elements of a set and allows membership tests with afalse positive rate (Bloom, 1970). The membership test consists of determining whetheran item is in the filter. If the result of membership test is true, then the item is in thefilter with a small probability of a false positive. Otherwise the item is not found, thatis, the membership test does not have false negatives.

13

A Bloom filter can be seen as an array BF [m] of m bits to represent a set S =x1, x2, x3, · · · , xn of n elements, initially with all bits set to zero as illustrated inFigure 2.2. The main idea for the construction of the filter is to use k hash functions,where each function is applied on an element hk(x) and relates an element x ∈ S withan integer in the range from 0, 1, 2, · · · ,m− 1.

Figure 2.2: Bloom Filter

The Bloom filter allows operations such as insert and search elements. An elementx ∈ S can be inserted in the filter BF [m] by calculating all the hash functions for x,that is, we get h1(x), h2(x), h3(x), · · · , hk(x) and each function obtains a position i fromthe array BF [m] with 0 ≤ i ≤ m− 1 where the bit is set to 1.

To search for an element x in the filter BF [m] a membership test is performed.All the hash functions are calculated on the element x and it is verified if all thecorresponding bits in BF [m] are set to 1, if so, the element x is found in the filter witha small probability of false positive. Otherwise the element is not in the filter. Noticethat to perform the membership test, the same set of hash functions that was used forconstruction are required.

The false positive rate in the Bloom filter can be obtained by knowing three pa-rameters: the size of the Bloom filter (m), the number of hash functions (k), and thenumber of elements (n) to be inserted in the filter:

false positive rate =

(1− 1

m

)km= e−

knm

This false positive rate can be decreased by a careful choice of the Bloom filter pa-rameters (Broder and Mitzenmacher, 2003). Given m and n we can know the optimalnumber of hash functions that minimizes f as a function of k.

Note thatf = ekln(1−e

−knm )

Let g = kln(1− e−knm ) then

dg

dk= ln(1− e

−knm ) +

(kn

m

)(e−knm

1− e−knm

)If we place the derivative equal to zero, we can obtain the critical values:

ln(1− e−knm ) +

(kn

m

)(e−knm

1− e−knm

)= 0 (2.1)

A solution for the equation 2.1 is when k =m

nln(2). Performing the corresponding

operations, it is concluded that the function g at the value k =m

nln(2) has a global

14

minimum. Then, the optimal number of hash functions is the value of k that minimizesthe false positive rate and can be computed in terms ofm and n. Therefore, the optimalfalse positive rate can be computed as following:

Optimal false positive rate = e−mnln(2)2 .

2.2.2 Cuckoo FilterThe Cuckoo Filter is a probabilistic data structure that is used to represent the elementsof a set and allows membership tests with a false positive rate. The membership test isperforming of the same manner at the bloom filter and does not have false negatives.

The cuckoo filter uses cuckoo hash tables with a basic unit named entry. Eachentry stores a fingerprint. Cuckoo hash tables consist of an array of buckets, wherea bucket can have multiple entries. The cuckoo filter can be represented as a arrayCF [m] of m buckets (see Figure 2.3), where each bucket can have b entries to representa set S = x1, x2, x3, · · · , xn of n elements, initially with all entries empty. To buildthe cuckoo filter, the partial key cuckoo hash technique is used. This technique allowsobtaining the indices of the buckets where the elements of the set S are inserted.

CF

0 1 2 3 4 ⋯ 𝑚 − 1

Figure 2.3: Cuckoo Filter with m buckets and b = 4 entries.

The cuckoo filter allows to perform insert, search, and delete operations. The in-sertion process can be analyzed in two cases: when inserting an element x there is anempty position in the bucket and when it is necessary to move the element betweennests to find an alternative position in the filter.

To insert an element x ∈ S in the filter CF [m] , first the fingerprint of the elementis calculated:

fx = fingerprint(x) (2.2)

Then, using the partial cuckoo hashing technique, the indices of the candidate bucketsare obtained:

h1(x) = hash(x) (2.3)h2(x) = h1(x)⊕ hash(fx) (2.4)

If the buckets h1 or h2 have any empty entries then the fingerprint fx is added andthe process ends. If this does not happen, a bucket i (i = h1 or i = h2) is randomlychosen and then an entry e from bucket i is selected. Once the selection is made, thefingerprint fe stored in the input e is retrieved and the fingerprint fx is inserted. Nowan alternative position j is calculated that indicates the bucket where the fingerprint

15

fe can be inserted, for this the current bucket i, the recovered fingerprint fe and thefollowing equation is used:

j = i⊕ hash(fe)

If the bucket j has any empty entries, then the fingerprint fe is inserted and the insertionprocess ends. The process is repeated until a bucket with an empty entry is found, orup to a maximum number of movements. If none bucket is found then the cuckoo filterCF [m] is considered full and not more items can be inserted.

To search an element x in the filter CF [m] a membership test is performed. Thefingerprint of the element is calculated with the equation 2.2. Subsequently, the positionof the candidate buckets are calculated using the equations 2.3 and 2.4. Then it isverified if in the obtained buckets there is an entry that contains the fingerprint fx, ifthis happens the cuckoo filter returns true with a probability of false positive, otherwiseit returns false.

Suppose that we want to remove the element x from the cuckoo filter CF [m]. First,it is calculate the fingerprint of x with the equation 2.2. Afterwards, the position ofthe candidate buckets is computed using the equations 2.3 and 2.4. If in any bucketthere is an entry that contains the fingerprint fx, a copy of it is deleted. Otherwise thealgorithm returns false. This way of deleting items in the cuckoo filter helps preventthat one item with the same fingerprint is erased.

False positives in the cuckoo filter occur when an element x and an element y havethe same fingerprint and the same buckets. The probability of getting a repeated finger-print depends on the number of entries b per buckets and the length of the fingerprintf :

1−(

1− 1

2f

)2b

=2b

2f(2.5)

Given a false positive rate ε, if 2b2f≤ ε then the minimum size required for the fingerprint

is approximately:

f ≥⌈log2

(2b

ε

)⌉=

⌈log2

(1

ε

)+ log2(2b)

⌉Currently there is not a theory that allows computed to determine the optimal numberof entries per bucket. According to experiments perform by (Bose et al., 2008) it wasconcluded that the best number of entries per bucket is b = 4.

2.3 Cryptographic PrimitivesIn this section, we present the definition of cryptography, the main security servicesprovided by cryptography and finally we discuss some cryptographic primitives.

According to (Menezes et al., 1996) achieving information security in an electronicsociety requires technical and legal skills. The technical means are provided throughcryptography.

Definition 2.3.0.1 Cryptography is the study of mathematical techniques related toaspects of in formation security such as confidentiality, data integrity, entity authenti-cation, and data origin authentication (Menezes et al., 1996).

16

The main security services provided by cryptography are the following:

1. Confidentiality is a service used to keep the content of information from all butthose authorized to have it.

2. Data integrity is a service which addresses the unauthorized alteration of data.

3. Authentication is a service related to identification. There are two types: entityauthentication and data origin authentication.

4. Non-repudiation is a service which prevents an entity from denying previouscommitments or actions.

In this work, we focus on data integrity service, to assure data integrity, one shouldhave the ability to detect data manipulation by unauthorized parties. Data manipula-tion includes such things as insertion, deletion, and substitution.

The security primitives that cryptography provides can be classified into three areas(Menezes et al., 1996): unkeyed primitives, symmetric-key primitives, and public-keyprimitives.

2.3.1 Unkeyed PrimitivesHash cryptographic functions play a fundamental role in modern cryptography. Thereare many hash functions, commonly used in non cryptographic applications. Usually,know them simply hash functions.

Definition 2.3.1.1 Hash function. A hash function H is a function that associatesan arbitrary length string with a fixed length string i.e,

H : 0, 1∗ → 0, 1τ

Where:

0, 1∗ is the set of all binary strings.

0, 1τ is the set of all τ bit strings.

τ is the size of the bits string.

Let l ∈ 0, 1∗ then H(l) is called digest. It will be necessary for H to satisfy certainproperties in order to prevent various forgeries.

We require that H satisfy the following collision-free property: Let l, l′ ∈ 0, 1∗. Ahash function H is weakly collision-free for l if it is computationally infeasible to finda l′ 6= l such that H(l

′) = H(l).

Another property that H must satisfy is the following: A hash function H is stronglycollision-free if it is computationally infeasible to find l and l

′ such that l′ 6= l andH(l

′) = H(l).The finally property is a hash function H is one-way if, given a digest t, it is

computationally infeasible to find a l such that H(l) = t.

17

2.3.2 Symmetric-key PrimitivesThese primitives have the ability to provide integrity based on a secret key. The mech-anisms that provide such integrity check based on a secret key are usually named"message authentication codes" (MAC).

Definition 2.3.2.1 Message Authentication Code. Message authentication code (MAC)provides authentication in the symmetric key setting. The MAC is a bit string that isobtained from the following function (Menezes et al., 1996)

MAC : K×M → 0, 1τ

Where:

K is the key space.

M is the message space.

τ is the size of the bits string.

This MAC (often named tag or digest) that is used to provide integrity and authen-tication of a message. The receiver can check this tag and be sure that the message hasnot been modified by a third party.

Similar to other message authentication codes, a hashed message authenticationcode can simultaneously verify the authentication of the message and data integrityassociated with it.

Definition 2.3.2.2 Hash Message Authentication Code. Hash message authenti-cation code (HMAC) provides authentication in the symmetric key setting. The HMACis a bit string that is obtained from the following function (Krawczyk et al., 1997)

HMAC(k,m) = H((k′ ⊕ opad) ‖ H((k

′ ⊕ ipad) ‖ m))

Where:

H: is a cryptographic hash function.

m: is the message to be authenticated.

k: is the secret key.

‖: denotes concatenation.

⊕: denotes bitwise exclusive or (XOR).

opad: is the block-sized outer padding.

ipad: is the block-sized inner padding.

18

2.3.3 Public-key PrimitivesThe digital signature is a cryptographic primitive of public key. This primitive pro-vides three security services: integrity, authentication, and non-repudiation. Accordingto (Menezes et al., 1996) the purpose of a digital signature is to provide a means for anentity to bind its identity to a piece of information.

The process of signing entails transforming the message and some secret informationheld by the entity into a tag named a signature (Menezes et al., 1996). Supposes thatGen is the key generation algorithm. Let S be the algorithm that the sender A appliesto a message m, and the output of this algorithm is called signature SA. The algorithmthat the receiver applies to a messagem and a signature SA in order to verify legitimacyis denoted by V . A signature scheme is a tuple of three polynomial-time algorithms(Gen, S, V ) satisfying the following (Menezes et al., 1996):

• The key-generation algorithm Gen takes as input a security parameter 1n andoutputs a pair of keys (pk, sk). These are called the public key and the privatekey, respectively.

• The signing algorithm S takes as input a private key sk and a message m ∈ 0, 1∗.It outputs a signature SA denoted as Ssk(m) = SA.

• The deterministic verification algorithm V takes as input a public key pk, a mes-sage m, and a signature SA. It outputs Vpk(m,SA) = u where u = true meaningvalid and u = false meaning invalid.

2.4 DiscussionIn this section some definitions such as data structures were given, which are a veryimportant tool in the field of computing because they allow better organization andmanagement of data. For example, bitmaps are a efficient structure in database ap-plications, they speed up the query process. Other structures such as filters were alsodescribed, recently the filters have been used in a novel manner into databases to verifythe integrity of the data. Some structures by themselves cannot guarantee the integrityof the data, so they resort to the cryptographic primitives that exist in the literaturesuch as hash function, MAC’s, and signatures. All of these tools work together to ob-tain a better product. The concepts and definitions that were reviewed in this part areto a better understanding of the following sections.

19

Chapter 3

State of the Art

In this chapter, we summarize the most relevant works related to authenticated queryprocessing on static outsourced databases, i.e. we focus on integrity and completenessrequirements. In Section 3.1 each of the AQP requirements is discussed, and the dif-ferent axis that makes this problem difficult. Also, we propose a taxonomy that allowsus to discuss in Section 3.2 the main techniques proposed in the literature to provideintegrity and completeness in outsourced databases. Finally, in Section 3.3 there is adiscussion contrasting the techniques that were analyzed in Section 3.2.

3.1 Generalities of the State of the ArtThe authenticated query processing looks forward to providing integrity, completeness,and freshness in cloud databases. Designing a cryptographic scheme for authenticatedquery processing is a big challenge because there are multiple axis that should beconsidered, such as: the kind of database (static or dynamic), granularity, type ofqueries (for more details see Section 1), among others.

3.1.1 IntegrityEven though there are multiple cryptographic primitives that provide integrity (e.g.hashes, MACs, and signatures), their adoption in the context of databases is not easy.Since this requirement is related to the granularity, for example, if a primitive is calcu-lated over each value of the database then the storage cost is increased. Otherwise, if acryptographic primitive is computed over each row then the storage cost is manageable.However, when the client poses a query for a few attributes of the table, the entire rowshould be retrieved to be able to verify. Thus, excessive network overhead is imposed.

Asymmetric primitives such as digital signatures have an optimal performance interms of storage, computing, and network usage but the processing costs are very highwhich limits the adoption in database scenarios (Narashima et al., 2004; Pang et al.,2005). Symmetric primitives such as MAC and hash functions have lower computationalcost compared to asymmetric primitives (Rodríguez-Henríquez and Chakraborty, 2013;Rodríguez-Henríquez, 2015), but only can be privately verified. In recent years, a data

20

structure named Bloom filter was proposed to provide integrity at cell level granularity(Ferretti et al., 2018) with a comparable storage cost of the tuple granularity schemes.

Moreover, this is a significant advance in the literature that opens up new possibil-ities to generate cryptographic schemes that take advantage of this granularity level.

3.1.2 CompletenessProviding completeness service is a challenge. When the client executes a query, a well-behaved server returns a response with the data that satisfies the query restrictions.However, it is possible that the server cheats, thus ideally the client should be able toverify the completeness of the response. In a database that does not include an authen-ticated query processing scheme, it is not possible to verify the completeness only withthe query response, since there is not enough information. To solve this problem, it isnecessary to add control information (such as authenticates data structure or signaturechain) to the externalized database, which allows the server to provide a proof of cor-rectness of the response. The problem of completeness lies in know-how to design thiscontrol information, for example, it should be considerably smaller than the originaldatabase. Furthermore, the structure of this information varies according to the typeof query that will be authenticated, as we discussed in the following.

Selection queriesThe selection query allows to select a set of attributes and filter the tuples according tothe WHERE clause, which includes a set of restrictions related by logical operators suchas OR, AND, and NOT, among others. For example, consider the relation shown in Table1.1 and the following query:

SELECT NAME, AGE FROM Employees WHERE GENDER=’F’ AND LEVEL=’L1’;

The correct and complete response for the above query is (Rosy, 26), (Betty, 25).Notice that each tuple only includes two attributes as posed in the query and that thetwo tuples correspond to Gender=’F’ and Level=’L1’.Providing a fine granularity (i.e. attribute level) is important for the precision of theselection response. Since it does not require to include any extra information for ver-ification, as would happen with any other level of granularity. In other words, if theauthenticated query scheme allows cell granularity the server should not include otherattributes except for thus explicitly required in the query, i.e. Name and Age in theexample.Moreover, usually, the control information is designed by attribute, but the selectionqueries require that this information can be easily combined, since where conditionsimpose relations using logical operators. Following with the above example, considerthat CI1 and CI2 are the control information to provide completeness in attributesNAME and AGE respectively. Thus, in order to verify the completeness of the response,it is necessary that CI1 and CI2 can be combined according to the AND operator.

Thus, the problem of completeness in the selection queries requires designing con-trol information in such a way that it can be easily combined with respect to logicaloperators.

21

Range queriesUnlike the selection queries, in the range queries the WHERE clause uses inequality oper-ators such as <,≤, >,≥, to establish a range where the value R(ti, aj) of an attributein a tuple must fall in order to ti be part of the correct response, denoted by aj ≥ liand aj ≤ ls, where li and ls are the left and right bound, respectively. These limits areoptional.

For example, suppose the client issues the following query in Table 1.1:

SELECT NAME, LEVEL FROM Employees WHERE AGE>25;

The correct and complete response for the above query is shown in Table 3.1. Thefirst column lists the tuples that meet the WHERE condition, i.e. AGE>25. The secondcolumn of Table 3.1 shows the corresponding age for the tuples in the response.

Tuples Condition (AGE>25)

(Omar,L1) 30(Oscar,L2) 27(Rosy,L1) 26

Table 3.1: Response

Following the example, the client should verify that in each pair of tuples ((Omar,L1)and (Oscar,L2), or (Oscar,L2) and (Rosy,L1)), there is no other tuple (x,y) correspondsto the age R(ti, AGE) that satisfies the condition >25. Moreover, should not exist atuple t′ between the tuple tm and the limit li = 25 that meets the condition.

In general, the client must be able to check that between any two tuples in theresponse, there is no other tuple that meets the condition in the WHERE clause. Also theclient must verify the limits of the involved range.

3.1.3 FreshnessThe freshness requirement for the authenticated query process implies that the clientcan verify if the response sent by the server was obtained from the most recent databaseversion and not from an old version. To tackle this problem, control information such astimestamps (Xie et al., 2008) and authenticated data structures (Rodríguez-Henríquez,2015; Wang et al., 2015; Etemad and Küpçü, 2018; Ausekar and Pasupuleti, 2018) havebeen proposed.

There are static and dynamic databases, static databases store information for arelatively big period of time (months, years) and allow query operations, e.g. businessintelligent databases. Notice that in this kind of database the updates are scheduledbased on previous knowledge.

Unlike static databases, in dynamic databases, information constantly changes (e.g.transactional databases), i.e. allows query and update operations. In this type ofdatabase, freshness is required. To fulfill the freshness requirement, the control infor-mation added by the query authenticated processing scheme should manage efficientlyupdates.

22

3.1.4 Overview: State of the artThere are different axis that affects the AQP problem, increasing its complexity. Thus,it is difficult to have a single classification of the literature proposals. In this research,we propose the taxonomy depicted in Figure 3.1 that classifies the proposals based onthe completeness technique used.

AQP

Signature Chain ADS

Trees Bitmaps Filters Hybrids

HADSDevanbu

et al., 2003

Narashimaet al., 2004

Xie et al., 2008

RDAS et al., 2013

Rodríguez et al., 2015

Ferretti et al., 2018

Wang et al., 2015

Etemad et al., 2018

Ausekar et al., 2018

Pang et al., 2005

Pang et al., 2008

Pang et al., 2009

Figure 3.1: State of the Art Taxonomy

The proposals that are analyzed study a subset of the requirements of the authenti-cated query processing. All of them provide integrity and completeness at least, except(Ferretti et al., 2018) which does not provide completeness but is the first work capableof providing integrity at cell level granularity. Due to this big advance is one of theinspirations for our scheme.

There are two main approaches that allow providing completeness: signature chainand authenticated data structure (ADS). The signature chain approach looks to providecompleteness without a data structure as control information. The schemes proposedby (Narashima et al., 2004; Narasimha and Tsudik, 2006) and (Pang et al., 2005) fallin this category.

The second approach provides completeness using data structures (ADS), in thenext section this approach will be discussed in detail. We classified the works accordingto the kind of ADS used, trees, bitmaps, filters, and hybrids.

Moreover, in Figure 3.1 are illustrated in green the works that provide three ser-vices: integrity, completeness and freshness. The works that provide integrity andcompleteness are shown in orange. In purple appear the works that provide integrityand freshness. All of them considers a tuple level granularity. Finally, the work thatprovides integrity at the cell level is shown in red.

Initially, most of the works have as objective to develop a scheme that allows onetype of query. Within the works that allows selection queries are: (Wang et al., 2015;Ausekar and Pasupuleti, 2018) and the work that allow range queries are: (Devanbu

23

Year Name Technique Integrity Completeness Freshness Granularity Types queries

2003 Devanbu et al.ADS-Merkle

tree X X Tuple Range

2004 Narashima et al. Signature-Chain X X Tuple Range2008 Xie et al. ADS+timestamp X X Tuple Aggregation

2013 RDASADS(bitmaps)/

MAC X X Tuple Selection

2015Rodríguez-Henríquez

ADS(bitmaps)/MAC X X X Tuple

Selection/Range

2015 Wang et al. ADS-Bloom F. X X X Tuple Selection2018 Ferretti et al. Blomm F. X Cell Selection2018 Etemad and Küpçü HADS(trees) X X X Tuple Join

2018Ausekar andPasupuleti

MB-Tree/ IBF/CBF X X X Tuple Selection

2019 CLIC-ODBADS(bitmaps)/

Cuckoo F. X X CellSelection /

Range

Table 3.2: Related work

et al., 2003; Narashima et al., 2004). Once this objective is achieved, the works areextended to support another type of query, for instance, (Rodríguez-Henríquez, 2015)allows selection and range queries and (Etemad and Küpçü, 2018) allows selection,range and join queries.

Table 3.2 summarizes the different characteristics (granularity and type of queries)and services (integrity, completeness and freshness) provided by the relevant worksrelated to the AQP problem. Also, we present the year of publication, the name of theauthor or the name of the scheme and the techniques used.

This section presents an overview of the most relevant works related to the AQPproblem. In the following section, the most important techniques used in these worksto provide integrity and completeness are explained in detail.

3.2 Main ApproachesThere are different approaches that help solve the problem of authenticated queryprocessing. In this section, we will talk about approaches that are used to protectintegrity and completeness in cloud databases. First, the technique that deals withthe signature-based approach is presented. Subsequently, the techniques related to thedata structures approach are discussed.

3.2.1 Approaches Based on Signature chainThe main characteristic of a signature chain based approach is that it achieves com-pleteness without using any data structure as control information. How the signaturechain work is illustrated in the following section.

24

Signature chainThe idea of a signature chain was introduced by (Narasimha and Tsudik, 2006), some-time later, arise some works in the same line of research with different improvementsin proof construction and verification process (Pang et al., 2005; Pang and Tan, 2008;Pang et al., 2009).

In general, to chain a set of elements e1, e2, · · · , er a signature is computed, ac-cording to the following expression:

Sr(h(h(e1) ‖ h(e2) ‖ · · · ‖ h(er)))

where Sr is the signature and h is a hash function.In order to use this technique to provide correctness and completeness in a database,

a signature chain is computed for each tuple. This process starts by ordering the tuplesby each attribute of interest, (i.e. attributes that can be involved in a range query).For example, consider the Employees relation shown in Table 1.1. In this case, theattribute of interest is AGE (because it is the only attribute where range queries makesense), and the tuples follow this ordered list (t2, 25), (t5, 25), (t4, 26), (t3, 27), (t1, 30).The process of computing the signature chains for the Table 1.1 is illustrated in Figure3.2. Each row corresponds to the tuple re salted in blue, the next step is to determine itspredecessor and successor. For example, consider the tuple t1 where the attribute AGEhas the value 30. Its predecessor is 27 which corresponds to tuple t3, and its successoris +∞ since there is not a tuple with a higher value than 30. Now, consider the tuplet2 where the attribute AGE has the value 25. Its predecessor is −∞ since there is nota tuple with a smaller value than 25, and its successor is 26 which corresponds to tuplet3. For the tuple t3 where the attribute AGE has the value 27. Its predecessor is 26and successor is 30 which correspond to tuple t4 and t1 respectively.

−∞ 𝑡3 𝑡1 +∞

−∞

−∞

−∞

−∞

𝑡2 𝑡5 +∞

𝑡4 𝑡3 𝑡1 +∞

𝑡5 𝑡4 𝑡3 +∞

𝑡2 𝑡5 +∞𝑡4

Figure 3.2: Signature chain

The tuple signature is computed including the hashes of all immediate predecessortuples. In the Figure 3.2 we observe that the predecessor for tuple t1 is tuple t3. Then,the signature chain for tuple t1 is computed as follow:

t1 → St1(h(h(t1) ‖ h(t3)))

25

Similarly, the signatures chain for tuples t2, t3, t4 and t5 are computed:

t2 → St2(h(h(t2) ‖ h(−∞))) t4 → St4(h(h(t4) ‖ h(t5)))t3 → St3(h(h(t3) ‖ h(t4))) t5 → St5(h(h(t5) ‖ h(t2)))

After a signature chain is generated for each tuple of the database, the databasetogether with the additional information is sent to the server.

When the client poses a query, the server response with the tuples that satisfy thequery and additional information. This additional information includes the lower andupper boundary tuples and the signature chain Sti related to each tuple.

The verification process involves re-calculated a signature chain S′tifor each tuple

ti that satisfy the query. The client uses the additional information sent by the serverto re-calculate the signature chain. If the signature chain S

′ti

= Sti then the data iscorrect and complete, otherwise, a violation was detected.

Suppose that the client arises the following query:

SELECT * FROM Employees WHERE AGE>25

The correct and complete response for the above query is: t1, t3, t4.To verify the correctness and completeness of the response, it is necessary the ad-

ditional information, in this example is St1 , St3 , St3 , t5. Then, the response returned bythe server would look as follows:

(t1, t3, t4, St1 , St3 , St4,t5)

The next step is re-calculated the signature chain for each tuple in the response, inthis case are S ′t1 , S

′t3

and S ′t4 . If S ′t1 = St1 , S′t3

= St3 and S ′t4 = St4 then the responset1, t3, t4 is integrity and completeness.

This technique has the advantage that it does not require an ADS to provide com-pleteness. However, a disadvantage is that many signatures have to be calculated, onesignature for each tuple in the database, and the signatures are very expensive in termsof processing.

3.2.2 Approaches Based on Authenticated Data StructuresData structures are widely used in different research areas. In outsourced databases,data structures are used as a tool that allows providing integrity and completenessrequirements for the AQP problem. The idea of this approach is to delegate the databasealong with a data structure. This structure allows the client to verify the integrity andcompleteness of the query result. The client raises a query and the server searches theresults and extracts the additional data from the data structure necessary to prove theresponse correctness and completeness.

In the literature, have been proposed many kinds of data structures, the most uti-lized for solving the AQP problem are trees, bitmaps, HADS and filters. In the followingsections, the use of each structure in the outsourced databases is described in detail.

26

TreesA tree data structure is a collection of nodes, where each node stores a value along witha list of references to other nodes (called children). Also, have the following constraints:that no reference is duplicated, and there is no node that points to the root.

These tree-based structures have been used to provide completeness in the databases.There are different kinds of trees, for example, Merkle hash tree, MB-tree, Red tree,Black tree, among others. In general, the tree stores the references to the values of eachtuple in a specific attribute.

This technique has been applied in multiple works (Devanbu et al., 2003; Etemadand Küpçü, 2018). The main idea is to build a tree for each attribute of interest(attributes that are involved in queries). The leaf nodes store the digests obtained fromhashing the attributes’ values. Subsequently, each parent node stores a digest that isobtained from hashing the concatenation of the digests stored by the child nodes. Thisprocess is repeated until reaching the root node. Once the root node digest has beencalculated, this value is signed by the client.

The root signature is stored locally by the client. In response to a query posed bythe client, the server sends the results of the query and the set of nodes necessary torebuild the root node. The client verifies the signature of the root, and if it verifies,then, the client is satisfied that the query result is both correct and complete.

To illustrate this technique, consider a Merkle hash tree. We describe an examplebased on the Employees relationship of Table 1.1. Figure 3.3 illustrates a Merkle hashtree for the ID attribute.

ℎ(𝐿𝑅𝐿) ℎ(𝑀𝑅𝐵) ℎ(𝑉𝐵𝑅) ℎ(𝐹𝐽𝑂) ℎ(𝐺𝐿𝑂)

ℎ1 ℎ2 ℎ3 ℎ4

ℎ12 ℎ34

ℎ𝑟𝑜𝑜𝑡

Figure 3.3: Merkle hash tree

To build the Merkle hash tree of Figure 3.3, the hash of each value in the leaf nodewas computed:

h(LRL), h(MRB), h(V BR), h(FJO), h(GLO), h()

27

Then h1, h2, h3 and h4 are computed as follows:

h1 = h(h(LRL) ‖ h(MRV ))

h2 = h(h(V BR) ‖ h())

h3 = h(h(FJO) ‖ h())

h4 = h(h(GLO) ‖ h())

Next, h12 and h34 are computed by hashing h1, h2 and h3, h4 respectively.

h12 = h(h1 ‖ h2) h34 = h(h3 ‖ h4)

The root is computed hashing h12 and h34 as follow: hroot = h(h12 ‖ h34). Finally, thedigest stored in the root is signed: Sroot,k(hroot).

The query process starts when the data-owner poses a query, e.g.:

SELECT * FROM Employees WHERE ID=MRV;

The related correct and complete response is (MRB,Betty,F,L1,25).The first step of the verification process is to construct the root by using the extra

information returned by the server. Afterward, the signature is verified. Following theexample, the extra information that the server must include is: h(LRL), h2, h34.

To construct the root, the client hashes the items that satisfy the query. Then,h(LRL) and h(MRB) are used to compute h1, which is used along with h2 to computeh′12. Next, h′12 and h34 are hashed to compute h′root. Finally, the client uses h′root toverify the signature S ′root. If S ′root = Sroot then the response of query is correct andcomplete.

As mentioned at the beginning of this section, each attribute has an associatedtree. When the client poses queries and the WHERE clauses have more than onecondition, it is necessary to combine and compare these trees in order to guaranteethe integrity and completeness of the response. The operations used to compare andcombine trees require big processing, which represents a disadvantage for this kind oftree-based structures.

BitmapsBitmap index has gained a lot of popularity in the current days for its use in acceleratedquery processing. The bitmaps indexes are very efficient for database applications wherethe data records are read only, and the queries usually produce a large number of data(Wu et al., 2003).

Currently, bitmaps are used as a structure that allows to provide the complete-ness requirement for the AQP problem (Rodríguez-Henríquez and Chakraborty, 2013).Unlike tree-based structures, bitmaps are an easy structure to combine respecting thelogical operators in the selection queries.

The idea of this work is to add an extra column to the database, where that extracolumn coincides with the number of the tuple. Then, it generates an extra table where

28

it stores the bitmaps obtained. Finally, the client protects these bitmaps using a MACcryptographic primitive.

When the client makes a query to the server, the server responds with the tuplesthat satisfy the query and additional information, this information contains the bitmapsstored in the extra table and the corresponding number of the tuple.

The verification process is different for each type of query. In selection queries,the client retrieves the bitmaps in equal coding and operates them using the logicaloperators of the WHERE clause.

We will see an example where the operation of bitmaps for selection queries isexplained.

Consider the relationship shown in Table 3.3.

ID NAME GENDER LEVEL AGE NonceGLO Omar M L1 30 1LRL Laura F L3 25 2FJO Oscar M L2 27 3VBR Rosy F L1 26 4MRB Betty F L1 25 5

Table 3.3: Employees with an extra attribute named Nonce

First, we will calculate the bitmaps of the allowed attributes, in this case, we com-pute the bitmaps with equality encoding for GENDER and LEVEL.

Bitmap=(LEV EL,L1) = 10011Bitmap=(LEV EL,L2) = 00100Bitmap=(LEV EL,L3) = 01000Bitmap=(GENDER,F ) = 01011Bitmap=(GENDER,M) = 10100

Let’s see how bitmaps work using the following query:

SELECT * FROM Employees Nonce WHERE GENDER=’F’ AND LEVEL=’L1’;

A complete answer for this query is:

(V BR,Rosy, F, L1, 26, 4), (MRB,Betty, F, 25, 5)

Now we will verify that the answer is complete using the bitmaps. The bitmaps thatare used in this process are chosen by analyzing the WHERE clauses, in this example theselected bitmaps are:

Bitmap(GENDER,F ) = 01011Bitmap(LEV EL,L1) = 10011

Note that the number of WHERE clauses should be equal to the number of selectedbitmaps, in this example, the number of WHERE clauses is 2 and the number of theselected bitmap is 2, therefore, the process continues, otherwise, the process ends andmeans that completeness has not been achieved.

29

After selecting the bitmaps, the logical operation indicated in the WHERE clause is com-pute, in this query the logical operator is AND:

0 1 0 1 1∧ 1 0 0 1 1

0 0 0 1 1

Finally, the result is analyzed looking for the positions where bits are set to one. In ourexample positions 4 and 5. This means that tuples 4 and 5 are the ones that meet theconditions of the above query. In this way, the index bitmaps provide completeness toselection queries in the database.

Hierarchical Data Structures (HADS)A disadvantage that ADS has is that needs a total order in the attributes’ values.Providing a total order is difficult, since there are attributes that have repeated values,thus, it is difficult to establish an order. There are even attributes where their valuescannot be sorted. To solve this problem, a technique named hierarchical ADS (HADS)is introduced (Etemad and Küpçü, 2018).

The idea behind this technique is to build a HADS for each attribute of each tableto allow the server to provide proof of the correctness of its response. The HADSconstruction allows combining multiple ADS. In each HADS level, there is an ADSthat can be of different type than the ADS of other levels. The following Figure 3.4taken from (Etemad and Küpçü, 2018) illustrates a general HADS for a database.

Database ADS

Table ADSs

ColumnADSs

Primary keyADSs

Table names

Columnnames

Columnvalues

Primary keys

𝑅

𝑇𝑘

𝐶𝑛𝐶1𝐶𝑛𝐶1

𝑉𝑟𝑉1

𝑝𝑘𝑟𝑝11𝑝𝑘1𝑝11

𝑇𝑘

…

… …

Figure 3.4: A general HADS

Notice that in Figure 3.4 the highest ADS, the database ADS, stores the names oftable. For each table, there is a table ADS, which stores names of the columns in thattable. For each column, there is a column ADS that stores the unique values in that

30

column. Finally, the last ADSs are primary key ADSs, associated with each uniquevalue vi in a column Cj, storing the primary key values of the records that having vi incolumn Cj.

The verification process is done from the bottom up. In the firts step, the clientverifies the set of primary keys. If all of them verifies, then continue to use them forverifying proofs of the column ADS’s. If this step is also successful, its results are usedto verify proofs of the table ADSs. The proof of the databases ADS is verified in asimilar manner. If all proofs are verified by using all and only the tuples include in theresponse, the client accepts the answer as correct and complete.

This technique constructs a HADS using different ADSs at multiple levels in ahierarchical structure. First, at the lowest level is constructed using the attributes’values. Then, these ADSs are grouped according to some relation, and their digestsalong with its location information and the upper level data are used to build theupper level ADSs. This process continues until a single ADS is built, its root is storedas metadata by the client.

For example, we are going to build a HADS for the AGE attribute considering therelation of the Table 1.1. Figure 3.5 shows a HADS with two levels, in the first levelthere is a Merkle hash tree as ADS where the leaf nodes have the primary keys, in ourexample the primary key is the ID: GLO, LRL, FJO, VBR, MRB. To build the Merklehash tree of Figure 3.5, the hash of each value is computed and included as a leaf node,as mentioned in the Tree Section. In the second level, there is an authenticated skiplist as ADS, this skip list has the interest attributes’ information. If the attributes haverepeated values only are included once, in our example the values of the AGE attributeare: 25, 26, 27, 30. To build the authenticated skip list, the hash of each value iscomputed:

h(25), h(26), h(27), h(30)

Then h1, h2, h3, h4 and h5 are computed as follows:

h1 = h(h(−∞) ‖ h2) h2 = h(h(25) ‖ h11) h4 = h(h(27) ‖ h13)h3 = h(h(26) ‖ h12) h5 = h(h(30) ‖ h14)

Subsequently, h6, h7 and h8 are computed:

h6 = h(h4 ‖ h5) h7 = h(h3 ‖ h6) h8 = h(h1 ‖ h7)

Finally, h9 is computed: h9 = h((h8) ‖ h(+∞)).Notice that, the root of this structure is given by h9, h9 is stored by the client and

the HADS is sent to the server. During the queries stage, the client can pose queries.Suppose that the client performs the following query:

SELECT * FROM Employees WHERE AGE=’27’ AND ID=’FJO’

The correct and complete response for the query is (FJO,Oscar,M,L2, 27).To carry out the verification process, the server sends the response of the query plus

the additional information, this information is used by the client to re-build the root ofHADS and verify the correctness and completeness of the response.

31

Level 1

Level 2

ℎ(−∞) ℎ(25) ℎ 26 ℎ(27) ℎ(30) ℎ(+∞)

ℎ1

ℎ8

ℎ9

ℎ7 ℎ6

ℎ(𝐿𝑅𝐿) ℎ(𝑀𝑅𝐵) ℎ(𝑉𝐵𝑅) ℎ(𝐹𝐽𝑂) ℎ(𝐺𝐿𝑂)

ℎ11 ℎ12 ℎ13 ℎ14

ℎ2 ℎ3 ℎ4 ℎ5

Figure 3.5: Two levels’ HADS

The second-level ADS needs to prove membership of 27. The proof includes theboundary records and all internal nodes values required for verification at the client.In this example, this can be done by returning essentially the searched value alongwith the hashes of the nodes required to obtain the corresponding digest: 26, 27, 30,h1, h12, h14, h(+∞). At the first level, the Merkle tree needs to prove membership ofFJO. This is done by returning: FJO,. Thus, the additional information sends bythe server will look like:

26, 27(FJO,), 30, h1, h12, h14, h(+∞)

Going back to our example, the verification of the two-level HADS is required toguarantee the correctness of the response. According to the verification process, thefirst step is to verify in bottom-up fashion. In this case, the verification of the Merklehash tree. Given proof FJO,, the hashes of each item are computed: h′(FJO) andh′(), subsequently, the hash h′13 = h(h(FJO) ‖ h()) . If h′13 = h13 then the Merkle

hash tree passes the verification.Now, the skip list verification is done for the second level. The verification algorithm

extracts the result 27 and boundary records 26, 30, checks if 26 < 27 < 30, and computesthe hashes of records in the result set: h′(26), h′(27), h′(30). Then, it uses h′(26) andh12 to compute h′3. It uses h′27 and h

′13 to compute h′4. It uses h′(30) and h14 to

compute h′5, which is used together with h′4 to compute h′6. h′6 is used together with h′3

to compute h′7, which is used together with h1 to compute h′8. Finally, it uses h′8 andh(+∞) to computes h′9, the digest of the computed ADS. Now, it compares h′9 againstthe digest stored locally h9. If h

′9 = h9 then the response of above query is correct and

completeness.The authors in Etemad and Küpçü (2018), claim that the HADS are more efficient

in terms of server processing costs than any other ADS. Due to the HADS stores unique

32

values in the upper level, independently of the number of repetitions that can exist inthe database. Thus, the ADS generated by this technique is smaller than the traditionalADS, because the last does not avoid repetitions. On the other hand, to answer queries,the server only looks for the required values, and it has access to the set of tuples thatcontains this value, without further computation. However, this technique uses multiplestructures for each attribute, increasing the storage overhead in the server.

Bloom filterFor some time, integrity was provided at the tuple level’s granularity looking to balancefunctionality and the storage costs. Nowadays, it is possible to provide integrity at thecell level’s granularity by means of Filters data structures, introduced by (Ferrettiet al., 2018). A Bloom filter is a probabilistic data structure that is used to representa set of items. This structure allows insertion and update operations, in addition tomembership tests with a false positives rate. Bloom filters allow false positives but notfalse negatives. Here, we will discuss the insertion operation and membership tests.

This technique adds to all database tables a new column storing a Bloom filter,that allows the client to verify the integrity of all the data stored in the correspondingtuple. Usually, the Bloom filter is described by an array BF [m] of m bits, initially allset to 0. To represent a set S = x1, x2, . . . , xn of n elements, a Bloom filter uses kindependent hash functions h1, h2, . . . , hk with range 1, . . . ,m. For each elementxi ∈ S, the bits hl (xi) are set to 1 for 1 6 l 6 k.

The process for build the Bloom filter is illustrated by an example. Consider theEmployees relationship shown in Table 1.1, we compute the Bloom filter for the firsttuple, the process is similar to generate the other filters of the corresponding tuples.

Assume that the Bloom filter is described by the following array BF, initializing allvalues to 0:

BF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

The first step is to concatenate the attribute with it is respective value, in the example,we concatenate ID with GLO (ID ‖ GLO). The next step is to compute the k hashfunctions for (ID ‖ GLO), through the hash functions we get the positions of the arraywhere the bits must be set to 1. For this example, we chooseK = 3. h1((ID ‖ GLO)) =3, h2((ID ‖ GLO)) = 11 and h3((ID ‖ GLO)) = 7.

BF 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

The next element to insert is (NAME ‖ Omar), positions are calculated using hashfunctions: h1(NAME ‖ Omar) = 5, h2(NAME ‖ Omar) = 12 and h3(NAME ‖Omar) = 0

BF 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

33

The next element to insert is (GENDER ‖M), compute the positions: h1(GENDER ‖M) = 10, h2(GENDER ‖M) = 12 and h3(GENDER ‖M) = 2.

BF 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

The next element to insert is (Level ‖ L1), compute the positions: h1(LEV EL ‖ L1) =8, h2(LEV EL ‖ L1) = 14 and h3(LEV EL ‖ L1) = 3.

BF 1 0 1 1 0 1 0 1 1 0 1 1 1 0 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

The next element to insert is (AGE ‖ 30), compute the positions: h1(AGE ‖ 30) = 10,h2(AGE ‖ 30) = 1 and h3(AGE ‖ 30) = 5.

BF 1 1 1 1 0 1 0 1 1 0 1 1 1 0 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

This last bloom filter is added to the database. After preparing the data, the clientcan consult the database, suppose that it poses the following query:

SELECT NAME, LEVEL FROM Employees WHERE AGE=30;

The correct and complete response for the query is: (Omar, L1).To provide integrity to the response, membership tests are performed. Membership

tests are used to see if an item is in the Bloom filter. If the element is in the filterthen two things may happen, the element is in the filter or we are in presence of a falsepositive. Otherwise, the element is not in the filter. Following with the example, wecompute the membership test to (Omar, L1). First, we concatenate the name of theattribute with it is respective value, that is, (NAME ‖ Omar) and (LEV EL ‖ L1).Then, compute the position of Bloom filter using the hash function: h1(NAME ‖Omar) = 5, h2(NAME ‖ Omar) = 12, h3(NAME ‖ Omar) = 0. If in all obtainedpositions obtained there is a 1 then the (NAME ‖ Omar) is in the filter. Otherwise, theelement (NAME ‖ Omar) is not in the filter and an integrity violation was detected.Similarly, the test is performed for (LEV EL ‖ L1).

The bloom filter is a structure that allows a finer granularity in the AQP problemthan the structures presented in the previous sections. This is important because itimpacts directly response precision provided by the server in selection queries. Toachieve this granularity, the Bloom filter construction computes k hash functions toinsert a single element, which increases the processing costs. However, due to the finegranularity provided by this technique, the storage and network overhead are reduced.

3.3 DiscussionThere are multiple axes that affect the AQP problem such as granularity and querytypes. Designing a scheme that meets all the requirements of the AQP problem is a big

34

challenge for the community. Thus, the proposed techniques focus on a subset of theAQP requirements. In this chapter, we have discussed the most relevant techniques.This section presents a discussion about the advantages and disadvantages of thesetechniques.

In general, the efficiency of the proposed schemes depends on the efficiency of theauthenticated data structure they are using, and the costs imposed by the cryptographicprimitives.

Due to the above, many proposals have explored the use of different structuretypes (Devanbu et al., 2003; Rodríguez-Henríquez and Chakraborty, 2014; Etemad andKüpçü, 2018; Ausekar and Pasupuleti, 2018). The tree-based approach requires tocompute a tree for each attribute, which imposes a considerable server storage cost.Moreover, in the query phase trees are well known for their efficiency for searching. In-deed, trees are used to index plain text databases. This characteristic allows to answerefficiently queries with one condition in selection queries. In contrast, one big disad-vantage of this structure is that trees are difficult to combine, increasing the cost ofmultiple conditions in the WHERE clause. Furthermore, the efficiency of the approachesbased on HADS is determined by the ADS included in their levels.

As trees, bitmaps are other structures usually used to index plaintext databases.This structure allows to efficiently answer queries (Wu et al., 2004). There are multiplestudies where the use of trees and bitmaps in databases is discussed, revealing thatbitmaps are more efficient in storage because only use one bit to represent a tuple.Also, they are more efficient in answering multiple condition queries due to the facilityto combine them. These advantages also hold in the cryptographic schemes based onthis structure. On the other hand, bitmaps are difficult to update. Thus, it is difficultto adapt cryptographic schemes based on bitmaps for dynamic databases. Even though,there are a few proposals that are devoted to this.

The signature chain scheme has the advantage that they do not require an ADSto provide completeness. However, this technique needs to calculate many signatures,which affects processing cost. So, the signature chain is considered an expensive cryp-tographic primitive.

Finally, filters are a structure that allows to provide integrity with a finer gran-ularity (cell level) than any other cryptographic primitive (due to the storage costsimposed). However, there is not a proposal based on filters that consider integrity andcompleteness.

35

Chapter 4

A Cryptographic Scheme for CellGranularity Authenticated QueryProcessing in Cloud Databases

In this chapter, we present the cryptography scheme for authenticated query processingon static outsourced databases. In Section 4.1 and 4.2, we explain the use of the Cuckoofilter and bitmaps to provide integrity and completeness, respectively. An overview ofthe scheme is shown in Section 4.3, in the same section, we discuss the scheme design indetail. The security analysis and security notion are described in Section 4.4. Finally,in Section 4.5 an evaluation of the scheme and a cost analysis is carried out.

4.1 Use of the Cuckoo Filter to Provide Integrity in the Data-base

In this section, we will explain how the cuckoo filter works to provide integrity in theoutsourced database. To explain in a better way, we use the relationship shown in Table4.1.

To guarantee the integrity of the relationship in Table 4.1 an extra column namedCF is added, in this column the filter Fi is stored for each tuple ti respectively, in thiscase 1 ≤ i ≤ 5.

ID NAME GENDER LEVEL AGE CFGLO Omar M L1 30 F1LRL Laura F L3 25 F2FJO Oscar M L2 27 F3VBR Rosy F L1 26 F4MRB Betty F L1 25 F5

Table 4.1: Employees 1

We explain the build for the F1 filter, for the filters F2, F3, F4 and F5 the process isthe same. Suppose that F1 has the following structure:

36

F1 =0 1 2 3

To insert an element in the filter, the name of the attribute is concatenated with itsrespective value and row number, then, the fingerprint is computed:

fingerprint(ID ‖ GLO) = fID,GLO

Next, we computed the position in the filter where the fingerprint fID,GLO of the element(ID ‖ GLO) is stored:

hash(ID ‖ lGLO) = 1hash(ID ‖ GLO)⊕ hash(fID,GLO) = 3

If the filter has space in any previous position then fID,GLO is inserted to the filter.

F1 =fID,GLO

0 1 2 3

The next element to insert is (Name ‖ Omar), we computed the fingerprint and thepositions:fingerprint(Name ‖ Omar) = fNAME,OMAR

hash(Name ‖ Omar) = 2hash(Name ‖ Omar)⊕ hash(fNAME,Omar) = 0

If the filter has space in any previous position then fNAME,Omar is inserted to the filter.

F1 =fID,GLO fNAME,Omar

0 1 2 3

The next element to insert is (GENDER ‖M), we computed:fingerprint(GENDER ‖M) = fGENDER,Mhash(GENDER ‖M) = 1hash(GENDER ‖M)⊕ hash(fGENDER,M) = 0

If the filter has space in any previous position then fGENDER,M is inserted to the filter.


fGENDER,M0 1 2 3

The next element to insert is (LEV EL ‖ L1), we computed:fingerprint(LEV EL ‖ L1) = fLEV EL,L1

37

hash(LEV EL ‖ L1) = 2hash(LEV EL ‖ L1)⊕ hash(fLEV EL,L1) = 3

If the filter has space in any previous position then fLEV EL,L1 is inserted to the filter.


fGENDER,M fLEV EL,L10 1 2 3

The last element to insert is (AGE ‖ 30), we computed:fingerprint(AGE ‖ 30) = fAGE,30hash(AGE ‖ 30) = 1hash(AGE ‖ 30)⊕ hash(fAGE,30) = 2

If the filter has space in any previous position then fAGE,30 is inserted to the filter. Inthis case, the filter F1 in positions 1 and 2 are full.


fGENDER,M fLEV EL,L10 1 2 3

We randomly select position 1 or 2, we choose section 1. After choosing the position,a fingerprint is randomly selected, in our example, we choose fGENDER,M . To continueswap fAGE,30 and the fingerprint fGENDER,M .


fAGE,30 fLEV EL,L10 1 2 3

With the fingerprint recover fGENDER,M , we calculate the alternate position:

hash(GENDER ‖M)⊕ hash(fGENDER,M) = 0

If the filter has space in previous position then fGENDER,M is inserted to the filter.

F1 =fGENDER,M fID,GLO fNAME,Omar

fAGE,30 fLEV EL,L10 1 2 3

In this example, all the elements of the first tuple of the Table 4.1 are already insertedin the filter F1. Now, suppose that the relationship in the table 4.1 is outsourced to anuntrusted server.When the client executes a query like the following:

SELECT NAME, GENDER, CF FROM Employees 1 WHERE AGE=’30’;

38

The server responds with: (Omar,M,F1).Note that the client also requests the filter F1 from the server, the filter F1 is necessaryto perform membership tests and verify the integrity of (Omar,M,F1).

Given the response (Omar,M,F1), first we verify the integrity of Omar, the processconsists in obtaining the fingerprint to (NAME ‖ Omar) as follows:

fingerprint(Name ‖ Omar) = fNAME,OMAR

Then, the client computes the position in the filter:hash(Name ‖ Omar) = 2hash(Name ‖ Omar)⊕ hash(fNAME,OMAR) = 0

If position 1 or position 2 has fNAME,OMAR then Omar passes the membership testwith a false positive rate, this means that integrity was not violated.The process is similar for verify the integrity of M :fingerprint(GENDER ‖M) = fGENDER,Mhash(GENDER ‖M) = 1hash(GENDER ‖M)⊕ hash(fGENDER,M) = 0

If position 1 or position 0 has fGENDER,M then M passes the membership test with afalse positive rate.

In this work, we use the cuckoo filter to provide integrity at the cell level in theoutsourced database. In section 4.3 we describe the design of the proposed scheme andgeneralizes the use of the cuckoo filter.

4.2 Use of the Bitmap Index to Provide Completeness in theDatabase

In this section, we will explain how the bitmap index works to provide completenessin the database. To work with bitmaps it is necessary to add an extra column namedNonce, in this column we write the corresponding number of the tuple as illustrated inTable 4.2.

ID NAME GENDER LEVEL AGE NonceGLO Omar M L1 30 1LRL Laura F L3 25 2FJO Oscar M L2 27 3VBR Rosy F L1 26 4MRB Betty F L1 25 5

Table 4.2: Employees 2

First, we computed the bitmaps of the allowed attributes, in this case, we calculate thebitmaps with equality encoding for LEVEL and GENDER.

Bitmap=(LEV EL,L1) = 10011 Bitmap=(GENDER,F ) = 01011Bitmap=(LEV EL,L2) = 00100 Bitmap=(GENDER,M) = 10100Bitmap=(LEV EL,L3) = 01000

39

Let’s see how bitmaps work using the following query:

SELECT NAME, Nonce FROM Employees 2 WHERE GENDER=’F’ AND LEVEL=’L1’

A complete response for above query is:

(Rosy, 4), (Betty, 5)

Now, we verify that the response is complete using the bitmaps. The bitmaps thatare used in this process are chosen by analyzing the WHERE clauses, in this example theselected bitmap are:

Bitmap(GENDER,F ) = 01011Bitmap(LEV EL,L1) = 10011

Note that, the number of where clauses should be equal to the number of selectedbitmaps, for this example, the number of where clauses are two and the number of theselected bitmap are two, therefore, the process continues, otherwise the process ends,and means that completeness has not been achieved.After selecting the bitmaps, the logical operation is indicated in the WHERE clause, inthis query the logical operator is AND, to continue we compute the operation:

∧0 1 0 1 11 0 0 1 10 0 0 1 1

We analyze the result and count from left to right the position of the bits, we see thatthe only bits on are number 4 and 5. This means that the tuples that satisfy only theabove query are, tuple 4 and 5. In this way, the index bitmaps provide completenessto selection queries in the database.

For range queries, the client computes the bitmaps with less-encoding. From thesebitmaps, the client applies algorithms (Rodríguez-Henríquez and Chakraborty, 2014)to retrieve another bitmap in any encoding (≤, >,≥).

Let’s see a particular example. Consider the relationship shown in Table 4.2 tocompute the bitmaps with less-encoding of the AGE attribute. The bitmaps obtainedare:

Bitmap<(AGE, 25) = 00000Bitmap<(AGE, 26) = 01001Bitmap<(AGE, 27) = 01011Bitmap<(AGE, 30) = 01111

After generating the information necessary to verify completeness, the client may posequeries. Suppose that the client performs the following query:

SELECT NAME Nonce FROM Employees 2 WHERE AGE>25;

40

A complete answer for this query is:

(Omar, 1), (Oscar, 3), (Rosy, 4)

To verify completeness, the client has to retrieve the bitmap corresponding to tu-ples that meet the condition of the query. In this example, we need to computeBitmap>(AGE, 25). The Bitmap>(AGE, 25) is obtained using its complement, thebitmap in equality coding and the following logic operation:

Bitmap>(AGE, 25) = Bitmap<(AGE, 25)C ⊕Bitmap=(AGE, 25)

Then, compute the Bitmap<(AGE, 25)C : we know that Bitmap<(AGE, 25) = 00000,thus, the complement is:

Bitmap<(AGE, 25)C = 11111

Now, it computes the Bitmap=(AGE, 25):

Bitmap=(AGE, 25) → Bitmap<(AGE, 25) → 00000Bitmap<(AGE, 26) → ⊕ 01001

01001

Next, it compute Bitmap>(AGE, 25):

Bitmap>(AGE, 25) → Bitmap<(AGE, 25)C → 11111Bitmap=(AGE, 25) → ⊕ 01001

10110

Finally, we analyze the result and count from left to right the position of the bits, noticethat the only bits on are 1, 3 and 4. This means that the tuples that satisfy only thequery are 1, 3 and 4. In this way, the index bitmaps provide completeness to rangequeries in a database. In the next section, the use of these bitmaps is generally shown.

4.3 Scheme DesignThe scheme is designed to be applied in two phases, in the first phase, the data areprepared to be sent it the server. In the second phase, the authenticated query processis performed.

The diagram in Figure 4.1 illustrates the development of the first phase. As canbe seen in the diagram, there is a function that takes the database of client as inputand produces a new database as output, which consists of the original data plus controlinformation. It is important to mention that this phase is carried out on the client sideand this new database is sent to the server.

In the second phase of the scheme a query translator is used to generate specificqueries with appropriate structure from the query made by the client, in order to theserver execute correctly the query in the new database. As shown in the diagram of

41

Data

+

Control

Information

Data

Function

Client Server

Untrusted

Figure 4.1: Outsourced data

Data

+

Control

Information

Query translator

AnswerVerification

process

Client Server

Untrusted

Figure 4.2: Query Process

Figure 4.2, the response generated by the server is analyzed by a verification processto ensure the integrity and completeness of the response.

We propose a cryptographic scheme with granularity at the Cell Level to provide In-tegrity and Completeness in Outsourced DataBases (CLIC-ODB). This scheme is basedon Cuckoo filter (Fan et al., 2014) and RDAS (Rodríguez-Henríquez and Chakraborty,2013). CLIC-ODB has a set of algorithms which are described in detail in the followingparagraphs.

Key generation algorithmWe denote the key space with K and let Γ be the key generation algorithmfor CLIC-ODB. Γ is an algorithm run by the client to select a set keys K =k1, k2, k3, k4 randomly from K. The key k1 is used to build the message authen-tication code and k2, k3, k4 are used to perform the construction of the cuckoofilter. In this scheme k1, k2, k3, k4 have the same size.

Function FLet F : R×K→ R

′ be a function that transform a set relations R to another setrelations R′ using the a set keys K. The function F is used in the first phase ofthe CLIC-ODB to prepare the database of client and sent it to the server. Wedescribe the procedure assuming that the set of relations R is a set consisting ofa single relation Rmn.

42

a1 a2 · · · aj · · · an

t1 v11 v12 · · · v1j · · · v1nt2 v21 v22 · · · v2j · · · v2n...

......

......

......

ti vi1 vi2 · · · vij · · · vin...

......

......

......

tm vm1 vm2 · · · vmj · · · vmn

Table 4.3: Rmn

A client who wants to store the relation Rmn in an un-trusted server, transformsRmn to R′mn using the function F and a set keys K.

The set relations R′mn = Rαmn, R

βmn, R

γmn, i.e, the function F converts Rmn into

three relations Rαmn, Rβ

mn and Rγmn.

The relation Rαmn : T ×A+ → Dom(aj) shown in Table 4.4 is defined on the set

attributes A+ = a1, a2, . . . , an, NonceA, CF, thus Rαmn has two more attributes

than in Rmn and both Rmn and Rαmn have the same number of tuples.

The NonceA attribute takes values from the set M = 1, 2, 3, . . . ,m using thefollowing rule: Rα

mn(ti, NonceA)→ i.

The CF attribute is a tag to provide integrity to the relation Rαmn based on Cuckoo

Filter. This attribute stores a short control structure, namely a cryptographicdigest (Fi), that allows the client to verify the integrity of all the data stored inthe corresponding tuple. Fi identifies the Cuckoo filter associated to the tuple ti,the value Fi is a bits string calculated according to the following equation:

Fi = CF (ti) (4.1)

a1 a2 · · · aj · · · an NonceA CFt1 v11 v12 · · · v1j · · · v1n 1 F1

t2 v21 v22 · · · v2j · · · v2n 2 F2

......

......

......

......

...ti vi1 vi2 · · · vij · · · vin i Fi

......

......

......

......

...tm vm1 vm2 · · · vmj · · · vmn m Fm

Table 4.4: Relation Rαmn

The Cuckoo filter is built by Algorithm 1. This algorithm computes a filter foreach tuple ti, first calculate the fingerprint f of the item to be inserted, in thiscase, the item is determined by the concatenated attribute, its value and rownumber. Each item has two candidate buckets determined by:

P1 ← hash(k1, aj ‖ vij) (4.2)P2 ← hash(k1, aj ‖ vij)⊕ hash(k2, fij) (4.3)

43

If either of two buckets is empty, the algorithm inserts f to that free bucket andthe insertion completes.

Else if randomly select an entry e of P1 or P2, then, swap f and the fingerprintfe stored in entry e. The new position for fe is calculated by the equation:

Paux ← Paux ⊕ hash(k2, fe)

If Paux has a bucket with an empty entry, the algorithm inserts fe to that freebucket. Finally, if the algorithm fails to insert the item then the filter is full.

The tuples in Rαmn are populated according to the procedure as shown Algorithm

2.

Table 4.5 shows the relation Rβmn that is obtained from function F applied to

relation Rmn. The relation Rβmn : N × B+ → M+ contains the attributes

B+ = Name, V alue,Bitmaps, NonceB ,MAC, independent of the attributesin relation Rmn.

Name Value Bitmaps NonceB MACt1 x1 y1 0, 1m m+ 1 Mβ

1

t2 x2 y2 0, 1m m+ 2 Mβ2

......

......

......

tr xr yr 0, 1m m+ r Mβr

......

......

......

tN xN yn 0, 1m m+N MβN

Table 4.5: Relation Rβmn

The Name attribute has a Dom (Name) = x1, x2, . . . , xr where xr ∈ A areallowed attributes.

TheValue of attribute has aDom(V alue) = Dom (a1)∪Dom (a2)∪, . . . , Dom (ap).Let Ω = ∪pi=1 (ai ×Dom (ai)), note that the elements of Ω are ordered pairsof the form (x, y) where x ∈ Dom (Name) and y ∈ Dom (V alue), and | Ω |=Σpi=1Card (ai) = N .

The Bitmaps attribute stores a binary string, these strings are used later in theverification process to provide completeness.

The NonceB attribute takes values from the setM+ = m+1,m+2,m+3, . . . ,m+N using the following rule: Rβ

mn(tr, NonceB)→ m+ r.

The MAC attribute stores a tag, namely a message authentication code(Mβ

r

),

that allows the client to verify the integrity of all the data stored in the corre-sponding tuple. The valueMβ

r is a bits string calculated according to the followingequation:

Mβi = MAC (k1, Name ‖ V alue ‖ Bitmaps ‖ NonceB) (4.4)

44

Algorithm 1 Cuckoo filter.Input : Tuple ti ;Output : Cuckoo filters Fi ;

1: function CF(ti)2: e = 1 to 4 ;3: l = 1 to BucketNum ;4: bucket[e][l] = 0e,l ;5: for j = 1 to n do6: fij ← fingerprint(aj ‖ vij ‖ i) ;7: P1 ← hash(k3, aj ‖ vij ‖ i) ;8: P2 ← hash(k3, aj ‖ vij ‖ i)⊕ hash(k4, fij) ;9: if bucket[e][P1] or bucket[e][P2] has an empty entry then

10: bucket[e][P1]← fij or bucket[e][P2]← fij ;11: return Done ;12: end if13: else14: 15: Paux ← P1 or P2

16: for count = 1 to MaxNumAttempts do17: randomly select an entry e from bucket[e][Paux] ;18: swap fij and the fingerprint fe stored in entry e ;19: Paux ← Paux ⊕ hash(k4, fe)20: if bucket[e][Paux] has an empty entry then21: bucket[e][Paux]← fe ;22: return Done ;23: end if24: end for25: 26: return Failure ;27: Fi ← bucket[e][l]28: end for29: return Fi ;30: end function

45

Algorithm 2 Rαmn.

Input : RelationRmn

Output : Relation Rαmn ,

1: function RALPHA(Rmn)2: for i = 1 to m do3: for j = 1 to n do4: Rα

mn(ti, aj)← Rmn(ti, aj) ;5: end for6: Rα

mn(ti, NonceA)← i ;7: Rα

mn(ti, CF )← CF (tαi ) ;8: end for9: end function

Algorithm 3 Rβmn.

Input : RelationRmn

Output : Relation Rβmn ,

1: function RBETA(Rmn)2: for r = 1 to N do3: Rβ

mn(tr, Name)← xr;4: Rβ

mn(tr, V alue)← yr;5: Rβ

mn(tr, Bitmaps)← BitmapsRmn(xr, yr);6: Rβ

mn(tr, NonceB)← m + r;7: S ← Rβ

mn(tr, Name) ‖ Rβmn(tr, V alue) ‖ Rβ

mn(tr, Bitmaps) ‖ Rβmn(tr, NonceB);

8: Rβmn(tr,MAC)←MAC(k1, S);

9: Mγr ← Rβ

mn(tr,MAC)10: end for11: end function

Where:

k1 is the key.

‖ is an operator for the secure concatenation of the two inputs.

The tuples in Rβmn are populated according to the procedure as shown Algorithm

3.

The relation Rγmn : p× C+ →M++ is shown in Table 4.6, this relation contains

the attributes c+ = Name, List, Array, NonceC ,MAC.

46

Algorithm 4 Rγmn.

Input : RelationRmn

Output : Relation Rγmn ,

1: function RGAMMA(Rmn)2: for q = 1 to p do3: Rγ

mn(tq, Name)← bq;4: Rγ

mn(tq, List)← Lq;5: Rγ

mn(tq, Array)← Arrq;6: Rγ

mn(tq, NonceC)← m + N + q;7: S ← Rγ

mn(tq, Name) ‖ Rγmn(tq, List) ‖ Rγ

mn(tq, Array) ‖ Rγmn(tq, NonceC);

8: Rγmn(tq,MAC)←MAC(k1, S);

9: Mγq ← Rγ

mn(tq,MAC)10: end for11: end function

Name List Array NonceC MACt1 b1 L1 Arr1 m+N + 1 Mγ

1

t2 b2 L2 Arr2 m+N + 2 Mγ2

......

......

......

tq bq Lq Arrq m+N + q Mγq

......

......

......

tp bp Lp Arrp m+N + p Mγp

Table 4.6: Relation Rγmn

The Name attribute has a Dom (Name) = b1, b2, . . . , bq where b1, b2, . . . , bq ⊆a1, a2, . . . , ap are attributes that allow range queries.

The List attribute has a Dom(List) = L1, L2, . . . , Lq where Lq ∈ Dom (bq).

The Array attribute has a Dom(Array) = Arr1, Arr2, . . . , Arrq where Arrq isan array that is obtained from the bitmaps matrix for each item in Dom (Name).

List and Array attributes stored in the relation Rγ are used later in the verificationprocess to provide completeness.

The NonceC attribute takes values from the setM++ = m+N+1,m+N+2,m+N +3, . . . ,m+N +q using the following rule Rγ

mn(tq, NonceC) : tq → m+N +q, 1 < q < p.

The MAC attribute stores a tag namely(Mγ

q

), that allows the client to verify

the integrity of all the data stored in the corresponding tuple. The value Mγi is a

bits string calculated according to the following equation:

Mγq = MAC (k,Name ‖ List ‖ Array ‖ NonceB) (4.5)

The tuples in Rγmn are populated according to the procedure as shown Algorithm

4.

47

Notice that, the function F is executed in the client side and the relation Rαmn,

Rβmn and Rγ

mn are sends and stored in the server.

Function φThe function φ, converts a query for the original relation Rmn to a set of querieswhich are executed on the relations Rα

mn, Rβmn and Rγ

mn. Given as input a validquery q, φ (q) outputs three queries, one query qα for the relation Rα

mn, otherquery qβ for Rβ

mn, and qγ query for Rγmn.

Let Q be a query defined by:

Q : SELECT aj FROM Rmn WHERE a1 = vi1 ∆ . . . ∆ an = vin ∆ b1 vi1 ∆. . . ∆ bp vinWhere:

∆ = ∨,∧,⊕, . . . is the set of logical operators.

= <,≤,=, >,≥, is the set of range operators.

φ (Q) will output the following queries:

To build Qα, the function φ in the SELECT clause add the NonceA and CF attribute.In the FROM clause swap the name of the relation Rmn for the relation Rα

mn. Then,in the WHERE clause, selects all the conditions. The query Qα is shown below:

Qα : SELECT aj, NonceA,CF FROM Rαmn WHERE a1 = vi1 ∆ . . . ∆ an = vin ∆

b1 vi1 ∆ . . . ∆ bp vin

For the query Qβ, the function φ in the SELECT clause swap the aj attributes withthe operator ∗. Then, in the FROM clause change the relation Rmn to the rela-tion Rβ

mn. After, the function φ build a disjunction of conjunctions as show below:

Qβ : SELECT ∗ FROM Rβmn WHERE (Name = a1 AND V alue = vi1) OR . . . OR

(Name = an AND V alue = vin);

For the query Qγ, the function φ in the SELECT clause change the aj attributeswith the operator ∗. Then, in the FROM clause swap the relation Rmn with therelation Rγ

mn. Finally, the function φ build a disjunction of conditions on theallowed attributes range as show below:

Qγ : SELECT ∗ FROM Rγmn WHERE Name = b1 OR . . . OR Name = bp;

Function ψThe function ψ is executed in the server to generate the response for a set of queriesproduced by φ. The response of the server is constructed just by running the

48

queries specified by φ on Rαmn, Rβ

mn and Rγmn. We denote the response by Ansψ =

(Ansα, Ansβ, Ansγ) where Ansα, Ansβ and Ansγ corresponds to responses of Qα,Qβ and Qγ, respectively.Note that relation Ansα has as domain the attributes that satisfies the queriescondition, the NonceA and CF. These last are necessary for the verification pro-cess. The relation between Ansβ and Ansγ contains tuples from the relation Rβ

and Rγ respectively, thus Ansβ and Ansγ have the same number of attributesthat Rβ

mn and Rγmn respectively. The set V O = NonceA,CF,Ansβ, Ansγ is

named verification abject.

4.3.1 Verification ProcessThe verification procedure checks for both the correctness and the completeness of theserver response. The server response consists of three distinct parts Ansα, Ansβ andAnsγ, the Ansα part corresponds to result of the original query Q, the Ansβ and Ansγpart assists the verification process to verify the correctness of the result in Ansα.

For the development of this verification process, the integrity of Ansα is first ver-ified. If the integrity in Ansα is not correct then the process ends. Otherwise, theprocess continues with the verification of Ansβ, if Ansβ is correct then a comparisonis made between the number of tuples in Ansβ and the number of WHERE clauses, ifthe comparison is true then the bitmaps related to the selection query are computed.If Ansβ is not correct the process ends. Following the process, the integrity of Ansγis verified, if Ansγ is not correct then the process ends. If Ansγ is correct then thebitmaps related to the range queries are calculated. Subsequently, the operations areperformed between the bitmaps obtained from Ansβ with the bitmaps obtained fromAnsγ respecting the logical operators that are in the WHERE clause of the original query.Then, the position of the bitmaps that are on to one is calculated. Finally, the positionnumber is compared with the numbers in NonceA attribute of Ansα, if the comparisonis true then the data are correct and complete. To perform the verification processinvolve the next steps:

Verify integrity AnsαIn the transformed relation Rα a Cuckoo filter Fi is associated with each tuple ti ofthe original relation Rmn, this Fi is used to provide integrity of the relation Ansαthrough membership queries in the Cuckoo filter (Algorithm 5). If membershiptext fails, them an integrity violation has been detected. If it occurs, then one ofthe following is true:

· The value vij of the tuple ti has not been tampered with and integrity holds.· The integrity of vij of the tuple ti has been compromised, but the membershiptest returned a false positive.

The Algorithm 6 shows the integrity verification process by response Ansα.Verify integrity AnsβIn first place you should check that Ansβ contains tuples corresponding to eachclauses WHERE in Qβ.

49

Algorithm 5 Membership test.Input : Tuple ti ;Output : True or Failure ;

1: function TEST(ti)2: e = 1 to 4 ;3: l = 1 to BucketNum ;4: bucket[e][l] = 0e,l ;5: for j = 1 to n do6: fij ← fingerprint(aj ‖ vij ‖ i) ;7: P1 ← hash(k3, aj ‖ vij ‖ i) ;8: P2 ← hash(k3, aj ‖ vij ‖ i)⊕ hash(k4, fij) ;9: if bucket[e][P1] or bucket[e][P2] has fij then

10: j + +;11: end if12: return Failure ;13: end for14: return True ;15: end function

Algorithm 6 Verify integrity Ansα.Input : All tuples t in Response AnsαOutput : ⊥, Integrity

1: function VERIFYANSALPHA(t)2: for all tuples t ∈ Ansα do3: if Test(ti) 6= True then4: return ⊥;5: end if6: end for7: return Integrity8: end function

50

In the relation Rβmn a message authentication code Mβ

r is associated with eachtuple of the relation Rβ

mn, the main objective is to checks whether the contents ofthe tuples in Ansβ are not modified. If any of the the tuples in Ansβ are modifiedthen, the computed message authentication code on the tuple will not match theattribute MAC. If the computed value of Mβ

r does not match with the attributeMAC for any tuple in Ansβ then an integrity violation has been detected. TheAlgorithm 7 shows the integrity verification process by response Ansβ.

Algorithm 7 Verify integrity Ansβ .Input : All tuples t in Response AnsβOutput : ⊥, Integrity

1: function VERIFYANSBETA(t)2: for all tuples t ∈ Ansβ do3: S

′ ← Rβmn(tr, Name) ‖ Rβ

mn(tr, V alue) ‖ Rβmn(tr, Bitmaps) ‖ Rβ

mn(tr, NonceB);4: M

′r ←MAC(k1, S

′);

5: if M ′r 6= Mβ

r then6: return ⊥;7: end if8: end for9: return Integrity

10: end function

Bitmaps operation Ansβ.The next step is to computed the operation ∆ between bitmaps of Ansβ. A betterdescription is shown in the algorithm 8. It’s important to mention that the oper-ations between bitmaps are only the operations that are specified in the clausesof the query Q. Let ⊗ a set of logical operations specified in the clauses of thequery Q such that ⊗ ⊆ ∆.

Algorithm 8 Bitmaps operation Ansβ .Input : all tuples t ∈ AnsβOutput : U

1: function BITOPERATION(t)2: U ← Rβ

mn(tr, Bitmaps);3: for all tuples t ∈ Ansβ do4: U ← U ⊗Rβ

mn(tr, Bitmaps);5: end for6: return U ;7: end function

Verify integrity AnsγIn the transformed relation Rγ

mn a message authentication code Mγq is associated

with each tuple of the relation Rγmn, the main objective is to checks whether the

contents of the tuples in Ansγ are not modified. If any of the the tuples in Ansγ

51

are modified then the computed message authentication code on the tuple willnot match the attribute MAC. If the computed value of Mγ

q does not match withthe attribute MAC for any tuple in Ansγ then an integrity violation has been de-tected. The Algorithm 9 shows the integrity verification process by response Ansγ.

Algorithm 9 Verify integrity Ansγ .Input : All tuples t in Response AnsγOutput : ⊥, Integrity

1: function VERIFYANSGAMMA(t)2: for all tuples t ∈ Ansγ do3: S

′ ← Rγmn(tr, Name) ‖ Rγ

mn(tr, List) ‖ Rγmn(tr, Array) ‖ Rγ

mn(tr, NonceC);4: M

′q ←MAC(k1, S

′);

5: if M ′q 6= Mγ

q then6: return ⊥;7: end if8: end for9: return Integrity

10: end function

Bitmaps operation Ansγ.The next step is to calculated the operation ∆ between bitmaps of Ansγ. It’s im-portant to mention that the operations between bitmaps are only the operationsthat are specified in the clauses of the query Q. Let ⊗ a set of logical operationsspecified in the clauses of the query Q such that ⊗ ⊆ ∆. To compute the ⊗operation we should first recover the bitmaps that satisfy the condition usingthe algorithms 10, 11, 12 (Rodríguez-Henríquez, 2015). Then, the ⊗ operationis calculated using the output of the previous algorithms, a better description isshown in the algorithm 13.

Algorithm 10 Lesser.Input : L,Arr, a, vOutput : B

1: function LESSER(L,Arr, a, v)2: Find i schut that L[i] = v;3: for j ← 1 to n do4: if Arr[j] > i or Arr[j] == 0 then5: B[j]← 0;6: end if7: else B[j]← 1;8: end for9: return B;

10: end function

Bitmaps operation.After recovering the bitmap forAnsβ and Ansγ, we calculated the ⊗ operation

52

between bitmaps of Ansβ and Ansγ.The result obtained when performing thisoperation is another bitmap that contains the indexes of the tuples that satisfythe original query.

Algorithm 11 Equal.Input : L,Arr, a, vOutput : C

1: function EQUAL(L,Arr, a, v)2: Find i schut that L[i] = v;3: B ← LESSER(L,Arr, a, v);4: if i = |Dom(a)| then5: C ← B;6: end if7: else8: B

′ ← LESSER(L,Arr, a, L[i + 1]);9: C ← B ⊕B

′;

10: return C;11: end function

Bits positionIn this part, we calculate the position of the bits to 1 to get the index of corre-sponding tuples on the relation. Algorithm 14 describes the process in more detail.

Algorithm 12 Greater.Input : L,Arr, a, vOutput : C

1: function GREATER(L,Arr, a, v)2: B ← LESSER(L,Arr, a, v);3: B

′ ← EQUAL(L,Arr, a, V );4: C ← B ⊕B

′;

5: return C;6: end function

ComparisonA Comparison is made between the NonceA attribute and the bits position ob-tained previously, if the comparison process is successful then the Ansα responseis completeness. In the Algorithm 15 we can see this development.

4.4 Security AnalysisAccording to (Katz and Lindell, 2014) modern cryptography is intend to providing rig-orous proof that a given cryptographic scheme is secure. There are different approachesthat have been proposed in the literature to this end, among them provable security

53

Algorithm 13 Bitmaps operation Ansγ .Input : all tuples t ∈ AnsβOutput : U

1: function BITOPERATION(t)2: U ← B or C;3: for all tuples t ∈ Ansβ do4: U ← U ⊗B or C;5: end for6: return U ;7: end function

Algorithm 14 Bits position.Input : UOutput : V

1: function BITPOSITION(U )2: cont← 0;3: for i← 0 to size(U) do4: if U [i] == 1 then5: V [cont]← i;6: cont← cont + 1;7: end if8: end for9: return V

10: end function

Algorithm 15 Comparison.Input : V , all tuples t in Response AnsβOutput : completeness

1: function COMPARISON(V )2: cont← 0;3: for i← 0 to size(V ) do4: for all tuples t ∈ Ansα do5: if V [i] == Rα

mn(t,NonceA) then6: W [cont]← V [i];7: cont← cont + 1;8: end if9: end for

10: end for11: longw ← size(W );12: longV ← size(V );13: if longw == longv then14: return completeness;15: end if16: end function

54

ID NAME GENDER LEVEL AGE WORKING YEARSGLO Omar M L1 30 18LRL Laura F L3 25 15FJO Oscar M L2 27 21VBR Rosy F L1 26 30MRB Betty F L1 25 25


(Katz and Lindell, 2014), symbolic approach (Cortier, 2009), and universal compos-ability (Canetti, 2000). In this thesis, CLIC-ODB security is analyzed using provablesecurity under the random oracle model. The first step under this approach is to havea security notion that captures the security services that the scheme in question aimsto bring. In this work, we will evaluate CLIC-ODB as an RDAS scheme (Rodríguez-Henríquez, 2015). This is possible since CLIC-ODB performs under the threat modelconsidered by an RDAS scheme, and also its main goal is to solve the authenticatedquery processing problem. In other words, two entities are considered: the client andun-trusted server. The former outsources his databases to the second, the server storesthe databases and perform queries on behalf of the client. Furthermore, the server isconsidered as the main opponent.

In section 4.5.1, basic attacks on the authenticated query process are discussedto motivate the security notion under the CLIC-ODB scheme is being evaluated. InSection 4.5.2, the security notion is introduced.

4.4.1 Some Basic AttacksIn this section, we characterize the basic attacks to the authenticated queries processing,all of them can be classified as correctness or completeness violation.

Correctness Violation

A correctness violation implies that the server is capable of modifying one or more cellsof its choice and still pass the verification process. To this end, the server can try toreuse the outsourced data that was given by the client. Two main techniques arise:columns scrambling and rows scrambling.Columns scrambling. In general, column scrambling consists of swapping two columns’values in a response’s tuple. As an example consider the Employee relation with anextra column named WORKING YEARS shown in Table 4.7.

Consider that the client executes the following query:

SELECT NAME, AGE FROM Employees WHERE LEVEL=’L3’;

If the server follows the protocol, he must answers with only one tuple (Laura, 25).However, if the server deviates from the protocol, it can attack the integrity based onthe data that is stored in the database. Swapping two values of any two attributes,such as the value 25 of the AGE attribute with the value F of the GENDER attributecorresponding to Laura’s tuple shown in red in Table 4.7. Then, the response issuedby the server looks like (Laura, F ). This answer contains values that indeed are inthe database and that are correct. However, they do not correspond to the correct

55

and complete response. In this particular case, the client can judge the response aswrong by simple observation since the value ’F’ does not have any relation with the AGEdomain. Nevertheless, when the server modifies the value of two attributes that sharethe same domain or part of it, this incorrect answer cannot be detected with the nakedeye. For example, if the server swaps the AGE’s value with WORKING YEARS’ value,i.e. the response issued by the server looks like (Laura,15), there is not an easy wayto detect the integrity violation. Thus, the proposed scheme must prevent this attack.Rows scrambling. Rows scrambling consists of swapping two rows’ values for thesame attribute in a response. For example, consider the Employees relationship shownin Table 4.7. If the client poses the query below:

SELECT NAME, AGE FROM Employees WHERE LEVEL=’L1’;

Then, the correct answer is (Omar, 30), (Rosy, 26), (Betty, 25). However, the servercan swap the values L2 with L1 shown in red in Table 4.7 corresponding to Oscar andRosy tuples respectively. Thus, the server response will look like (Omar, 30), (Oscar, 27),(Betty, 25), which is an incorrect. This selection of tuples is because the level of Oscarhas change to L1 instead of L2 and the level of Rosy has change to L2 instead of L1.

Completeness Violation

A completeness violation occurs when the server is capable of adding or omitting tuplesin the response that satisfies the query and still pass the verification process. To mountthis attack, the server can use two techniques: Add rows or Skip rows.Add rows. Adding rows consists in including tuples in the response that do not existsin the original outsourced database. For example, consider the Employees relationshown in Table 4.7. When the client poses a query, such as:

SELECT NAME, LEVEL FROM Employees WHERE AGE>’26’;

The correct answer is (Oscar, 27)(Omar, 30). However, the server can add a tuple(Juan, 28) that does not exist in the original database. Thus, the response issued bythe server looks like (Oscar, 27), (Omar, 30), (Juan, 28).Skip rows. Skip rows consist of omitting tuples in the response that satisfies thequery. Consider the query of the previous section, but now, the server answers with(Omar, 30) instead of (Oscar, 27)(Omar, 30). Then the server is skipping the(Oscar, 27) tuple that also meets the conditions of the query. In other words, theresponse is incomplete.

In this section, the most relevant techniques to compromise the responses of queriesare described. However, list all the techniques that an attacker could use to violate thescheme is not possible. Due to this fact, the security notion must capture the essenceof the attack and not the explicit technique used to mount the attack.

4.4.2 Security ModelCLIC-ODB is a scheme that works in static databases, for a specific relation Rmn anda query Q, there exists a unique answer that is correct and complete. In other words,

56

it is not possible for a query to have two answers Ans and Ans′ that can be consideredaccurate. Based on RDAS security model (Rodríguez-Henríquez and Chakraborty,2013), the following interactions in CLIC-ODB between the client and the server arepossible:

CLIC-ODB transform Rmn to R′mn in such a way that if the query φ(Q) is sent tothe server, then the answer Ans should be recoverable from the server response Ansψthrough the verification process with the condition that the server follows the protocolcorrectly. Otherwise, if the server is malicious and sends a response Ans′ψ distinct fromthe correct response Ansψ then the verification process should reject the response byoutputting ⊥. According to the previous conditions, after running the protocol, theverification process will either produce Ans or ⊥, it would not produce an answer Ans′

distinct from Ans.In a provable security approach, the idea is to simulate the system behavior con-

sidering an adversary and a challenger. On the one hand, the adversary aims to breakthe CLIC-ODB scheme by using one of the techniques discussed in the previous sectionor any other. On the other hand, the challenger aims to simulate the server and de-cide if the adversary is successful or not in the attack. The interaction between thesetwo entities follows this set of actions: first, the adversary chooses the Rmn relation inwhich he wants to be evaluated, then the challenger must provide him with R

′mn, to

accomplish this task the challenger has two random oracles to query, one to computethe hash functions of the filter and another to compute the MACs. This interaction isillustrated in Figure 4.3.

𝑅𝑚𝑛

𝑅𝑚𝑛𝛼 , 𝑅𝑚𝑛

𝛽, 𝑅𝑚𝑛

𝛾

𝑄

𝑄𝛼, 𝑄𝛽 , 𝑄𝛾

𝐴𝑛𝑠

𝐹 𝑅𝑚𝑛 = 𝑅𝑚𝑛′

𝑅𝑚𝑛𝛼 , 𝑅𝑚𝑛

𝛽, 𝑅𝑚𝑛

𝛾

𝜙 𝑄 =

𝑄𝛼 , 𝑄𝛽 , 𝑄𝛾

Challenger

𝑅𝑚𝑛 ∈ ℝ

𝑄 ∈ ℚ

Adversary

Figure 4.3: Security Model

In this security model the adversary is allowed to choose Rmn of primary set ofrelations R. Given this choice of Rmn, the adversary sends Rmn to the challenger. Then,F (Rmn, K) → (Rα

mn, Rβmn, R

γmn) is compute by the challenger, for a randomly selected

set of keys K which is unknown to the adversary. The challenger give (Rαmn, R

βmn, R

γmn)

to the adversary. The adversary chooses a query Q of set queries Q and sends tothe challenger, after the challenger provides the adversary with Qα, Qβ, Qγ, finally,the adversary outputs a response Ans, and it say that the adversary is successful ifverification process for AnsA produces a different response to ⊥, Ans. The abovedescribed is captured in the following security notion.

Definition 4.4.1 Let SCLIC−ODB be the event that a specific adversary A is successful

57

under RDAS security model. We say that a CLIC-ODB is (ε, t) − secure if for anyadversary A which runs for time at most t Pr[SCLIC−ODB] ≤ ε.

This notion was built on the premise that for each query Q there is a single correctanswer Ans. This means that:

1. Ans response must not be altered in any way (integrity).

2. Ans must have exactly the tuples that satisfy the conditions in the WHERE clauseof the query Q (Completeness).

Thus, this security notion covers both integrity and completeness.It should also be noted that the security notion described in Definition 4.4.1 follows

concrete security. Namely, that the security of the scheme lies on the security of thecryptographic primitives used on its construction. Due to this dependency, there is avery small probability ε where the adversary can circumvent the scheme in a reasonabletime t.

4.4.3 CLIC-ODB SecurityCLIC-ODB was designed based on two cryptographic primitives: the cuckoo filter andthe message authentication code HMAC. These primitives allow CLIC-ODB to providecorrectness and completeness, respectively. The filter brings integrity at cell granularity,and the MAC provides integrity to the bitmaps structures that bring completeness.Therefore, CLIC-ODB security is reduced at the filter’s and MAC’s security.

The adversary may attempt to violate correctness or completeness to break CLIC-ODB. Next, how CLIC-ODB prevents the attacks described in Section 4.4.1 is discuss.Finally, a sketch of the CLIC-ODB security theorem is presented.

Correctness Violation

To break the correctness the adversary must make changes in one or more cells of Rα

and still pass the verification process. As it was mentioned Rα includes a cuckoo filterby tuple. Now consider the following cases:

1. Include a new value: the adversary must change an attribute value Rmn(ti, aj) byother value that does not exist in the original database R′mn(ti, aj) /∈ Rmn. Tosucceed in the verification process the adversary also needs to include the newvalue in the filter R′mn(ti, aj) ∈ Fi without knowing the set of keys that the clientuse.

2. Include an old value: the adversary must change an attribute value Rmn(ti, aj)by other value that does exist in the original database R′mn(ti, aj) ∈ Rmn. Thismeans apply either the scrambling rows or the scrambling columns. However, tosucceed in the verification process the adversary must include the value in thefilter R′mn(ti, aj) ∈ Fi or create a new filter F ′i from those already existing in thedatabase.

58

Completeness Violation

On the other hand, to violate the completeness, the adversary must change the respec-tive bitmaps in Rβ

mn and Rγmn which also implies forging the respective MACs.

1. Add rows: the adversary wants to add a tuple in the response of a given queryQ. To do so, the adversary must include the new tuple in the database and becapable of calculate its respective filter (see correctness violation). Moreover, itneeds to change the involved bitmaps and generate their respective tags, in orderto succeed in the verification process. Namely, forge the MAC.

2. Skip rows: the adversary omits a tuple that meets the WHERE conditions associatedwith the query Q. However, to succeed in the verification process the adversarymust modify the bitmaps stored in Rβ

mn or Rγmn and substitute the corresponding

tags of these bitmaps. This is equivalent to be capable of forge the MAC.

After this analysis it is possible to conclude that CLIC-ODB’s security can be re-duced to the security of the Filter and the MAC. In other words, if there exists anadversary capable of breaking CLIC-ODB either the filter or the MAC is not secure.Now, we formalize the above.

Consider an adversary A attacking CLIC-ODB in the sense of Definition 4.4.1. LetA choose a relation Rmn with m tuples and the relation be such that the transformedrelation Rβ

mn and Rγmn containsm′ and m′′ tuples, respectively. Then there exists an ad-

versary B attacking the filter and an adversary C attacking the message authenticationcode MAC such that

Pr[SuccA] ≤ Pr[Bforges(filter)] + Pr[Cforges(MAC)]

Also, B asks at most 3 ·m · n queries to its oracle. C ask at most m′ + m′′ queries toits oracle.

4.5 Cost AnalysisRemember that a essential characteristics of cloud computing is on demand self service,thus have control over the costs of this cryptographic scheme is very important for theclient because this can take decisions in the future that do not affect your economy.

4.5.1 StorageCost of Client

Next we establish the notation to define the storage cost of the client.

Kf : Key to generate the fingerprint.

59

Ki : Key to generate the hash of item.

Khf : Key to generate the hash of fingerprint.

Kβ : Key to generate th MAC in Rβmn.

Kγ : Key to generate the MAC in Rγmn.

Thus

ClientCost(Storage) = size(Kf ) + size(Ki) + size(Khf ) + size(Kβ) + size(Kγ)

Let τ = size(Kf ) = size(Ki) = size(Khf ) = size(Kβ) = size(Kγ) then

ClientCost(storage) = 5τ

Cost of Server

Let m, n are the tuples and attributes respectively of the relation Rmn then size of Rmn

is

size(Rmn) =m∑i=1

n∑j=1

size(vij)

When the transform F was applied to the relation Rmn, three relations were pro-duced Rα

mn, Rβmn and Rγ

mn.

Rαmn has two more attributes than Rmn, NonceA, CF and both Rα

mn and Rmn havethe same tuples number. Let

size(NonceA) =m∑i=1

size(viNonceA)

size(CF ) =m∑i=1

size(viCF )

thus the size of Rαmn is

size(Rαmn) = size(Rmn) + size(NonceA) + size(CF ).

The relation Rβmn contains five attributes and N tuples number then the size of Rβ

mn is

size(Rβmn) =

N∑i=1

5∑j=1

size(vij).

Rγmn has five attributes and q tuples number thus the size of Rγ

mn is

size(Rγmn) =

q∑i=1

5∑j=1

size(vij).

60

Then the storage of server was determined by

ServerCost(storage) = size(Rαmn) + size(Rβ

mn) + size(Rγmn).

Finally, the total storage cost for the cryptographic scheme was computed by

TotaCost(storage) = ClienCost(storage) + ServerCost(storage)

= 5τ + ServerCost(storage).

4.5.2 BandwidthThe network usage depends on type and number operations executed on the database.The select an insert operations were considered in this cryptographic scheme.

Outgoing Bandwidth

Select operation affect only the outgoing network usage because they only fetch datafrom the server.

The response of the server was denoted by Ansψ = (Ansα, Ansβ, Ansγ) where Ansα,Ansβ and Ansγ corresponds to response of Qα, Qβ and Qγ respectively.

The relation Ansα has as domain the attributes that satisfies the queries conditionand two attributes, NonceA and CF are necessary for the verification process.

Let m0, n0 the tuples and attributes respectively that satisfies he queries condition.If

size(NonceAα) =

m0∑i=1

size(viNonceA)

size(CFα) =

m0∑i=1

size(viCF )

then

size(Ansα) =

m0∑i=1

n0∑j=1

size(vij) + size(NonceAα) + size(CFα).

Ansβ has the same attributes number that Rβmn and let N0 the tuples that satisfies

the clauses WHERE in Qβ, then the size of Ansβ is

size(Ansβ) =

N0∑i=1

5∑j=1

size(vij).

61

Let q0 the tuples number for Ansγ that satisfies the clauses WHERE in Qγ. SimilryAnsγ has the same attributes that Rγ

mn, i this way

size(Ansγ) =

q0∑i=1

5∑j=1

size(vij).

In consecuense the outgoing bandwidth was determined by

OutCost(bandwidth) = size(Ansα) + size(Ansβ) + size(Ansγ).

Ingoing Bandwindth

The insert operation affect only the ingoing network usage because they push to theserver. The ingoing bandwidth was determined by

InCost(bandwidth) = size(Rαmn) + size(Rβ

mn) + size(Rγmn).

The total bandwidth cost to cryptographic scheme was computed by

TotalCost(bandwidth) = InCost(bandwidth) +OutCost(bandwidth).

4.5.3 ProcessingCost of Server

Note that the server does not perform external processes to the original function, theproposed scheme does not affect the processing cost T0 which is a great advantage forthe client in terms of cost savings, then

ServerCost(processing) = T0.

Cost of Client

The verification process was realized by the client, in this process it is necessary to verifythe integrity of Ansα, Ansβ and Ansγ and compute a number of logical operations.

The lookup process of a cuckoo filter is the next, given an item x, the algorithmfirst calculates x fingerprint and two candidate buckets according to Eq. 4.2 and 4.3.Then, these two buckets are read: if any existing fingerprint in either bucket matches,the cuckoo filter returns true, otherwise the filter returns false.

If Tα is the operations number for lookup a item x in Cuckoo filter then

numOper(Ansα) = (m0)(n0 + 1)(Tα).

If operations number for computed the MAC in Ansβ was determined by Tβ then

numOper(Ansβ) = N0 ∗ Tβ

62

The logical operations that was realized to recover of bitmap in Ansβ is

numOperLog(Ansβ) = N0 − 1

For Ansγ was defined Tγ as the operations number for computed the MAC in Ansγthen

numOper(Ansγ) = q0 ∗ TγThe logical operations that was realized to recover of bitmap in Ansγ is: Let

T1 size of array.

T2 is the number of logical comparisons.

T3 number of logical operations in the WHERE.

T4 number of logical operations to recovering the finally bitmap.

numOperLog(Ansγ) = 2(T1 ∗ T2)(T3) + T3 + T4

Therefor, the cost of client is

CostClient(processing) = numOper(Ansα) + numOper(Ansβ) + numOper(Ansγ)

+numOperLog(Ansβ) + numOperLog(Ansγ)

4.6 DiscussionThis chapter contributes an explanation of how to use the cuckoo filters and bitmapsto provide integrity and completeness to the database. Subsequently, the design of theproposed scheme was presented in detail and each algorithm that composes it. Then, asecurity analysis was carried out, where some basic attacks of violation of correctnessand completeness are discussed. In particular, the security model was given using theprovable security approach, where the idea is to simulate the behavior of the systemconsidering an adversary and a challenger. This chapter ends with an analysis of costssuch as storage, bandwidth, and processing, which are of great importance for cloudcomputing since one of the characteristics of this paradigm is the payment on-demandservice.

63

Chapter 5

Implementation

In this chapter, we present in detail the experiments and results obtained from thisresearch work. In Section 5.1, we specify the information of the computer tools and thedatabase used to perform the experiments. In Section 5.2 we show the development ofthe experiments and report the results obtained.

5.1 System InformationWe present the system configurations that were used for developed experiments andtests.

• CPU: CoreTM i7-7500U Intel R© processor 2.70GHz × 4

• OS: Ubuntu 18.04.2 LTS

• Databases Manager: PosgreSQL 10.9

• Compiler: gcc 7.4.0

• OpenSSL 1.1.1

PostgreSQL was installed on the machine to store and manage the data set locally.Also installed pgAdmninIII as a graphic interface. The installation processes is shownin Appendix A.

Data SetThe data set was used to test performance is Census-Income (Frank and Asuncion,2010). This data set contains weighted census data extracted from the 1994 and 1995Current Population Surveys conducted by the U.S. Census Bureau. The data set con-tains 41 demographic and employment related attributes, 11 are numerical and 30 arealphanumerical. The number of tuples in this data set is 199522. After selecting thedatabase, the next step was to upload the data in PostgreSQL. For developing this task,the process described in Appendix A.2 was performed.

64

5.2 Experimental ResultsIn order to observe the performance of the proposed scheme, a set of tests was carriedout on a database described above. The experiment 1 consists in analyzing the processthat the scheme needs to prepare the data. Experiment 2 was performed to executequeries to the server, in this experiment the performance of the proposed scheme isshown, and the results obtained are discussed. The idea of experiment 3 is to show thecorrect functioning of CLIC-ODB, for this purpose we carry out attacks on the databaseintentionally. The objective of the fourth experiment is to test the cryptographic schemein a database stored in Amazon Web Services (AWS) and see the performance of thescheme.

5.2.1 Experiment 1: Data SettingsCLIC-ODB uses the cuckoo filter as a structure that provides the integrity service. Forthe construction of the filter we use the HMAC cryptographic primitive and a non-cryptographic function named NoMAC. The objectives of this experiment are showthe stage where the client prepares the data and also to analyze the performance ofCLIC-ODB with HMAC and CLIC-ODB with NoMAC function.

The first part of this experiment consists in analyzing the performance of CLIC-ODB where the cuckoo filter uses the NoMAC function. To perform this experimentCLIC-ODB generates the corresponding structures for Rα and Rβ, which are storedin cloud database. Subsequently, a certain number of tuples of the local database arerandomly selected to be inserted into the cloud database.

In Table 5.1 the first column contains the number of tuples, the next two columnsshow the time (in seconds) to generate the structures Rα and Rβ. Subsequently, weshow the time to insert the tuples in Rβ. The next columns shows the time to insertRα using the NoMAC function. Immediately, we present the total time that the CLIC-ODB needs to prepare the data using the NoMAC function. Then, the next columnshows the time to insert Rα using the HMAC. Finally, we show the total time that theCLICL-ODB required to prepare the data using the HMAC.

Tuples Time (s)Structure

Rα

StructureRβ

InsertRβ

NoMACInsert Rα

NoMACTotal

HMACInsert Rα

HMACTotal

10000 0.05 0.01 2.70 96.42 99.20 107.47 110.2425000 0.05 0.01 6.77 240.82 247.66 266.48 273.3250000 0.05 0.01 13.55 478.99 492.61 533.00 546.61

100000 0.05 0.01 28.19 956.92 985.18 1066.30 1094.56125000 0.05 0.01 33.86 1206.50 1240.43 1352.50 1386.43150000 0.05 0.01 40.64 1448.70 1489.41 1560.40 1601.10199522 0.05 0.01 54.06 1955.30 2009.42 2100.34 2154.46

Table 5.1: NoMAC vs HMAC Times

65

The second part of this experiment consists in analyzing the performance of CLIC-ODB where the cuckoo filter uses the cryptographic primitive HMAC. This second partwas done in a similar way to the first part, with the difference that the scheme uses theHMAC to insert the tuples in Rα. In Table 5.1 the results obtained when performingthe tests in this second part are shown.

If we compare the NoMAC Total column with HMAC Total column in the Table5.1, we can see that CLIC-ODB with the NoMAC function has a better performancein time than CLIC-ODB with HMAC. The graph in Figure 5.2.1 illustrates in blue thetime for CLIC-ODB with NoMAC function and in orange the time for CLIC-ODB withHMAC.

0.00

300.00

600.00

900.00

1,200.00

1,500.00

1,800.00

2,100.00

10,000 25,000 50,000 100,000 125,000 150,000 199,522

Tim

e (s

)

Tuples

NoMAC HMAC

In the graph we can see that to insert 199522 rows (all database) CLIC-ODB requiredan average time of 2009.42 seconds for NoMac function and 2154.46 seconds for HMAC,if we see it in minutes it is approximately 33.49 minutes for NoMAC function and 35.90for HMAC. But, it is important to mention that this process only takes place once, sowe can think that it is a reasonable time.

On the other hand, it is necessary to analyze the false positive rate generated byCLIC-ODB with NoMAC function and CLIC-ODB with HMAC. To perform this testwe modified a number of cells in Rα and subsequently performed the membership teststo detect the number of false positives generated by CLIC-ODB with NoMAC function.

If MC the number of modified cells and FP the number of false positives then thefalse positive rate (FPR) is given by the following equation:

FPR =FP

MC

In Table 5.2 we present the results obtained for CLIC-ODB with NoMAC functionand CLIC-ODB with HMAC. The first column shows the number of cells that wemodify in Rα, in the next column we obtain the number of cells that do not pass

66

Modifiedcells

Non-verification

cells

NoMAC HMAC Percent (%)Falsepositives

False positiverate

Falsepositives

False positiverate

798088 798084 4 5.01× 10−6 1 1.25× 10−6 101795698 1795684 14 7.79× 10−6 1 5.57× 10−7 202593786 2593756 30 1.16× 10−5 2 7.71× 10−7 303391874 3391839 35 1.03× 10−5 2 5.90× 10−7 404389484 4389435 49 1.12× 10−5 3 6.83× 10−7 505187572 5187515 57 1.10× 10−5 4 7.71× 10−7 605985660 5985597 63 1.05× 10−5 4 6.68× 10−7 706783748 6783682 66 9.73× 10−6 5 7.37× 10−7 807781358 7781287 71 9.12× 10−6 6 7.71× 10−7 908579446 8579370 76 8.86× 10−6 6 6.99× 10−7 100

Table 5.2: False Positives

the membership test, then we present the number of false positives obtained from thedifference between the first and second column for the NoMAC function, later we showthe false positive rate with NoMAC function. The same manner we present the falsepositives and false positive rate for the HMAC. Finally, the percentage of the databasethat was modified is given.

Similary, we modified a number of cells in Rα and subsequently performed themembership tests to detect the number of false positives generated by CLIC-ODB withHMAC. In table 5.2 we show the results obtained for CLIC-ODB with HMAC.

As a result of this test, we can see in NoMAC column in the Table 5.2 that thefalse positive rate is higher for CLIC-ODB with NoMAC function and lower for theCLIC-ODB with HMAC (Table 5.2). The safety of CLIC-ODB depends on the falsepositive rate, if the false positive rate is higher then the probability that an adversarywill succeed is higher. For this reason, we decided to work on the following experimentswith CLIC-ODB with HMAC.

5.2.2 Experiment 2: Queries ProcessThis experiment is performed using the set of queries presented in Table 5.3. In Table5.3 we describe five queries, for each query we define the number of conditions that aresatisfied in the WHERE clause, we also mention the logical operator, the number ofrequested attributes and finally the number of tuples that satisfies each query.

67

Query Conditions Type of query Attributes Response Size (Tuples)

Q19 OR 10 201159 OR 20 201159 OR 30 20115

Q219 OR 10 3545219 OR 20 3545219 OR 30 35452

Q325 OR 10 9278125 OR 20 9278125 OR 30 92781

Q43 OR, AND 10 40163 OR, AND 20 40163 OR,AND 30 4016

Q53 OR, AND NOT 10 123643 OR, AND NOT 20 123643 OR, AND NOT 30 12364

Table 5.3: Querys Description

The objective of this experiment is to perform the authenticated query process andmeasure the times used in each stage of the scheme.

To carry out this experiment we generated five queries, the Q1 − Q2 queries havespecific characteristics that were mentioned in Table 5.3. Also, we randomly select 10,20 and 30 attributes for each query, the attributes of each query are different. For eachquery we measure the following times:

1. The time that φ Transform uses to prepare the queries Qα and Qβ for Rα and Rβ

respectively.

2. The time required by the server to respond the queries using the ψ Transform.

3. The time that the scheme needs to complete the verification process in Ansα andAnsβ.

4. The time required to perform the logical operations.

5. Finally, we obtained the total time for CLIC-ODB.

In Table 5.4 we show the results obtained for each query respectively.The times shown in Transform φ column in Table 5.4 belong to the response time

of the server. As we can see the cost of time for the server is very small. This happensbecause the queries generated by the Transform φ are transparent to the server, i.e.the server does not need to perform any extra function to answer the client’s queries.The times shown in the Transformed φ column and the Verification Process column areperformed on the client side. In the verification process, as we can see in the Table 5.4,the time to verify the integrity of Ansα is longer than the time to verify the integrityof Ansβ.

68

Query AttributesTransform φ Transform ψ Verification Process Logical

operatorTotal

Rα + Rβ Rα Rβ Ansα Ansβ10 2.5913× 10−5 0.1262 0.0134 0.7078 0.0052 0.011 0.86

Q1 20 4.9825× 10−5 0.1558 0.0079 1.3297 0.0057 0.0130 1.5130 4.6353× 10−5 0.1793 0.0088 1.9496 0.0051 0.0126 2.1610 9.8549× 10−5 0.1797 0.0139 1.2780 0.0118 0.0240 1.51

Q2 20 9.5512× 10−5 0.2188 0.0196 2.0748 0.0098 0.0199 2.3430 9.9575× 10−5 0.2975 0.0138 3.0217 0.0098 0.0231 3.3710 1.2494× 10−4 0.3616 0.0165 2.9435 0.0131 0.0286 3.36

Q3 20 1.3389× 10−4 0.5569 0.0224 5.4317 0.0132 0.0316 6.0630 1.3165× 10−4 0.6703 0.0168 7.9250 0.0133 0.0315 8.6610 2.9478× 10−5 0.0579 0.0084 0.1276 0.0017 0.0029 0.20

Q4 20 3.0909× 10−5 0.0703 0.0107 0.2244 0.0017 0.0031 0.3130 3.6566× 10−5 0.0704 0.0084 0.3420 0.0017 0.0032 0.4310 3.3685× 10−5 0.0859 0.0087 0.3921 0.0010 0.0035 0.49

Q5 20 2.5942× 10−5 0.1116 0.0077 0.7220 0.0010 0.0032 0.8530 3.1210× 10−5 0.1278 0.0079 1.0610 0.0010 0.0031 1.20

Table 5.4: Authenticated Queries Process Times

Table 5.4 shows the time for Q3, which is the highest time of all queries, this isbecause the number of conditions in this query is greater with respect to any queryshown in the Table 5.3. From this experience we can conclude CLIC-ODB allows veryvaried queries and the performance of CLIC-ODB will depend on these queries.

5.2.3 Experiment 3: Database attacksThe objective of this experiment is to show the correct performance of CLIC-ODB.To prove that the scheme works correctly, we intentionally attack the database in thecloud.

We modified a certain number of cells in Rα and Rβ intentionally. First, we modifythe cells in Rα and not in Rβ, then we carry out the verification process to detectthe modified cells. The verification process ends when it finds the first cell modifiedin Rα and does not perform the verification process for Rβ and does not recover thebitmap. Subsequently, we modified the cells in Rβ and not in Rα, then we carry outthe verification process to see if the scheme detects that a cell has been modified.

In the Table 5.5 the results obtained when performing tacks on Rα are shown. In thefirst column we present the number of tuples that were affected, in the next column weshow the attributes that were modified, the third column, the number of affected cellsare presented, then the number of detected cells is shown and finally they are shownthe false positives.

For Rβ, some different data were analyzed, in this test, the number of tuples thatwere modified is counted and then the number of tuples is detected where the verificationprocess for the integrity failed. In Table 5.6 the results obtained when performing tackson Rβ are shown.

69

Modify Tuples Modify Attributes Modify Cell Detected Cells False Positives10000 10 100000 100000 010000 20 200000 200000 010000 30 300000 300000 0

Table 5.5: Modify Cells Rα

Modify Tuples Modify Attributes Modify Cell Detected Tuples100 2 200 100100 3 300 200100 4 400 300

Table 5.6: Modify Cells Rβ

5.2.4 Experiment 4: CloudThe objective of this experiment is to measure the performance of the scheme only toprovide integrity in the database that is stored in the cloud.

For this experiment, the client was interpreted by a computer with the specificationsmentioned in section 6.1. The server was interpreted by a instance that was createdin Amazon Web Services (AWS). AWS provides a Relational Data Bases (RDS) whichallow to store a data set in cloud. The databases instance was configured to use Post-greSQL as a databases manager [see Appendix B].

The next we perform the connection between the client and the server, sometimes theconnection is complicated, therefore, configurations were made to achieve this objectiveand were showed in Appendix c.

The experiment consists in selecting a certain number of tuples in a base storedlocally and inserting these tuples in a base that is stored in the cloud.

In Table 6.2, we can see the number of tuples that were inserted during the exper-iment. The following column shows the times that were obtained during the insertionwithout the scheme. The column shows the times obtained during the insertion withthe scheme. The next column has the difference from the previous times. In the lastcolumn, the percentage of the difference with respect to the insertion time withoutscheme is given.

If we analyze the results in Table 5.7 we can see that the times obtained in the column"Without scheme" are less than the times obtained in the column "With scheme" wheninserting 1 to 25,000 tuples, which makes sense because in the insertion with the schemethey are inserting two extra attributes. But, when we insert 50,000 tuples, the timeobtained in the insertion with the scheme is less than the time obtained in the insertionwithout scheme and this result does not make sense.

As can be seen in the Table 5.7, the times are varied and the data obtained doesnot provide information. This is because in the cloud computing there are severalfactors that can intervene in the client-server communication process. Factors such asnetwork connection, bandwidth, network speed, server response time, free platform,among others can cause a delay in the communication process (Dillon et al., 2010).

70

Tuples Time (s) PercentWithout scheme With scheme Difference1 0.1986 0.2041 0.0055 2.7694

10 1.9578 2.7300 0.7722 39.4422100 19.8221 20.6997 0.8776 4.4274500 97.8689 103.8008 5.9319 6.0611

1000 195.9678 212.3183 16.3505 8.34355000 979.9955 1032.4000 52.4045 5.3474

10000 1959.3000 2498.1000 538.8000 27.499625000 7610.6000 8571.2000 960.6000 12.621950000 18230.0000 11028.0000 -7202.0000 -39.5063

100000 36527.5400 25593.6400 -10933.9000 -29.9333199522 39092.3454 45124.8121 6032.4667 15.4313

Table 5.7: Integrity Cloud Time

To perform tests on cloud computing it is necessary to have a service provider thatprovides the tools and suitable conditions.

5.2.5 DiscussionIn this chapter the implementation of the proposed scheme CLIC-ODB was carried out.The system information was provided as well as the tools that were used to the exper-iments. Also, the characteristics of the database such as the number of rows, columns,and attributes are mentioned, with which it worked. Subsequently, four experimentswere developed. The first consisted of preparing the data to be sent to the server. Inthe second experiment, the client executed a series of queries to the server, and then,the server generated a response that sent back to the client. Next, the client apply theverification process to verify the correctness and completeness of the response. The ideaof experiment 3 was to show the correct functioning of CLIC-ODB for which attackswere realized intentionally by modifying a percentage of data in the databases. Thelast experiment was to test CLIC-ODB on a database stored in Amazon Web Service(AWS) to observe its performance. Most of the experiments were achieved locally sincein the cloud computing as indicated at the end of experiment four there are severalfactors that can intervene in the client server communication process.

71

Chapter 6

Conclusion and Future work

We have discussed several issues involving authenticated query processing. In Chapter4 a new construction that aim to be a better solution in terms of functionality andcosts for this important problem is presented. In chapter 5, we present some resultsregarding to the implementation of CLIC-ODB in relational databases. In this chapter,we summarize the main contributions of our work and, finally we note down someimmediate thoughts to extend this research.

6.1 ConclusionsIn this thesis we have studied the problem of query authentication in outsourceddatabases. Our study includes two directions:

1. We studied the problem from a formal cryptographic viewpoint, we proposed ascheme which provides solution to the problem considering security and efficiencyas designing lines.

2. Finally, we implemented this scheme and generated performance data in a realisticsetting.

Regarding to CLIC-ODB construction we want to point out the following:

1. CLIC-ODB is a new scheme that provides integrity and completeness for staticoutsourced databases based on cuckoo filters, bitmaps, and MACs.

2. CLIC-ODB enhances the state of the art since it provides integrity at cell levelgranularity with a better cost than other proposals. Specifically uses cuckoo filterswhich only require three hash functions and, the should nearest scheme uses bloomfilters which requires k hash functions to be used in outsourced databases.

3. Moreover, cuckoo filters allow three operations; insert, lookup, and delete. Oth-erwise, bloom filters only allow two operations: insert and lookup. This charac-teristic can be exploited to extend the scheme for dynamic scenarios.

72

4. CLIC-ODB provides cell granularity integrity which allows decreasing the verifi-cation object size. However, the scheme is exposed to a small probability to becheated if the modifications introduced by the adversary, then it falls in the falsepositive rate of the filter. Thus, the use of the cuckoo filter is better than bloomfilters since it provides a smaller rate.

5. As mentioned CLIC-ODB security depends on the false positive rate of the filter,furthermore this rate increases with the possible collisions among the elementsthat are introduced. Thus, it is very important to decrease the possibility ofcollisions. In order to achieve this, we tested the fingerprint function proposed inthe literature (NoMAC) and cryptographic MACs, the last one gets a substantialdecrease of the collisions (HMAC) at cost of a small performance difference.

6. One important characteristic of our construction is that for provide completenessanother non-cryptogrphic structure is used named bitmaps, the integrity of thesestructures is provided through MACs instead of filters. This is due to the perfor-mance between these two methods, while filters are more suitable when just partof the tuple is included in the response, the MACs are better when the wholetuple needs to be retrieved. Then, our design is based on the best compromise toget the best performance.

7. In general, the performance of CLIC-ODB verification depends on the kind ofquery that is posed, the number of restrictions on the WHERE clause, the numberof attributes retrieved.

6.2 Future workWe note down some issues of immediate interest which were not treated in this thesis:

1. We have some ideas to extend the proposed scheme to allow a new set of queriessuch as joins queries. Also, we would like to extend the scheme to dynamicscenarios.

2. There are a set of new experiments that we would like to perform in the Cloud,to do so we will require a dedicated service.

3. Finally we want to provide a formal proof of CLIC-ODB security under oraclemodel.

73

Appendix A

PostgreSQL

A.1 Installing PostgreSQLUbuntu’s default repositories contain Postgres packages, so you can install these usingthe apt packaging system. Since this is your first time using apt in this session, refreshyour local package index. Then, install the Postgres package:

Installing pgAdminIII is optional and you can ignore this step, if you don’t need pgAd-min. To install pgAdminIII, use the command:

Now PostgreSQL and pgAdminIII was installed successfully.

A.2 Upload databases in PostgreSQLSometimes the files are showed with extension

• Download databases in file.xlsx

• Change a file.csv with format UTF-8 and delimiters with comma ’,’ or ’;’.

• Open the file.csv in text editor and save as a file.txt

First open the terminal, write the line and password. For example:Then write psql:

74

Next write \l to show the existing databases and \q for exit.

Write \c databasesname for change the database:

75

Appendix B

Amazon Web Services

B.1 Create Databases Instance

B.1.1 Create database

76

B.1.2 Select engine

B.1.3 Choose use case

77

B.1.4 Specify DB details

78

B.1.5 Settings

B.1.6 Configure advanced settingsNetwork and security

79

Databases options

Encryption

80

Backup

Monitoring

Performance insights

81

Log exports

Maintenance

Deletion protection

B.1.7 Click

82

Bibliography

Abiteboul, S., Hull, R., and Vianu, V. (1995). Foundations of databases, volume 8.Addison-Wesley Reading.

Ateniese, G., Burns, R., Curtmola, R., Herring, J., Kissner, L., Peterson, Z., andSong, D. (2007). Provable data possession at untrusted stores. In Proceedings of the14th ACM Conference on Computer and Communications Security, CCS ’07, page598–609, New York, NY, USA. Association for Computing Machinery.

Ausekar, S. R. and Pasupuleti, S. K. (2018). Dynamic verifiable outsourced databasewith freshness in cloud computing. Procedia computer science, 143:367–377.

Blodget, H. (2011). Amazon’s cloud crash disaster permanently de-stroyed many customers’ data. [Online]. Last accessed 2019-07-05,URL:https://www.businessinsider.com/amazon-lost-data-2011-4.

Bloom, B. (1970). Space/time trade-offs in hash coding with allowable errors. Commun.ACM, 13:422–426.

Boldyreva, A., Chenette, N., Lee, Y., and O’neill, A. (2009). Order-preserving symmet-ric encryption. In Annual International Conference on the Theory and Applicationsof Cryptographic Techniques, pages 224–241. Springer.

Bose, P., Guo, H., Kranakis, E., Maheshwari, A., Morin, P., Morrison, J., Smid, M., andTang, Y. (2008). On the false-positive rate of bloom filters. Information ProcessingLetters, 108(4):210–213.

Bradbury, D. (2019). Microsoft azure data deleted because of dns outage. [Online].Last accessed 2019-10-17, URL:https://nakedsecurity.sophos.com/2019/02/01/dns-outage-turns-tables-on-azure-database-users/.

Broder, A. Z. and Mitzenmacher, M. (2003). Survey: Network Applications Of BloomFilters: A Survey. Im, 1(4):485–509.

Canetti, R. (2000). Universally composable security: a new paradigm for cryptographicprotocols. cryptology eprint archive. Report, 67:2000.

Chan, C.-Y. and Ioannidis, Y. E. (1999). An efficient bitmap encoding scheme forselection queries. In ACM SIGMOD Record, volume 28, pages 215–226. ACM.

83

Cortier, V. (2009). Verification of security protocols. In International Workshop onVerification, Model Checking, and Abstract Interpretation, pages 5–13. Springer.

Devanbu, P., Gertz, M., Martel, C., and Stubblebine, S. G. (2003). Authentic datapublication over the internet 1. Journal of Computer Security, 11(3):291–314.

Dillinger, P. C. and Manolios, P. (2004). Bloom filters in probabilistic verification. InHu, A. J. and Martin, A. K., editors, Formal Methods in Computer-Aided Design,pages 367–381, Berlin, Heidelberg. Springer Berlin Heidelberg.

Dillon, T., Wu, C., and Chang, E. (2010). Cloud computing: Issues and challenges. In2010 24th IEEE International Conference on Advanced Information Networking andApplications, pages 27–33.

Etemad, M. and Küpçü, A. (2018). Verifiable database outsourcing supporting join.Journal of Network and Computer Applications, 115:1 – 19.

Fan, B., Andersen, D. G., Kaminsky, M., and Mitzenmacher, M. D. (2014). Cuckoofilter: Practically better than bloom. In Proceedings of the 10th ACM Internationalon Conference on emerging Networking Experiments and Technologies, pages 75–88.ACM.

Ferretti, L., Marchetti, M., Andreolini, M., and Colajanni, M. (2018). A symmetriccryptographic scheme for data integrity verification in cloud databases. InformationSciences, 422:497 – 515.

Fontaine, C. and Galand, F. (2007). A survey of homomorphic encryption for nonspe-cialists. EURASIP Journal on Information Security, 2007(1):013801.

Frank, A. and Asuncion, A. (2010). UCI machine learning repository. PhD thesis,CENSUS BOREAU.

Garcia-Molina, H. (2008). Database systems: the complete book. Pearson EducationIndia.

Group, T. P. G. D. (1996). PostgreSQL 10.10 Documentation. The PostgreSQL GlobalDevelopment Group.

Hacigümüş, H., Iyer, B., and Mehrotra, S. (2004). Ensuring the Integrity of EncryptedDatabases in the Database-as-a-Service Model, pages 61–74. Springer US, Boston,MA.

Hopcroft, J. E. and Ullman, J. D. (1983). Data structures and algorithms.

Katz, J. and Lindell, Y. (2014). Introduction to modern cryptography. Chapman andHall/CRC.

Krawczyk, H., Canetti, R., and Bellare, M. (1997). Hmac: Keyed-hashing for messageauthentication.

84

Mather, T., Kumaraswamy, S., and Latif, S. (2009). Cloud security and privacy: anenterprise perspective on risks and compliance. " O’Reilly Media, Inc.".

Menezes, J., Katz, A. J., Van Oorschot, P. C., and Vanstone, S. A. (1996). Handbookof applied cryptography. CRC press.

Mykletun, E., Narasimha, M., and Tsudik, G. (2006). Authentication and integrity inoutsourced databases. ACM Trans. Storage, 2(2):107–138.

Narashima, E., mykletun, M., and Tsudik, G. (2004). Signature bouquets: Immutabil-ity for aggregated/condensed signatures. In European Symposium on Research inComputer Security, pages 160–176. Springer.

Narasimha, M. and Tsudik, G. (2006). Authentication of outsourced databases usingsignature aggregation and chaining. In Li Lee, M., Tan, K.-L., and Wuwongse, V.,editors, Database Systems for Advanced Applications, pages 420–436, Berlin, Heidel-berg. Springer Berlin Heidelberg.

News, B. (2015). Google loses data as lightning strikes. [Online]. Last accessed 2019-10-17, URL:https://www.bbc.com/news/technology-33989384.

Pang, H., Jain, A., Ramamritham, K., and Tan, K.-L. (2005). Verifying completeness ofrelational query results in data publishing. In Proceedings of the 2005 ACM SIGMODinternational conference on Management of data, pages 407–418. ACM.

Pang, H. and Tan, K.-L. (2008). Verifying completeness of relational query answers fromonline servers. ACM Transactions on Information and System Security (TISSEC),11(2):5.

Pang, H., Zhang, J., and Mouratidis, K. (2009). Scalable verification for outsourceddynamic databases. Proceedings of the VLDB Endowment, 2(1):802–813.

Pugh, W. (1990). Skip lists: A probabilistic alternative to balanced trees. Commun.ACM, 33(6):668–676.

Rodríguez-Henríquez, L. M. and Chakraborty, D. (2013). Rdas: A symmetric keyscheme for authenticated query processing in outsourced databases. In Accorsi, R.and Ranise, S., editors, Security and Trust Management, pages 115–130, Berlin, Hei-delberg. Springer Berlin Heidelberg.

Rodríguez-Henríquez, L. M. and Chakraborty, D. (2014). Using bitmaps for executingrange queries in encrypted databases. In International Conference on Security andCryptography, volume 2, pages 432–438. SCITEPRESS.

Rodríguez-Henríquez, L. M. X. (2015). Security Services on Outsourced Databases. PhDthesis, INSTITUTO POLITÉCNICO NACIONAL.

Silberschatz, A., Korth, H. F., Sudarshan, S., et al. (1997). Database system concepts,volume 4. McGraw-Hill New York.

85

Wang, H., He, D., Fu, A., Li, Q., and Wang, Q. (2019). Provable data possession withoutsourced data transfer. IEEE Transactions on Services Computing, pages 1–1.

Wang, J., Chen, X., Huang, X., You, I., and Xiang, Y. (2015). Verifiable audit-ing for outsourced database in cloud computing. IEEE transactions on computers,64(11):3293–3303.

Wu, K., Koegler, W., Chen, J., and Shoshani, A. (2003). Using bitmap index for inter-active exploration of large datasets. In 15th International Conference on Scientificand Statistical Database Management, 2003., pages 65–74. IEEE.

Wu, K., Otoo, E., and Shoshani, A. (2004). On the performance of bitmap indices forhigh cardinality attributes.

Xie, M., Wang, H., Yin, J., and Meng, X. (2008). Providing freshness guaranteesfor outsourced databases. In Proceedings of the 11th International Conference onExtending Database Technology: Advances in Database Technology, EDBT ’08, pages323–332, New York, NY, USA. ACM.

86

A Cryptographic Scheme for Cell Granularity Authenticated ...

Documents

Transcript of A Cryptographic Scheme for Cell Granularity Authenticated ...