The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating...

17
The VLDB Journal (2007) 16:371–387 DOI 10.1007/s00778-006-0037-7 REGULAR PAPER The dynamic predicate: integrating access control with query processing in XML databases Jae-Gil Lee · Kyu-Young Whang · Wook-Shin Han · Il-Yeol Song Received: 30 September 2005 / Revised: 10 March 2006 / Accepted: 13 July 2006 / Published online: 19 December 2006 © Springer-Verlag 2006 Abstract Recently, access control on XML data has become an important research topic. Previous research on access control mechanisms for XML data has focused on increasing the efficiency of access control itself, but has not addressed the issue of integrating access con- trol with query processing. In this paper, we propose an efficient access control mechanism tightly integrated with query processing for XML databases. We pres- ent the novel concept of the dynamic predicate (DP), which represents a dynamically constructed condition during query execution. A DP is derived from instance- level authorizations and constrains accessibility of the elements. The DP allows us to effectively integrate autho- rization checking into the query plan so that unautho- rized elements are excluded in the process of query execution. Experimental results show that the proposed access control mechanism improves query processing J.-G. Lee (B ) · K.-Y. Whang Department of Computer Science and Advanced Information Technology Research Center(AITrc), Korea Advanced Institute of Science and Technology(KAIST), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, South Korea e-mail: [email protected] K.-Y. Whang e-mail: [email protected] W.-S. Han Department of Computer Engineering, Kyungpook National University, 1370 Sankyuk-dong, Book-gu, Daegu 702-701, South Korea e-mail: [email protected] I.-Y. Song College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA e-mail: [email protected] time significantly over the state-of-the-art access control mechanisms. We conclude that the DP is highly effec- tive in efficiently checking instance-level authorizations in databases with hierarchical structures. Keywords Access control · Query processing · XML databases · Privacy/security 1 Introduction The amount of private data stored in databases is rap- idly growing [24], and the awareness of privacy protec- tion on data is rapidly increasing [1, 2, 25]. Among many research issues, access control has played a crucial role in privacy protection [1]. Access control prevents an adver- sary from illegitimately accessing private data stored in databases. As XML is emerging as a new standard for data representation and exchange on the internet, access control on XML data has become an important research topic [5, 7, 8, 10, 1214, 20, 23, 26, 34]. Several access control models for XML data have been reported in the literature [5, 12, 14]. These mod- els provide ways to grant authorizations and to con- trol access to XML data through authorizations. That is, these models allow users to access only authorized XML data elements. These models have some common characteristics: an authorization can be specified at the schema level or at the instance level; an authorization on an element implies those on its descendant elements in the XML data hierarchy (called an implicit autho- rization); and an explicit authorization can override an implicit authorization. These characteristics make access control in XML databases complicated. Thus, access control tends to be time-consuming and degrades query- processing performance.

Transcript of The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating...

Page 1: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The VLDB Journal (2007) 16:371–387DOI 10.1007/s00778-006-0037-7

REGULAR PAPER

The dynamic predicate: integrating access control with queryprocessing in XML databases

Jae-Gil Lee · Kyu-Young Whang · Wook-Shin Han ·Il-Yeol Song

Received: 30 September 2005 / Revised: 10 March 2006 / Accepted: 13 July 2006 / Published online: 19 December 2006© Springer-Verlag 2006

Abstract Recently, access control on XML data hasbecome an important research topic. Previous researchon access control mechanisms for XML data has focusedon increasing the efficiency of access control itself, buthas not addressed the issue of integrating access con-trol with query processing. In this paper, we proposean efficient access control mechanism tightly integratedwith query processing for XML databases. We pres-ent the novel concept of the dynamic predicate (DP),which represents a dynamically constructed conditionduring query execution. A DP is derived from instance-level authorizations and constrains accessibility of theelements. The DP allows us to effectively integrate autho-rization checking into the query plan so that unautho-rized elements are excluded in the process of queryexecution. Experimental results show that the proposedaccess control mechanism improves query processing

J.-G. Lee (B) · K.-Y. WhangDepartment of Computer Science and AdvancedInformation Technology Research Center (AITrc),Korea Advanced Institute of Science and Technology(KAIST),373-1 Guseong-dong, Yuseong-gu,Daejeon 305-701, South Koreae-mail: [email protected]

K.-Y. Whange-mail: [email protected]

W.-S. HanDepartment of Computer Engineering,Kyungpook National University, 1370 Sankyuk-dong,Book-gu, Daegu 702-701, South Koreae-mail: [email protected]

I.-Y. SongCollege of Information Science and Technology,Drexel University, Philadelphia, PA 19104, USAe-mail: [email protected]

time significantly over the state-of-the-art access controlmechanisms. We conclude that the DP is highly effec-tive in efficiently checking instance-level authorizationsin databases with hierarchical structures.

Keywords Access control · Query processing · XMLdatabases · Privacy/security

1 Introduction

The amount of private data stored in databases is rap-idly growing [24], and the awareness of privacy protec-tion on data is rapidly increasing [1,2,25]. Among manyresearch issues, access control has played a crucial role inprivacy protection [1]. Access control prevents an adver-sary from illegitimately accessing private data stored indatabases. As XML is emerging as a new standard fordata representation and exchange on the internet, accesscontrol on XML data has become an important researchtopic [5,7,8,10,12–14,20,23,26,34].

Several access control models for XML data havebeen reported in the literature [5,12,14]. These mod-els provide ways to grant authorizations and to con-trol access to XML data through authorizations. Thatis, these models allow users to access only authorizedXML data elements. These models have some commoncharacteristics: an authorization can be specified at theschema level or at the instance level; an authorizationon an element implies those on its descendant elementsin the XML data hierarchy (called an implicit autho-rization); and an explicit authorization can override animplicit authorization. These characteristics make accesscontrol in XML databases complicated. Thus, accesscontrol tends to be time-consuming and degrades query-processing performance.

Page 2: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

372 J.-G. Lee et al.

There have been a number of efforts to developefficient access control mechanisms for XML data.Previous research has mainly dealt with the methods ofefficiently searching for authorizations [5,12,13,26,34]or those of effectively reducing the number of authoriza-tion checks that need to be performed at run time [10,20,23]. These research activities have successfully improvedthe performance of access control itself. In general,access control should be accompanied by query process-ing to return only authorized query results. Nevertheless,despite the close relationship between access controland query processing, there has not been any worktowards integration of these two kinds of operations.

We contend that the tight integration of access controlwith query processing can significantly improve query-processing performance. Let us present a simple exam-ple. Suppose that a user issues a query that retrievesall the drug elements in an XML document and thata very large proportion of drug elements are unautho-rized. If a query processor is aware of the authorizationinformation, it can exclude beforehand most of the drugelements from query processing, thus drastically savingquery processing cost. Hence, the authorization infor-mation can be exploited to help the query engine betteroptimize query evaluation.

In this paper, we develop an efficient access controlmechanism applicable to existing access control mod-els [5,12,14] for XML data. The novelty of our accesscontrol mechanism is tight integration of access controlwith query processing. The key idea is to regard an autho-rization as a query condition to be satisfied. The develop-ment consists of two steps: (1) devising a mechanism thatefficiently searches for authorizations and (2) integratingthis access control mechanism with the query plan.

We first propose an access control mechanism thatutilizes an authorization index and nearest neighborsearch technique [18] for efficiently searching for autho-rizations. We implement an authorization index as amulti-dimensional index [15]. The proposed mechanismfirst maps the elements in XML data with authoriza-tions to two-dimensional points and store them in theauthorization index. Since accessibility of an elementis determined by the explicit authorization specified onthe nearest ancestor element with an explicit one [5,10,12,26], we adopt a nearest neighbor search technique toefficiently find that explicit authorization.

We then propose the notion of the dynamic predi-cate (DP), which represents a dynamically constructedcondition during query execution. We note that acces-sibility of an element is dynamically determined duringquery execution because authorizations can be specifiedat the instance level. The DP is applied for checking theaccessibility of such elements during query execution.

Here, the DP indicates which elements are authorized orunauthorized. By inserting the DP into the query plan,we effectively integrate access control with query pro-cessing. Due to this integration, unauthorized elementsare excluded in an early phase of query processing, thussignificantly improving the performance.

There are two major reasons why we need a mecha-nism different from those for relational databases:instance-level authorizations and data hierarchies. Typ-ically, relational databases support only schema-levelauthorizations without supporting instance-level autho-rizations. Supporting instance-level authorizations isexpensive since we need to access the elements them-selves to check their accessibility. We reduce the costby using the notion of the DP, i.e., by representing theauthorization information of a set of elements having anidentical authorization with that of the element beingcurrently accessed. The DP is highly effective in check-ing instance-level authorizations in databases with hier-archical structures. A large number of data elements canbe filtered out along the data hierarchies, thus makingthe DP more effective. We call this feature hierarchi-cal filtering. In contrast, relational databases have flatstructures and cannot take advantage of this hierarchi-cal filtering.

Recently, the importance of information inference isbeing widely recognized [1]. Information inference is theability of inferring the information that a user is notpermitted to know using the information that the userknows [1]. The extent of information inference is depen-dent on the strategy for determining authorized queryresults. There are two different strategies for determin-ing authorized query results. The first one is to checkauthorizations only for the instances returned as thequery result [5,12,20,23,26]. The second one is to checkauthorizations for all the instances of other elementsthat qualify result instances as well as for the resultinstances themselves [10]. The second strategy signifi-cantly reduces information inference, and we call it theinference-blocking strategy. To show progressive devel-opment, we first develop our access control mechanismfor the first strategy in Sects. 4 and 5, and then, proceedto the second strategy in Sect. 6.

In summary, the contributions of this paper are asfollows:

• We propose a new notion of the authorization indexcombined with the nearest neighbor search tech-nique that allows effective determination of acces-sibility.

• We propose a new notion of the DP that allowseffective integration of access control with queryprocessing.

Page 3: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 373

• We propose an efficient access control mechanismtightly integrated with query processing—using thenotion of the DP.

• We make our access control mechanism minimizeinformation inference by adopting the inference-blocking strategy.

• We demonstrate, by extensive experiments, that ouraccess control mechanism significantly outperformsexisting mechanisms.

The rest of this paper is organized as follows. Sect. 2describes existing access control models and mecha-nisms for XML data. Section 3 presents a simple accesscontrol mechanism. Section 4 proposes an integratedaccess control mechanism that uses DPs. Section 5 dis-cusses our approach to reducing information inference.Section 6 presents the results of performance evaluation.Finally, Sect. 7 concludes the paper.

2 Related work

We explain existing access control models [5,12,14] forXML data and access control mechanisms in these mod-els. These models provide ways to grant/revoke autho-rizations on XML data stored in the database and tocontrol access to authorized XML data.

2.1 Access control models

An authorization in existing access control models isgenerally defined as a 5-tuple < s, o, a, sign, imply_option>[5,12,14]. Here, s is a user, a user group, a role, ora credential [7] to whom an authorization is granted; o anXML document or an XML element/attribute protectedby the authorization; a an action being allowed or pro-hibited; sign has one of + or −, which indicates whetherthe action is allowed or prohibited; imply_option repre-sents whether the authorization implies 1 authorizationson descendant elements. s is called the authorizationsubject, o the authorization object, and a with sign theauthorization type.

Authorizations can be specified at the schema levelor at the instance level. Schema-level authorizations arespecified on a DTD, and these are applicable to all XMLdocuments that are instances of the DTD. Instance-levelauthorizations are specified on an XML document, andthese are applicable only to the XML document to whichthe authorizations have been granted.

Authorizations are classified into explicit or implicit,strong or weak, and positive or negative ones [12,27].

1 Most access control models for XML data use the term “prop-agate,” but we use the term “imply” for consistency.

An explicit authorization is explicitly specified on anelement of a DTD or an element of an XML document;an implicit authorization is implied by an explicit autho-rization specified on the nearest ancestor element withan explicit one. A strong authorization does not allow animplicit authorization to be overridden; a weak authori-zation allows an implicit authorization to be overriddenby an explicit authorization. A positive authorizationallows accesses for a specific action; a negative authori-zation prohibits accesses for a specific action.

Implication of an authorization introduces potentialconflicts among authorizations. That is, an element mayhave both implicit and explicit authorizations for differ-ent authorization types. To resolve these conflicts, manyaccess control models for XML data use the most spe-cific overrides policy [5,10,12]. An explicit authorizationgranted on a lower-level element always overrides aweak authorization granted on a higher-level elementaccording to this policy. On the other hand, any explicitauthorization cannot be granted on a lower-level ele-ment of an element having a strong authorization bydefinition.

2.2 Access control mechanisms

Due to implicit authorizations, authorizations on theancestors should be examined to determine accessibilityof an element. Bertino et al. [5] have proposed top-downand bottom-up strategies that determine the accessibil-ity by traversing paths between the root and the elementto be examined. Top-down strategies traverse pathsbeginning from the root; bottom-up strategies traversepaths beginning from the element to be examined. Thus,both strategies access the ancestors of the element tobe examined. If elements to be examined are scatteredin the database, in the worst case, these strategies mayaccess the whole database to check authorizations [5].

Damiani et al. [12] have proposed a view-based accesscontrol mechanism. This mechanism creates and main-tains a separate view for each user, which containsexactly the set of data elements that the user is autho-rized to access. After views are constructed, users cansimply run their queries against the views. Althoughviews are prepared off-line, in general, this mechanismsuffers from high maintenance and storage costs espe-cially for a large number of users [20]. Fan et al. [13]have proposed another view-based access control mech-anism. This mechanism generates not only a view (calleda security view), but also a view DTD to which thesecurity view conforms. The view DTD is used forimproving efficiency of query rewriting and optimiza-tion. This mechanism also shares the common drawback

Page 4: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

374 J.-G. Lee et al.

of view-based access control mechanisms—high main-tenance cost for a large number of users [8].

Murata et al. [23] have proposed a static analysismethod for reducing the number of authorization checksduring run time. This method classifies an XML queryat compile time into three categories: entirely autho-rized, entirely prohibited, or partially authorized ones.Entirely authorized or entirely prohibited queries canbe executed without access control. However, the sta-tic analysis cannot obtain any benefits when a queryis classified as a partially authorized one. To remedythis problem, Luo et al. [20] have proposed a query-rewriting method QFilter that converts a partially autho-rized query into an entirely authorized one. QFilter usesthe Non-deterministic Finite Automata (NFA) for queryrewriting. However, a query rewritten by QFilter tendsto be complicated (i.e., contain unions of many pathexpressions) as the number of authorization increases.Both methods, however, are able to support onlyschema-level authorizations since they do not examinethe actual database.

Recently, some access control mechanisms whosemodels are slightly different from traditional ones havebeen proposed. Yu et al. [34] use an access control modelwhere an authorization on an element does not implyauthorizations on its descendant elements. In this typeof model, the number of authorizations tends to pro-liferate since every authorization has to be explicitlygranted. Thus, Yu et al. have proposed the compressedaccessibility map (CAM), a mechanism of compressingneighboring authorizations to avoid such proliferation.Cho et al. [10], using an access control model whereinstance-level authorizations are granted only accord-ing to predetermined patterns (called security annota-tions), have proposed an optimization method (whichwe call SCA for convenience) minimizing the numberof security checks that need to be performed at runtime. Instead of unconditionally performing a recursivecheck on the ancestors of an element, the SCA methodoptimally determines when a recursive check can beeliminated or simplified to a local check on the element.

2.3 Comparison with our access control mechanism

Our access control mechanism has desirable propertiescompared with earlier access control mechanisms. First,it can speed up query processing by excluding unau-thorized elements early on during query processing.Second, it obviates the need for accessing the wholedatabase by searching only the authorization index asopposed to the top-down and bottom-up strategies [5].Third, it does not suffer from the overhead of maintain-ing views as opposed to the view-based access control

mechanisms [12,13]. Fourth, it supports instance-levelauthorizations as opposed to the static analysis meth-ods [20,23] which handle only schema-level authoriza-tions.

3 A simple access control mechanism

In this section, we present a simple access control mecha-nism. We first present the problem definition in Sect. 3.1.We discuss the structure of the authorization index thatstores instance-level authorizations in Sect. 3.2 and pro-pose a simple access control algorithm that utilizes theauthorization index and nearest neighbor search tech-nique in Sect. 3.3.

3.1 The problem definition

We develop an efficient access control mechanism appli-cable to existing access control models [5,12,14] forXML data. That is, given an access control policy (i.e., aset of authorizations granted), an XML document, and aquery, our access control mechanism retrieves the queryresults authorized according to the access control pol-icy. The XPath [11] language, which is a core componentof the XQuery language, is used for specifying XMLqueries.

We first summarize the features of the access con-trol model adopted in this paper. As stated in Sect. 2,an authorization is defined as a 5-tuple < s, o, a, sign,imply_option >. A path expression confirming to theXPath [11] standard is used for specifying an authori-zation object. Hence, we are able to support protectiongranularity levels ranging from an XML document to anXML element/attribute. If the path expression points tothe root element, the granularity level is the whole doc-ument. If a set of elements is generated as the resultof the path expression, we assume that the authoriza-tion is granted on every element. Besides, we supportthe content (i.e., the text node) of an element as thegranularity level. Here, the content of an element is alsoprotected by the authorization on the element as in mostaccess control models. Authorizations are classified intoexplicit or implicit, strong or weak, and positive or neg-ative ones. The authorization type of a positive authori-zation is represented by a or +a, and that of a negativeauthorization by ¬a or −a.

The most specific overrides [5,10,12,27] policy isemployed to resolve conflicts among authorizations.That is, an explicit authorization on an element over-rides any authorizations specified on the ancestors ofthe element. On the other hand, if there exists no autho-rization—either explicit or implicit—on an element, theelement is considered as inaccessible [12,14,20,23,26].

Page 5: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 375

It is possible to specify multiple explicit authoriza-tions for different authorization types on the same ele-ment, thus allowing accesses to the element for multipleactions (e.g., for both read and update). If both a positiveexplicit authorization and a negative explicit authoriza-tion for the same authorization type are specified onthe same element, however, the negative one overridesthe positive one (called the denials take precedence pol-icy) [5,12,14,20,23,26].

We now explain the ways of performing access con-trol. Executing a query requires authorizations for aspecific authorization type (e.g., read), and we call it theauthorization type of the query. The elements on whichan authorization has been granted for the authoriza-tion type of the query are accessible to the user, butother elements are inaccessible. We elaborate on theinstance-level checking since this checking is very expen-sive compared with the schema-level checking. Hereaf-ter, we refer to instance-level authorizations simply asauthorizations.

Example 1 Figure 1 shows an example of an XMLdocument and authorizations. Let us call this XML doc-ument hospital.xml. The authorization <userA, docu-ment(“hospital.xml”)/hospital, read, +, imply 2> allowsuserA to read the hospital element and its descendantelements of hospital.xml. However, the authorization<userA, document(“hospital.xml”)/hospital/patient[2],read, −, imply> overrides the authorization above andprohibits userA from reading the second patient elementand its descendant elements. Suppose userA issues aquery //patient//drug which requires read authorizations.Here, the only three drug elements under the first patientelement are retrieved as the query result.

3.2 Authorization indexes

We use a two-dimensional index [15] to implement theauthorization index. This is because elements with anexplicit authorization are represented by two-dimen-sional points (start, end) according to the numberingscheme [3,6,9,19] used in many XML query-processingmethods. The (start, end) points are stored in the autho-rization index. In the numbering scheme, (start, end) rep-resents an ancestor–descendant relationship betweenXML elements and satisfies the following two condi-tions [19]: (1) for any element u and its parent elementv, the interval (startu, endu) is contained in the inter-val (startv, endv); (2) for two sibling elements u and v,if u is a predecessor of v in preorder traversal, then

2 For simplicity, we assume that an authorization on an elementalways implies authorizations on its descendant elements.

hospital

name illness therapy

patient

drug drug drug

text text text

text text

name illness therapy

patient

drug drug

text text

text text

name illness therapy

patient

drug drug

text text

text text

The query results returned to user A

<userA, document(“hospital.xml”)/hospital/patient[2], read, -, imply><userA, document(“hospital.xml”)/hospital, read, +, imply>

Fig. 1 An example of an XML document and authorizations

endu < startv. Therefore, a is an ancestor of d if and onlyif starta < startd ∧ enda > endd.

For the authorization index, we can use any kindof indexes that can handle multi-dimensional points.The MLGF [32], R-tree [17], buddy tree [31], and quadtree [29] are examples.

Example 2 Figure 2 illustrates an example of the autho-rization index implemented using the MLGF. Supposethat authorizations are granted on the elements whose(start, end) numbers are (7, 9), (10, 20), (26, 28), (29, 36),and (37, 44), respectively. These two-dimensional pointsare depicted in the two-dimensional space as in Fig. 2aand are indexed in the MLGF as in Fig. 2b. The MLGFis a height-balanced index tree that stores multi-dimen-sional points [32]. A non-leaf page contains entries of<region, ref>, where ref is the pointer to the child page,and region contains all the regions represented by theentries of the child page pointed by ref. A leaf page con-tains entries of <point, oid>, where point is the coordi-nate of the point, and oid is the identifier of the objectstored in the database.

3.3 Simple access control algorithm using theauthorization index and nearest neighbor search

Due to the most specific overrides policy, we can deter-mine accessibility of an element by seeing the autho-rization granted only on the nearest ancestor elementregardless of its authorization type. That is, an element isaccessible if the authorization type of this authorizationis the same as that of the query; inaccessible otherwise.We refer to this authorization as the nearest ancestorauthorization. We formally define it in Definition 1 andfind it using Lemma 1.

Page 6: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

376 J.-G. Lee et al.

0

10

20

30

40

50

0 10 20 30 40start

end

Authorizations depicted in the two-dimensional space.

auth1: (7,9)

auth2: (10,20)

auth3: (26,28)

auth4: (29,36)

auth5: (37,44)

[7,21]x[9,44], refA

<7,9>, oid1 <10,20>, oid2 <26,28>, oid3 <29,36>, oid4 <37,44>, oid5

[22,37]x[9,44], refB

auth1 auth2 auth3 auth4 auth5

A B

Root node

Node A Node B

The structure of the authorization index.

(a)

(b)

Fig. 2 An example of the authorization index implemented usingthe MLGF

Definition 1 An authorization is called the nearestancestor authorization authnaa of the element e if it satis-fies the following two conditions: (1) authnaa is an explicitauthorization granted on the element e or one of itsancestor elements regardless of its authorization type; (2)

no explicit authorization exists on elements in the pathbetween the element e and the element on which authnaa

is granted. If a strong authorization satisfies the first con-dition, it automatically satisfies the second condition bythe definition of the strong authorization.

Lemma 1 The nearest ancestor authorization authnaa ofthe element e is the one that minimizes |start(e) −start(auth)|, where auth is an authorization that satis-fies start(auth) ≤ start(e) ∧ end(auth) ≥ end(e). That is,authnaa has the minimum difference between start(e) andstart(auth) where auth is an authorization located in theupper-left region of the element e in the two-dimensionalspace. Here, start(e) and end(e) represent start and endvalues of the element e; start(auth) and end(auth) thoseof the element on which auth is granted.

Proof See Appendix A.

We note that multiple nearest ancestor authorizationscan exist due to multiple authorizations on the sameelement. Suppose there exist multiple nearest ances-tor authorizations auth(1)

naa, . . . , auth(k)naa whose authori-

zation types are a(1)naa, . . . , a(k)

naa, respectively. In this case,

Fig. 3 The simple access control algorithm Nearest AncestorFiltering

we regard them as one authorization whose authoriza-tion type is obtained by ORing individual a(i)

naa’s (i.e.,∨k

i=1 a(i)naa).

We now propose a simple access control algorithmthat uses the authorization index and nearest neighborsearch technique. We call it Nearest Ancestor Filtering.The algorithm executes a query and, for each queryresult, searches for the nearest ancestor authorizationaccording to Lemma 1 to check authorizations. Here,the nearest ancestor authorization can be retrieved byusing a nearest neighbor search technique [18]. Finally,the algorithm examines whether the authorization typeof the nearest ancestor authorization is the same asthat of the query, thus checking accessibility of thequery result. We note that, to find the nearest ancestorauthorization, Nearest Ancestor Filtering accesses theauthorization index only once, while the top-down andbottom-up strategies access many ancestors and find theauthorization for each ancestor accessed [5]. Figure 3shows the algorithm Nearest Ancestor Filtering.

4 An integrated access control mechanism

In this section, we propose an access control mechanismtightly integrated with query processing, which is theprimary contribution of this paper. We first present anoverview in Sect. 4.1, and then, propose a new notion ofthe dynamic predicate and a technique that integratesauthorization checking into the query plan using thisnotion in Sect. 4.2.

4.1 Overview

The key operation in the integrated access control mech-anism is identifying a set of elements that have the sameaccessibility. The start values of those elements can beformed as an interval, and we call it the start interval. A

Page 7: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 377

Nearest ancestor authorization (auth naa).

Nearest overriding authorization (auth noa).

candidate results

elements with anexplicit authorization

e1 e2 e3 e4 e5 e6 e7 e8 e9

auth naa

auth noa

auth naae10 e10

e11

e1 e2 e3 e4 e5 e6 e7 e8 e9

(a) (b)

Fig. 4 Elements with the same nearest ancestor authorization

start interval represents either a set of authorized ele-ments or a set of unauthorized elements. During queryprocessing, the start interval can be used to determineaccessibility of an element by examining whether thestart value of the element is contained in this start inter-val. With this method, we do not have to search theauthorization indexes for each query result as in thealgorithm Nearest Ancestor Filtering.

A start interval of elements with the same accessibil-ity can be constructed as the interval of elements withthe same nearest ancestor authorization because acces-sibility of an element e is determined by the nearestancestor authorization of e by Definition 1. Figure 4shows the elements having the same nearest ancestorauthorization. In Fig. 4a, the nearest ancestor autho-rization of element e1 is authnaa, granted on e10, andno authorization overrides authnaa. Therefore, elementse1–e9, which are the descendant elements of e10, haveauthnaa as the nearest ancestor authorization. However,in Fig. 4b, only elements e1–e4 have authnaa as the near-est ancestor authorization because authnoa, granted one11, overrides authnaa. To consider this case, we definethe nearest overriding authorization authnoa of the near-est ancestor authorization authnaa as the one that firstoverrides authnaa, when traversing the XML data treefrom a current element in preorder. We formally defineit in Definition 2 and find it using Lemma 2.

Definition 2 Suppose authnaa is the nearest ancestorauthorization of the element e and is a weak authori-zation. An authorization is called the nearest overridingauthorization authnoa of authnaa if it satisfies the follow-ing two conditions: (1) authnoa is an explicit authorizationgranted on a descendant element of the element havingauthnaa regardless of its authorization type; (2) no otherexplicit authorization exists, when traversed in preorder,between the element e and the element having authnoa. Ifauthnaa is a strong authorization, however, authnoa doesnot exist by the definition of the strong authorization.

Lemma 2 Suppose authnaa is the nearest ancestor autho-rization of the element e and is a weak authorization.Then, the nearest overriding authorization authnoa of

authnaa is the one that minimizes |start(e) − start(auth)|,where auth is an authorization that satisfies start(auth) >

start(e) ≥ start(authnaa) ∧ end(auth) < end(authnaa).

Proof See Appendix B. We now present Lemma 3 for constructing the start

interval having the same nearest ancestor authorizationas that of the element e. All the elements whose startvalue is contained in this interval are accessible or inac-cessible depending on the authorization type of the near-est ancestor authorization of the element e.

Lemma 3 Suppose that the nearest ancestor authoriza-tion of the element e is authnaa, and the nearest overrid-ing authorization of authnaa is authnoa. If authnaa does notexist, we assume that authnaa is a negative authorizationspecified on the root element. This assumption is reason-able since an element with no authorization is regardedas inaccessible in our access control policy. Then, anyelement whose start value is contained in the followinginterval has the same nearest ancestor authorization asthat of the element e.

• if authnoa exists: [start(e), start(authnoa))

• if authnoa does not exist: [start(e), end(authnaa))

Proof See Appendix C.

4.2 Dynamic predicates (DPs)

We propose the notion of the dynamic predicate3, whichrepresents a dynamically constructed condition duringquery execution. We note that, due to instance-levelauthorizations, the start interval constructed by Lemma 3is data-dependent and should be evaluated at run time.The DP inserted in the query plan allows this data-dependent start interval to be evaluated at run time.By evaluating the start interval, we can filter out unau-thorized elements from query processing early on, thussignificantly improving query performance. Therefore,we apply the notion of the DP for determining accessi-bility of elements so as to tightly integrate access control

3 Oracle has used the term “dynamic predicate” for a meaningdifferent from that of our DP. Oracle Enterprise Edition providesa facility that automatically appends a predicate to a user’s queryat query compilation time. Oracle refers to this predicate as a DP.It is defined by the database administrator based on his accesscontrol policy. The DP is used for hiding some tuples from a userjust like the way views are used for access control. We note that theDP in Oracle is static rather than dynamic. It is determined dur-ing query compilation and is not changed during query execution.In contrast, the DP in this paper is indeed dynamic because it ischanged continuously due to its property of being data-dependent.

Page 8: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

378 J.-G. Lee et al.

with the query plan. Definition 3 formally defines the DPwhen it is applied to access control.

Definition 3 A dynamic predicate for the element e is acondition dynamically constructed during query execu-tion, representing a start interval of elements with the sameaccessibility. A DP is represented as ([startbegin, startend),sign). Here, [startbegin, startend) represents the start inter-val, and sign represents accessibility (+) or inaccessibil-ity (−) of the elements that belong to this start interval. ADP is constructed as follows:

(1) [startbegin, startend) : the start interval generatedaccording to Lemma 3;

(2) sign:

⎧⎪⎨

⎪⎩

+ if the nearest ancestor authorization

allows accesses to the element e;

− otherwise.

Example 3 Suppose userA issues a query //patient //drug,which requires read authorizations, against the XMLdocument in Fig. 5. During query processing, a drug ele-ment (11, 13) is retrieved, and then, the DP for (11, 13)

is constructed according to Definition 3 as follows. (1)The nearest ancestor authorization of (11, 13) is the onegranted on (2, 46) by Definition 1, and its nearest over-riding authorization is the one granted on (22, 45) byDefinition 2. This case falls into the first category inLemma 3, and thus, [startbegin, startend) becomes [11, 22).(2) The nearest ancestor authorization, which is grantedon (2, 46), prohibits userA from reading the element(11, 13) because its authorization type is ¬read. Thus,sign of the DP becomes −. Therefore, if we insert([11, 22), −) into the query plan, drug elements (11, 13) ∼(17, 19) are excluded from the query results, thus improv-ing the query performance.

Only DPs with the − sign can enhance the perfor-mance when they are inserted into the query plan,because these predicates exclude elements. DPs withthe + sign make the elements whose start values arecontained in [startbegin, startend) be included into queryprocessing. However, DPs with the + sign should alsobe inserted into the query plan because a DP with the+ sign allows us to find the element for constructing thenext DP. This element is the first one whose start valueis not contained in [startbegin, startend).

The earlier the DPs are evaluated in the query plan,the earlier we can exclude unauthorized elements fromquery processing, thus improving query performance.Thus, we push down DPs in the query plan. It is simi-lar to query optimization in relational databases, push-ing down selection predicates to reduce the number ofretrieved tuples as early as possible. While the query

surgery

name illness therapy

patient

drug drug drug drug drug

text text text text

drug drug

text text text

text text

name illness therapy

patient

text text(11,13)(14,16)(17,19)

(3,21) (22,45)

(30,32) (38,40) (41,43)(33,35)

(29,36) (37,44)

(2,46)

(4,6) (7,9) (10,20) (23,25) (26,28)

Those elements that can be excluded from the query results by DPs

hospital(1,81)

nearest overriding authorization:<userA, /hospital/surgery/patient[2], read, +, imply>

nearest ancestor authorization:<userA, /hospital/surgery, read, -, imply>

therapy

Fig. 5 An example that shows performance improvement whenusing DPs

optimization in relational databases pushes down selec-tion predicates only once during query compilation,however, the proposed optimization technique dynami-cally generates the predicates and consecutively pushesthem down during query processing.

To push-down DPs, we first insert selection (σ ) oper-ators into the query plan right above the operators thatscan query elements during query compilation, and then,assign DPs as the predicates to these selection operatorsduring query processing.

Figure 6 shows query plans for executing the query//eA//eB//eC before and after using DPs. The queryplan is constructed using the scheme proposed by Wuet al. [33] extended with authorization operators. Here,the query plan is similar to that of relational algebra:the query plan is a tree where query elements are leafnodes, and operators are internal nodes. Figure 6a showsa basic query plan that does not use DPs. The authoriza-tion checking operator executes the algorithm Near-est Ancestor Filtering presented in Sect.3.3 and returnsauthorized results. The operator // returns pairs ofancestor–descendant elements. A well-known process-ing method of the operator // is the structural join [3,9,19]. The scan(eA), scan(eB), and scan(eC) operators scaneach element in the order of start values to execute thestructural join. The basic query plan is constructed insuch a way that it first retrieves each result of the query//eA//eB//eC, and then, performs authorization check-ing with the operator . The output of each operator ispipelined to the input of the parent operator [16].

Figure 6b shows the query plan augmented with DPs.This plan is constructed such that the DP push-downoperator is inserted at top of the query plan, and σDP

Page 9: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 379

//

The query plan augmented with DPs.

σDP

σDP σDP

//

// DP push-down

//

The basic query plan.

ΨΩΩ: Ψ:

scan

scan(e )C

(e )B

scan(e )A

DP push-down

operator

authorizationcheckingoperator

scan(e )A

scan(e )B

scan(e )C

(a) (b)

Fig. 6 Query plans before and after using DPs

operators right above scan(eA), scan(eB), and scan(eC)

operators. The role of the operator is to initialize andupdate DPs as query processing proceeds, pushing themdown to σDP operators. Here, the arrow lines indicatethat DPs generated by the operator are pushed downto the σDP operators. By using these DPs, only autho-rized instances of eA, eB, and eC are provided to theinput of // operators.

The DP push-down operator examines whetherthe nearest ancestor authorization of each query resultretrieved is different from that of the previous queryresult. Only when it is different, the operator pushesdown a newly constructed DP into σDP operators. Thealgorithm of the operator is constructed by using theiterator [16] model commonly used in query processorsof commercial DBMSs. In the iterator model, the opera-tors composing the query plan receive an input from thechild operators, and then, provide the processed result tothe parent operator. To obtain a result from each oper-ator, it provides GetNext() function as the interface.

Figure 7 shows the algorithm for GetNext() of theoperator . We call it Dynamic Predicate Filtering. Thealgorithm consists of two steps. In the first step, the algo-rithm pushes down the constructed DP into the childoperator P and retrieves each authorized query resultby calling P.GetNext() (lines 2–6). At this time, it trans-mits the DP to the child operator through the argumentof P.GetNext(). This DP is recursively transmitted toall the σDP operators. When .GetNext() is first called,the algorithm retrieves the first candidate query resultecandidate and constructs an initial DP for ecandidate usingDefinition 3 (lines 3–5). In the second step, the algorithmconstructs the DP using Definition 3 only when the DPfor the query result e needs to be reconstructed (lines7–8). [startbegin, startend) of the DP is the start intervalrepresenting the set of elements with the same near-est ancestor authorization as that of the previous queryresult. Therefore, we can easily determine the element

Fig. 7 The integrated access control algorithm DynamicPredicate Filtering (Basic)

that triggers reconstruction of the DP by examiningwhether the start value of the element e is containedin this interval (line 7).

Example 4 Figure 8 shows an example of executing thealgorithm Dynamic Predicate Filtering (Basic), whenissuing a query //patient//drug against the XML docu-ment in Fig. 5. In line 4 of Fig. 7, the algorithm retrievesthe first candidate query result, drug element (11, 13).In line 5, DP = ([11, 22), −) is constructed as shown inExample 3 using Definition 3. In line 6, the algorithmpushes down DP = ([11, 22), −) all the way to the σDPoperators by calling P.GetNext(). Here, the operator Pis the // operator. Then, drug elements, (11, 13), (14, 16),and (17, 19), whose start values are contained in [11, 22),are excluded from query processing. (Filtering patientelements is discussed in Example 6.) Thus, the algorithmretrieves drug element (30, 32) which is next to [11, 22).In line 8, DP = ([30, 45), +) is newly constructed, whichis used when .GetNext() is called next.

Dynamic Predicate Filtering is optimally applied tothe query plan that satisfies the following condition: ele-ments are scanned and query results generated in theorder of start values. Most structural join algorithmssatisfy this condition [3,6,9].4 Thus, we fully exploit theproperties of structural join algorithms, which are themost popular methods for processing XML queries.

4 When this condition is not satisfied, however, we can simplyextend Dynamic Predicate Filtering in such a way that it accumu-lates DPs instead of replacing the old one with a new one (althoughthe performance will not be as good).

Page 10: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

380 J.-G. Lee et al.

P

σ¬ (11≤start<22) σ¬ (11≤start<22)

DP:([11,22), -)

DP:([11,22), -)

σ30≤start<45

DP:([30,45), +)

DP:([30,45), +)

σ30≤start<45

ΨΨ constructed by the element (30,32)

scan( patient)

Push-down of the DP constructed by the query result (30,32).

P

Push-down of the DP constructed by the query result (11,13).

scan( drug)

scan( patient)

scan(drug)

(a) (b)

Fig. 8 An example of executing the algorithm DynamicPredicate Filtering (Basic)

Dynamic Predicate Filtering can be used also for twigqueries [6]: for example, //eA[.//eB//eC]//eD. Twigqueries are processed in two consecutive phases [6].In the first phase, individual linear paths are extractedfrom the twig query (//eA//eB//eC and //eA//eD in theexample query), and the result for each is retrieved.In the second phase, results for individual linear pathsare merge-joined. We can use Dynamic Predicate Filter-ing during the first phase that handles only linear pathqueries.

5 Further enhancement of the access controlmechanism to minimize information inference

Information inference is the ability of inferring the infor-mation that a user is not permitted to know using theinformation that the user knows [1]. We discuss a fur-ther enhancement of our access control mechanism tominimize information inference. We determine autho-rized query results using the following strategy: we checkauthorizations for all the instances of other elementsthat qualify result instances as well as for the resultinstance themselves. 5 We call it the inference-blockingstrategy. This strategy can significantly reduce informa-tion inference compared with that presented in previoussections.

Example 5 Consider the XML document in Fig. 5. Sup-pose that a user is allowed to access the drug elements,but not the patient elements. If only query results arechecked for authorizations, a query //patient[./name=‘Lee’]//drug could allow the user to infer the exis-tence of a patient named ‘Lee.’ That is, non-empty query

5 The meaning of the qualifying instances becomes ambiguouswhen XPath queries involve negation. Since most of XPath que-ries do not allow negation [28], we do not consider it here leavingit as the topic of a future paper.

results indicate the existence of such a patient. Thisinference clearly violates the desired intent. In contrast,the inference-blocking strategy prevents this inference.Since the user is not allowed to access the patient ele-ments, the query result is always empty regardless of theexistence of such a patient.

We first define the instances for which we need tocheck authorizations. We call the set of these instancesthe result tree and define it using the scheme proposedby Miklau and Suciu [22]. Miklau and Suciu model anXML document and an XPath query as trees. Here, anode of an XML document is an element instance, andthat of an XPath query is an element type. A tree rep-resenting an XML document is called an XML tree, andthat representing an XPath query a tree pattern. Thenode representing the query result in a tree pattern iscalled the distinguished output node. The set of edgesof a tree pattern is partitioned into the set of the par-ent–child edges and the set of the ancestor–descendantedges. Miklau and Suciu define an embedding as a map-ping between the nodes of an XML document and thenodes of an XPath query. Given a tree pattern p and anXML tree t, an embedding e of p into t is a total functione : nodes of p → nodes of t which satisfies the fol-lowing condition: (1) ∀(x, y) ∈ parent–child edges of p,e(y) is a child of e(x) in t; (2) ∀(x, y) ∈ ancestor–descen-dant edges of p, e(y) is a descendant of e(x) in t; and (3)∀x ∈ nodes of p, the element name of x is the same asthat of e(x).

We now formally define the result tree in Definition 4.

Definition 4 A result tree rt of a tree pattern p is a treeconstructed by embedding the nodes of p, that is, byreplacing x ∈ nodes of p with e(x). NODES(rt) denotesthe set of nodes in a result tree rt, and OUTPUT(rt) thenode in rt that corresponds to the distinguished outputnode in p.

We now enhance our access control algorithm so asto check authorizations for all the nodes in the resulttree of a query. The enhancement of Nearest Ances-tor Filtering is straightforward: we search for the nearestancestor authorization and check the authorization foreach node in the result tree retrieved. Figure 9 shows theenhanced algorithm Dynamic Predicate Filtering adopt-ing the inference-blocking strategy. It maintains DPsdedicated to each element type specified in the query tocheck accessibility of the instances of that element type.The algorithm constructs a DP for each node in the resulttree and pushes down all the DPs constructed (lines 3–6).The DP is transmitted to the σDP operator checking theinstances of the element type to which this DP is dedi-cated. Then, the algorithm examines whether the DP for

Page 11: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 381

Fig. 9 The DynamicPredicate Filtering algorithmadopting the inference-blocking strategy

σ

σ

σ

σ σ

σ

σ

each node in the result tree newly retrieved needs to bereconstructed (lines 7–9). P.GetNext(), which is calledin .GetNext(), returns a result (i.e., a result tree) of Ptransmitting recursively each DP to the correspondingσDP operator (lines 12–23). In lines 20–21, we find theelements that satisfy ancestor–descendant relationshipusing the scheme proposed by Chien et al. [9]. Chienet al. construct, for each element type in an XML doc-ument, two-dimensional indexes storing the (start, end)

points of elements to speed up query processing. Theseindexes are also used to find the instance having thesmallest start value for each element type in line 4. Wediscuss the effect of the inference-blocking strategy onperformance in Sect. 6.2.

Example 6 Reconsider Example 4 with Dynamic Predi-cate Filtering in Fig. 9. The result trees retrieved are pairsof the patient element and the drug element satisfyingancestor–descendant relationship. For example, the firstresult tree is the pair of the patient element (3, 21) andthe drug element (11, 13). Dynamic Predicate Filteringmaintains two separate DPs: one dedicated to the patient

element and one dedicated to the drug element. Forthe first result tree, the former becomes DP(patient) =([3, 22), −), and the latter DP(drug) = ([11, 22), −). Then,the former is transmitted to the σDP operator above thescan(patient) operator to filter out unauthorized patientelements early on. Similarly, the latter is transmitted tothat above the scan(drug) operator.

6 Performance evaluation

In this section, we evaluate the performance of ouraccess control mechanism compared with existing onesfor XML data. We describe the experimental data andenvironment in Sect. 6.1 and present the results of theexperiments in Sect. 6.2.

6.1 Experimental data and environment

We compare the performance of query processing withsix different access control mechanisms: the top-down

Page 12: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

382 J.-G. Lee et al.

strategy, the bottom-up strategy, the CAM method [34],the SCA method [10], Nearest Ancestor Filtering, andDynamic Predicate Filtering. The first and second onesare considered as naive ones; the third and fourth onesas state-of-the-art ones. We adopt the inference-block-ing strategy for determining authorized query results. Asthe performance measure, we use the wall clock time.

To measure the performance variation depending onthe types of data sets, we use the XMark [30] benchmarkdata, which is a synthetic data set, and the TreeBank[21] data, which is a real data set. The XMark bench-mark data consists of a large XML document. We use10 MB, 100 MB, and 1 GB XML documents to measurethe performance variation depending on the databasesize. In the XMark benchmark data, subtrees with asimilar structure occur repeatedly at the same level. TheTreeBank data consists of an XML document of 86 MB,in which elements are recursively nested in many levels.

To measure the performance variation depending onthe number of explicit authorizations, we vary the ratioof the number of elements on which explicit authoriza-tions are granted to the total number of elements in adata set from 0.01 to 100%. To determine an authori-zation object, we randomly pick 0.01–100% of elementsin a data set. Here, we first pick the root element tomake sure all the elements have an authorization—either explicit or implicit. Multiple explicit authoriza-tions can be granted on the same element unless we try togrant both a positive authorization and a negative autho-rization for the same authorization type. To determinean authorization type, we randomly select one from fiveauthorization types (e.g., read, write, update, create, anddelete). We grant positive authorizations and negativeauthorizations with the ratio of 9:1. We also performexperiments with the ratio of 1:9.

For the CAM method, we treat implicit authoriza-tions in our access control model as explicit ones thatare identical to their parents’ in CAM because, in CAM,every authorization has to be explicitly granted. Then,after compression, only those authorizations corre-sponding to explicit authorizations in our access controlmodel remain and are indexed by CAM. Thus, autho-rizations indexed by CAM become the same as thoseindexed by our authorization index. To index authori-zations, CAM constructs a trie consisting of the pathstrings of the elements having authorizations.

To measure the performance variation depending onthe types of queries, we run simple queries of type//eA//eB and complex queries of type //eA//eB//eC//

eD//eE (a linear path expression) or //eA[.//eB]//eC(a branch path expression). Since queries of the sametype have similar tendencies, we present the experimen-tal results for the queries in Fig. 10, which show the

dataset simple queries complex queries

XMark //person//interest(1) //site//open_auctions//open_auction//

bidder//increase (2) //open_auctions[.//bidder]//seller

TreeBank //NP//NN //SBAR//S//VP//PP//NP

Fig. 10 XML queries used in the experiments

representative tendency for each data set. Read autho-rizations are required to execute these queries.

To compare the performance of our access controlmethod with that of the SCA method, we reproduce theenvironmental setting used by Cho et al. [10] because,in SCA, instance-level authorizations are granted onlyaccording to security annotations. Cho et al. [10] haveused two different settings for security annotations: (1)the instances of only three element types are permittedto have authorizations (called a sparse DTD) or (2) theinstances of half of element types are permitted to haveauthorizations (called a dense DTD). Using these set-tings, they have issued the query //site[.//open_auction/initial]/people//name against the XMark benchmark data.

We conduct all the experiments on a SUN Ultra 60workstation with 450 MHz CPU and 512 MB of mainmemory. We use the MLGF [32] as the multi-dimen-sional index for storing authorizations, and the pagesize for data and indexes is set to be 4,096 bytes. Weuse the SP-GiST [4] to implement the trie for the CAMmethod. Since this trie is constructed for each user [34],our authorization index is constructed for each user tomake a fair comparison. For the SCA method, we storean authorization as an attribute of an element in thesame way as done by Cho et al. [10]. For the top-downand bottom-up strategies, we store an authorization as atuple in an authorization table in the same way as doneby Rabitti et al. [27], and this table is consulted throughthe B+-tree index to find the authorization for a givenelement.

6.2 Results of the experiments

6.2.1 Effects of the database size and the number ofexplicit authorizations

Figure 11 shows the wall clock time for the query //per-son//interest as the database size and the number ofexplicit authorizations are varied. These results indicatethat Dynamic Predicate Filtering outperforms the exist-ing methods by a large margin. Specifically, comparedwith the existing methods, Dynamic Predicate Filteringimproves the performance by 1.8–12.1 times in Fig. 11a;by 2.0–70.7 times in Fig. 11b; and by 2.1–107.1 times inFig. 11c. This margin increases as the database size gets

Page 13: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 383

10MB database. 100MB database. 1GB database.

10

0.1

0.01

wal

l clo

ck ti

me

(sec

)

1

10 1000.10.01 1 10 1000.10.01 1 10 1000.10.01 1

100

10010

0.1

wal

l clo

ck ti

me

(sec

)

1

1000

10

wal

l clo

ck ti

me

(sec

)

1

TopDown BottomUp CAM Nearest Ancestor Filtering Dynamic Predicate Filtering

# of elements havingexplicit authorizations /total # of elements (%)

# of elements havingexplicit authorizations /total # of elements (%)

# of elements havingexplicit authorizations /total # of elements (%)

(a) (b) (c)

Fig. 11 The wall clock time for the query //person//interest against the XMark benchmark data

larger. The reason is that the number of elements thatare excluded by a DP increases in a larger database sincethe number of elements in a subtree pruned at the samelevel increases. This effect is more marked when thenumber of explicit authorizations is small. The reason isthat the number of elements that are excluded by a DPgets larger since the number of implicit authorizationsthat are overridden is small.

When explicit authorizations are granted on 100%of the elements, Dynamic Predicate Filtering shows aminor degradation (less than 1.6 times) in comparisonwith Nearest Ancestor Filtering since constructing DPsin Dynamic Predicate Filtering incurs overhead ofsearching for the nearest overriding authorization with-out any benefit. In XML databases, a large numberof elements are granted implicit authorizations exploit-ing the data hierarchy. Granting explicit authorizationson every element is rare. Typically high compressionratio of the CAM method shown by Yu et al. [34] sup-ports this argument. Besides, the experiments showthat Dynamic Predicate Filtering outperforms NearestAncestor Filtering when explicit authorizations aregranted on less than 70% of the elements. Figure 11shows that the wall clock time of Nearest AncestorFiltering slightly increases at 100%. It turns out thatthis increase occurred because the depth of the authori-zation index was increased by one.

Nearest Ancestor Filtering shows a performance sim-ilar to CAM with a minor improvement. The reason isthat, as discussed in Sect. 6.1, CAM is a trie structureindexing authorizations after compression and is anal-ogous to our authorization index. The minor improve-ment is due to the efficiency of the index structure andefficiency in finding the nearest ancestor. On the otherhand, Dynamic Predicate Filtering improves the perfor-mance compared with CAM by 1.2–16.9 times. This is

because Dynamic Predicate Filtering reduces query pro-cessing time by filtering unauthorized elements early on,while CAM does not.

6.2.2 Effects of query types

Figure 12 shows the wall clock time for a more com-plex query //site//open_auctions//open_auction//bidder//increase, which has a long path expression, as the data-base size and the number of explicit authorizations arevaried. These results indicate that Dynamic PredicateFiltering outperforms the existing methods for a com-plex query with larger margins than for a simple query.The reason is that, if a subtree is pruned by a DP at ahigher level, the cost for checking query conditions overthat subtree can be much more reduced in a complexquery. Compared with the existing methods, DynamicPredicate Filtering improves the performance by 1.8–25.9 times in Fig. 12a; by 2.0–106.0 times in Fig. 12b; andby 2.0–165.8 times in Fig. 12c.

The wall clock time for a twig query //open_auctions[.//bidder]//seller shows a tendency similar to that ofFig. 11. For a twig query, since Dynamic PredicateFiltering is applied to each linear path, the overallimprovement of Dynamic Predicate Filtering comparedwith other methods is the sum of the improvementobtained from these linear paths. In the XMark data of100 MB, Dynamic Predicate Filtering improves the wallclock time by 1.1–17.4 times over CAM and by 0.9–7.6times over Nearest Ancestor Filtering.

6.2.3 Effects of the depth of XML data

Figure 13 shows the wall clock times for the queries//NP//NN and //SBAR//S//VP//PP//NP against the Tree

Page 14: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

384 J.-G. Lee et al.

10MB database. 100MB database. 1GB database.

10

10 100

0.1

0.10.01

0.01

wal

l clo

ck ti

me

(sec

)1

1 10 1000.10.01 1 10 1000.10.01 1

100

10010

0.1

wal

l clo

ck ti

me

(sec

)

1

1000

10

wal

l clo

ck ti

me

(sec

)

1

TopDown BottomUp CAM Nearest Ancestor Filtering Dynamic Predicate Filtering

# of elements havingexplicit authorizations /total # of elements (%)

# of elements havingexplicit authorizations /total # of elements (%)

# of elements havingexplicit authorizations /total # of elements (%)

(a) (b) (c)

Fig. 12 The wall clock time for the query //site//open_auctions//open_auction//bidder//increase against the XMark benchmark data

Bank data of 86 MB as the number of explicit autho-rizations is varied. These results indicate that DynamicPredicate Filtering outperforms the existing methodseven more for XML data having many levels of recur-sively nested elements like the TreeBank data. Specifi-cally, in Fig. 13b, Dynamic Predicate Filtering improvesthe performance by 5.9–216.3 times over the top-downstrategy; by 9.2–492.9 times over the bottom-up strategy;by 1.1–31.1 times over CAM; and by 0.9–30.3 times overNearest Ancestor Filtering. We note that the differencesbetween Dynamic Predicate Filtering and the existingmethods are much larger in Fig. 13 than in Figs. 11and 12. The reason is that the number of elements thatare excluded by a DP increases in XML data havingmany levels since the number of elements in a subtreepruned at a higher level increases. This result is natu-ral since DPs computed in Lemma 3 inherently workwell for hierarchical structures by virtue of hierarchicalfiltering.

6.2.4 Effects of the inference-blocking strategy

Figure 14 shows the composition of the results in Fig. 12:the top portion of the bar represents the wall clock timeoverhead caused by the inference-blocking strategy. Thewall clock times of the top-down and bottom-up strate-gies are shown to be insensitive to the inference-block-ing strategy since these strategies access the ancestorsincluding the qualifying instances to find the nearestancestor authorization.

The wall clock times of Nearest Ancestor Filteringand Dynamic Predicate Filtering increase only slightly(less than 26.5%). It turns out that these increases areblunted by virtue of the buffering effect since most ofthe explicit authorizations on the qualifying instancesare stored in the same page of the authorization index as

//SBAR//S//VP//PP//NP query. //NP//NN query.

10 1000.10.01 1

100

1000

10

wal

l clo

ck ti

me

(sec

)

1

100

1000

10

wal

l clo

ck ti

me

(sec

)

1

# of elements havingexplicit authorizations /total # of elements (%)

10 1000.10.01 1

# of elements havingexplicit authorizations /total # of elements (%)

TopDown BottomUP CAM Nearest Ancestor Filtering Dynamic Predicate Filtering

(a) (b)

Fig. 13 The wall clock times for simple and complex queriesagainst the TreeBank data (86 MB)

the ones on result instances. This is possible because thestart values of the qualifying instances tend to be closeto those of result instances. These results indicate thatadopting the inference-blocking strategy in our methoddoes not incur much additional overhead.

6.2.5 Comparison between our method and the SCAmethod

Figure 15 shows the wall clock times of Dynamic Predi-cate Filtering and SCA for the experimental setting usedby Cho et al. [10]. Dynamic Predicate Filtering outper-forms SCA by 1.5–14.4 times in Fig. 15a and by 1.3–2.2times in Fig. 15b despite that SCA exploits additionalinformation (security annotations). Furthermore, theperformance improvement increases as the databasesize gets larger. The reason for this advantage is thatSCA reduces only the number of authorization checksperformed during query processing, while DynamicPredicate Filtering also reduces the number of elements

Page 15: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 385

10MB database. 100MB database. 1GB database.

10

10 100

0.1

0.10.01

0.01

wal

l clo

ck ti

me

(sec

)1

1 10 1000.10.01 1 10 1000.10.01 1

100

10010

0.1

wal

l clo

ck ti

me

(sec

)

1

1000

10

wal

l clo

ck ti

me

(sec

)

1

TopDown BottomUp CAM Nearest Ancestor Filtering Dynamic Predicate Filtering

# of elements havingexplicit authorizations /total # of elements (%)

# of elements havingexplicit authorizations /total # of elements (%)

# of elements havingexplicit authorizations /total # of elements (%)

(a) (b) (c)

Fig. 14 The composition of the results in Fig. 12 (the top portion is the wall clock time overhead of the inference-blocking strategy)

accessed during query execution by filtering out unau-thorized elements early on. This is exactly the advan-tage of integrating access control with query processing,which we propose in this paper. The difference betweenDynamic Predicate Filtering and SCA is much larger inthe sparse DTD than in the dense DTD because thenumber of elements excluded in Dynamic Predicate Fil-tering gets larger in the sparse DTD due to a smallernumber of explicit authorizations. (In the sparse DTD,every people element is granted explicit ones. In thedense DTD, every people, person, and open_auction ele-ments are granted explicit ones.)

6.2.6 Effects of the authorization types

In all the experiments stated above, we have repeatedthe test for the same queries with the ratio of positiveand negative authorizations changed from 9:1 to 1:9. Wehave observed the performance of Dynamic PredicateFiltering only slightly improves by 10.8% on the aver-age. The reason for this minor improvement is that alarge proportion of the elements excluded are those onwhich an authorization has been granted for other thanthe authorization type of the query, and thus, negativeauthorizations do not much affect the performance.

7 Conclusions

In this paper, we have proposed an efficient access con-trol mechanism tightly integrated with query process-ing for XML databases. To achieve this integration, wehave presented a novel concept of the dynamic predi-cate, which represents a dynamically constructed con-dition during query execution. The DP is derived from

the dense DTD.the sparse DTD.

10

10

0.1

1

100

database size (MB) database size (MB)

1000 10 100 1000

wal

l clo

ck ti

me

(sec

)10

0.1

1

wal

l clo

ck ti

me

(sec

)

SCA Dynamic Predicate Filtering

(a) (b)

Fig. 15 The comparison between our method and SCA using theexperimental setting by Cho et al.

instance-level authorizations and constrains accessibil-ity of the elements. By inserting the DP into the queryplan, we effectively integrate access control with queryprocessing. Due to this integration, unauthorizedelements are excluded in an early phase of query pro-cessing, thus significantly improving the performance ofquery processing with access control. We have imple-mented this access control mechanism as the algorithmDynamic Predicate Filtering.

The primary factors that make this integration usingthe DP highly effective are instance-level authorizationsand data hierarchies, both inherent in XML databas-es. First, the cost for checking instance-level authoriza-tions is reduced significantly by using the DP, i.e., byrepresenting the authorization information of a set ofelements having an identical authorization with that ofthe element being currently accessed. Second, filteringunauthorized elements by the DP can be propagated

Page 16: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

386 J.-G. Lee et al.

downward along the data hierarchies, which we call hier-archical filtering.

We have also minimized information inference byapplying the inference-blocking strategy to the decisionof the authorized query result. This strategy obviouslyreduces information inference compared with the basicstrategy that checks authorizations only for the resultinstances. We have verified, through experiments, thatenforcement of this strategy in our access control mech-anism does not add much overhead by virtue of thebuffering effect.

To show the efficiency of our access control mecha-nism, we have performed extensive experiments usingsynthetic and real data sets. Experimental results showthat Dynamic Predicate Filtering improves the wall clocktime by up to 31 times over CAM and by up to 14 timesover SCA. These improvements indicate that the advan-tages of excluding beforehand unauthorized elements(and also authorization checks on them) in DynamicPredicate Filtering are superior to those of indexingauthorizations in CAM or those of reducing the num-ber of authorization checks performed during run timein SCA. The results also show that the advantage ofDynamic Predicate Filtering becomes more prominentas (1) the database size, (2) the length of path expres-sions for queries, or (3) the depth of XML data increases.These results demonstrate the effectiveness of accesscontrol integrated with query processing.

Overall, we believe that we have provided an efficientmechanism for checking instance-level authorizations indatabases with hierarchical structures. As a future work,we plan to extend our mechanism so as to handle largerfragments of XPath or more complex query languagessuch as XQuery.

Acknowledgements This work was supported by the Ministry ofScience and Technology (MOST)/Korea Science and EngineeringFoundation (KOSEF) through the Advanced Information Tech-nology Research Center (AITrc).

Appendix A: Proof of Lemma 1

We show that the authorization obtained by Lemma 1satisfies the two conditions in Definition 1. First, by thedefinition of the numbering scheme, authnaa is grantedon the element e or one of its ancestor elements. Letus call that element enaa. Thus, authnaa satisfies the con-dition (1). Second, let us assume that the authorizationminimizing |start(e) − start(auth)| is authnaa and that anexplicit authorization authexp exists on an element in thepath between the element e and the element enaa. Then,by the definition of the numbering scheme, start(enaa) <

start(authexp) < start(e). (We note that start(authnaa) =

start(enaa).) This contradicts the assumption. Therefore,no authorization exists on elements in the path betweenthe element e and the element enaa. Thus, authnaa satis-fies the condition (2).

Appendix B: Proof of Lemma 2

We show that the authorization obtained by Lemma 2satisfies the two conditions in Definition 2. First, by thedefinition of the numbering scheme, authnoa is grantedon a descendant element enoa of the element enaa onwhich authnaa is granted. Thus, authnoa satisfies the con-dition (1). Second, let us assume that the authorizationminimizing |start(e) − start(auth)| is authnoa and that anexplicit authorization authexp exists, when traversed inpreorder, on an element in the path between the ele-ment e and the element enoa. Then, by the definitionof the numbering scheme, start(e) < start(authexp) <

start(enoa). (We note that start(authnoa) = start(enoa).)This contradicts the assumption. Therefore, no authori-zation exists, when traversed in preorder, between theelement e and the element enoa. Thus, authnoa satisfiesthe condition (2).

Appendix C: Proof of Lemma 3

We show that any element whose start value belongs tothe start interval defined in Lemma 3 has the same near-est ancestor authorization as that of the element e. Inthis proof, enaa (or enoa) denotes the element on whichauthnaa (or authnoa) is granted. Before starting the proof,we point out that the start value increases in preor-der [19]. First, we consider the case where authnoa exists.By Definition 2, no explicit authorization exists, whentraversed in preorder, on elements in the path from theelement e to the element right before enoa. Then, thestart values of those elements having no explicit autho-rization are formed as the interval [start(e), start(enoa))

by the definition of the numbering scheme. Thus, anyelement in this start interval has authnaa as the near-est ancestor authorization. Second, we consider the casewhere authnoa does not exist. By Definition 2, no explicitauthorization exists, when traversed in preorder, on ele-ments in the path from the element e through the right-most descendant erd_naa of the element enaa. We notethat start(erd_naa) is immediately smaller than end(enaa).Then, the start values of those elements having no explicitauthorization are formed as the interval [start(e),end(enaa)) by the definition of the numbering scheme.Thus, any element in this start interval has authnaa as thenearest ancestor authorization.

Page 17: The dynamic predicate: integrating access control with ... · The dynamic predicate: integrating access control with query processing in XML databases 373 • We propose an efficient

The dynamic predicate: integrating access control with query processing in XML databases 387

References

1. Aggarwal, G. et al.: Enabling privacy for the paranoids. In:Proceedings of 30th International Conference on Very LargeData Bases, Toronto, Canada, pp. 708–719, Aug./Sept. 2004

2. Agrawal, R. et al.: Hippocratic databases. In: Proceedings of28th International Conference on Very Large Data Bases,Hong Kong, China, pp. 143–154, Aug. 2002

3. Al-Khalifa, S. et al.: Structural joins: a primitive for effi-cient XML query pattern matching. In: Proceedings of 18thInternational Conference on Data Engineering, San Jose,California, pp. 141–152, Feb. 2002

4. Aref, W.G., Ilyas, I.F.: SP-GiST: an extensible database indexfor supporting space partitioning trees. J. Intell. Inform.Syst. 17(2–3), 215–240 (2001)

5. Bertino, E. et al.: Specifying and enforcing access con-trol policies for XML document sources. World Wide WebJ. 3(3), 139–151 (2000)

6. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins:optimal XML pattern matching. In: Proceedings of 2002ACM SIGMOD International Conference on Managementof Data, Madison, Wisconsin, pp. 310–321, June 2002

7. Carminati, B., Ferrari, E.: Management of access control poli-cies for XML document sources. Int. J. Inform. Sec. 1(4), 236–260 (2003)

8. Carminati, B., Ferrari, E.: AC-XML documents: improvingthe performance of a web access control module. In: Proceed-ings of 10th ACM Symposium on Access Control Models andTechnologies, Stockholm, Sweden, pp. 67–76, June 2005

9. Chien, S.-Y. et al.: Efficient structural joins on indexed XMLdocuments. In: Proceedings of 28th International Conferenceon Very Large Data Bases, Hong Kong, China, pp. 263–274,Aug. 2002

10. Cho, S. et al.: Optimizing the secure evaluation of twig que-ries. In: Proceedings of 28th International Conference onVery Large Data Bases, Hong Kong, China, pp. 490–501,Aug. 2002

11. Clark, J., DeRose, S.: XML path language (XPath) Version1.0, W3C Recommendation, Nov. 1999

12. Damiani, E. et al.: A fine-grained access control system forXML documents. ACM Trans. Inform. Syst. Sec. 5(2), 169–202 (2002)

13. Fan, W., Chan, C.Y., Garofalakis, M.N.: Secure XML query-ing with security views. In: Proceedings of 2004 ACM SIG-MOD International Conference on Management of Data,Paris, France, pp. 587–598, June 2004

14. Fundulaki, I., Marx, M.: Specifying access control policies forXML documents with XPath. In: Proceedings of 9th ACMSymposium on Access Control Models and Technologies,Yorktown Heights, New York, pp. 61–69, June 2004

15. Gaede, V., Gunther, O.: Multidimensional access methods.ACM Comput. Surveys 30(2), 170–231 (1998)

16. Graefe, G.: Query evaluation techniques for large databas-es. ACM Comput. Surveys 25(2), 73–170 (1993)

17. Guttman, A.: R-trees: a dynamic index structure for spa-tial searching. In: Proceedings of 1984 ACM SIGMODInternational Conference on Management of Data, Boston,Massachusetts, pp. 47–57, June 1984

18. Hjaltason, G.R., Samet, H.: Distance browsing in spatialdatabases. ACM Trans. Database Syst. 24(2), 265–318 (1999)

19. Li, Q., Moon, B.: Indexing and querying XML data for reg-ular path expressions. In: Proceedings of 27th InternationalConference on Very Large Data Bases, Rome, Italy, pp. 361–370, Sept. 2001

20. Luo, B. et al.: QFilter: fine-grained run-time XML accesscontrol via NFA-based query rewriting. In: Proceedings of2004 ACM CIKM International Conference on Informationand Knowledge Management, Washington, DC, pp. 543–552,Nov. 2004

21. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building alarge annotated corpus of English: The Penn Treebank. Com-put. Linguist. 19(2), 313–330 (1993)

22. Miklau, G., Suciu, D.: Containment and equivalence for afragment of XPath. J. ACM 51(1), 2–45 (2004)

23. Murata, M., Tozawa, A., Kudo, M.: XML access control usingstatic analysis. In: Proceedings of 10th ACM Conference onComputer and Communications Security, Washingtion, DC,pp. 73–84, Oct. 2003

24. Information and Privacy Commissioner of Ontario, Intelli-gent Software Agents: Turning a Privacy Threat into a Pri-vacy Protector, Apr. 1999

25. Information and Privacy Commissioner of Ontario, AnInternet Privacy Primer: Assume Nothing, Aug. 2001

26. Qi, N., Kudo, M.: Access-condition-table-driven access con-trol for XML database. In: Proceedings of 9th EuropeanSymposium on Research in Computer Security, French Rivi-era, France, pp. 17–32, Sept. 2004

27. Rabitti, F. et al.: A model of authorization for next-gener-ation database systems. ACM Trans. Database Syst. 16(1),88–131 (1991)

28. Ramanan, P.: Covering indexes for XML queries: bisim-ulation – Simulation = Negation. In: Proceedings of 29thInternational Conference on Very Large Data Bases, Berlin,Germany, pp. 165–176, Sept. 2003

29. Samet, H.: The quadtree and related hierarchical data struc-tures. ACM Comput. Surveys 16(2), 187–260 (1984)

30. Schmidt, A.R. et al.: XMark: a benchmark for XML datamanagement. In: Proceedings of 28th International Confer-ence on Very Large Data Bases, Hong Kong, China, pp. 974–985, Aug. 2002

31. Seeger, B., Kriegel,H.-P.: The buddy-tree: an efficient androbust access method for spatial data base systems. In: Pro-ceedings of 16th International Conference on Very LargeData Bases, Queensland, Australia, pp. 590–601, Aug. 1990

32. Whang, K.-Y., Krishnamurthy, R.: The multilevel gridfile—a dynamic hierarchical multidimensional file structure.In: Proceedings of International Conference on DatabaseSystems for Advanced Applications, Tokyo, Japan, pp. 449–459, Apr. 1991

33. Wu,Y., Patel, J.M., Jagadish H.V.: Structural join order selec-tion for XML query optimization. In: Proceedings of 19thInternational Conference on Data Engineering, Bangalore,India, pp. 443–454, Mar. 2003

34. Yu, T. et al.: A compressed accessibility map for XML. ACMTrans. Database Syst. 29(2), 363–402 (2004)