A Novel methodology for handling Document Level Security in Search Based Applications
-
Upload
lucenerevolution -
Category
Technology
-
view
1.854 -
download
1
description
Transcript of A Novel methodology for handling Document Level Security in Search Based Applications
![Page 1: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/1.jpg)
Rajani Maski - Senior Software Engineer
DOCUMENT LEVEL SECURITY IN SEARCH BASED APPLICATIONS
![Page 2: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/2.jpg)
Introduction to Search Based Applications
Requirement Analysis of Document Level Security
Access Control Lists
Multiple Solutions
Summary
Agenda
![Page 3: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/3.jpg)
Search Based Applications are software application in which Search Engine platform is used as the core infrastructure for information accessing and reporting.
E-commerce web applications or content management systems are the types of search based application.
Search Based Applications
![Page 4: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/4.jpg)
Authentication
• User is authenticated before providing access to the application
Application
• Presents with full fledge User Interface
• Perform user operations such as upload documents, send emails, search, etc.
Unified Data Layer
• Search Server
• Indexes content across the sources
• Retrieves data at very high speed.
Data Storage
• Volume of data sources from different repositories
Overview of Search Based System
Unified Data Layer
Search Based Application Server
Archives Documents
User Authentication System
Emails File
Server
![Page 5: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/5.jpg)
So Far, So Good!
What’s the problem?
![Page 6: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/6.jpg)
Unified Data Layer
Search Based Application
Archives Documents
User Authentication System
Emails
Common Access To Unified data Layer
How is this a threat?
File Servers
![Page 7: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/7.jpg)
User A : - Logs in to application. - Performs a search operation
- With the key words such as ‘Pay Slips’, ‘Personal’ or ‘appraisal’.
Sample results demonstrated for “appraisal”
Consider a Sample Use Case
![Page 8: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/8.jpg)
Un Authorized Results
Search Results
![Page 9: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/9.jpg)
Relevant Search Results : [Correct]
- User A was returned with relevant search results based on his search query; such as exact matches, more like this key words, synonym key words, etc.
Unauthorized Search results: [Wrong]
- Few of the search results retrieved were the documents to which he was not authorized to view.
Threats:
• Exposure to other users’ confidential documents
• Access to Unauthorized information.
Observations
How are we doing with this?
![Page 10: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/10.jpg)
• To develop a search platform where every user has access to only those documents to which he/she is authorized to.
• To ensure that all the confidential data uploaded is not globally searchable unless it is intended to be globally accessible.
Problem Definition
How can we achieve this?
![Page 11: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/11.jpg)
Solution
Maintaining Access Control List mapped to each document object.
Access Control
List?
![Page 12: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/12.jpg)
• Access Controls are Security features that control how users [subject] and documents[object] communicate and interact with one another.
• Subject: An active entity[User] that
requests access to an object[Document].
• Object: A passive entity[Document] that contains information
Access Control List
Document
Object Subject
Interaction
![Page 13: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/13.jpg)
Let’s first understand the data model of search engine.
How are documents stored in search engine?
Document Oriented Approach.
Data Model
Alec_1167 {_id:”1167”,
Name:”Ale C”, Agent:”Miller”
Place:”NY, NJ, CA”, Units:570}
3424 Kiwi reds 340
5612 Reh Mo’s 664
1167 Alec Miller 570
1167 2 NJ
1167 3 CA
1167 1 NY
![Page 14: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/14.jpg)
• User A uploads a document into the system
• Metadata and Text Extraction
• Convert it to a flat structure
• Input it to Search Engine
Indexing and Storing Document Object
Document
Metadata
Extract
Search Engine
Document Saved
![Page 15: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/15.jpg)
• We missed to capture something!
• What did we miss?
– Capturing of User information for each document!
• Who uploaded the document
• To whom did the user share with?
• How do we maintain this information?
– Access control list to each document object.
Document Metadata Extract
Search Engine
Document Saved
![Page 16: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/16.jpg)
• Access Control Lists for each user.
• At the time of search,
– Retrieve search results,
– And perform a check on each document for user’s authorization and
– Finally return the results.
Conventional Solution
Search Engine
Security Filter Each Document
Return Results to User
![Page 17: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/17.jpg)
Multiple Solutions
![Page 18: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/18.jpg)
Solutions are dependent on the Access Control Models we choose.
Two important types of Access Control Models:
1. Non-Discretionary Access Control(Role Based)
2. Discretionary Access Control (DAC)
Access Control Models
![Page 19: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/19.jpg)
Definition:
• Non-Discretionary ACL uses a administered set of rules to determine how Users and Documents interact.
• It is referred to as nondiscretionary because assigning a user to a role is unavoidable
1. Non-Discretionary (Role Based) Sales
Super User
Manager
Sales Documents
Marketing Documents
Engineering Documents
Admin Documents
![Page 20: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/20.jpg)
System that has,
• Roles defined during design time and Static ACL set to each document .
• We choose, “Early Binding with ACL bound to Document Objects”
In such systems,
• Document objects will include a multi-valued Role-id field that will contain list of role-Ids which has access to the document.
Solution For Role Based ACL - Type 1
Documents with ACLs
Index Time
Document 1 role-Ids: [“1”, “2”, “3”]
Document 1 role-Ids: [“1”, “2”, “3”]
Document 2 “role-Ids:” [ “2”, “3”]
![Page 21: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/21.jpg)
Continued…
At the time of search,
• User Search Query should be appended with user’s Role Id.
• Solr’s Filter Query feature and it’s caching techniques gives the most efficient solution for
such ACL Techniques. This approach is called as
‘Early Binding’ approach.
Query Request
Solr J Client
Query Response
User Role-Id
Early Binding
![Page 22: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/22.jpg)
Systems that has,
• Roles which often change; data is normalized by segregating access control information into different tables.
• This approach is called as ‘Early Binding with Externalized ACL’
In such systems:
• Role-Ids are not attached to the document object.
• Instead they are stored into different tables with foreign key relation.
• Use Pseudo Joins at the time of Search
Solution For Role Based ACL - Type 2
Document1 D1
Doc ID Role-Ids
D1 1, 2, 3, N
![Page 23: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/23.jpg)
Definition:
• Discretionary – Document owner has the authority to control access of the document.
• A system that enables the document owner to specify set of Users with access to a set of
documents
2. Discretionary Access Control
Specifies Users/groups who can Access
Owner Object
![Page 24: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/24.jpg)
System that has
• Frequent changes in ACL
• ACL is defined for each user and a document,
• We choose ‘Late Binding Approach with Externalized ACL’
In such systems,
• ACL is a 2D-matrix with users and documents along its rows and columns
Solution for Discretionary ACL - Type 1
Users Doc1 Doc2 Doc N
User A 1 1 1
User B 0 1 1
User M
Encode Values – 0 :No access, 1 : Access N : Number of Users, M – Number of Documents
![Page 25: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/25.jpg)
For implementation, the ACL matrix can be represented as a array of bits.
This compact representation improves search efficiency and memory over head.
Continued…
Users Doc1 Doc2 Doc N
UserA 1 1 1
UserB 0 1 1
111
011
[1]
[2]
![Page 26: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/26.jpg)
Consider,
• Maximum documents in the Search systems is 5 with document ids:{1,2, 3, 4, 5}
• Maximum Users are 2 { Id : 1,2 }
• User 1 has access to document {1, 2, 3}
• User 2 has access to Document {1,2,3,4,5}
• ACL matrix and array representation:
User 1 2 3 4 5
1 1 1 1 0 0
2 1 1 1 1 1
11100
11111
[1]
[2]
1 1 1 1 1
1 1 1 0 0
Example
![Page 27: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/27.jpg)
Solution 1
• Solr has a Post Filter Interface that can be extended to develop a Custom Plugin.
• Interface has a method called ‘collect()’
• Collect() has a list of documents matched to the user’s search query.
– Iterate through the list, get the document-Id from the Field Cache and apply ACL using bit array .
• Code Snippets: https://gist.github.com/rajanim/7197154
Solr Implementation
1 1 1 0 0
![Page 28: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/28.jpg)
Solution 2
• Using BitSet utilities
• Get the bitset of documents matched by the search query from Search Engine
• Get the User ACL bitset instance
• Obtain the intersection of the two bitsets [intersect(bitset other)]
Other Implementation Solution
1 1 1 0 0 1 1 1 0 0
1 1 1 0 0
![Page 29: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/29.jpg)
• Discretionary ACL systems have static ACL
• We choose, “Early Binding with ACL bound to Document
Objects”
In such systems,
• Document objects will include a multi-valued user-id field that contains a list of user-ids with access to the document.
• The user-id field has to be indexed.
Solution for Discretionary ACL - Type 2
![Page 30: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/30.jpg)
• This solution requires the ACL and document data to be de-normalized to flat structure.
Continued…
Index Time Search Time
Query Request With User ID
Solr J Client
Query Response
Parse Document
Add List of Users Who has access
![Page 31: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/31.jpg)
Summary
![Page 32: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/32.jpg)
• Discretionary ACL with late binding solution is a complex model and it requires
extensive verification
• Leverage Solr’s smart caching capability
• Since ACL always adds an additional over head it has to be optimized to provide minimum delay.
Summary
![Page 33: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/33.jpg)
• searchhub.org/2012/02/22/custom-security-filtering-in-solr/
• Secure Search in Enterprise Webs: Tradeoffs in Efficient Implementation for Document Level Security By Peter Bailey, David Hawking, Brett Matson
• All in One Book (Shon Harris, 2005)
• http://www.searchtechnologies.com/enterprise-search-document-level-security.html
• http://alvinalexander.com/java/jwarehouse/lucene/src/test/org/apache/lucene/search/TestFilteredQuery.java.shtml
• https://github.com/Zvents/score_stats_component/blob/master/src/main/java/com/zvents/solr/components/ScoreStatsPostFilter.java
References:
![Page 34: A Novel methodology for handling Document Level Security in Search Based Applications](https://reader033.fdocuments.in/reader033/viewer/2022060108/55502f1cb4c9059f318b4cf7/html5/thumbnails/34.jpg)
Thank You