Post on 11-Nov-2014
description
HDAP:
A Breakthrough in Directory Technology Bringing Together LDAP, Context, and Big Data
• What Is HDAP?
• Why HDAP?• Why even LDAP?
• Evaluating the models for structured data
• Hierarchical model and LDAP
• The requirements/ drivers for more scalability• Using Identity and Context Virtualization to build a Federated Identity Service (FID)
• Why FID is essential
• Powering a new use case: Contextual Search
• How HDAP works/ Performance.
What We’ll Cover Today
What is HDAP?
• This highly-available version of LDAP offers better performance and
increased scalability.
• Now, you may be thinking:
• LDAP is already very fast and scalable.
• And who needs LDAP anyway? Shouldn’t we do as Ian Glazer says, and
“kill IdM in order to save it”?
• But HDAP goes beyond LDAP, delivering much more and doing it all
much faster.
A Next-Gen LDAP Directory Driven by
Hadoop and Search Technology
7/15/2013 4
Why HDAP?W
Why HDAP?
• Identity remains essential to IT because people are often the center
of activities.
• While there are multiple use cases, one of the key functions of
identity is to act as an integration point.
• As such, identity management is at the center of application
integration.
• We need a way to store identities and their attributes, but is LDAP
still relevant?
• Do we really need a hierarchical system, when the world is moving
toward these models?
• Path
• Graph
• Directed Graph
• Relational
To Bring New Life to the Heart of IT:
People and What They Do
Roadmap:
The Role of Identity and Context Virtualization
in the Technology Food Chain
Company Confidential
Are the Hierarchies of LDAP Still
Necessary?
• The Protocol
• The Schema
• The Storage: Hierarchy
• Searching and Navigation: Traversing the Tree
• Searching by Attributes
• Navigation: One level or sub-tree. There are not many ways to navigate
a tree:• First, you enumerate the children.
• Then you reiterate for each child node.
• So you either believe that a hierarchical system is sufficient, or you don’t.
• The storage
The World of Data
Structured
(SQL)Unstructured
(Search)
Relational
Structured Data: The Three Models and
Their Respective Installed Bases
Network/Graph
Graph
Database
Hierarchical
Database
SQL
Database
• These three models are similar in terms of what you can represent
with them. But they are optimized for different functions.
• Relational (SQL) is the most ubiquitous for good reasons:
• The most complete model and extremely flexible
• ACID properties make it great for capturing and updating information,
and it’s optimized for non-redundant write
• But it’s also slow to navigate and perform ad-hoc query and search
• Graphs and hierarchies belong to the same family; after all, trees
are “DAG” or “directed acrylic graphs:
• Slow for write and update (NO ACID properties in general)
• Fast in navigation and ad hoc query and search
The Three Models
Object/Entity, Attribute, Value/Keyword
Attribute 1 Attribute 3Attribute 2
Keyword/Value Keyword/Value Keyword/Value
Attribute 4
Keyword/Value Keyword/Value Keyword/Value
Object, Relationship, Data Model
Object
Relationship
Network Data Model
Hierarchical Data Model
1
2
3
1
2
3
Relational Data Model (ERM, ORM, & UML)
Tables/Entities/Object & Relations
From Graph to Functions to E/R
From E/R to Semantic Model
Verb
Verb
Verb
Subject Object
How The Models Stack Up
Relational
Graph/Hierarchy
FasterSlower
Slower
Faster
Write
Update
Query
Search
Navigation/Traversal
SQL is the Workhorse for Modern
Data Management
Data Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big DataSQL
IntegrationUnstructured Data
LDAP is Key to Identity Management
Identity Management
(ETL)
Sync engine
Provisioning
MDM
Metadirectory
Analytics/SIEMSearch
Big Data
(along with
Web Services
and SQL)
Integration
LDAP
Virtualization
Why Should Identity Management be
Separate from the Rest of the Chain?
Identity Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big Data (SIEM)
Directory
Web Services
SQL
Integration
Identity and Context Virtualization Process
Foundation for an Identity Service:
Building a Global Virtual Identifier
and Global Virtual Registry
Solution:
Building a Global List with No Duplicates
Link Identity to Context, Regrouping Objects into
Sentences and Sentences into Contexts
Solution: Gather Attributes and Join Them
to Build a Virtualized Global Profile
• A system made of two parts
• Integration layer based on virtualization
• Storage layer (Persistent Cache)• LDAP (up to R1 V 6.1)
• HDAP (based on Hadoop/Lucene/Solr, V 7.0)
Integration and Cache/Storage Layer
Why We Need a Federated Identity
That’s Based on Virtualization and
Stored in HDAP Directories
The World of Access Keeps Expanding
App sourcing and hosting
User
populationsApp access
channels
SasS apps
Apps in public clouds
Partner apps
Apps in private clouds
On-premise enterprise apps
Enterprise computers
Enterprise-issued devices
Public computers
Personal devices
Employees
Contractors
Customers
Partners
Members
The Challenges of implementing an Enterprise IdP:
How to Handle Different Internal Security Domains?
Federation
Cloud Apps
IdP
Authentication and SSO
Enterprise Identity
Data Sources? ??
Imp
lem
en
tation
A Federated Identity Hub Manages Authentication
and Attributes to Support the IdP
ADForest/Domain A
ADForest/Domain B Databases
Internal
Enterprise
Apps
Directories
Federation
Cloud Apps
Identity
Sources
IdP
Federated Identity Service and Provisioning
Legacy Applications(and respective stores)
AD Sun LDAP
Cloud Apps
LDAP/
SQL/
SPML
FIDas reference store
SPML
SCIM
Internal
SystemsExternal
Systems
Virtual View Based on Org Chart
Top Manager
Full
Management
Hierarchy
Virtual View Based on Location
CountryState
City
Virtual View Based on Role, Location,
and Territory
RoleLocation
Territory
New Use Case: Contextual Search
Company Confidential
Webster’s Definition of “Context”
Latin Contextus: a joining together, origin pp of contexere “to weave
together.”
1.The parts of a sentence, paragraph, discourse immediately next
to or surrounding a specified word or passage and determining
its exact meaning [to quote a remark out of context] (Language
Representation)
2.The whole situation, background, or environment relevant to a
particular event, personality, creation, etc…(Perception)
Company Confidential
Trees as a Representation of Sentences
Company Confidential
Trees as a Way to Represent Sentences
and Context
Searching for HDAP on Google
Diving into one sentence from the
contextual search result
Navigating the different sentences returned in the
context search:
Account the Great Outdoors purchased Order 21
Navigating sentences returned in the search:
SalesRep Nancy Davolio has account The Great
Outdoors
HDAP:
RadiantOne High-Availability LDAP Based on Lucene/ZooKeeper
(Sub-components of Hadoop)
• An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the directory tree
• A set of indexes to support attribute search (one per attribute).
• The core technology over the last 10 years was to implement the tree as
a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.
Current Implementation of LDAP Servers
is Based on B-Tree Indexation
Entries
B Tree
From Lucene to Hadoop to ZooKeeper
• Hadoop is an offshoot of the Lucene/Nutch project, aimed at
creating an open source search engine.
• Lucene is the search and index part of the search engine.
• Hadoop is the distributed storage (HDFS) and compute
(Map/Reduce batch-oriented) engine, offering very sizable
throughput on a large cluster of commoditized servers.
• There are many components and sub-projects that came out of the
Hadoop project.
• ZooKeeper is a low-level component for managing configuration and
replication for a large number of nodes in a Hadoop cluster.
Millions of
Entries
Millions of
Users
Node management
LDAP Front-End
Components(BER encoding etc…...)
Distributed
Configuration ManagerAdd Node, Define new
leader, SWAP in and
SWAP out dynamically.
Scale OutAdd more VDS for faster
queries and more
documents
Replication
(Leader/Followers)Add more replicas
(followers) for better
throughput (queries/sec)
and fault toleranceHard commit
(Flushed to
disk)
configures
Manage
Configuration
and State
Per Node
We are getting
60000 LDAP q/sec
before VDS,
30000q/sec after
VDS
LDAP Front End
functions)
One Core per JVM
Java Web App
VDS CoreLDAP Processing
add/update/del
LDAP
Query Processing
and Caching
Schema
etc….xml
<fields>
<types>
VDS Config
Distributed VDS + Lucene Index on each node
Soft commit
(in memory)
Near Real-Time
Replica n
Follower
replica1
cluster of commodity
servers
Zookeeper
For VDS
LDAP and Other
Protocols: Front-End
XML/JSON/HTTP
Indexing Queries
Leader Follower
• HDAP (VDS + Lucene)/10M entries
• 1 node: 30k/sec
2 nodes: 65k/sec
3 nodes: 95k/sec
4 nodes: 130k/sec
5 nodes: 149k/sec
• Google daily average load: 3 million q/minute or 50,000 q/sec
Initial Performance Tests (LDAP Search)
0
20000
40000
60000
80000
100000
120000
140000
160000
1 2 3 4 5
Series1
Series2
The Architecture of the
RadiantOne Federated Identity Service:
• Acting as an abstraction layer between applications and the underlying identity
silos, virtualization isolates applications from the complexity of backends.
Aggre
gation
Co
rre
latio
n
Inte
gra
tion
Virtualization by model
Population
C
Population
B
Population
A
Groups Roles
LDAP
SQL
Web
Services
/SOA
App A
App B
App C
App D
App E
App F
Contexts
Se
rvic
es
REST
• An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the directory tree
• A set of indexes to support attribute search (one per attribute).
• The core technology over the last 10 years was to implement the tree as
a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.
Current Implementation of LDAP Servers
is Based on B-Tree Indexation
Entries
B Tree
• Everything is automatically indexed in HDAP so you can search the
directory the same way you search Google…
• An inverted tree is not necessarily balanced; you could have some
paths that are very shallow, while some are very deep.
HDAP Uses a Key/Value System Based on
Search Technology: Inverted Tree
Inverted Tree