Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

29
Data: Application Data: Application requirements, data flow, requirements, data flow, and person registry and person registry Tom Barton University of Chicago

Transcript of Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

Page 1: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

Data: Application Data: Application requirements, data flow, and requirements, data flow, and

person registryperson registry

Tom Barton

University of Chicago

Page 2: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Copyright Tom Barton 2004. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.

Page 3: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

OutlineOutline

Three stages of managing identity information1. Feeding the person registry - integrating identity

from many authoritative sources

2. Processes & business logic at the person registry

3. Feeding consumers of identity information

Some examples sprinkled in Selected policy & process issues (time

permitting)

Page 4: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Core middleware for an Core middleware for an integrated architectureintegrated architecture

Page 5: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Potential sources of identity infoPotential sources of identity info

“Big” administrative systems: student systems, payroll/HR systems, academic records systems, financials, telecom mgmt system, alumni systems, library systems, …

“Small” sources: affiliated organizations with fairly simple administrative operations (excel?)

Collateral operational systems: application-specific directories/databases, NOS directories, campus card systems, other metadirectory/ID Mgmt operations

People’s heads: “ad hoc” affiliations, self, proxies

Page 6: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

UofC sources:UofC sources:nownow

Student info & campus card system by live RDBMS views

Payroll & faculty by periodic batches Dozen or so “small feeds” by aperiodic upload Self Trusted Agents to make temporary and “pre-

feed” accounts 370 or so departmental directory reviewers Network security group

Page 7: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

UofC sources:UofC sources:planning or earnest discussionplanning or earnest discussion

Feed from UC Hospitals Alumni system Select distributed IT support staff (mail

& password resets) Potentially anyone to manage ad hoc

groups

Page 8: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Feed mechanicsFeed mechanics

Source system selection criteria– Express the set of affiliation types or constituencies

authoritatively represented in the source– Affiliation indicator attributes

Format & transmission technology– Complete selections vs. differentials vs. transactions– Automated vs. semi-manual (eg, maildrop) vs. manual– scp flatfiles, live views, varieties of EAI (what are you

using?)– Actual metadirectory products (what are you using?)– Ad hoc record structure, XML (what are you doing?)

Page 9: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Identity MatchingIdentity Matching Matching strategies

– Match personal IDs for each source record– Per-source shared identifier with prior matching– Broadly used institutional identifier with prior matching

The query “is this person new” is resolved somewhere, somehow. – Inaccurate answers spoil 1–1 relationship between

registry objects and real world subjects– It’s worthwhile to think on how to improve it!

Insert “rational” ID Mgmt spiel here …

Page 10: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Identity matching at UofC:Identity matching at UofC:nownow

SSN StudentID (after prior match by SSN) “CorpID” (mangling of substrings of

lastname, SSN) Several options for identifying “self” as

authoritative source

Page 11: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Identity matching at UofC:Identity matching at UofC:upcoming (dose of rationality upcoming (dose of rationality ))

UCID (SSN replacement) assigned as unique key in payroll & student systems at record creation time

Person registry is authoritative source of UCID “Is this person new” is answered when a new

record is to be created in payroll or student systems

Tightly-coupled and loosely-coupled designs are being considered

UC Hospitals feed might also use a similar design

Page 12: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

CanonicalizationCanonicalization

Provide simpler, consistent representation of certain data– Name– Phone number(s)– Address(es)– Department names– Names of “major” affiliations

Transformation rules and business logic – Which source trumps name– Phone & address mappings– Rules to determine expressed affiliations

Page 13: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Fat or thin?Fat or thin? Fat = contains selected data from sources Thin = contains only links to sources Issues with thin:

– Source system availability– Source system security (apps need creds)– App complexity (feed mechanics, identity matching,

canonicalization rules)– Policy complexity (authorize N apps to access M sources)

Issues with fat:– Data freshness– Downstream from canonicalization (usually a pro, but can

be a con) Most campuses are fat!

Page 14: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Functional requirements for a Functional requirements for a registry entryregistry entry

Private primary key– Never reassigned, never revoked– Not used for any other purpose– GUIDs are preferable to uniqueness within a database

Publicly visible key– Available for sources or consumers to use to refer to the

person (better than, say, a username)– Probably numeric string <= 9 digits to ensure that it fits in

most predefined fields– Reduces exposure in case of disaster with primary key

Crosswalk source and consumer specific identifiers

Page 15: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Functional requirements for a Functional requirements for a registry entryregistry entry

Personal information – answer the “is this person new” query with sufficient

accuracy– Support account claiming, initialization, or re-initialization

Storage for whatever’s authoritative in the person registry– Egs: support for provisioning, email, username(s)

Information obtained from source systems that is valuable to authorization or entitlement algorithms and policies

The entry and its principal identifiers and personal info (at least) are never deleted from the registry (except…)

Page 16: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Registry record structure at UofCRegistry record structure at UofC

RDBMS (Sybase) with tables for:– Each major source system– One in which to collect all “small feeds”– Individuals, one row per person– Tracking usernames– Supporting service baskets and (de-)provisioning– Supporting the security model for registry operations

DB-local primary key (not a GUID), no PVID Records for “temporary” affiliations are removed

Page 17: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Logging & reporting requirementsLogging & reporting requirements

Audit– Who had which identifiers when– State changes (when using a stateful provisioning

model)– Activity, to a degree

Diagnostic views/reports for selected helpdesk and operational staff

Refer requests for reports outside of the scope of IT operational needs to the data warehouse group!

Page 18: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Provisioning strategyProvisioning strategy

Provisioning = maintenance of electronic ephemera required to facilitate users’ access to services

Format & transmission technology– Incremental vs. differential vs. full

consumer rebuilds– Periodic vs. asynchronous updates– Per-consumer or standard record formats– Transmission techniques (what do you

do?)

Page 19: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Provisioning strategyProvisioning strategy

Service baskets– Business logic that determines which categories

of people are entitled to participate in which services, with which service levels

– One aspect of a more inclusive access control architecture

– Egs: shell accounts & quotas, mailboxes, email forwarding, dialup profiles, vpn, wireless, computer registration, calendar, …

– Issue of excessive granularization

Page 20: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Not shown: transitions to prospective state from

grace, limbo, slide, IDonly.

Stateful provisioningStateful provisioning

Page 21: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Independent variables for state Independent variables for state transitionstransitions

state substate date the present state was reached date by which the present state might end

(expiration date) major affiliation (faculty, staff, enrolled student,

accepted student, registered student, alum, …) list of the identifiers of resources being managed

for this account

Page 22: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Fault avoidance & recoveryFault avoidance & recovery

Bad source data arrives – what happens? Flux high water marks

– Hold update when # changes exceeds threshold– Possible in source side, more often seen in consumer

provisioning techniques “Semantical filters”

– E.g. can absence from the HR feed mean anything other than they’re gone?

– Construct source filters based on knowledge of business practices that relate to selection criteria on the source system.

Page 23: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Fault avoidance & recoveryFault avoidance & recovery

Person registry change log– Enables rollback & replay of consumer

updates– Good diagnostic info– Supports a “hit me with the new ones”

incremental provisioning strategy Stateful provisioning model can be

constructed to ensure continuity of service & buy time to fix effects of bad source data

Page 24: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Expression of rulesExpression of rules

Hard coded or abstracted rule syntax? Rules for

– Affiliation– State transitions– Inclusion in service baskets– Memberships in selected groups (“minor” affiliations,

privilege classes) Stanford, Memphis examples

– Rules expressed in terms of registry object methods– External configuration file eval’d by the code that

manages the registry

Page 25: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Common consumersCommon consumers

Minimum set of consumers & consumer technologies needed to meet application requirements!– Authentication, attributes, groups, coordinated identity

management Types

– Generic LDAP (maybe >1 replication networks)– Active Directory (maybe >1 consuming domain)– Kerberos– eDirectory, NIS, Ph, RDBMS (show hands?, others?)– Applications as direct consumers– Affiliated identity management operations

Page 26: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

UofC consumersUofC consumers

Consumers– openLDAP (1 replication network), Kerberos, Active

Directory, NIS, Ph uid is RDN uid namespace issues: regular, temporary, hospital

people– Above with periodic diffs, high water hold, async self

& management updates– Peer ID Mgmt operations (periodic full)

Service baskets & statefulness being developed– Manual quarterly account closures suits UofC culture– Automated stateful approach to loss of services per-

basket

Page 27: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Selected policy & process issuesSelected policy & process issues

How will the University operate its identity management infrastructure?– What balance between centralized and distributed

operation? Registry – singular, centralized function Consumers – high degree of distribution possible Registration Authorities – small number??

– Who may have which role with what authority & obligations?– Leverages & extends existing data administration policies &

processes, or begs if those are insufficient– Highly cross-functional activity demanding organizational

flexibility

Page 28: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Selected policy & process issuesSelected policy & process issues

What entitlements should attend each type of affiliation?– “Major” affiliations: student, faculty, alum, …

Possibly former or recent student, faculty, …?

– “Minor” affiliations: <role> in course 123, <role> in department X, <role> in degree program Y, occupant of building Z, …

– What processes should determine entitlements for each affiliation?

How should affiliations be structured?

Page 29: Data: Application requirements, data flow, and person registry Tom Barton University of Chicago.

CAMP Directory Workshop Feb 3-6, 2004

Selected policy & process issuesSelected policy & process issues

Who should be issued a credential? What assurance level should authentication for each constituency achieve? What constraints may pertain to each?– Applicants (student, faculty, staff)– Admitted students, accepted faculty or staff– Alums– Parents– Library patrons– Guests: visiting academics, conference attendees, hotel

guests, arbitrary “friends”, …