KnowIT, semantic informatics knowledge base

knowITMapping out Informatics

systems

Laurent Alquier

Keith McCormick

Ed Jaeger

About

• Laurent Alquier• Software engineer, Project lead• Johnson & Johnson Pharmaceutical Research & Development, L.L.C• [email protected]

Could you answer these questions ?

• Can you give us a list of all of your applications, related servers and stakeholders and send us an update every six months ?

• All Linux servers need to be patched this weekend. Can you send an outage announcement with a list of affected applications by tomorrow ?

• Is this server still in use ? Can we retire it ?

• What is the meaning of DRU ?

(Based on real questions)

Systems knowledge

knowIT in a nutshell

• A collaborative database– Semantic wiki

• Capture knowledge about informatics systems– Information Systems components

• Applications, Servers, Data sources, plugins– Map relationships between components– Capture Business context around them

• Organizations, Companies, Locations– Document known issues, procedures,

processes

Goals

• Answer recurring questions– Subject Matter Experts lists for

Application Support– Application / License rationalization– Outage communications

• Increase knowledge retention– Many ways to contribute

• Facilitate “Transfer In / Transfer Out”– Capture knowledge from experts

before they leave– Facilitate learning for new resources

• Enable self service– Many ways to search and explore

Pragmatic approach

• Bottom up knowledge management in a corporate, R&D environment• Search is not enough

– Complementary to a document library with search index– Capture details about individual components of systems– Rely on queries as much as search

• Change will happen– Plan for future integration and migration from the start– Import content from several sources– Export content to several formats

• “Know your content, respect your users. “ – E.Tufte– Accept incomplete content– Evolve the data model as necessary– Let real data, use cases drive requirements

• Above all, remain flexible

Evolution

• Started as disconnected files• Turned into a relational database

– Rigid design– Lack of collaboration tools

• Solution: Collaborative Database using a Semantic Wiki– Collaborative features and flexibility of wiki– Structure from Semantic annotations

Collaborative database

• Flexible yet structured content management – Collaborative data model – Discussions, comments, community editing

• Knowledge management tools – Redirections, wanted pages– Automated maintenance tasks

• Background jobs to enforce consistency and updates

– Monitoring tools, change tracking• Modular and extensible design

– Templates– Open source components

Semantic Media Wiki

• Based on Media Wiki– Proven platform (Wikipedia)– Redirect, wanted pages, templates, API, bots– Active development, commercial support– No licensing fee (PHP, Mysql)

• Structure from Semantic annotations– Inline annotations– Supports forms and direct annotations– Map complex relationships between objects – Allow both Search and Queries– Multiple input / output formats– Compatible with Semantic Web integration

• Semantic Web in a bottle

http://semantic-mediawiki.org/wiki/Semantic_MediaWiki

Semantic Annotations

• Tags with meaning• Syntax

– Triple: Page -> Property -> Value• [[Has support contact::Help Desk]]

• Data types– Page, URL, Date, String, Text, Number, Geo-location– Custom units for Number

• Browse properties– Summary of all properties for a page

Relationships

• Defined as links to other pages– Enhanced with semantic properties

• Tracking lists of things is not enough– Knowledge comes from

understanding relationships• SMW assisted Ontology design

Wiki ? What wiki ?

• Focus on content, not technology• Occasional users less intimidated when wiki tools are not visible

– But keep wiki tools available to advanced users• Use forms to standardize data capture

– Make semantic annotations invisible using forms and templates– Enforce (some) naming conventions

• Auto-completion• Automated page names

• Be ready to provide help with difficult tasks– Provide guidance and training– Front loading wiki with data users care about

Content Migration

• From relational tables to Categories and Pages– Review data model, drop unnecessary attributes– Create forms, templates, properties in Semantic MediaWiki– One category per page

• Separate ‘semantic categories’ from ‘supporting categories’• Extract old content into tabular form• Review, clean up, correct

– Unique titles (Disambiguation)– Special characters in titles

• Load pages in bulk– using PHP API (bulkinsert.php)

• Consider specialized import forms if content needs detailed review– Example: Support articles

Queries

• Visualize structure of content– Ad-hoc reports– Interactive queries (Exhibit)– Automate system configuration pages– Architectural layers

• Business, Functional, Process, Data, Applications, Physical

– Network diagrams• Concepts

– Saved queries, dynamic categories

Enhanced Search

• Default search replaced by Sphinx Search extension

• Faceted search – Drill down by properties– Search results grouped by Category

• Semantic search– Semantic summary instead of excerpt

• Customized by Category– Annotations used to improve results

• Aliases, keywords• Related terms• Selection of default category

• Feedback option– Ask a question

Input flexibility - Data capture

• Import – Manually using Forms– Remote CSV files, databases, LDAP– FOAF format to retrieve and provide

vocabularies– OWL DL ontologies can be imported

• Explicit statements only – no support for reasoning

• Query remote sources– Linked data import

• SMW+ can enrich page annotations with queries across multiple sources

• Supports OpenCalais, DBPedia, RSS feeds

Output flexibility - Data integration

• Export– HTML, PDF, CSV, XML, Email, Maps (Yahoo, Google, Open Layers), Timeline

(Simile), Google graphs, vCard, iCalendar• Machine readable

– Default RSS feed replaced by #ask query for recent content– RDF view for each page– RDFa, CSV index, FOAF files, Web Services (SMW+)– Ontology and content export

• RDF dumps / SPARQL endpoint available• Follows Linked Data principles

– One page per entity– One HTTP URI for each entity– RDF information available from each page– RDF statements are browsable

Familiar look and feel

• Consistent with other intranet sites, familiar interface– Integration with MS SharePoint look and feel using RILPoint theme– Login using global directory

Make basic tasks explicit

• Search, Explore, Contribute– On main page and on side bar

Consistent navigation for every pages

• ‘Table of Content’ links – Browse content

• Using Semantic Drilldown– Categories

• Using Nice Categories List for recursive tree view– Topic

• #ask query for pages with Topic defined as a property– A-Z index / Glossary

• Using a mix of Table of Content template, #urlget and #ask queries• Single link to add New content

– With list of forms available

Reduce clutter

Advanced tasks moved to the bottom of pages• Maintenance tasks• Upload file• Page tools• RDF link• Browse properties

UI Simplification – Special Pages

Custom made administrative tasks page

UI Simplification – Recent changes

Simplified Recent changes using Dynamic Page List extension

UI Customization – Category:Location

Customization of categories according to page type

• Maps for locations• Timelines for events• A-Z index for people

UI Customization – Category:Events

Status - Usage

• After a year – 2900 pages of content (4600 pages total )– 31 registered users ( 5 active contributors )– Between 15 and 75 updates a day– 130 unique visitors/month– 400 visits / 600 searches a month

• Entering phase of growing interest

Status - Content

• Data imported from old system except for Articles and Persons

• Built an ontology of IT systems components

• 550+ Applications, 90+ Databases and 280+ Servers portfolio

• mostly RED systems at this point

• 145 data sources

• Semi-automated generation of Data landscape

• A Glossary of 950+ acronyms and definitions

• imported from multiple sources within J&J and outside

• About 170 support articles, how-to and FAQs

• Another 400 old articles pending review

• 340+ Organizations

• Including 44 J&J Operating Companies

• Google Maps of J&J PRD sites

Features

• KnowIT currently includes: – An IT systems portfolio management (inventory) – A Configuration management tool for these systems (components and relationships) – A Communication component (calendar / timeline of announcements, outages and training sessions) – A Question / feedback list (similar to WikiAnswers) – A Logging mechanism (to track events, outages) – A Service Account Password expiration management (with notification by RSS and eMail) – Semantic / faceted search results – Dynamic maps of known locations (with built-in form to driving directions) – A Self service help system (knowledge base of solutions) – And an Advanced glossary (terms organized by domains, with synonyms, related terms, etc)

• Future directions– Advanced bulk manipulations– Dynamic visualizations of relationships network– Automated annotations using internal and external sources– Improved Semantic search

Observations from day to day use

• SMW is structured yet flexible– Allows for exceptions, changes as well as standardization

• SMW doesn’t get in the way– New content can be added, edited very quickly

• Remember to monitor response time of page edits, search– Use PHP cache, optimization strategies to keep wiki as fast as possible

• Keep a single structure of ‘semantic categories’– Separate from other categories– Use semantic properties for complex categorizations of pages

• Keep realistic expectations– A long way to go before shared ownership and fully documented systems

Acknowledgements

• We would like to thank current and past contributors for their patience, ideas and support :– Jim Gainor– Brian Wegner– Deborah Yates– David Epstein– John Baum– Lisa Valetta

– Dimitris Agrafiotis– Mario Dolbec– Brian Johnson – Emmanouil Skoufos.

Resources

• Semantic MediaWiki– http://semantic-mediawiki.org

• Referata tips for SMW– http://smw.referata.com/wiki/Special:BrowseData/Tips

• Wiki Patterns– http://www.wikipatterns.com/display/wikipatterns/Wikipatterns

• Sphinx search extension– http://www.mediawiki.org/wiki/Extension:SphinxSearch

• RILPoint – SharePoint theme for MediaWiki– http://www.rilnet.com/en/rilpoint-sharepoint-look-alike-drupal-and-mediawiki-skin

• Gruff – Triple store browser for AlleroGraph (Relationships graph)– http://www.franz.com/agraph/gruff/

• Cytoscape – Network graph– http://www.cytoscape.org/

KnowIT, semantic informatics knowledge base

Technology

Transcript of KnowIT, semantic informatics knowledge base