© 2005 EMC Corporation. All rights reserved. Content Management: The Puzzle, The Challenge, and The...

41
005 EMC Corporation. All rights reserved. Content Management: The Puzzle, The Challenge, and The Opportunity Shu-Shang Sam Wei, Ph.D. Software Architect EMC Documentum Content Management Offerings
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of © 2005 EMC Corporation. All rights reserved. Content Management: The Puzzle, The Challenge, and The...

© 2005 EMC Corporation. All rights reserved.

Content Management: The Puzzle, The Challenge, and The Opportunity

Shu-Shang Sam Wei, Ph.D.

Software Architect

EMC Documentum Content Management Offerings

2 Enterprise Content Management

Google as an example

3 Enterprise Content Management

Yahoo! as another example

Splits:02-Sep-97 [3:2], 03-Aug-98 [2:1], 08-Feb-99 [2:1], 14-Feb-00 [2:1], 12-May-04 [2:1]

4 Enterprise Content Management

Baidu for another Example

5 Enterprise Content Management

What Does it Tell Us

• There is a strong desire/demand to search on the Web

• We are in an Information Explosion Age

Number of emails (SPAM excluded) sent every day in North America tripled to 11.9 billion since 1999 (Wall Street Journal, 8/26/2004)

Google is doing 2 billion searches a month

Yahoo! generates 10 terabyte data a day (The Library of Congress)

eBay hosts 1.4 billion auctions and 16 million active auctions at any moment of time

• Internet has made the search significantly easy/efficient

Scott McNearly (CEO, Sun Microsystems) joked:“Google has become one of the most important tools IT has ever

deployed on the corporate system”

6 Enterprise Content Management

What Does it Tell Us (Cont.)

• Information exists in many different forms (and places): email/IM, video, audio, database, Blog, Web pages etc.

• Unstructured data (content based) is becoming more important than structured data (number based) 70 ~ 90% of corporation data are unstructured

• Unstructured data impose more challenge on management

• Enterprise content management (ECM) not confined in organizing data, involves exploiting business know-how

• to avoid critical failures, • to operate more efficiently and • to become more productive and profitable

7 Enterprise Content Management

The Puzzle of ECM

• Search

• Knowledge Management

• Document Management

• Lifecycle Management

• Web Content Management

• Collaboration

• Portals

• Digital Asset Management

• Email Management

• ….

• The list is still growing

8 Enterprise Content Management

Search

• More than half of professional people spend more than 2 hours/day searching for info for their jobs

• Software created in late 1970s and early 1980s could search millions of documents, primarily for education, medical research, and large legal cases

• In late 1980s, search extends to Web. Internet becomes a popular place for sharing info

• Search tool can be confusing if it returns tons of pages for you to choose

• Basic search features: full text search, Boolean expression, wildcarding, proximity, parametric search, thesauri, synonyms, relevant order

9 Enterprise Content Management

Search (cont)

• Advance searches Adjustable ranking Hyperlink ranking (Google’s engine) Hit highlighting Auto summary User behavior learning Natural language queries Dynamic clustering of results Concept mining and extraction Federated search Auto classification based on taxonomies Taxonomy navigation

10 Enterprise Content Management

Knowledge Management

• Poorly managed knowledge costs Fortune 500 about $12 billion/year (IDC, Business 2.0, February 2002)

• Knowledge is applying information to resolve a problem

Information must be organized and filtered

Layer of intelligence gathering info about info

Knowledge is context aware

Authoritative, hierarchical taxonomies and thesauri greatly improve info access for decision making and innovation

• Knowledge management is about the application of knowledge

• An effective KM system should reduce the impact on established routines and extend existing enterprise applications

11 Enterprise Content Management

Knowledge Management (cont)

• Knowledge management system provides a community of practice for people to share their knowledge

• The cycle of knowledge management

Find/Create

Share

OrganizeReuse

12 Enterprise Content Management

Document Management

• Emerged in 1980s to help airline, pharmaceutical and financial industries handle paper-based processes that drive their business

• To comply with stringent government regulations (FDA in pharmaceutical, FAA in Airlines)

• Document capturing/imaging, dissemination, and annotation

• Version control

• Compound document

• Document renditions

• COLD (Computer Output to Laser Disk) and Archiving

• Security and permissions control

• Audit trails

• Library services

13 Enterprise Content Management

Lifecycle Management

• Information carry different meaning to content over time

• Typical cycle

Creation

Processing

Retention and archiving

Disposition

• Active processing

Redaction, review and markup

Electronic (password based) and digital (PKI + encryption) signing

Classification and taxonomies

Compound document assembly

Publication

14 Enterprise Content Management

Lifecycle Management (cont)

• Retention, archiving and disposition

Storage management• Migrating inactive contents to low cost system

Archiving• Indexed and accessible manner, or

• Secured and easily restored upon request

Record management• Based on U.S. DOD 5015.2 certification standard

• E-mail included

• Manage retention policies

• Create “holds” on content

• Keep audit trail on all actions

15 Enterprise Content Management

Web Content Management

• Internet becomes an important place for business

• Information posted on web needs to be current up to minute

• Automation is essential due to the complexity

• Web content: static or dynamic, structured or unstructured

• Web content editing

• Use templates and style sheets to separate content from layout

• Support distributed team-based collaboration

• Internationalization support

16 Enterprise Content Management

Collaboration

• Link processes and people to create a combined work environment where ideas and knowledge are shared to accomplish a project

• Tools used E-mail/IM Application sharing Web conferencing (meeting, whiteboard, poll, chat) Intranets/extranets Groupware (eRoom) Repositories

• Future tool will seamlessly connect content, people and processes between back/front office

17 Enterprise Content Management

Portals

• Provide Web browser a single point access to corporate info

• Portlets (widgets, gadgets) are connector programs to present info from another application or information source

• Allow personalization

• Support customizable search, navigation and access to contents

• Hosting services

ASP rent the software and charged by use

Backup and maintenance done by ASP

18 Enterprise Content Management

Digital Asset Management

• Rich media is defined as images, audio, video and other visually oriented unstructured content (like animation and presentations)

• Managing rich media becomes crucial due to broadband support and technology enhancement

• It’s a challenge moving large digital media files

• Need to consider the rights and licensing permissions

• Meta-data is extensively used for managing the content

• Online education is a good example

19 Enterprise Content Management

Email Management

• Email has become a pervasive communication tool in corporate

• An employee receives around 70 emails a day in average

• Messaging technology includes fax, voice, IM and virtual meeting services

• Messaging system is the largest content repository

• It can store up to terabytes of data which is a challenge to manage

Support audit trail

Integrated with Records Management

Provide legal compliance

20 Enterprise Content Management

Business Process Management

• A shorter business process cycle can reduce operational cost, increase profits and meet customer demands

• BPM describes how people interact with technology added to automate processes, information and each other to get jobs done

• BPM enables organizations to leverage and extend their existing technologies to support the processes driving the success of business

• Workflow is the combination of tasks that define a process

• Web-based open standards (XML, SOAP, or WSDL into process management) allow new standard of application integration and sharing real-time info that drives the daily operations

• Organizations can use BMP to build processes that adapt to new market conditions

• BPM allows processes to be modeled, refined and modified as needed

21 Enterprise Content Management

How They Work Together

People to People People to Information

Str

uct

ure

dU

nst

ruct

ure

d

Workflow

BPM

Projects

ImagingDAM

Document

Management

Archive

Web Content Management

Records Management

Portals

Classifications Knowledge

ManagementWeb

ConferencingGroupware

IM

EmailSearch

22 Enterprise Content Management

Collaboration and Content

• Link processes and individuals across the enterprise

• Create a work environment where teams can share and circulate ideas, experience and knowledge

• All the information created as a by-product of collaborative work are securely captured, managed, and transformed into invaluable corporate knowledge

• These knowledge assets are preserved in a repository as contents for shared and reused through an organization

• Collaboration and content are interconnected by process

23 Enterprise Content Management

The Role of Collaboration

People to People People to Information

Str

uct

ure

dU

nst

ruct

ure

d

Collaboration

ImagingDAM

Document

Management

Archive

Web Content Management

Records Management

Portals

Classifications Knowledge

Management

Search

24 Enterprise Content Management

Collaboration, Content and Process

People to People People to Information

Str

uct

ure

dU

nst

ruct

ure

d

Collaboration ContentProcess

25 Enterprise Content Management

ECM Services Architecture

Users

Solutions

Service-Oriented

Architecture

Repositories ECM ERP Email Storage Device

Web Content

Exec

Sales Research Production Admin Services

ClientServer

Collaboration Content

ERP Email Mobile Desktop Portal Intranet

Embedded Dedicated Web

26 Enterprise Content Management

A Loan Management Example

27 Enterprise Content Management

The Challenge

• Additional Enterprise Requirements Close to constant respond time regardless of info amount

• Ingestion rate 25M files per day• Classification with content analysis 0.25M files per day• Classification without content analysis 2.5M files per day 

System requires being available 99.999% of the time• Less than 5.256 minutes down time in a regular year• Automatic crash/disaster recovery

Real-time info even for decision support system Allow easy customization Easy administration Provide a unified client interface

28 Enterprise Content Management

Response from Software Vendors

• Database and Content Management Companies Data Partition Real Application Clusters (Oracle) Cache Fusion (Oracle) Grid Computing (Oracle) Pluggable Components Self-tuning/healing Data warehouse

• Traditional offline database doesn’t work well• Materialized views, In-memory database, Bitmap Indexes,

Bitmap Join Indexes, clustering, multi-table inserts

Online Backup and Recovery Distributed databases and (hot) replication

29 Enterprise Content Management

Response from Software Vendors (cont)

• Fulltext Companies

Collections Partition

Better indexing mechanism for meta-data and content

Better taxonomy support

• Language Support

Object-Oriented Programming (C++, Java, C#)

Agile/Aspect Programming

Dynamic Class Loaders

Service Oriented Architecture

30 Enterprise Content Management

Response from Hardware Vendors

• AMD, Intel and Apple

Dual processor

64-bit PC

Dual-core (Athlon 64 x 2, Pentium 4D, Power PC G5)

Quad-core (Opteron 2006, Power Mac G5 Quad)

• Sun offers 8-core chip, UltraSPARC T1, end of 2005

Each core runs up to four instruction threads

Address energy consumption issue by using only 70 watts

Cheaper and faster than IBM mainframe

31 Enterprise Content Management

The Opportunity

0

200

400

600800

1,000

1,2001,400

1,600

1,800

2,000

In Thousands

2002 2003 2004 2005 2006 2007 2008 2009

BY 2009, worldwide new ECM software license revenue will reach $2.0B up from $1.2B in 2004 with a 10.6% CAGR

32 Enterprise Content Management

Big Players Attracted to the Market

• Oracle

10g advertising completely aimed at EMC/Documentum

Large developer community established• Could turn into RSI strategy

Focusing on search

33 Enterprise Content Management

Yet Another Big Player

• Microsoft

Strategically, still thinking about mindshare for applications

• Office 12 aimed at EMC, but will lack infrastructure support services (ala CSS)

• Integrated interface and server offerings will mean increased ubiquity of deployment (land grab)

Still missing the ILM aspects, however

Microsoft Stakes Out the Middle

By Carolyn A. April, VARBusinessTue. Sep. 27, 2005

From the October 03, 2005 VARBusiness

…Microsoft has been methodically crafting its answer to midmarket IT challenges. Its approach? To create single products that combine the company's ERP and CRM applications, Microsoft Office – as a front-end interface and interface and server offerings into integrated, out-of-the-box solutions.

34 Enterprise Content Management

Competition from Open Source

• Somewhere, someone is developing an open source CMS

• Analysts telling VC’s, customers to:

“… demand that even proprietary vendors have strategies to compete with open source”

• Documentum should have a field response to open source

Options: • prepare standard response for sales reps

• Acquire a standalone CMS system and open it up; sell service / support

• Migrate parts of Content Server to open source

35 Enterprise Content Management

Where Is EMC Positioned?

• Acquired Documentum in 2003

The leader in ECM

• Q3 revenue was $2.37 billion ($1 billion in software)

Up 17% from a year ago

9th consecutive quarter of double-digit growth

12th quarter in a row met or exceeded own targets

Net income was up • 93% on a year-to-year basis including a tax-related benefit

• 45% without including that benefit.

The best performance among any IT company in the world.

36 Enterprise Content Management

Gartner 2005 Report on ECM

American Cherokee Strip Land Run, September 16, 1893

37 Enterprise Content Management

The Trend of Computing

Users/Clients

Storage Devices

Networks

Servers

Databases

38 Enterprise Content Management

The Trend on Storage Device

• Storage Area Network (SAN)

High-speed special-purpose network

Interconnects different kinds of data storage devices

Associated data servers on behalf of a larger network of users.

Support • Disk mirroring, backup and restore, archival and retrieval of

archived data

• Data migration from one storage device to another

• Sharing of data among different servers in a network.

39 Enterprise Content Management

The Trend on Storage Device (cont)

• Network Attached Storage (NAS)

Hard disk storage that is set up with its own network address

Not attached to the department computer that is serving applications to a network's workstation users.

By removing storage access and its management from the department server, both application programming and files can be served faster because they are not competing for the same processor resources.

40 Enterprise Content Management

Researches on (NAS and SAN)

• Active Storage Provide a mechanism for service migration

• focus on limited application such as image processing, data mining and other database related tasks

Exploit the processing power in storage device• Acharya etc. proposed a stream-based programming model (1998)• Xiaonan etc. proposed a Multi-View Storage System (MVSS) with

a flexible interface (2001 ~ 2003)• Evan etc. proposed a parallel file systems (2005)• Sivathanu etc. introduced an RPC-based framework (2002)• Amiri etc. dynamically partitions application and change function

placement within a cluster due to the load characteristics (2000)

• Object-based Storage Object-based Storage Device (OSD) T10 protocol Make use of an intelligent object interface

41 Enterprise Content Management

Conclusion

• Lots of opportunities are still there for academy and industry

Better Algorithms• Performance

• Scalability

• Reliability

• Automatic Failover

Better Programming Models

Better Problem Modeling Mechanism

Parallelism Needs Finer Granularity

• Changes are a must for survival and success

• Big players have a better chance to win