The Information Viewpointdonatas/PSArchitekturaProjektavimas/slides...The Information Viewpoint...

The Information Viewpoint

View Relationships

The Information Viewpoint 2

The Information Viewpoint Definition: Describes the way that the system stores,

manipulates, manages, and distributes information Architect should do data modeling only at an architecturally

significant level of detail. You need to focus on those aspects of the data model where

getting it wrong would affect the system as a whole rather than just a part of it.

Your task is to develop a summary view of: static information structure and dynamic information flow, with the objective of answering the architecturally significant questions around: consistency, ownership, latency, relationships and identifiers, and so forth


Concerns1. Information Structure and Content2. Information Purpose and Usage3. Information Consistency4. Information Storage Models5. Information Ownership6. Enterprise-Owned Information7. Identifiers and Mappings8. Volatility of Information Semantics9. Information Flow10. Information Quality11. Timeliness, Latency, and Age12. Archiving and Information Retention


1. Information Structure and Content: Static Models Static information structure models analyze the static

structure of the information: the important data elements and the relationships among them.

Entity-relationship modeling is an established technique of data analysis, though Chen’s notation is no longer popular

Class models perform a role similar to that of entity-relationship models but for the object-oriented world


en.wikipedia.org/wiki/Entity-relationship_model

ER model example with Crow's Foot (IE) Notation


2. Information Purpose and Usage The different information usage patterns often have significantly different

information ownership rules and may require significantly different architectural solutions (or at least structural decompositions). OLTP: The transaction store manages the information required to support

day-to-day operational business processes. This information is highly volatile, and the system needs to be able to process a large number of concurrent read and write operations with short latency and high reliability

Reporting : A long-running or complex reporting query can disrupt access to the main DB by operational users, leading to increased response times and lower throughput. For this reason, some systems implement a separate reporting database

Data warehouse (OLAP): manages historical information with fast querying abilities. The data warehouse holds a record of all activity going back many years and can be used to retrieve specific historical information or to analyze trends over time.

Data marts: The data warehouse may in turn feed into more specialized data marts, which manage information from a specific domain or time period.

Reference data (aka static, master, lookup data, or classifiers): the information on people, places, and things that categorizes or classifies the system’s transactional information. Reference data may not be owned by your system, which can be a significant architectural challenge


en.wikipedia.org/wiki/Online_analytical_processing


ETL – Extract, Transform, Load

3. Information Consistency Information consistency means that information held

in different parts of the system, or in different but related data items, should be compatible, congruent, and not in conflict.

This may be as simple as a referential integrity constraint or may be more subtle and complex

for example, a summary financial position should always match the underlying data used to calculate it

Most businesses have sophisticated rules for information consistency


3. Information ConsistencyAchieving consistency Transactions and distributed transactions: XA transactions, two-

phase commit protocol Compensating transactions: each data update is committed

individually, and if a later update fails, each committed update is reversed by a transaction with an equal and opposite effect to the original one.

Eventual consistency: distributed applications favor high availability over consistency (remember CAP theorem) and are designed to be able to cope with data that is out of synch for a period of time. Such a system guarantees that after an update, all instances of the

same data will eventually be updated to this value, without guaranteeing how long this will take

Eventual consistency is used for infrastructure software such as DNS (the Internet’s Domain Name Service) and for some Internet-scale applications such as global search engines, e-commerce sites, and social networking sites.

The model is sometimes referred to as following BASE principles: Basically Available, Soft state, Eventual consistency


http://en.wikipedia.org/wiki/X/Open_XA

http://en.wikipedia.org/wiki/Eventual_consistency

See Slides about CAP Theorem


4. Information Storage Relational databases

Dimensional databases

NoSQL databases

File-based stores

Others: XML databases, object-oriented databases, hierarchical databases, network databases, graph databases, etc.


4. Information Storage:Relational databases A typical relational database contains a largely third-

normal-form schema and is usually used as some form of transactional or operational data store.

Features: SQL, ACID (Atomic, Consistent, Isolated, and Durable), OLTP

The limitations of a relational database tend to be:

the difficulty of scaling them to very large problems and

the complexity of the schema and queries that often results when implementing a large enterprise application.


4. Information Storage:Dimensional databases Dimensional databases use specialized column-based or

dimensional stores. Dimensional store is based around a multidimensional (or

“star”) schema model, with large “fact” tables containing the primary data in the database, linked to small “dimension” tables that contain classification data that can be used to group and summarize the fact data. Dimensional databases are particularly well suited for

complicated reporting problems, and so this storage model is often used for reporting databases rather than transactional databases.

Has its own query language (created by Microsoft, now de-facto standard): MDX

The major limitation of a dimensional model is the relative difficulty of updating information after it has been added to the database.


http://en.wikipedia.org/wiki/MOLAP

http://en.wikipedia.org/wiki/Star_schema

http://en.wikipedia.org/wiki/Multidimensional_Expressions

4. Information Storage:NoSQL databases NoSQL databases are a relatively recent, but they have proved

their usefulness in many very large-scale Internet services for e-commerce, Internet search, and social networking. There are many data storage technologies that classify themselves as

“NoSQL” products, and each one has its own unique characteristics, strengths, and weaknesses.

What is common among the NoSQL products is the fundamental tradeoff they have made, which is to abandon the traditional RDBMS characteristics of: strict tabular data storage, SQL-query-based data access, and in some cases ACID transaction semanticsin order to achieve very high scalability and performance.

Most of these databases are accessed via a simple “map”-based interface that allows records to be stored and retrieved by key, sometimes also offering simple query facilities based on the attributes of the records being retrieved.


4. Information Storage: NoSQL(http://blog.nahurst.com/visual-guide-to-nosql-systems)


http://blog.nahurst.com/visual-guide-to-nosql-systems

4. Information Storage: NoSQL


Data Management Patterns CRUD

CQRS

Event Sourcing


The real picture is bigger


Data Management Body of Knowledge

http://www.dama.org/i4a/pages/Index.cfm?pageID=3548

Homework Book "Software Systems Architecture"

Chapter 18 The Information Viewpoint


The Information Viewpointdonatas/PSArchitekturaProjektavimas/slides...The Information Viewpoint...

Documents

Transcript of The Information Viewpointdonatas/PSArchitekturaProjektavimas/slides...The Information Viewpoint...