The Information Viewpointdonatas/PSArchitekturaProjektavimas/slides...The Information Viewpoint...
Transcript of The Information Viewpointdonatas/PSArchitekturaProjektavimas/slides...The Information Viewpoint...
The Information Viewpoint
View Relationships
The Information Viewpoint 2
The Information Viewpoint Definition: Describes the way that the system stores,
manipulates, manages, and distributes information Architect should do data modeling only at an architecturally
significant level of detail. You need to focus on those aspects of the data model where
getting it wrong would affect the system as a whole rather than just a part of it.
Your task is to develop a summary view of: static information structure and dynamic information flow, with the objective of answering the architecturally significant questions around: consistency, ownership, latency, relationships and identifiers, and so forth
The Information Viewpoint 3
Concerns1. Information Structure and Content2. Information Purpose and Usage3. Information Consistency4. Information Storage Models5. Information Ownership6. Enterprise-Owned Information7. Identifiers and Mappings8. Volatility of Information Semantics9. Information Flow10. Information Quality11. Timeliness, Latency, and Age12. Archiving and Information Retention
The Information Viewpoint 4
1. Information Structure and Content: Static Models Static information structure models analyze the static
structure of the information: the important data elements and the relationships among them.
Entity-relationship modeling is an established technique of data analysis, though Chen’s notation is no longer popular
Class models perform a role similar to that of entity-relationship models but for the object-oriented world
The Information Viewpoint 5
ER model example with Crow's Foot (IE) Notation
The Information Viewpoint 6
2. Information Purpose and Usage The different information usage patterns often have significantly different
information ownership rules and may require significantly different architectural solutions (or at least structural decompositions). OLTP: The transaction store manages the information required to support
day-to-day operational business processes. This information is highly volatile, and the system needs to be able to process a large number of concurrent read and write operations with short latency and high reliability
Reporting : A long-running or complex reporting query can disrupt access to the main DB by operational users, leading to increased response times and lower throughput. For this reason, some systems implement a separate reporting database
Data warehouse (OLAP): manages historical information with fast querying abilities. The data warehouse holds a record of all activity going back many years and can be used to retrieve specific historical information or to analyze trends over time.
Data marts: The data warehouse may in turn feed into more specialized data marts, which manage information from a specific domain or time period.
Reference data (aka static, master, lookup data, or classifiers): the information on people, places, and things that categorizes or classifies the system’s transactional information. Reference data may not be owned by your system, which can be a significant architectural challenge
The Information Viewpoint 7
The Information Viewpoint 8
ETL – Extract, Transform, Load
3. Information Consistency Information consistency means that information held
in different parts of the system, or in different but related data items, should be compatible, congruent, and not in conflict.
This may be as simple as a referential integrity constraint or may be more subtle and complex
for example, a summary financial position should always match the underlying data used to calculate it
Most businesses have sophisticated rules for information consistency
The Information Viewpoint 9
3. Information ConsistencyAchieving consistency Transactions and distributed transactions: XA transactions, two-
phase commit protocol Compensating transactions: each data update is committed
individually, and if a later update fails, each committed update is reversed by a transaction with an equal and opposite effect to the original one.
Eventual consistency: distributed applications favor high availability over consistency (remember CAP theorem) and are designed to be able to cope with data that is out of synch for a period of time. Such a system guarantees that after an update, all instances of the
same data will eventually be updated to this value, without guaranteeing how long this will take
Eventual consistency is used for infrastructure software such as DNS (the Internet’s Domain Name Service) and for some Internet-scale applications such as global search engines, e-commerce sites, and social networking sites.
The model is sometimes referred to as following BASE principles: Basically Available, Soft state, Eventual consistency
The Information Viewpoint 10
See Slides about CAP Theorem
The Information Viewpoint 11
4. Information Storage Relational databases
Dimensional databases
NoSQL databases
File-based stores
Others: XML databases, object-oriented databases, hierarchical databases, network databases, graph databases, etc.
The Information Viewpoint 12
4. Information Storage:Relational databases A typical relational database contains a largely third-
normal-form schema and is usually used as some form of transactional or operational data store.
Features: SQL, ACID (Atomic, Consistent, Isolated, and Durable), OLTP
The limitations of a relational database tend to be:
the difficulty of scaling them to very large problems and
the complexity of the schema and queries that often results when implementing a large enterprise application.
The Information Viewpoint 13
4. Information Storage:Dimensional databases Dimensional databases use specialized column-based or
dimensional stores. Dimensional store is based around a multidimensional (or
“star”) schema model, with large “fact” tables containing the primary data in the database, linked to small “dimension” tables that contain classification data that can be used to group and summarize the fact data. Dimensional databases are particularly well suited for
complicated reporting problems, and so this storage model is often used for reporting databases rather than transactional databases.
Has its own query language (created by Microsoft, now de-facto standard): MDX
The major limitation of a dimensional model is the relative difficulty of updating information after it has been added to the database.
The Information Viewpoint 14
4. Information Storage:NoSQL databases NoSQL databases are a relatively recent, but they have proved
their usefulness in many very large-scale Internet services for e-commerce, Internet search, and social networking. There are many data storage technologies that classify themselves as
“NoSQL” products, and each one has its own unique characteristics, strengths, and weaknesses.
What is common among the NoSQL products is the fundamental tradeoff they have made, which is to abandon the traditional RDBMS characteristics of: strict tabular data storage, SQL-query-based data access, and in some cases ACID transaction semanticsin order to achieve very high scalability and performance.
Most of these databases are accessed via a simple “map”-based interface that allows records to be stored and retrieved by key, sometimes also offering simple query facilities based on the attributes of the records being retrieved.
The Information Viewpoint 15
4. Information Storage: NoSQL(http://blog.nahurst.com/visual-guide-to-nosql-systems)
The Information Viewpoint 16
4. Information Storage: NoSQL
The Information Viewpoint 17
Data Management Patterns CRUD
CQRS
Event Sourcing
The Information Viewpoint 18
The real picture is bigger
The Information Viewpoint 19
Data Management Body of Knowledge
Homework Book "Software Systems Architecture"
Chapter 18 The Information Viewpoint
The Information Viewpoint 20