Database 2 - WordPress.com...We created a database on a server (survey.cs.unicam.it) and uploaded...

Post on 25-Jun-2020

1 views 0 download

Transcript of Database 2 - WordPress.com...We created a database on a server (survey.cs.unicam.it) and uploaded...

Database 2

Diego CervelliniRiccardo Pancotti

General Index

● Introduction to Data Warehousing● Initial goals● Date Warehousing phases● Obtained reports● Required indexes● Conclusions

First of all - What is a Database?

● A database is an organized collection of data● Data are organized in models to be easily

queried● Most important aspects are accuracy,

availability, usability and resilience● It's not useful for detailed analysis aimed at

planning and decision making● Possible solution?

Data Warehousing

What is Data Warehousing?

● Data Warehousing consists in a set of methods, tools and technologies to assist the knowledge worker to carry out data analysis.

● It can starts from:○ an existing corporate database○ the Company Information Systems ○ data coming outside the corporate

Data Warehouse

Data Warehouse works as a repository used for reporting and analysis.

It has the following characteristics:● oriented to the subject of interest● integrated and consistent● representative of temporal evolution● non-volatile

Benefits of Data Warehouse

● Maintain data history● Integrate data from multiple source systems● Provide a single data model for all data● Improve data quality● Restructure the data so that it delivers excellent query

perfomance● OLAP vs OLTP

OLTP (On-line transactional processing)

OLAP (On-line Analytical Processing)

● Dynamic and multidimensional analysis.

● Works better with huge amount of data, summing up the performance of an enterprise.

● Interactivity is essential

● Transactions that read/write a small number of tuples from/to many tables connected by simple relations

● The workload core is "frozen", no interactivity.

Initial goals

Initial goals of our course were:

● Creation of a data warehouse from ESSE3 database

● Data extraction to obtain indexes● Report creation

Phases in Data Warehousing

Major phases in Data Warehousing:

● Extraction

● Cleaning

● Transformation

● Loading

Tools used

● SquirrelSql & Dbeaver: Sql clients used to analyze Esse3

● Pentaho Suite: open source BI suite with ETL and reporting capabilities

● MysqlWorkbench: Database design and administration tool, used to manage our local repository

Extraction

In this phase relevant data are extracted from data source.

The choice of the data to be extracted is mainly based on their quality.

Our Extraction

What have we done?Downloaded some useful tables from ESSE3 database, according to our goals and the suggestions of ESSE3 developers.

Tools used:● SquirrelSQL to obtain the SQL structure of the DB● Pentaho suite to download the tables from ESSE3 to

our local database.● MySQLWorkbench to create and manage our local

database.

Cleaning

Cleaning is used to improve the quality of thedata sources.It's about deleting and/or leaving out:● duplicate data● missing data● inconsistency between logical associated

values● ...

Our Cleaning

What have we done?We cutted all data that were inserted before 2008, because they are not useful for our purposes.

Tools used:● MySQLWorkbench to delete all unnecessary data.

Transformation

Converts data from operational source format to that of DW. The correspondence with the source level is complicated by the presence of distinct sources heterogeneous, requiring a complex integration phase.

Our Transformation

What have we done?We have changed the engine of tables (from Oracle one to InnoDB).We created indexes of each table.We linked the tables creating the foreign keys.

Tools used:● MySQLWorkbench to manage the tables changes

Loading

The loading of data into the DW ● Refresh: DW data are written in full,

replacing the previous ones (technique used to originally populate the DW)

● Update: only changes occurring in source data are added in DW (technique used for the periodic update of DW)

Our Loading

What have we done?We created a database on a server (survey.cs.unicam.it) and uploaded there our "clean" and modified tables.

Tool used:● MySQLWorkbench to re-create indexes and foreign

keys ● Pentaho suite to upload tables on the server

Obtained Reports

We worked on and analyzed our cleaned tables to try to retrieve some useful data that can influence the decision making process.

In this way we could give some useful information about Unicam, making the decision planning easier and faster.

Obtained Reports

1. Situation of first year exams of some faculties

2. Foreign students on total students percentage

3. Situation of exams between italian and foreign students

Situation of marks average between italian and foreign students

First year exams Pharmacy

First year exams Computer Science

First year exams Law faculty

Passed exams by Italian students

Passed exams by foreign students

Italian students marks average

Foreign students marks average

Percentage of foreign students on total from 2008

Percentage of foreign student on total by year

Calculating Indexes

One of the goals of our course was to calculate two different indexes for the FFO (Fondo di finanziamento ordinario).● A1: Atot = RAP * ( KA + KT )

● A2: University's weighted CFU / National's weighted CFU

Active studentsRegion wealth function

0,98

Number of Teacher /Courses 0,85

A1-Index

RAP = 5.092KT = 0,98KA = 0,85National Atot = ?

Atot = RAP*(KA+KT) = 9318,36

A1 = Local Atot/National Atot = ?

A2-Index

Acquired CFU = 171.058Expected CFU = 294.178MNG = 0,43National Weighted CFU = ?

PCFU = Expected CFU/Acquired CFU = 1,719755872Weight = PCFU/MNG = 3,999432261Weighted CFU = Weight*Acquired CFU = 684134,88372093

A2 = Local Weighted CFU/National Weighted CFU= ?

Conclusions

● We didn't managed to make a data-warehouse properly but just a collection of data-marts and some reports about it.

● We faced a lot of problems due to the inconsistency of ESSE3 database and its documentation, that sometimes didn't seem so clarifying and helpful.

● On the other hand we obtained useful reports and we realized how to work in team on such a "problematic" task.

THANKS FOR

YOUR ATTENTION!