Data mining – introduction
-
Upload
fiddy-prasetiya -
Category
Education
-
view
132 -
download
0
Transcript of Data mining – introduction
Data mining – databases
Tim Deprez
Outline
• Today: – introduction to databases– Introduction on how to work with MSAccess
• Next coming days: practical excercises with MSAccess
Data mining
• Exploration of data• Prerequisite: data should be available in a
minable format - database• Database = electronic document storing data
– Non-relational: 1 bulk system with non-related items (eg. Msexcel files, text-documents, non-related-tables)
– Relational: all items (tables) are linked to each other (see further)
Why using a database
• Relational database:– All your data is stored in 1 file
• Easy to retrieve data• Easy to backup
– Data and metadata stored together• Data ...• Metadata: data about the data (documentation)
– Many data-files contain undocumented values:– Species A has an abundance of 17 ( meaning of value 17?)
Why using a database
• All data in a good relational designed database is only stored once:– Example: species list typing errors
• Nudora thorakista• Nudora thorrakista• Nudora thorakhista • Nudora thorakisa
– 1 species species richness calculation: 4– Solution: 1 table with each species 1 record and
use it as a reference
Why using a database
• Data is much more rigid ...– More difficult to make errors– E.g. Sorting in excell
Relational database - biologySpecies
person
Places
Sample
Country
Density
Equipment
Speciesperson
Places
Sample
Country
Density
Equipment
Which person was present on samplings in sweden?
Speciesperson
Places
Sample
Country
Density
Equipment
Which species sampled with a core occur in densities higher than 40
Variable
Var_value
Taxonomy
Photo
Literature...
...
...
...
Table designs ...
• A table consists of a series of Columns ...• Each record as such:
– Different fields– Design of table must be done before data is entered– Each field: name, data type– Each field can also by formatted layout
Record
ColumnField
Table designs ...
• Field types:– Numeric – integer/double– Text – Date/Time– Memo– Autonumber ID– Yes/No
Task on field types:
• 12• 15 jan 1988• hallo• 12,456• 12:56• Azdazdazd azdda zda azdd dad zd dadazdzd azdazddazdd azdazd azdazd dzdzdzzd ada zzd azdaz
dda azd da az d z azdzadazd a zd a azd azd z dd da a z a z zd d ddaa zd
• 09:89
Special field in a table: key
• A key = a unique identifier for a record– Example: pasport number:
• Number in a database which is unique and relates to all data about you
– Each record in a table gets also a key– This key is used to link tables to each other– Example:
• Nudora sp1 – id: 123776• Nudora sp2 – id: 34688
– Advantage: species name changes: linked taxa remain linked
Linking tables through id’s
• Storing numbers is most effecient way to store data:
• Nudora sp1 is found in the north sea with a density of 32
• Species 123776 is found in station 2 (North sea) with a density of 32
• Record in table density becomes:123776 | 2 | 32
Setting up relations between tables
• Relations: links between tables• Connecting tables through certain fields in a
rigid way to each other• Advantage: database becomes a strong unity• Types of relations:
– 1 to many– Many to many ( = 2 times 1 to many)
Examples of relations
• Table places: field country (numeric)• Table countries – list of countries, each country has unique id• Relation is made between:
– Field country in places– Field id in country
• One to many relation: 1 record in table country linked to multiple records in places
• No deleting of countries possible
Places
Country
Examples of relations
• Many to many• Id of sample• Id of species
• Table density: unique combination of sample, species ...
Species
Sample
Density
Queries
• All data in database:– Next step: get it out again– Selections on 1 table: by using filters– Selections on multiple tables: using queries– Queries can be saved and reused– Queries can be the basis for new queries
Sorting on tables
• Sorting
Filtering on tables
Making a simple selection Query
• Create ... Query in design view• Switching between views:
Making a simple selection Query
• Select the tables and/or queries needed
Making a simple selection Query
• Select the fields needed for output/selection/sorting
Making a simple selection Query
• Select the fields needed for output/selection/sorting
Making a simple selection Query
• Select the fields needed for output/selection/sorting
Making a simple selection Query
• Select the fields needed for output/selection/sorting
Making a simple selection Query
• Set the criteria
Making a simple selection Query
• Select the values to out put and add sorting options
Output the results
• Go to datasheet view
Making a simple selection Query
• Special options ...
Exporting data
• From msaccess it is possible to export to different formats!
• Tables, queries, ...• Exports can be used to do further data mining:
– Through MSExcell making graphs– To do statistical analysis
Exporting data
Step by step demonstration
• Open a database• Different items in database• Open tables, sorting, filtering• Table design• Relationships• Queries
Query operators
= equals> Larger than< Smaller than>= larger than or equalsBetween ... And ... Is nullLike ...Not like ...
Query operators
Query operators
and both trueor at least 1 true< Smaller than>= larger than or equalsBetween ... And ... Is nullLike ...Not like ... >"q*" and <"u*" VOORNAAM René, Robbie, Stefan, Stijn, Tim, Tristam
="r*" or "s*" VOORNAAM Robbie, Stefan, Stijn