Data 101: Fundamentals of Data in GIS

52
Data 101 Fundamentals of data in a GIS

description

September 2012 GIS ToT Webinar

Transcript of Data 101: Fundamentals of Data in GIS

Page 1: Data 101: Fundamentals of Data in GIS

Data 101

Fundamentals of data in a GIS

Page 2: Data 101: Fundamentals of Data in GIS

Overview

Role of data

Data structures and schemas

Metadata

Linking data

Issues of confidentiality

Page 3: Data 101: Fundamentals of Data in GIS

Review

Page 4: Data 101: Fundamentals of Data in GIS

90 percent rule

90% Data Preparation

10% Mapping90% of the cost, time and effort will be devoted to data preparation

Page 5: Data 101: Fundamentals of Data in GIS

90% Rule

Data Preparation Collecting

Cleaning

Validating

Formatting

Linking with other data

Mapping Map design

Categorization decisions

Production

Page 6: Data 101: Fundamentals of Data in GIS

GIS analysis is only as strong as the data used.

Page 7: Data 101: Fundamentals of Data in GIS

Strategies for strong data

Accuracy

Timlieness

Properly structured

Properly documented

Page 8: Data 101: Fundamentals of Data in GIS

Data accuracy

Data should accurately reflect reality

In GIS there are two types of accuracy to be concerned with:

Spatial accuracy

Items located correctly

Attribute accuracy

Attributes are correct and properly linked to geography

Page 9: Data 101: Fundamentals of Data in GIS

Spatial accuracy

Hotel Suryaa

Real Location

Page 10: Data 101: Fundamentals of Data in GIS

Spatial Accuracy and Scale

Hotel Suryaa

Page 11: Data 101: Fundamentals of Data in GIS

Attribute Accuracy

Is the data associated with the location accurate?

Is it linked to the right geographic entity?

Page 12: Data 101: Fundamentals of Data in GIS

Attribute Accuracy

Page 13: Data 101: Fundamentals of Data in GIS

Timeliness

Is the data for the time period of interest? Boundaries change

New features created

Features change

Page 14: Data 101: Fundamentals of Data in GIS

Data Structure

Proper data structure is necessary in order to effectively use data

Software must know how to read the data, and query it.

The structure of the data is also known as data schema

Page 15: Data 101: Fundamentals of Data in GIS

Data Schema

For most programs, data will need to be stored in a row and column format

GIS programs expect well formed data in the following schema:

One record per geographic unit

Geographic units don’t repeat in records

Variables are stored in columns

No blank cells unless data is missing

Page 16: Data 101: Fundamentals of Data in GIS

Data Schema

Population China India United States

Indonesia

Total 1339724852 1210193422 312417000 237556363

Percent of World’s Population

19.23% 17.37% 4.48% 3.41%

Population Density

140/km2 368/km2 32/km2 121/km2

Poor data schema•Columns are geographic units•Variables are rows

Page 17: Data 101: Fundamentals of Data in GIS

Blank Cells

Duplicate D

istrict Nam

es

Page 18: Data 101: Fundamentals of Data in GIS

Proper Data Schema

One record per geographic unit

Columns are variables

Page 19: Data 101: Fundamentals of Data in GIS

Metadata

Data about data

Provides information on:

Source of data

Who created it

When it was created

Coordinate system and datum

Usage and sharing restrictions

Page 20: Data 101: Fundamentals of Data in GIS

Metadata

Metadata is especially important with spatial data because of issues of:

Spatial accuracy

Coordinate systems and datums

Confidentiality

Timeliness

Page 21: Data 101: Fundamentals of Data in GIS

Metadata formats

International standard

ISO 9115

Mandatory elements

Schema for metadata

Countries may have their own national standards that are compatible with the ISO standard but provide extra elements

Page 22: Data 101: Fundamentals of Data in GIS

Metadata Example

Page 23: Data 101: Fundamentals of Data in GIS

Data Types

Text

Numeric

Coordinates

Programs assign variables to be a specific type which can affect the way the program handles data

Page 24: Data 101: Fundamentals of Data in GIS

Data Types

Text

Arithmetic can not be conducted on values in text fields

Numeric

Arithmetic permitted

May require user to declare number of decimal places before entering data

This can be important when storing coordinates

Page 25: Data 101: Fundamentals of Data in GIS

Linking data

Key field

The field that contains information common between tables

Tables are linked using the key field

Can’t link using key fields that are two different types

Page 26: Data 101: Fundamentals of Data in GIS

District Population Male Pop Female Pop

North 24015 14409 9606

West 31154 16202 14952

South 62442 29972 32470

District Area (sq km)

North 243

West 310

South 602

District is the key field

District Population Male Pop Female Pop Area (sq km)

North 24015 14409 9606 243

West 31154 16202 14952 310

South 62442 29972 32470 602

Page 27: Data 101: Fundamentals of Data in GIS

Linking data

Linking using text fields can be problematic

Variations in spelling

Page 28: Data 101: Fundamentals of Data in GIS

District Population Male Pop Female Pop

North Kinley 24015 14409 9606

West 31154 16202 14952

South 62442 29972 32470

District Area (sq km)

N. Kinley 243

West 310

South 602

The two tables have different spellings for the district North Kinley

District Population Male Pop Female Pop Area (sq km)

West 31154 16202 14952 310

South 62442 29972 32470 602

Page 29: Data 101: Fundamentals of Data in GIS

Linking data

Linking using numeric fields is often more reliable and less vulnerable to variations and other issues

Countries often use numeric codes for administrative units to get around problems with spelling variations

If standardized national codes exist, it is a good idea to include them in data National Bureau of Statistics or Census often

manage such codes

Page 30: Data 101: Fundamentals of Data in GIS

District Dist code Population Male Pop Female Pop

North Kinley 100 24015 14409 9606

West 200 31154 16202 14952

South 300 62442 29972 32470

District Dist code Area (sq km)

N. Kinley 100 243

West 200 310

South 300 602

Dist code is the key field

District Dist Code Population Male Pop Female Pop

Area (sq km)

North 100 24015 14409 9606 243

West 200 31154 16202 14952 310

South 300 62442 29972 32470 602

Page 31: Data 101: Fundamentals of Data in GIS

Advantage of numeric codes

Can manage hierarchy effectively

North District Code 100

District Province Code

North Coast 101

North Mountain 103

North Savanna 105

Savanna

Mountain

Coast

Page 32: Data 101: Fundamentals of Data in GIS

Linking data key points

Key fields must be of the same type

Text fields can be problematic due to spelling variations

Numeric fields are often a more reliable key field

Unique geography codes, if available in a country is often the best option for making linkages

Page 33: Data 101: Fundamentals of Data in GIS

Data and confidentiality issues

Important issue when working with spatial data

Discuss issues of confidentiality and spatial tools

Present strategies for protecting confidentiality

Page 34: Data 101: Fundamentals of Data in GIS

Confidentiality

Protecting identity of individuals

Requirement

Informed consent agreements

Ethical research

Page 35: Data 101: Fundamentals of Data in GIS

The act of explicitly making data available that breaches confidentiality commitments.

Overt disclosure

Page 36: Data 101: Fundamentals of Data in GIS

Deductive Disclosure

45 year old female

45 year old female

45 year old female

Has 5 children

45 year old female

Has 5 children

45 year old female

Has 5 children

Works for General Electric in Delhi

45 year old female

Has 5 children

Works for General Electric in Delhi

28.67171, 77.21211

Page 37: Data 101: Fundamentals of Data in GIS

Spatial Data

Overt disclosure

Makes deductive disclosure easier

Page 38: Data 101: Fundamentals of Data in GIS

Geoprivacy

“[an] individual’s right to prevent disclosure of the location of one’s home, workplace, daily activities or trips.”

Protection of geoprivacy and accuracy of Spatial Information: How Effective are Geographical Masks?

Kwan, Casas, Schmitz

Cartographica, Vol 39, #2

Page 39: Data 101: Fundamentals of Data in GIS

Four Principles

Protection of Confidentiality

Social-Spatial Linkage

Data Sharing

Data Preservation

Confidentiality and spatially explicit data: Concerns and challenges

VanWey, Rindfuss, Gutmann, Entwisle, Balk PNAS, vol. 102, no. 43

Page 40: Data 101: Fundamentals of Data in GIS

1. Protection of Confidentiality

Fundamental to ethical research

Information that might lead to physical, emotional, financial or other harm

Protection of information that discloses identity

Page 41: Data 101: Fundamentals of Data in GIS

2. Social-Spatial Linkage

All human activity takes place on earth

Understanding that adds context and perspective

Key to advancement of science

Essential for understanding the diffusion of behaviors

Page 42: Data 101: Fundamentals of Data in GIS

3. Data Sharing

Essential on both scientific and financial grounds

Provide access to data for other researchers

Condition of funders

Page 43: Data 101: Fundamentals of Data in GIS

4. Data Preservation

Data available in the future

How long should data be deemed “sensitive”?

When, if ever, can it be released

Page 44: Data 101: Fundamentals of Data in GIS

Strategies

Page 45: Data 101: Fundamentals of Data in GIS

Random Perturbations

Random shifting of point locations

Pros: Easy (relatively) to do

Cons: Lose original location, introduces error

Page 46: Data 101: Fundamentals of Data in GIS

Affine Transformation

Change scale

Rotate

Shift a set distance

Combination

Pros: Easy to do

Cons: Easy to undo, can impact some types of analysis

Page 47: Data 101: Fundamentals of Data in GIS

Aggregate

Point locations are aggregated to higher unit of analysis

Pros: Easy to do

Cons: Requires sufficient data points, Finer data variations will be lost

Page 48: Data 101: Fundamentals of Data in GIS

Despatialize

Remove Coordinate System

Use Euclidean space

Pros: Simple, keeps relative position and placement

Cons: Loses contextual data

Page 49: Data 101: Fundamentals of Data in GIS

Nothing

Do not collect or release data

Cold room or on-site analysis only

Pros: Maintains all of the original spatial data

Cons: Complicated, limits data sharing, limits social-spatial link

Page 50: Data 101: Fundamentals of Data in GIS
Page 51: Data 101: Fundamentals of Data in GIS

“Ignoring is unacceptable”

Can get lost in the excitement about GIS

Those who collect data must think about the confidentiality issues

Data users must also think about how their analysis may increase the risk of deductive disclosure.

Page 52: Data 101: Fundamentals of Data in GIS

Key points

Confidentiality issues arise when spatial context is included in data.

It’s important to protect confidentiality. People have an expectation that their identities are protected.

There are strategies that can preserve confidentiality, but there is no “one-size-fits-all solution”