Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

49
Computing in Computing in Archaeology Archaeology Basic Statistics Basic Statistics Week 8 (25/04/07) Week 8 (25/04/07) © Richard Haddlesey www.medievalarchitecture.net

Transcript of Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Page 1: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Computing in Computing in ArchaeologyArchaeology

Basic StatisticsBasic Statistics

Week 8 (25/04/07)Week 8 (25/04/07)© Richard Haddlesey www.medievalarchitecture.net

Page 2: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

AimsAims

To familiarise ourselves with KEY To familiarise ourselves with KEY statistical terms and their meaningsstatistical terms and their meanings

To understand the use of stats in To understand the use of stats in archaeologyarchaeology

To assign variables, appropriate To assign variables, appropriate levels of measurement, at the levels of measurement, at the recording levelrecording level

Page 3: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Key textsKey texts

Page 4: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Basic StatsBasic Stats

Batch

VariablesVariables

Case Case Case

Post holes

Length, area, diameter

Post hole ID

Page 5: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

VariablesVariables

Variables are measured according Variables are measured according to one of FOUR levelsto one of FOUR levels

1.1. Nominal Nominal = arbitrary name= arbitrary name

2.2. OrdinalOrdinal = sequence with no distance= sequence with no distance

3.3. IntervalInterval = sequence with fixed distance= sequence with fixed distance

4.4. RatioRatio = sequence with a fixed = sequence with a fixed datumdatum

Page 6: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Vince NOIRVince NOIR

NNominalominal OOrdinalrdinal IIntervalnterval RRatioatio

Page 7: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Nominal examplesNominal examples

ConditionCondition AgeAge DiameterDiameter LengthLength ContextContext PeriodPeriod

Page 8: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Ordinal examplesOrdinal examples

ConditionCondition1.1. ExcellentExcellent

2.2. GoodGood

3.3. FairFair

4.4. PoorPoor

Here “2” may be between “1” and Here “2” may be between “1” and “3” but is unlikely to be of equal “3” but is unlikely to be of equal distancedistance

Page 9: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Interval examplesInterval examples

PeriodPeriod1.1. Late Bronze (1200-650)Late Bronze (1200-650)2.2. Early Iron (649-100)Early Iron (649-100)3.3. Late Iron (100+)Late Iron (100+)

Here, if we have 3 artefacts dated Here, if we have 3 artefacts dated 150BC, 300BC and 450BC, although 150BC, 300BC and 450BC, although bb may be equal distance between may be equal distance between aa and and cc, , cc is not twice as old as is not twice as old as aa..

This is because there is no datum.This is because there is no datum.

Page 10: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Ratio examplesRatio examples

Age instead of periodAge instead of period• 1000 ya is twice 500 ya1000 ya is twice 500 ya• 20kg is twice 10kg20kg is twice 10kg

Ratio is the highest level of Ratio is the highest level of measurement because it has a measurement because it has a datum datum

Page 11: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Mortlakestyle bowl

Fengatestyle bowl

Grooved ware jar

Nominal, Ordinal and Interval

Page 12: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Note!Note!

Avoid using 0 or 1 to indicate such Avoid using 0 or 1 to indicate such variables as yes or no, as we may variables as yes or no, as we may need to know if it is “no” or “no data”need to know if it is “no” or “no data”

Also when using presence or absence Also when using presence or absence you may wish to add “missing” to you may wish to add “missing” to avoid confusionavoid confusion

Page 13: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Further distinctionFurther distinction

Nominal and OrdinalNominal and Ordinal• = categorical= categorical• = qualitative= qualitative

Interval and RatioInterval and Ratio• = continuous= continuous• = quantitative= quantitative

Page 14: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

CodingCoding

Nominal and Ordinal often need coding, to Nominal and Ordinal often need coding, to minimise errors, via a keyword indexminimise errors, via a keyword index

con = contextcon = context• str = stray findstr = stray find• set = settlementset = settlement• bur = burialbur = burial

Avoid 1,2,3,etc, as you will have to keep Avoid 1,2,3,etc, as you will have to keep looking up their meanings which is time looking up their meanings which is time consumingconsuming

Page 15: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

CodingCoding

NOTE!NOTE!

EVERY DATA VALUE MUST HAVE A EVERY DATA VALUE MUST HAVE A CODE AND ONLY ONE CODE!CODE AND ONLY ONE CODE!

Page 16: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

GroupingGrouping

Good for periods, as in Good for periods, as in • Late Bronze (1200-650)Late Bronze (1200-650)• Early Iron (649-100)Early Iron (649-100)• Late Iron (100+)Late Iron (100+)

NOTE: it is better to record as a NOTE: it is better to record as a continuous variable (i.e. 780BC), continuous variable (i.e. 780BC), then group as an output (i.e. Late then group as an output (i.e. Late Bronze)Bronze)

Page 17: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Good PracticeGood Practice

Always keep a “CLEAN” version of Always keep a “CLEAN” version of the original data setthe original data set

Page 18: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Exploring the dataExploring the data

Page 19: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Context FNO Taxon Bone z1 z2 z3 z4 z5 z6 F/U L/R art. sex NISP chop cut m1 m2 m3 m4269 58 bs mn 0 0 0 0 0 0 - r - - 1 35.9 14.6722 191 eq sc 1 1 1 1 1 1 f r 2 - 1 78.2 40.7 55.6722 191 eq sc 1 1 1 1 1 1 f l 2 - 1 78.7 41.4 48.5371 102 eq sc 1 1 1 1 1 1 f r - - 1 45.0 58.0 52.9722 191 eq cal 1 1 1 1 1 0 f r 2 - 1 90.6 45.0722 191 eq mp 1 1 1 0 0 0 f l 2 - 1 41 45.6 40.3 28.7722 191 eq mp 1 1 1 0 0 0 f r 2 - 1 42 46.0 39.5 29.4722 191 eq mp 1 1 1 0 0 0 f r 2 - 1 46.0 39.7 28.5285 72 bs cal 1 1 1 1 1 0 f r - - 1 1 1 137.5 46.3722 191 eq mp 1 1 1 0 0 0 f l 2 - 1 42 46.3 40.0 29.2722 191 eq pp 1 1 1 0 0 0 f l 2 - 1 71 48.7 45.0 32.5722 191 eq pp 1 1 1 0 0 0 f r 2 - 1 71 48.8 45.2 32.5722 191 eq pp 1 1 1 0 0 0 f r 2 - 1 68 49.0 45.0 34.1722 191 eq pel 1 1 1 1 1 1 f l 2 - 1 60.1 52.2722 191 eq ast 1 1 1 1 0 0 - r 2 - 1 51 53 44.9722 191 eq ast 1 1 1 1 0 0 - l 2 - 1 51 54 44.4 52.7722 191 eq mciii 1 1 1 1 1 1 f r 2 - 1 187 179 43.7 28.6722 191 eq mciii 1 1 1 1 1 1 f l 2 - 1 187 180 42.8722 191 eq mtiii 1 1 1 1 1 1 f l 2 - 1 229 223 41.4 39.1722 191 eq mtiii 1 1 1 1 1 1 f r 2 - 1 229 223 42.8 39.5722 191 eq hum 1 1 1 1 1 1 f/f r 2 - 1 232 30.8722 191 eq rad 1 1 1 1 1 1 f/f l 2 - 1 274 71.7 64.2

example data set

Page 20: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

univariate frequency tableunivariate frequency table

speciesspecies frequencyfrequency

cattlecattle 187187

sheepsheep 109109

pigpig 7878

horsehorse 2121

TotalTotal 395395

Page 21: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

speciesspecies pitspits ditchesditches TotalTotal

cattlecattle 6767 120120 187187

sheepsheep 6363 4646 109109

pigpig 4141 3737 7878

horsehorse 33 1818 2121

TotalTotal 174174 221221 395395

bivariate frequency tablebivariate frequency table

Page 22: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

bivariate frequency tablebivariate frequency table

speciesspecies pitspits ditchesditches TotalTotal

cattlecattle 67 67 39%39% 120 120 54%54% 187187

sheepsheep 63 63 36%36% 46 46 21%21% 109109

pigpig 41 41 24%24% 37 37 17%17% 7878

horsehorse 3 3 2%2% 18 18 8%8% 2121

TotalTotal 174 174 100%100% 221 221 100% 100% 395395

Page 23: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

MultivariateMultivariate

These tend to operate on a table, or These tend to operate on a table, or matrix of items, described in terms of matrix of items, described in terms of a set of variablesa set of variables

Page 24: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Pictorial displays forPictorial displays forcategorical datacategorical data

Page 25: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

0

5

10

15

20

25

30

35

40

45

50

cattle sheep pig horse

%

bar chart

Page 26: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

0

10

20

30

40

50

60

cattle sheep pig horse

%

pits

ditches

multiple bar chart

Page 27: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pie chart

Page 28: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Pictorial displays forcontinuous data

Page 29: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

0

2

4

6

Co

un

t

Hunt's House

Monkton

4 9.0 5 0.0 5 1.0 5 2.0 5 3.0 5 4.0 5 5.0 5 6.0 5 7.0 5 8.0 5 9.0 6 0.0 6 1.0 6 2.0 6 3.0 6 4.0 6 5.0 6 6.0 6 7.0 6 8.0 6 9.0 7 0.0 7 1.0 7 2.0

Bd (mm)

0

2

4

6

Co

un

t

histogram

Page 30: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .
Page 31: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .
Page 32: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Basic descriptive statistics:

• mode• median• mean• range• variance• standard deviation

Page 33: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

Page 34: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

Mode = 2

Page 35: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

ModeMode

Mode is the only way to measure Mode is the only way to measure average/typical in the average/typical in the NominalNominal class class

If there are two averages then they If there are two averages then they are bimodal (1,2,are bimodal (1,2,33,,33,,6,66,6,7,8,9),7,8,9)

Three = trimodal, etc.Three = trimodal, etc.

Page 36: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

Mode = 2

Median = 3

Page 37: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

MedianMedian

Best for Best for ordinalordinal and above and above

If the number of variables is even, If the number of variables is even, you make a number between the two you make a number between the two middle numbers middle numbers

(1,2,3,(1,2,3,4,54,5,6,7,8 = 4+5/2=,6,7,8 = 4+5/2=4.54.5))

Page 38: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

Mode = 2

Median = 3

Mean = (2+2+3+5+8)/5 = 4

Page 39: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

MeanMean

The most commonly used average The most commonly used average and, it will only work for and, it will only work for intervalinterval and and ratioratio

It is the most important measure of It is the most important measure of position because a lot of further position because a lot of further statistical analyses are based on itstatistical analyses are based on it

Page 40: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

ConclusionConclusion

It is important to understand that the It is important to understand that the modemode, , medianmedian and and meanmean are three quite are three quite different measures of position which can different measures of position which can give three different values when applied to give three different values when applied to the same data-setthe same data-set

2, 2, 3, 5, 8 2, 2, 3, 5, 6, 8

Mode = 2 2 Median = 3 4 Mean = 4 4.333

Page 41: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

The The skewskew

symmetrical

Positive skew Negative skew

Page 42: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

Measures of variability – the spread

Page 43: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

Range =

max – min

8 - 2 = 6

• Very simple and of limited use

Page 44: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

variance

key:

Page 45: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

s2 =

(2-4)2 + (2-4)2 + (3-4)2 +(5-4)2 + (8-4)2

5

variance (s2)

s2 = 5.2

s2 =

(Mean = 2=2=3=5=8/5=4)

Page 46: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

variance

standard deviation

Page 47: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

pottery fragments (weights in grams):2, 2, 3, 5, 8

variance (s2) = = 5.2

standard deviation =

= (√variance) = √5.2 = 2.28

Page 48: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

SummarySummary

Variables are measured according to Variables are measured according to one of one of FOURFOUR levels levels

1.1. Nominal Nominal = arbitrary name= arbitrary name

2.2. OrdinalOrdinal = sequence with no distance= sequence with no distance

3.3. IntervalInterval = sequence with fixed distance= sequence with fixed distance

4.4. RatioRatio = sequence with a fixed datum= sequence with a fixed datum

Page 49: Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey .

SummarySummary

Measures of position Measures of position (average/typical)(average/typical)• ModeMode• MedianMedian• MeanMean• RangeRange• VarianceVariance• Standard DeviationStandard Deviation