STAT 305: Chapter 1STAT 305: Chapter 1
Part IIPart II
Amin ShiraziAmin Shirazi
Course page:Course page:ashirazist.github.io/stat305.github.ioashirazist.github.io/stat305.github.io
1 / 331 / 33
Why Engineers Study StatisticsWhy Engineers Study Statistics
Chapter 1: Introduction, ContinuedChapter 1: Introduction, Continued
Chapter 2: Data CollectionChapter 2: Data Collection
2 / 332 / 33
Section 1.2Basic Terminology, Continued
3 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresThe most basic way to think about data is to imagine howthe the raw observations could be organized oncecollected.
Collected data can be referred to as a data set. If the dataset is simple enough, we can store it in a data table or flatfile. Traditional data tables store values relating to a singleobservation/unit/individual as a row of the table. Eachcolumn in the table represents a value for some observedcharacterstic observed.
Example: Failure time of lightbulbs
A single brand and model of lightbulb is being examinedfor average failure time. Five bulbs were run until theyburned out and their lifetime was recorded in hours. Thefirst bulb lasted 521.4 hours, the second bulb lasted 501.2hours, the third bulb lasted 541.8 hours, the fourth bulblasted 498.1 hours, and the fifth bulb lasted 528.2 hours.
4 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresExample: Failure time of lightbulbs, continued
Assembling the results in a data table could look like this:
Bulb Number Failure Time (hours)1 521.42 501.23 541.84 498.15 528.2
Each bulb tested gets its own row - which row is attachedto which bulb is identified by the first column. The onlyfeature being observed is failure time - so only one columnof observations are recorded for each bulb.
Notice:
Failure Time is a quantitative continuous variable.This is a univariate data set.
5 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresExample: Type of bill, date of payment, and paymentamount for Mediacom
Customer Type Date AmountJohn Doe Internet 01-05-2015 110.00John Doe Phone 01-15-2015 10.00John Doe Internet 02-05-2015 110.00John Doe Phone 02-15-2015 10.00John Doe Internet 03-05-2015 110.00John Doe Phone 03-15-2015 10.00... ... ... ...John Doe Internet 01-05-2016 110.00John Doe Phone 01-15-2016 10.00Jane Doe Internet 04-12-2015 90.00Jane Doe Internet 05-12-2015 90.00... ... ... ...Jane Doe Internet 01-12-2016 90.00
Notice:
Type of bill is is a Qualitative variable.Amount paid is quantitative discrete. 6 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresExample: Machine Parts
Suppose we get a shipment of 5000 machineparts and would like to verify that the shipmentmeets the standards the machinist agreed to.We take out 100 parts and examine themcarefully. To verify that the parts are as strongas we anticipated, we measure the "Rockwellhardness" with a machine that is accurate tothe first decimal place. We also examine eachpart for scratches and record it weight. Further,we run the part in a test machine to determineif it works correctly.
In this case, we are gathering 4 values on each part. So forinstance, the first of the 100 parts we examine could havea measured Rockwell hardness of 3.2, no scratches, aweight of 1.7562 g, and it works correctly. The second ofthe 100 parts we examine could have a measuredRockwell hardness of 3.1, no scratches, a weight of 1.7901g, and does not work correctly.
7 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresThe data as recorded by the researcher might look like this
Part identifier: 1/100 Rockwell Hardness: 3.2 scratches: no weight (g): 1.7562 functioning: yes
Part identifier: 2/100 Rockwell Hardness: 3.1 scratches: no weight (g): 1.7901 functioning: no
...
Part identifier: 100/100 Rockwell Hardness: 3.4 scratches: no weight (g): 1.7651 functioning: yes
8 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresWhich we could turn into structured data table like this:The data as recorded by the researcher might look like this
part rockwell_hardness weight scratches functioning1 3.2 1.7562 no yes2 3.1 1.7901 no no. . . . .. . . . .. . . . .100 3.4 1.7651 no yes
When data is arranged like this, with each sampling uniton its own row, the data is said to be in wide format.
9 / 33
What andWhy
Terms
DataStructures
Types of Data StructuresHowever, we could also structure a data table like this:
part measurement value1 Rockwell 3.21 weight 1.75621 scratches no1 functioning yes2 Rockwell 3.12 weight 1.79012 scratches no2 functioning no. . .. . .. . .100 functioning yes
When data is arranged like this, with each sampling uniton its own row, the data is said to be in long format.
10 / 33
What andWhy
Terms
DataStructures
Factorial StudiesFactorial Studies involve scenarios in whichseveral process variables are indentified asbeing of interest and data are collected underdifferent settings of these process variables.
We call the process variables factors and thepossible settings for a process variable itslevels
Complete Factorial Studies are factorialstudies where data is collected from eachpossible combination of the levels of the factors(also known as Full Factorial Studies).
Partial(Fractional) Factorial Studies arefactorial studies where data is collected fromsome (but not all) possible combinations of thelevels of the factors.
11 / 33
What andWhy
Terms
DataStructures
Factorial Studies ExampleA pair of chemists, Walter and Jessie, areattempting to synthesize a chemical productand consider purity to be the most importantquality. There are three environments availableto them ( RV, a basement, and a laboratory) andtwo precursors (Chemical compound)(pseudoephedrine/methylamine). They are bothwilling to try all their options in order to get thebest results.
What parts of this synthesis are being treated asvariables which can be controlled at the start of theexperiment?
What are the possible values for each of thesevariables?
How many ways can the variables be combined?
12 / 33
What andWhy
Terms
DataStructures
Factorial Studies Example, cont
Here are all the possible combinations of the factors:
cook environment precursor walter RV psuedoephedrine walter RV methylamine walter basement psuedoephedrine walter basement methylamine walter lab psuedoephedrine walter lab methylamine jessie RV psuedoephedrine jessie RV methylamine jessie basement psuedoephedrine jessie basement methylamine jessie lab psuedoephedrine
(# of Cooks) ⋅ (# of Environments) ⋅ (# of Precursors) = 2 ⋅ 3 ⋅ 2 = 12
13 / 33
What andWhy
Terms
DataStructures
Factorial Studies Example, cont
If we collect data from each of these combinations, wehave performed a A Complete Factorial Study
14 / 33
What andWhy
Terms
DataStructures
Factorial Studies Example, cont
After testing each scenario, Walter and Jessie decide thatthe best combination to use is Walt as cook in the lab withmethylamine. However, a new "chemist" Victor has joinedthe group and is going to try to be the cook and "follow therecipe" in the lab. Jessie also tries a new environment,South America.
If we consider the all the past combinations to be partof this new study, how many combinations of factorlevels are now possible?
15 / 33
What andWhy
Terms
DataStructures
Victor never works in the RV, the basement, or SouthAmerica.
Walter never works in South America.
16 / 33
What andWhy
Terms
DataStructures
Factorial Studies Example, cont
cook env precursor1. walt RV pseudo2. walt RV methylamine3. walt basement pseudo4. walt basement methylamine5. walt lab pseudo6. walt lab methylamine7. jessie RV pseudo8. jessie RV methylamine9. jessie basement pseudo10. jessie basement methylamine11. jessie lab pseudo12. jessie lab methylamine13. jessie so. am. methylamine14. jessie so. am. pseudo15. victor lab methylamine16. victor lab pseudo 17 / 33
What andWhy
Terms
DataStructures
Factorial Studies Example, cont
In this case, we would have a Fractional Factorial Study -a factorial study in which no data is collected for somepossible combinations.
18 / 33
Section 1.3Section 1.3Measurement: It's Importance and Di�cultyMeasurement: It's Importance and Di�culty
19 / 3319 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Or Engineering For That Matter
Success in statistical engineering studies requires theability to measureMethods of measurements are available for somephysical properties (length, mass, temperature, ...)
Often, the behavior of anengineering system canbe adequately characterized in terms of suchproperties
If it cannot, engineers must carefully define what isabout the system that needs observing and then createa suitable method of measurement.
20 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Example:
Two students wanted to conduct a factorial studycomparing joint strengths for combinations of threedifferent woods and three glues.
Didn't know how to have access to strength-testingequipment, so invented their own.Suspend a large container from one of the pieces ofwood involved and poured water into it until theweight was sufficient to break the joint.Knowing the volume of the water poured into thecontainer and the density of the water, they coulddetermine the force required to break the joint.
21 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
22 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Measurement and its importance and di�culty
Validity: appropriately represent the feature ofinterest.
Variation is always present in collecting data.
Some come from the the objects under study asthey are never alike(that might be of interest tosee if the variation is due to the object)Some of it is due to the fact that the measurementprocesses have their own inherent variablity.
23 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Measurement and its importance and di�culty
Precision: the amount of variation in repeatedmeasurement of the same object
A measurement system is called precise if it producessmall variation in repreated measurement of the sameobject
Precision is the internal consistency of a measurementsystem: typically, it can be improved only with basicchanges in the configuration of the system.
24 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Measurement and its importance and di�culty
Precision of a measurement is important, but for manypurposes it alone is not adequate.
Accuracy: or Unbiasedness; how close ameasurement is to the true value "on average".
Accuracy is the agreement of a measuring system withsome external standard. It can be changed withoutextensive physical change in a measurement method. So,we calibrate to improve accuracy.
25 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Measurement and its importance and di�culty
Calibration of a system against the standard (bringingthe measurement system in line with the standard)can be
As simple as comparing the measurement systemto a standardDeveloping an appropriate conversion schemeand then using converted values in place ofrecording observed measurements.
26 / 33
What andWhy
Terms
Measure
Key Words
If You Can't Measure, You Can't DoStatistics
Accuracy VS. Precision
Accuracy: how close a measurement is to the truevalue "on average"Precision: the amount of variation in repeatedmeasurement of the same objectComparing measurement to target shooting
27 / 33
Section 1.4Section 1.4Mathematical ModelsMathematical Models
28 / 3328 / 33
What andWhy
Terms
Measure
MathModels
Mathematical Models and Data AnalysisA discussion on the relationships of mathematics to thephysical words and to engineering statistics.
Mathematical Model: A description of aphysical system using mathematical conceptsand language (in terms of symbols, equations,numbers, and the like)
Identifying mathematical relationships between partsof a system allows us to describe complexity in simpleterms.An effective mathematical model is the one which issimple and has predictive ability.
29 / 33
What andWhy
Terms
Measure
MathModels
Mathematical Models and Data AnalysisExample: Height of an Object in Projectile Motion
We can describe the relationship between height of aprojectile and time as
where
is the initial height, is the initial vertical velocity, and
is the (constant) acceleration due to gravity
h t
h = h0 + vh ⋅ t − gt2, t ≥ 0,1
2
h0vh
g
30 / 33
What andWhy
Terms
Measure
MathModels
Mathematical Models and Data AnalysisExample: Height of an Object in Projectile Motion
31 / 33
What andWhy
Terms
Measure
MathModels
Example: Height of an Object in Projectile Motion, cont.
However, this is not what we see in real life for a varietyof reasons. This model assumes
1. is constant as the ball falls, while actually dependson the distance between the object and earth,
2. is a known to infinite accuracy, while we would beusing a value that is estimated,
3. Gravity is the only force acting on the object, ignoringdrag force, electrical attractions, etc.
4. There are no other changes in the system (forinstance, changes in air pressure)
We can fix these by writing a better relationship or we canaccept that some things won't be known and use astochastic model - a mathematical model that specificallyallows for variation (or "randomness"). Understandinghow these stochastic models work is a major focus of thiscourse.
g g
g
32 / 33
What andWhy
Terms
Measure
MathModels
What's my pointWe cannot say there is no variation in themeasurement or the relation under study is justaffected by the components defined in themathematical model.
We can control some parts of the variation byplanning the data collection process
There is always some error out of control which arestochastic (random)
Statistical methods help to deal with this randomness
33 / 33
Top Related