Statistical Theory of Estimation

Post on 10-Jan-2017

102 views 2 download

Transcript of Statistical Theory of Estimation

The Phone Keypad Number Puzzle

The Phone Keypad Number Puzzle

The Phone Keypad Number Puzzle

Presenter
Presentation Notes
The sum of numbers in a row, vertical, horizontal or diagonal is always divisible by 3 The key is in the remainders

1 / 3 => 2

2 / 3 => 1

3 / 3 => 0

Presenter
Presentation Notes
The sum of numbers in a row, vertical, horizontal or diagonal is always divisible by 3 The key is in the remainders

Why Does a Circle Have 360 Degrees?

Babylonians

Presenter
Presentation Notes
Why not 100 or 500 or even 720? Could be Greeks as well, but these guys

Babylonians

Presenter
Presentation Notes
Why not 100 or 500 or even 720? Could be Greeks as well

What does 360 resemble?

Presenter
Presentation Notes
Why not 100 or 500 or even 720? Could be Greeks as well

What does 360 resemble?

Presenter
Presentation Notes
Earth year is roughly 365 days So everyday sun moves about 1/365 of the way along a huge circle all the way around the Earth (called ecliptic) If you lived few millennia ago and di not have modern instruments to accurately calculate this, you would think it is 360 Make sense?

Babylonian Calendar

Presenter
Presentation Notes
Also, 360 is a lovable number Divisible by 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 18, 20, 24, 30, 36, 40, 45, 60, 72, 90, 120, 180 and of course 360. And that makes 360 a really convenient number because it means we can divide a circle into 2, 3, 4, 5, 6, 8, 9, 10, 12, and so on even parts. It makes solving problems by hand—which, mind you, was the only way to solve problems thousands of years ago—much easier.

TJ Gokcen @tjgokcen

Estimating the maximum of a discrete uniform distribution from sampling data

TJ Gokcen @tjgokcen

German Tank Problem

Presenter
Presentation Notes
During the course of the war, the Western Allies made sustained efforts to determine the extent of German production and approached this in two major ways: conventional intelligence gathering and statistical estimation. In many cases, statistical analysis substantially improved on conventional intelligence. In some cases, conventional intelligence was used in conjunction with statistical methods, as was the case in estimation of Panther tank production just prior to D-Day.
Presenter
Presentation Notes
The allied command structure had thought the Panzer V (Panther) tanks seen in Italy, with their high velocity, long-barreled 75 mm/L70 guns, were unusual heavy tanks and would only be seen in northern France in small numbers. But it was important to know how much of these tanks were produced and then determine where they could be sent Shortly before D-Day, rumors indicated that large numbers of Panzer V tanks were being used.
Presenter
Presentation Notes
The Allies attempted to estimate the number of tanks being produced. To do this, they used the serial numbers on captured or destroyed tanks. The principal numbers used were gearbox numbers, as these fell in two unbroken sequences. Chassis and engine numbers were also used, though their use was more complicated. Various other components were used to cross-check the analysis. Similar analyses were done on tires, which were observed to be sequentially numbered
Presenter
Presentation Notes
Analysis of wheels from two tanks (32 road wheels each, 64 road wheels total) yielded an estimate of 270 tanks produced in February 1944, substantially more than had previously been suspected German records after the war showed production for the month of February 1944 was 276. The statistical approach proved to be far more accurate than conventional intelligence methods, and the phrase "German tank problem" became accepted as a descriptor for this type of statistical analysis. And how successful were they in estimating?
Presenter
Presentation Notes
As can be seen they were very close to the real numbers So how did they do it?

Estimators

a rule telling you how to calculate a special type of statistic that tells you not only about the properties of a sample of data, but also about the properties of the entire population from which the sample was drawn.

Population Maximum

an estimator rule that will help us estimate the value of the largest integer in the bag using only the values in the sample

Presenter
Presentation Notes
Population vs Sample data

10, 23, 17, 9, 35, 3

A bag of tiles with numbers on them

10, 23, 17, 9, 35, 3

Keep in mind we have 42 tiles in the bag

A bag of tiles with numbers on them

10, 23, 17, 9, 35, 3

Keep in mind we have 42 tiles in the bag

A bag of tiles with numbers on them

How do we come up with population maximum?

10, 23, 17, 9, 35, 3

- Twice the biggest integer 2 x 35 = 70

A bag of tiles with numbers on them

10, 23, 17, 9, 35, 3

- Twice the biggest integer 2 x 35 = 70

A bag of tiles with numbers on them

- Twice the mean value = 16 x 2 = 32

10, 23, 17, 9, 35, 3

- Twice the median value

A bag of tiles with numbers on them

- Put all the numbers in numerical order:3, 9, 10, 17, 23, 35

Presenter
Presentation Notes
- To find median value take the number in the middle or two numbers in the middle and take the mean value of them

10, 23, 17, 9, 35, 3

- Twice the median value

A bag of tiles with numbers on them

- Put all the numbers in numerical order:3, 9, 10, 17, 23, 35

10, 23, 17, 9, 35, 3

- Twice the median value

A bag of tiles with numbers on them

- Median is 10+17 /2 = 13.5 x 2 = 27

Presenter
Presentation Notes
All of the values are way off So how do we calculate this?

sample max

sample sizepop max = sample max + - 1

Presenter
Presentation Notes
Right, so let’s plug in our numbers

sample max

sample sizepop max = sample max + - 1

35

6pop max = 35 + - 1

Our numbers: 10, 23, 17, 9, 35, 3

sample max

sample sizepop max = sample max + - 1

pop max = ~40

Our numbers: 10, 23, 17, 9, 35, 3

Presenter
Presentation Notes
the population maximum is estimated to be equal to the sample maximum…plus a little bit more. And that little bit more is basically equal to the average gap between the numbers in the sample.  Of course the real formula is a bit more complicated than this
Presenter
Presentation Notes
Frequentists approach to estimate the number of tanks
Presenter
Presentation Notes
Bayesian analysis to estimate the number of tanks
Presenter
Presentation Notes
Final formula (or there of)

• Number of bugs• Number of user stories• Number of user story points• Team Capacity

• It’s abused

• It’s abused• Never taken as the estimation always as the final number

• It’s abused• Never taken as the estimation always as the final number• Used to stress out developers

• It’s abused• Never taken as the estimation always as the final number• Used to stress out developers• Scope Creep

• What is the aim of the project?• What do we expect to get out of it?• Where does the project fit with in the

organization?• What other areas does it impact?

• What is my team’s capacity?• Do we need to hire more people or

outsource?• Launch a start up for this project?• Where does marketing come in?

• For iterations etc. estimating is sufficient, because you will be making granular decisions

• Otherwise, budgeting especially with lack of granularity, is a better fit

Presenter
Presentation Notes
For more accurate estimates, you need more granularity The more waterfall it gets.

• Budget using a top-down approach• Let’s say we’re building an online

bookstore

• Shopping Cart• Browse Books• Search Books• Manage Inventory• Preview Inside of Book

Presenter
Presentation Notes
Do we have enough information to answer “How much is this going to cost? Probably not. We need to get more granular. This is too high level. Let’s break down search books I to details like, By Author, By title etc. If we have some experience, and a balanced team should, building such a component then we can come up with some timelines. Let’s take a look at them.

• Should we build this software?

• Do we have enough info to answer: Should we make this software?

Presenter
Presentation Notes
If our budget is $500k, then we do have enough information. The answer is no, we can’t afford it. If our budget is $5M, then we do have enough information. The answer is yes, we can afford it. If our budget is $2.5M, then we do not have enough information.
Presenter
Presentation Notes
If your budget is somewhere in between, then more information is needed. Prioritize topics Required vs Nice to Have Then we get the confidence levels. http://www.stridenyc.com/ballpark
Presenter
Presentation Notes
http://www.stridenyc.com/ballpark
Presenter
Presentation Notes
You are essentially building your MVP And spending as little time on budgeting and estimating as possible