Smqa unit iii

UNIT - III

GOOD ESTIMATES

• Process Prediction• It guides our decision making,

– before the development begins, – Through the development process– During the transition of the product to the

customer.– And while the software is being maintained.

Mr. M. E. Patil S.S.B.T COET, Bambhori

• A prediction is useful only if it is reasonably accurate.

• Predictions are not expected to be the exact but to be close enough to the eventual actual numbers that we can make sound judgment.


What is an estimate ?

• As prediction is the range of window, rather than a single number.

• An estimate is not a target, it is the probabilistic assessment, so that the value produced as an estimate is really the center of a range.


• Form the graph we can compute the probability that any project based on the given requirements will be completed within a time interval [ t1,t2]

• The probability is simply the area under the curve between t1 and t2


• Formally the estimate is defined as the median of the (unknown) distribution.

• To indicate the window , estimate should be presented as a triple – The most likely value(i.e. median of distribution)– Plus the lower and upper bounds of the value


Evaluating the estimate accuracy

• Suppose E is estimated value and A is the actual value.

• The relative error in estimate is • RE = (A-E)/A• If estimate is greater than the actual value ,

the relative error is negative• If estimate is less than the actual value , the

relative error is positive


Mean relative error for n projects


Mean magnitude of relative error is

If this is small our estimate is good


Cost Estimation problems and approaches

• Novelty• Politics• Technology change• Price to win.


Current approaches to cost estimation

• Expert Opinion.• Analogy• Decomposition• Models


Bottom-up and top-down estimation

• Bottom-up estimation begins with the lowest level parts of the products or tasks, and provides estimates for each.

• Top-down estimation begins with the overall process or product. A full estimate is made and then the estimates for the components are calculated


Models for Effort and Cost.

• By deploying the models in the process, estimators can examine the relationship between the model and its accuracy, so that they can be fine tune it to improve accuracy for future projects.

•


Two types of models for estimating the efforts.

• Cost Models:- Provides the direct estimates of efforts or duration. Most cost models are based on empirical data reflecting factors that contribute to overall cost.

• These models have one primary input (usually the measure of product size). And a number of secondary adjustment factors (cost drivers).

• Cost Drivers:- Are characteristics of the project , process, products or resources that expected to influence effort and duration in some way.


• Constraint Models: demonstrates the relationship overtime between two or more parameters of effort, duration, or staffing level.

• The Rayleigh Curve is used as a constraint model in several commercial products.


Regression-based Models

• By collecting the data form the previous projects and examining relationship among the attribute measures captured , software engineers hypothesized that some factors could be related by and equation.


• The regression has been performed using the logarithm of project effort(measured in person months) on the Y axis and the logarithm of project size(measured in thousands of lines of code) on X axis

• Transforming the linear equation– Log E = log a + b log S


From log-log domain to real domain yields and exponential relation


• If size were a perfect predictor of effort then every point of the graph would lie on the line of the equation, with residual error of 0.

• In reality there is usually significant residual error.

• Identify the factors that cause the variation between predicted and the actual efforts.

• A factor analysis can help to identify these additional parameters.


• Where F is the effort adjustment factor, computed as the product of the cost driver values.

• This computation of F is valid only when the individual adjustment factors are independent.


COCOMO

• Constructive Cost Model• Original COCOMO :- Effort• It is collection of three models • 1. Basic Model:- It can be applied when little

about the project is known• 2. Intermediate model:- is used after

requirements are specified.• 3. Advanced model is used when design is

complete


• All take the same form as above.• E- Efforts in person months• S size measured in thousands of delivered

source instructions (KDSI) and • F is the adjustment factor equal to 1 for basic

model


• The values of a, b are listed in table depends on the development mode, determined by the type of software under construction.


• Organic System:- It involves data processing.• Embedded system:- Contains real-time

software that is an integral part of a larger, hardware-based systems.

• Semi-detached, is somewhere in between organic and embedded.


Original COCOMO :- Duration


COCOMO 2.0

• In stage 1 when the project is using prototypes for high-risk issues including user interfaces, software and system interaction COCOMO 2.0 estimates size in object points.

• At stage 2, a decision the designer considers whether to use different architectures and notions of operation. There is not sufficient information to support fine-grained effort and duration estimates. For stage 2 COCOMO 2 uses function point as a size measure.


• By stage 3, development as started and far more information is available. This stage matches the original COCOMO model, in that sizing can be done with regards lines of code and many factors can be estimate with a certain degree of comfort. COCOMO 2 uses models of reuse, incorporates maintenance and breakage and more.


Putnam’s SLIM model:-

• In 1978 the US army required a method to estimate total effort and delivery time for massive projects

• Putnam produced the SLIM model, to be used on projects containing in excess of 70,000 lines of code.

• The SLIM equation can be altered for smaller projects.


• Putnam’s model assumes that effort for software development is distributed similarly to a group of Rayleigh curves, one for each major development activity .

• Putnam used some empirical observations about productivity levels to produce his software equation from the basic Rayleigh curve formula.


• This equation relates size (lines of code) to various variables: – a technology factor, C, – total project effort measured in person years, K,– and elapse time to delivery, td , measured in

years.


• The relationship is expressed as above.• In theory, it is the point where the Rayleigh

curve reaches a maximum. In practice this technology factor can take on up to 20 values, and the equation can not be used unless size can be estimated, the technology factors agreed on, and either K or held constant.


• As SLIM software equation includes the fourth power, it has strong implication for resource allocation on large projects.


• D0 constant called as Manpower acceleration takes value depending upon type of project.

• Ex : 12.30 for system interacting with others• 15 for stand alone systems

27 for re-implementation of existing systemsUsing these two equations we can find effort or

duration.


• We can derive the above equation.• SLIM uses different Rayleigh curves for

– Design and code– Test and validation– Maintenance– Management


Problems with existing modeling methods

• Model Structure– Most researchers and practitioners agree that

product size is the key to establishing the effort needed to create the product.

– But the exact association between size and effort is not known.

– Most of the models suggests that effort is approximately proportional to size, but they include an adjustment for diseconomy of scale.

– So that larger projects are less productive.


• Overlay Complex Models– organization’s specific features can affect its

productivity– many models incorporate adjustment factors, like

COCOMO’s cost drivers and SLIM’s technology factor, to provide the flexibility needed to capture these differences.


This generalized approach has fallen short of its promises

• Using the COCOMO cost drivers does not always improve the estimation accuracy.

• It is not simple to obtain an accurate estimate of the technology factor, and SLIM estimates are very sensitive to the technology factor.

• Cost drivers are not independent but are treated as such.


• Cost driver values are normally based on subjective assessments of the influence of some factor on overall project effort.

• Current models include various adjustment factors, enabling the estimator to cope with many projects. However, the project undertaken by a single group are normally very similar and so only a few factors need to be considered.


Product size estimation:-

• Most models need an estimation of the size of the product.

• This variable is not measurable early in the life cycle

• Models like COCOMO and SLIM need size in lines of code (LOC), but LOC can not be producing from the requirement or invitation to bid for the project.

• Estimates of LOC is often difficult to perform.


Dealing with the Problems of Current Estimation Methods:-

• Local data definition:-– The first and most critical approach of improving

cost estimation in a particular environment is to use size and effort measures that are defined consistently across the environment.

– The measurement must be understood across all who must supply or use them, and two people measuring the same product should produce basically the same number or rating.


Calibration:-

• Calibration significantly improves the accuracy of all models.

• The calibration process includes two processes Ensuring that the values supplied to the

model of consistent with the model needs and expectations. Readjusting the model coefficients using

data from the past project matches the basic productivity found in the new environment.


Independent Estimation Group:-

• DeMarco states that it is useful to assign estimating responsibility to a particular group of people.

• This group offers estimates to all projects, storing the outcomes of all data capture and analysis in a historical database


Reduce input subjectivity:-

• Subjectivity of ratings and early estimates of code size can add to the inaccuracy of estimates.

• There is interest in different size measurements that match better the likely size of the final product.


Preliminary estimates and re-estimates:-

• Early estimates involve using incomplete information.

• Estimation is likely to improve by performing two steps: basic preliminary estimates on measures of available products, and re-estimating as more information becomes available.

• It is likely that the first estimates are founded on expert opinion and analogue. There are various approaches available to improve the outcomes of using expert opinion and analogy.


Group estimation:-

• One simple approach is to gain the opinion of a group of experts instead of an individual

• This allows the views of individuals with diverse project experience to be incorporated in the estimation.


• Putman and Fitsimmons suggest that group estimation should not be just a simple average of individual estimation


• Estimation by Analogy

• Re- Estimation

• Alternative size measures for cost measures


Locally developed cost measures

• Decomposition cost elements• Formulate cost theory• Collect Data• Analyze data and evaluate model• Check models


Basics of Reliability Theory

• The basic difficulty of reliability has its roots in the more general theory of systems and hardware reliability

• The basic difficulty of reliability theory is to anticipate when a system will finally fail

• With hardware reliability it is typically component failures due to physical wear that we are interested in

• Such failures are probabilistic in nature, in that we do not know when a component will fail but know it will fail


• so we can attach a probabilistic value of the likelihood the product will fail at a certain time

• We can use the same idea with software and produce a basic model of component reliability and create a probability density function (pdf) f of t (written f(t)) that describes our uncertainty about when the component will fail.


• Suppose it is known that a component has a maximum life of 10 hours

• It will fail within 10 hours of use. • Suppose it is just as likely to fail in the first 2

minutes of use as the last 2 minutes of the 10 hours.


The pdf f(t) for this behavior is shown in the figure


• The function f(t) is outlined to be 1/10 for any t between 0 and 10, 0 for and t>10.

• In general, for any x we can define the uniform pdf over interval [0,x] to be 1/x for any interval [0,x] and 0 elsewhere.


Uniform pdf

• The uniform distribution has various limitations for reliability modeling.

• It only applies when the failure time is bounded.

• In many situations, no such bound exists, and we need a pdf that reflects there may be an arbitrary long time to failure.


The figure below illustrates an unbounded pdf that reflects the concept that the failure time happen purely randomly. The

function is expressed as an exponential function


• we want to know how long a component will behave correctly before it fails

• we want to know the probability of failure from time 0 to a given time t

• The distribution function F(t) is the probability of failure between 0 and t, stated as:


• We say that a component survives until it fails the first time, so that we can think of as the opposite concept to failure.

• Thus, we define the reliability function R(t) as:• R(t)=1-F(t)• This function produces the probability that the

component will function properly up to time t.


• The mean time to failure (MTTF) is the mean of the probability density function. We can calculate the mean of the pdf f(t) as above

• The median time to failure is the point in time t at which the probability of failure after t is the same as the probability of failure before t.


Reliability problem for scenario of attempting to fix failures after each occurrence

• For each i, there is a new random variable that depicts the time of the ith failure.

• Each has its own probability density function .• Here we would anticipate that the probability

density function of to be different from the probability density function .

• We would expect to be greater than as newer components are less likely to fail than older ones.

• In such a situation we have reliability growth: successive observed failure times tend to grow


• A system runs successfully for a time and then it fails.

• Once the failure occurs there is a need to repair the fault.

• It is therefore useful to know the mean time to repair (MTTR) for the component that has failed

• Combining this time with the mean time to failure (MTTF) tells us how long the system is unavailable for the users mean time between failures (MTBF)

• MTBF = MTTF + MTTRMr. M. E. Patil

S.S.B.T COET, Bambhori

• Availability is the probability that a component is operating at a given point in time. Pressman defines available as


The software reliability problem• There are many reasons for software to fail, but none

involves wear and tear.• Usually software fails due to design problem• Other failures occurs when the code is written or

changes are introduced to a working system.• These changes are from new design, changed

requirement, revised design, corrections in existing problem.

• These does not create the failure immediately.• The failures are triggered only by certain states and

inputs.


• When changes are implemented without introducing new fault, so that by fixing the problem we increase the overall reliability of the system.

• When hardware fails, the problem is fixed by replacing the failed components with new ones or repaired one.

• And the system is restored to its previous reliability.

• Rather than growing the probability it is just maintained.


• The difference between the hardware and software reliability is the difference between the intellectual failure(due to design faults) and physical failure.

• Assumptions about software reliability – The software is operating in a real or simulated

environment.– When software failures occurs, attempts are made

to find and fix the faults that caused them.Mr. M. E. Patil

S.S.B.T COET, Bambhori

• We are not computing an exact time for the next failure

• we are using the past history to help us make prediction of the failure time.

• All attempts to measure reliability , how ever expressed are examples of prediction.


Parametric Reliability Growth Models

• we are modeling the reliability of a program that operates in the real world or simulates user environment, and faults are fixed after failure occurs

• We make two further assumptions about our program– Executing the program involves selecting inputs from

some space I (the totality of all possible inputs).– The program transforms the input into

outputs(Comprising a space O)


The Jelinski - Moranda modelThe J-K model is the best known model of reliability models.

It assumes that , for each i


• In the figure N is the initial number if faults, and Φ is the contribution of each fault to the overall failure rate.

• Thus underlying model is the exponential model, so that the type 1 uncertainty is random and exponential.

• There is no type 2 uncertainty in this model, it assumes that fault detection and correction begins when a program contains N faults and that fixes are perfect( in that they correct the fault causing the failure, and they introduce new faults).

• The model also assumes that all faults have the same rate.


• In figure the hazard rate for the exponential distribution is λ, it follows the graph of JM hazard rate looks like the step function.

• Between (i-1) and I failure, the hazard rate is • (N- i + 1) Φ• The interface procedure for JM is called

maximum likelihood estimation.


Criticisms of this model

• 1. The sequence of rates is considered by the model to be purely deterministic.

• This assumption is not realistic.• 2 The model assumes all faults contribute equally

to the hazard rate• Faults vary dramatically in their contribution to

program unreliability.• 3. We will show that the reliability predictions

obtained from the model are poor• They are usually too optimistic.


Littlewood model:-

• The Littlewood model attempts to be a more realistic fault model than Jelinski-Morando by seeing hazard rates as independent random variables.

• Whereas Jelinki-Morando is represented with equal steps, Littlewood has steps with diverse size.

• Both the Jelinski-Morando and Littlewood models are general classes called exponential order statistical models.


• In this kind of model, the faults can be seen as competing risks: at any point in time, any of the remaining faults can cause a failure, but the chance that it will be a specific fault is identified by the hazard rate for that fault

• It can be shown that the times, , at which the faults show themselves are independent, identically distributed random variables.

• For the J-K model, this distribution is exponential with parameter .


For the Littlewood model the distribution has a Pareto distribution:


Smqa unit iii

Education

Transcript of Smqa unit iii