Regression Project

19
SMBA-Project Report 1 Relationship between Literacy and Population exposed to Primary Schools In Uttar Pradesh Statistical Modeling for Business Analytics PROJECT REPORT Submitted By: Nishu Navneet (12125032) Sushil Panigrahi (12125048) Mangesh Dharwad(12125026)

description

Relationship between Literacy andPopulation Exposed to Primary Schools In Uttar Pradesh Statistical Modeling for Business Analytics – PROJECT REPORT

Transcript of Regression Project

  • SMBA-Project Report 1

    Relationship between Literacy and Population exposed to Primary Schools

    In Uttar Pradesh

    Statistical Modeling for Business Analytics PROJECT REPORT

    Submitted By:

    Nishu Navneet (12125032) Sushil Panigrahi (12125048) Mangesh Dharwad(12125026)

  • SMBA-Project Report 2

    INDEX

    Introduction3

    Data Collection and References.4

    Variables 5

    Tools and Methods.....7

    Analysis .....8

    Observation and Conclusion .....10

    Result .11

    Annexures...12

  • SMBA-Project Report 3

    Introduction

    Uttar Pradesh is the most populous state in India. Its population accounts for 16.4

    per cent of the countrys population. It is also the fourth largest state in

    geographical area covering 9.0 per cent of the countrys geographical area. It has

    83 districts, 901 development blocks and 112,804 inhabited villages. The density

    of population in the state is 473 persons per square kilometers while of the country

    it is 274. The literacy rate in Uttar Pradesh stood at 56.27% overall with 67% male

    and 43% female literate.

    Indian constitution defines literacy for people aged seven years and above with

    ability to read and write with understanding in any language. In India we

    denominate its poor literacy rate with following reasons-

    1. Absence of adequate school infrastructure

    2. Improper facility

    3. Inefficient teaching staff

    4. Existence of Caste-Religion disparity

    5. Poverty

    The purpose of this report is to find the relationship between literacy of a district

    in Uttar-Pradesh with number of person exposed to primary schools in their

    villages. We have tried to measure its significance while putting many other factors

    in account.

  • SMBA-Project Report 4

    Data Collection

    The relevant data to do the analysis has been collected from the government census

    website.

    It can be found here-

    http://censusindia.gov.in/Tables_Published/Basic_Data_Sheet.aspx

    Through the website data for each district can be collected. The tabling of data into

    the excel sheet has been done manually. Utmost sincerity and precaution has been

    taken care while putting data in excel columns.

    The data collected is from the source of 2001 census data.

    Other references have been taken from here-

    http://lawmin.nic.in/ncrwc/finalreport/v2b1-5.htm

    http://upgov.nic.in/upecon.aspx

  • SMBA-Project Report 5

    Variables

    Data collected has been assigned under various variables listed below.

    Primary Variables - Collected in raw form

    1. District (String) Name of the district

    2. Literacy (Scale) Number of population literate in the district, according to

    government standards.

    3. Population (Scale) Population of the district

    4. Males (Scale) Male population in the district

    5. Hindu (Scale) Hindu population in the district

    6. Muslim (Scale) Muslim population in the district

    7. NoOfHouseholds (Scale) Number of Household present in the district

    8. TotalVillages (Scale) Total number of villages present in the district

    9. PrimarySchoolsAvail (Scale) Total number of villages with the facility of

    primary schools

    10. BusServiceAvail (Scale) Total number of villages with bus service

    availability.

    Secondary Variables Variables with operations on primary variables to do further

    analysis

    11. PrimarySchoolExposure (Scale) Average population exposed to primary

    schools

    12. BusServiceExposure (Scale) Average population exposed to Bus service

    13. PctHindu (Scale) Percentage of hindu population in the district

    14. PctMuslim (Scale) Percentage of Muslim Population in the district

  • SMBA-Project Report 6

    15. PrimarySchoolExposureSquared (Scale) Square of Average population

    exposed to primary schools

    16. PrimarySchoolExposureCube (Scale) Cube of Average population

    exposed to primary schools

    17. NormalLofOfPrimarySchoolExposure (Scale) Normal logarithm of

    Average population exposed to primary schools

    18. NormalLofOfPrimarySchoolExposureSqaured (Scale) Square of Normal

    logarithm of Average population exposed to primary schools

    19. NormalLofOfPrimarySchoolExposureCube (Scale) Cube of Normal

    logarithm of Average population exposed to primary schools

    20. HighPercentageOfMuslimPopulation (Binomial) Value 1 assigned to

    variable if the Percentage Muslim Population is above the mean of the

    percentage Muslim population

    21. InteractionBetweenMuslimPopulationAndPopulationExposureToSchool

    (Numeric) Multiplication of the variables

    HighPercentageOfMuslimPopulation and PrimarySchoolExposure, to study

    interaction effect

    22. HouseholdSize (Numeric) Average number of person staying in a

    household for the district.

    23. HighPercentHouseHoldSize (Binary) - Value 1 assigned to variable if the

    Household size of the district is above the mean of the Household sizes

    24. InteractionBetweenHighHouseholdSizeAndPSExpsoure (Numeric) -

    Multiplication of the variables

    InteractionBetweenHighHouseholdSizeAndPSExpsoure and

    PrimarySchoolExposure, to study its interaction effect

  • SMBA-Project Report 7

    Tools and Methods

    IBM SPSS Statistics software has been used to do regression and other statistical

    analysis.

    The analysis includes

    1. Scatter Plot

    2. Curve Estimation

    3. Linear Regression

    4. Multiple Linear Regression

    5. Polynomial Non-linear Regression

    6. Linear-Log Modeling

    7. Linear-Log Model with Higher Powers

    8. Interaction Between Continuous and Binary Variable

  • SMBA-Project Report 8

    Analysis

    The relationship between Literacy and population exposed to primary schools can

    be shown by the scatter plot. The scatter plot is suggesting a linear relationship.

    1. Model 1 The equation 1 in Result (Page Number - 11) is depicting a linear

    relationship of literacy with population exposed to primary schools. The

    coefficient is statistically significant at 5% significance level and is giving a

    model with Adjusted 2 as high as 71.7%

    2. Model 2 The equation 2 in Result (Page Number - 11) is depicting a multiple

    relationship of literacy with population exposed to primary schools, population

  • SMBA-Project Report 9

    exposed to bus services and percentage of Hindu population. The coefficients

    are statistically significant at 5% significance level for two regressors but not

    for the population exposed to bus service. The Adjusted 2

    for the model is

    72.90%.

    3. Model 3 - The equation 3 in Result (Page Number - 11) is depicting a non-

    linear relationship of literacy with population exposed to primary schools, to its

    square and cube. The coefficient is statistically significant at 5% significance

    level for the population exposed to primary schools only but not for its square

    and cube. The Adjusted 2 for the model is 72.20%.

    4. Model 4 - The equation 4 in Result (Page Number - 11) is depicting a Linear-

    Log model of literacy with population exposed to primary schools. The

    coefficient is statistically significant at 5% significance level for the natural log

    of population exposed to primary schools. The Adjusted 2

    for the model is

    66.5%.

    5. Model 5 - The equation 5 in Result (Page Number - 11) is depicting a Linear-

    Log model of literacy with population exposed to primary schools, to its square

    and cube. The coefficient is statistically significant at 5% significance level for

    the natural log of population exposed to primary schools and to its cube but not

    for its square. The Adjusted 2 for the model is 71.30%.

    6. Model 6 The equation 6 in Result (Page Number - 11) is depicting a multiple

    relationship of literacy with population exposed to primary schools, High

    percentage of Muslim population and their interaction term. The coefficients are

    statistically significant at 5% significance level for population exposed to

    primary schools but not for others. The Adjusted 2 for the model is 74.5%.

    7. Model 7 The equation 6 in Result (Page Number - 11) is depicting a multiple

    relationship of literacy with population exposed to primary schools, High

    percentage of Household size and their interaction term. The coefficients are

    statistically significant at 5% significance level for population exposed to

    primary schools and for high percentage of household size but not for their

    interaction. The Adjusted 2 for the model is 72.7%.

  • SMBA-Project Report 10

    Observation

    1. There is a significant relationship between Literacy and population exposed to

    primary schools.

    2. There is no association between Literacy and population exposed to bus service.

    3. Percentage of Hindu population is associated with literacy.

    4. Relationship between Literacy and population exposed to primary schools is

    linear in nature.

    5. High percentage of Muslim population is negatively related with the Literacy

    but it is not statistically significant.

    6. High percentage of Household Size is negatively related with the Literacy but

    there is no interaction between household size and population exposed to

    primary schools.

    Conclusion

    1. From the above analysis we can conclude that more the population is exposed

    to primary schools more will be the literacy in Uttar-Pradesh.

    2. 1% increase in population exposed to primary school will lead to an increase of

    [.01 * 998974.215 = 9989.74215] ~ 10000 literates in Uttar Pradesh

    3. Decreasing the Household Size (number of people per household) will increase

    the literacy in Uttar Pradesh

  • Result

    Regressor 1 2 3 4 5 6 7

    Unstandarised Intercept Coeffecients 96260.478 -599561.359 -207629.731 -13046171.64 70834486.19 73238.91 -80423.001

    Population Exposed To Primary Schools 0.849 0.659 1.85 0.931 0.946(Sig.) 0.001 0.001 0.105 0.001 0.001

    Sqaure Population Exposed To Primary Schools -1.105(Sig.) 0.273

    Cube of Population Exposed To Primary Schools 1.706(Sig.) 0.206

    Natural Log Of Population Exposde To Primary Schools 0.813 -0.6402(Sig.) 0.001 0.001

    Square Of Natural Log Of Population Exposde To Primary Schools **(Sig.)

    7.219Cube of Natural Log Of Population Exposde To Primary Schools 0.001

    (Sig.)

    Population Exposed To Bus Service 0.073(Sig.) 0.701

    Percentage of Hindu Population 0.151(Sig.) 0.033

    High Percentage Of Muslim Population -0.063(Sig.) 0.694

    High Percentage Of Muslim Population * Population Exposed To Primary Schools -0.147(Sig.) 0.411

    High Percentage Of HouseholdSize 0.32(Sig.) 0.05

    High Percentage Of HouseholdSize* Population Exposed To Primary Schools -0.272(Sig.) 0.118

    R Sqaure 0.721 0.741 0.734 0.661 0.721 0.757 0.739 Adjusted R Square 0.717 0.729 0.722 0.665 0.713 0.745 0.727

    Regression Model of Literacy in Uttar PradeshDependent Variable : Literacy; 68 Observations, Significance Level = 10% , RED = Insignificant

    (Sig.) p-Value

    SMBA-Project Report 11

  • Annexures

    Curve Estimation

    Model Summary and Parameter Estimates

    Dependent Variable:Literacy

    Equation

    Model Summary Parameter Estimates

    R Square F df1 df2 Sig. Constant b1 b2 b3

    Linear .721 170.983 1 66 .000 96260.478 .638

    Logarithmic .661 128.472 1 66 .000 -

    13046171.63

    6

    998974.215

    Quadratic .727 86.753 2 65 .000 323034.544 .354 7.442E-8

    Cubic .734 58.944 3 64 .000 -207629.731 1.390 -5.096E-7 9.829E-14

    The independent variable is Population Exposed to Primary Schools.

    SMBA-Project Report 12

  • Means

    Descriptive Statistics

    N Minimum Maximum Mean Std. Deviation

    Percentage Muslim Population 68 2.95 49.14 18.0329 10.63096

    Percentage Hindu Population 68 47.05 96.21 81.3617 11.00021

    HouseholdSize 68 5.66 8.36 6.4465 .46777

    Valid N (listwise) 68

    Regression Results Linear Regressions

    1. Linear Relation b/w Literacy and Primary School Exposure to the Population

    Literacy = 0 + 1PopulationExposedToPrimarySchool

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .849a .721 .717 305639.51070

    a. Predictors: (Constant), Population Exposed to Primary Schools

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval

    for B

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound

    1 (Constant) 96260.478 92742.929 1.038 .303 -88906.754 281427.710

    Population Exposed to

    Primary Schools

    .638 .049 .849 13.076 .000 .541 .735

    a. Dependent Variable: Literacy

  • 2. Multiple Regressions with linear Regressor

    Literacy = 0 + 1PopulationExposedToPrimarySchool + 2 PopulationExposedToBusService + 3PercentageHinduPopulation

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .861a .741 .729 299389.95576

    a. Predictors: (Constant), Population Exposed to Primary Schools,

    Percentage Hindu Population, Population Exposed to Bus Service

    Coefficientsa

    Model

    Unstandardized Coefficients

    Standardized

    Coefficients

    t Sig. B Std. Error Beta

    1 (Constant) -599561.359 332102.285 -1.805 .076

    Population Exposed to Bus

    Service

    .073 .190 .030 .386 .701

    Percentage Hindu

    Population

    7875.853 3622.053 .151 2.174 .033

    Population Exposed to

    Primary Schools

    .659 .057 .877 11.526 .000

    a. Dependent Variable: Literacy

  • Non-Linear Regression

    1. PolynomialRegression Model

    Literacy = 0 + 1PopulationExposedToPrimarySchool + 2 PopulationExposedToPrimarySchool

    2 + 3 PopulationExposedToPrimarySchool

    3

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .857a .734 .722 303187.19985

    a. Predictors: (Constant), Cube of population exposed to primary

    school, Population Exposed to Primary Schools, Square of Population

    exposed to primary schools

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval for

    B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -207629.731 465363.453 -.446 .657 -1137300.100 722040.638

    Population Exposed to

    Primary Schools

    1.390 .846 1.850 1.643 .105 -.300 3.079

    Square of Population

    exposed to primary

    schools

    -5.096E-7 .000 -2.647 -1.105 .273 .000 .000

    Cube of population

    exposed to primary

    school

    9.829E-14 .000 1.706 1.278 .206 .000 .000

    a. Dependent Variable: Literacy

  • 2. Linear-Log Model

    Literacy = 0 + 1ln(Population Exposed To Primary School)

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .813a .661 .655 337395.93911

    a. Predictors: (Constant), Normal Log of Population Exposed to Primary

    School

    Coefficientsa

    Model

    Unstandardized Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval for B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -13046171.636 1258245.056 -10.369 .000 -15558338.945 -10534004.327

    Normal Log of

    Population Exposed

    to Primary School

    998974.215 88135.391 .813 11.335 .000 823006.230 1174942.201

  • 3. Linear Log Model with powers

    Literacy = 0 + 1ln(PopulationExposedToPrimarySchool)+ 2[ln(PopulationExposedToPrimarySchool)]

    2+ 3[ln(PopulationExposedToPrimarySchool)]3

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .849a .721 .713 308180.18029

    a. Predictors: (Constant), Cube of Normal Log of Population Exposed to

    Primary School, Normal Log of Population Exposed to Primary School

    Coefficientsa

    Model

    Unstandardized Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval

    for B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) 70834486.18

    5

    22362519.46

    9

    3.168 .002 26173450.82

    6

    1.155E8

    Normal Log of

    Population Exposed to

    Primary School

    -

    7868268.095

    2362248.104 -6.402 -3.331 .001 -

    12586003.33

    3

    -

    3150532.856

    Cube of Normal Log of

    Population Exposed to

    Primary School

    14632.725 3895.918 7.219 3.756 .000 6852.039 22413.410

    a. Dependent Variable: Literacy

  • Interaction between Independent Variables

    1. Continuous and Binary Variable

    Literacy = 0 + 1PopulationExposedToPrimarySchool + 2HighPercentageOfMuslimPopulation+ 3(HighPercentageOfMuslimPopulation * PopulationExposedToPrimarySchool )

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .870a .757 .745 290132.86319

    a. Predictors: (Constant),

    InteractionBetweenMuslimPopulationAndPopulationExposureToSchool,

    Population Exposed to Primary Schools, Binary Variable of High

    Muslim Population

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval for

    B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) 73238.912 110327.732 .664 .509 -147166.071 293643.894

    Population Exposed to

    Primary Schools

    .700 .062 .931 11.247 .000 .575 .824

    Binary Variable of High

    Muslim Population

    -74442.482 188655.158 -.063 -.395 .694 -451324.484 302439.521

    InteractionBetweenMusli

    mPopulationAndPopulati

    onExposureToSchool

    -.079 .096 -.147 -.828 .411 -.271 .112

    a. Dependent Variable: Literacy

  • 1. Continuous and Binary Variable

    Literacy = 0 + 1PopulationExposedToPrimarySchool + 2 HighPercentageOfHouseholdSize+ 3 (HighPercentageOfHouseholdSize* PopulationExposedToPrimarySchool )

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .860a .739 .727 300233.86148

    a. Predictors: (Constant), Interaction Between HighHousehold Size and

    Exposure to Primary Schools, Population Exposed to Primary Schools,

    Binary Variable of High Household size

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval for

    B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -80423.001 126561.439 -.635 .527 -333258.541 172412.539

    Population Exposed to

    Primary Schools

    .711 .066 .946 10.705 .000 .578 .844

    Binary Variable of High

    Household size

    365077.504 182345.455 .320 2.002 .050 800.582 729354.427

    Interaction Between

    HighHousehold Size and

    Exposure to Primary

    Schools

    -.152 .096 -.272 -1.586 .118 -.344 .039

    a. Dependent Variable: Literacy

    SMBA Project Report - Final.pdfResult.pdfAnnexures.pdf