SMBA-Project Report 1
Relationship between Literacy and Population exposed to Primary Schools
In Uttar Pradesh
Statistical Modeling for Business Analytics PROJECT REPORT
Submitted By:
Nishu Navneet (12125032) Sushil Panigrahi (12125048) Mangesh Dharwad(12125026)
SMBA-Project Report 2
INDEX
Introduction3
Data Collection and References.4
Variables 5
Tools and Methods.....7
Analysis .....8
Observation and Conclusion .....10
Result .11
Annexures...12
SMBA-Project Report 3
Introduction
Uttar Pradesh is the most populous state in India. Its population accounts for 16.4
per cent of the countrys population. It is also the fourth largest state in
geographical area covering 9.0 per cent of the countrys geographical area. It has
83 districts, 901 development blocks and 112,804 inhabited villages. The density
of population in the state is 473 persons per square kilometers while of the country
it is 274. The literacy rate in Uttar Pradesh stood at 56.27% overall with 67% male
and 43% female literate.
Indian constitution defines literacy for people aged seven years and above with
ability to read and write with understanding in any language. In India we
denominate its poor literacy rate with following reasons-
1. Absence of adequate school infrastructure
2. Improper facility
3. Inefficient teaching staff
4. Existence of Caste-Religion disparity
5. Poverty
The purpose of this report is to find the relationship between literacy of a district
in Uttar-Pradesh with number of person exposed to primary schools in their
villages. We have tried to measure its significance while putting many other factors
in account.
SMBA-Project Report 4
Data Collection
The relevant data to do the analysis has been collected from the government census
website.
It can be found here-
http://censusindia.gov.in/Tables_Published/Basic_Data_Sheet.aspx
Through the website data for each district can be collected. The tabling of data into
the excel sheet has been done manually. Utmost sincerity and precaution has been
taken care while putting data in excel columns.
The data collected is from the source of 2001 census data.
Other references have been taken from here-
http://lawmin.nic.in/ncrwc/finalreport/v2b1-5.htm
http://upgov.nic.in/upecon.aspx
SMBA-Project Report 5
Variables
Data collected has been assigned under various variables listed below.
Primary Variables - Collected in raw form
1. District (String) Name of the district
2. Literacy (Scale) Number of population literate in the district, according to
government standards.
3. Population (Scale) Population of the district
4. Males (Scale) Male population in the district
5. Hindu (Scale) Hindu population in the district
6. Muslim (Scale) Muslim population in the district
7. NoOfHouseholds (Scale) Number of Household present in the district
8. TotalVillages (Scale) Total number of villages present in the district
9. PrimarySchoolsAvail (Scale) Total number of villages with the facility of
primary schools
10. BusServiceAvail (Scale) Total number of villages with bus service
availability.
Secondary Variables Variables with operations on primary variables to do further
analysis
11. PrimarySchoolExposure (Scale) Average population exposed to primary
schools
12. BusServiceExposure (Scale) Average population exposed to Bus service
13. PctHindu (Scale) Percentage of hindu population in the district
14. PctMuslim (Scale) Percentage of Muslim Population in the district
SMBA-Project Report 6
15. PrimarySchoolExposureSquared (Scale) Square of Average population
exposed to primary schools
16. PrimarySchoolExposureCube (Scale) Cube of Average population
exposed to primary schools
17. NormalLofOfPrimarySchoolExposure (Scale) Normal logarithm of
Average population exposed to primary schools
18. NormalLofOfPrimarySchoolExposureSqaured (Scale) Square of Normal
logarithm of Average population exposed to primary schools
19. NormalLofOfPrimarySchoolExposureCube (Scale) Cube of Normal
logarithm of Average population exposed to primary schools
20. HighPercentageOfMuslimPopulation (Binomial) Value 1 assigned to
variable if the Percentage Muslim Population is above the mean of the
percentage Muslim population
21. InteractionBetweenMuslimPopulationAndPopulationExposureToSchool
(Numeric) Multiplication of the variables
HighPercentageOfMuslimPopulation and PrimarySchoolExposure, to study
interaction effect
22. HouseholdSize (Numeric) Average number of person staying in a
household for the district.
23. HighPercentHouseHoldSize (Binary) - Value 1 assigned to variable if the
Household size of the district is above the mean of the Household sizes
24. InteractionBetweenHighHouseholdSizeAndPSExpsoure (Numeric) -
Multiplication of the variables
InteractionBetweenHighHouseholdSizeAndPSExpsoure and
PrimarySchoolExposure, to study its interaction effect
SMBA-Project Report 7
Tools and Methods
IBM SPSS Statistics software has been used to do regression and other statistical
analysis.
The analysis includes
1. Scatter Plot
2. Curve Estimation
3. Linear Regression
4. Multiple Linear Regression
5. Polynomial Non-linear Regression
6. Linear-Log Modeling
7. Linear-Log Model with Higher Powers
8. Interaction Between Continuous and Binary Variable
SMBA-Project Report 8
Analysis
The relationship between Literacy and population exposed to primary schools can
be shown by the scatter plot. The scatter plot is suggesting a linear relationship.
1. Model 1 The equation 1 in Result (Page Number - 11) is depicting a linear
relationship of literacy with population exposed to primary schools. The
coefficient is statistically significant at 5% significance level and is giving a
model with Adjusted 2 as high as 71.7%
2. Model 2 The equation 2 in Result (Page Number - 11) is depicting a multiple
relationship of literacy with population exposed to primary schools, population
SMBA-Project Report 9
exposed to bus services and percentage of Hindu population. The coefficients
are statistically significant at 5% significance level for two regressors but not
for the population exposed to bus service. The Adjusted 2
for the model is
72.90%.
3. Model 3 - The equation 3 in Result (Page Number - 11) is depicting a non-
linear relationship of literacy with population exposed to primary schools, to its
square and cube. The coefficient is statistically significant at 5% significance
level for the population exposed to primary schools only but not for its square
and cube. The Adjusted 2 for the model is 72.20%.
4. Model 4 - The equation 4 in Result (Page Number - 11) is depicting a Linear-
Log model of literacy with population exposed to primary schools. The
coefficient is statistically significant at 5% significance level for the natural log
of population exposed to primary schools. The Adjusted 2
for the model is
66.5%.
5. Model 5 - The equation 5 in Result (Page Number - 11) is depicting a Linear-
Log model of literacy with population exposed to primary schools, to its square
and cube. The coefficient is statistically significant at 5% significance level for
the natural log of population exposed to primary schools and to its cube but not
for its square. The Adjusted 2 for the model is 71.30%.
6. Model 6 The equation 6 in Result (Page Number - 11) is depicting a multiple
relationship of literacy with population exposed to primary schools, High
percentage of Muslim population and their interaction term. The coefficients are
statistically significant at 5% significance level for population exposed to
primary schools but not for others. The Adjusted 2 for the model is 74.5%.
7. Model 7 The equation 6 in Result (Page Number - 11) is depicting a multiple
relationship of literacy with population exposed to primary schools, High
percentage of Household size and their interaction term. The coefficients are
statistically significant at 5% significance level for population exposed to
primary schools and for high percentage of household size but not for their
interaction. The Adjusted 2 for the model is 72.7%.
SMBA-Project Report 10
Observation
1. There is a significant relationship between Literacy and population exposed to
primary schools.
2. There is no association between Literacy and population exposed to bus service.
3. Percentage of Hindu population is associated with literacy.
4. Relationship between Literacy and population exposed to primary schools is
linear in nature.
5. High percentage of Muslim population is negatively related with the Literacy
but it is not statistically significant.
6. High percentage of Household Size is negatively related with the Literacy but
there is no interaction between household size and population exposed to
primary schools.
Conclusion
1. From the above analysis we can conclude that more the population is exposed
to primary schools more will be the literacy in Uttar-Pradesh.
2. 1% increase in population exposed to primary school will lead to an increase of
[.01 * 998974.215 = 9989.74215] ~ 10000 literates in Uttar Pradesh
3. Decreasing the Household Size (number of people per household) will increase
the literacy in Uttar Pradesh
Result
Regressor 1 2 3 4 5 6 7
Unstandarised Intercept Coeffecients 96260.478 -599561.359 -207629.731 -13046171.64 70834486.19 73238.91 -80423.001
Population Exposed To Primary Schools 0.849 0.659 1.85 0.931 0.946(Sig.) 0.001 0.001 0.105 0.001 0.001
Sqaure Population Exposed To Primary Schools -1.105(Sig.) 0.273
Cube of Population Exposed To Primary Schools 1.706(Sig.) 0.206
Natural Log Of Population Exposde To Primary Schools 0.813 -0.6402(Sig.) 0.001 0.001
Square Of Natural Log Of Population Exposde To Primary Schools **(Sig.)
7.219Cube of Natural Log Of Population Exposde To Primary Schools 0.001
(Sig.)
Population Exposed To Bus Service 0.073(Sig.) 0.701
Percentage of Hindu Population 0.151(Sig.) 0.033
High Percentage Of Muslim Population -0.063(Sig.) 0.694
High Percentage Of Muslim Population * Population Exposed To Primary Schools -0.147(Sig.) 0.411
High Percentage Of HouseholdSize 0.32(Sig.) 0.05
High Percentage Of HouseholdSize* Population Exposed To Primary Schools -0.272(Sig.) 0.118
R Sqaure 0.721 0.741 0.734 0.661 0.721 0.757 0.739 Adjusted R Square 0.717 0.729 0.722 0.665 0.713 0.745 0.727
Regression Model of Literacy in Uttar PradeshDependent Variable : Literacy; 68 Observations, Significance Level = 10% , RED = Insignificant
(Sig.) p-Value
SMBA-Project Report 11
Annexures
Curve Estimation
Model Summary and Parameter Estimates
Dependent Variable:Literacy
Equation
Model Summary Parameter Estimates
R Square F df1 df2 Sig. Constant b1 b2 b3
Linear .721 170.983 1 66 .000 96260.478 .638
Logarithmic .661 128.472 1 66 .000 -
13046171.63
6
998974.215
Quadratic .727 86.753 2 65 .000 323034.544 .354 7.442E-8
Cubic .734 58.944 3 64 .000 -207629.731 1.390 -5.096E-7 9.829E-14
The independent variable is Population Exposed to Primary Schools.
SMBA-Project Report 12
Means
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Percentage Muslim Population 68 2.95 49.14 18.0329 10.63096
Percentage Hindu Population 68 47.05 96.21 81.3617 11.00021
HouseholdSize 68 5.66 8.36 6.4465 .46777
Valid N (listwise) 68
Regression Results Linear Regressions
1. Linear Relation b/w Literacy and Primary School Exposure to the Population
Literacy = 0 + 1PopulationExposedToPrimarySchool
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .849a .721 .717 305639.51070
a. Predictors: (Constant), Population Exposed to Primary Schools
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval
for B
B Std. Error Beta
Lower
Bound
Upper
Bound
1 (Constant) 96260.478 92742.929 1.038 .303 -88906.754 281427.710
Population Exposed to
Primary Schools
.638 .049 .849 13.076 .000 .541 .735
a. Dependent Variable: Literacy
2. Multiple Regressions with linear Regressor
Literacy = 0 + 1PopulationExposedToPrimarySchool + 2 PopulationExposedToBusService + 3PercentageHinduPopulation
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .861a .741 .729 299389.95576
a. Predictors: (Constant), Population Exposed to Primary Schools,
Percentage Hindu Population, Population Exposed to Bus Service
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) -599561.359 332102.285 -1.805 .076
Population Exposed to Bus
Service
.073 .190 .030 .386 .701
Percentage Hindu
Population
7875.853 3622.053 .151 2.174 .033
Population Exposed to
Primary Schools
.659 .057 .877 11.526 .000
a. Dependent Variable: Literacy
Non-Linear Regression
1. PolynomialRegression Model
Literacy = 0 + 1PopulationExposedToPrimarySchool + 2 PopulationExposedToPrimarySchool
2 + 3 PopulationExposedToPrimarySchool
3
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .857a .734 .722 303187.19985
a. Predictors: (Constant), Cube of population exposed to primary
school, Population Exposed to Primary Schools, Square of Population
exposed to primary schools
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -207629.731 465363.453 -.446 .657 -1137300.100 722040.638
Population Exposed to
Primary Schools
1.390 .846 1.850 1.643 .105 -.300 3.079
Square of Population
exposed to primary
schools
-5.096E-7 .000 -2.647 -1.105 .273 .000 .000
Cube of population
exposed to primary
school
9.829E-14 .000 1.706 1.278 .206 .000 .000
a. Dependent Variable: Literacy
2. Linear-Log Model
Literacy = 0 + 1ln(Population Exposed To Primary School)
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .813a .661 .655 337395.93911
a. Predictors: (Constant), Normal Log of Population Exposed to Primary
School
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -13046171.636 1258245.056 -10.369 .000 -15558338.945 -10534004.327
Normal Log of
Population Exposed
to Primary School
998974.215 88135.391 .813 11.335 .000 823006.230 1174942.201
3. Linear Log Model with powers
Literacy = 0 + 1ln(PopulationExposedToPrimarySchool)+ 2[ln(PopulationExposedToPrimarySchool)]
2+ 3[ln(PopulationExposedToPrimarySchool)]3
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .849a .721 .713 308180.18029
a. Predictors: (Constant), Cube of Normal Log of Population Exposed to
Primary School, Normal Log of Population Exposed to Primary School
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 70834486.18
5
22362519.46
9
3.168 .002 26173450.82
6
1.155E8
Normal Log of
Population Exposed to
Primary School
-
7868268.095
2362248.104 -6.402 -3.331 .001 -
12586003.33
3
-
3150532.856
Cube of Normal Log of
Population Exposed to
Primary School
14632.725 3895.918 7.219 3.756 .000 6852.039 22413.410
a. Dependent Variable: Literacy
Interaction between Independent Variables
1. Continuous and Binary Variable
Literacy = 0 + 1PopulationExposedToPrimarySchool + 2HighPercentageOfMuslimPopulation+ 3(HighPercentageOfMuslimPopulation * PopulationExposedToPrimarySchool )
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .870a .757 .745 290132.86319
a. Predictors: (Constant),
InteractionBetweenMuslimPopulationAndPopulationExposureToSchool,
Population Exposed to Primary Schools, Binary Variable of High
Muslim Population
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 73238.912 110327.732 .664 .509 -147166.071 293643.894
Population Exposed to
Primary Schools
.700 .062 .931 11.247 .000 .575 .824
Binary Variable of High
Muslim Population
-74442.482 188655.158 -.063 -.395 .694 -451324.484 302439.521
InteractionBetweenMusli
mPopulationAndPopulati
onExposureToSchool
-.079 .096 -.147 -.828 .411 -.271 .112
a. Dependent Variable: Literacy
1. Continuous and Binary Variable
Literacy = 0 + 1PopulationExposedToPrimarySchool + 2 HighPercentageOfHouseholdSize+ 3 (HighPercentageOfHouseholdSize* PopulationExposedToPrimarySchool )
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .860a .739 .727 300233.86148
a. Predictors: (Constant), Interaction Between HighHousehold Size and
Exposure to Primary Schools, Population Exposed to Primary Schools,
Binary Variable of High Household size
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -80423.001 126561.439 -.635 .527 -333258.541 172412.539
Population Exposed to
Primary Schools
.711 .066 .946 10.705 .000 .578 .844
Binary Variable of High
Household size
365077.504 182345.455 .320 2.002 .050 800.582 729354.427
Interaction Between
HighHousehold Size and
Exposure to Primary
Schools
-.152 .096 -.272 -1.586 .118 -.344 .039
a. Dependent Variable: Literacy
SMBA Project Report - Final.pdfResult.pdfAnnexures.pdf
Top Related