DATA MINING IN HIGHER EDUCATION - Rapid … · DATA MINING IN HIGHER EDUCATION 1 Abstract There has...

DATA MINING IN HIGHER EDUCATION

Marshall University Graduate College

Degree - MS in Technology Management with

Emphasis in Information Security

Submitted on 10/27/2010 Advisor - Dr. Tracy Christofero

Capstone Project

By: Debra Elliotte

[email protected]

mailto:[email protected]�


1

Abstract

There has always been an ongoing need for private monies to supplement state and federal funds in public education. Private monies fund scholarships, buildings and equipment either partially or completely. State colleges such as Marshall University received less and less from the government and relied more and more on private donors to make up the difference. Advancement officers looked for new techniques to expand their reach and use their resources efficiently. One way to increase efficiency was to understand constituents, to understand why constituents donated and what their connection to the University was. Data mining and donor modeling tools provided the means to understand constituents and how best to use that information in fundraising efforts. The analysis identified characteristics likely to solicit a donation from a non-donor and used the software program Rapid Insights to produce the statistical models. Keywords: Data mining, algorithm, fundraising, philanthropic, donor modeling, pattern analysis.


2

Acknowledgements

I thoroughly enjoyed my graduate education; even the hard parts. I received a great

education. I met and worked with some wonderful individuals who deserve my gratitude.

My professors were always helpful and supporting regardless of the number of questions I

asked, Drs. Christofero and Logan in particular come to mind, but I learned from all. In Dr.

Larsen’s class, I not only improved my writing skills but my approach to getting along in the

world as well and Professor Biros sent me further along my path of understanding data

mining.

I met students along the way who were a joy. Some were just partners to

commensurate in misery; others were partners to work on projects. I learned from them all,

and developed great friendships.

I want to thank Dr. Lynne Mayer who started me down the path to obtain my degree,

Dr. Ron Area, CEO of The Marshall University Foundation, who gave me permission to use

the data from the alumni database for this project, and Rapid Insight Software Corporation

for making the software available free to students. I also want to thank Rebecca Samples and

Barbara Hicks for their support.

I want to thank Peter Wylie for his support and guidance and Kevin MacDonnell for

his informative blog about data mining, I learned a great deal from both.

Last, but not least, I want to thank my family for their patience and support.


3

Table of Contents

Abstract ................................................................................................................................1

Acknowledgements ..............................................................................................................2

Table of Contents .................................................................................................................3

List of Figures ......................................................................................................................5

List of Tables .......................................................................................................................6

Terms and Definitions..........................................................................................................7

Introduction ..........................................................................................................................8

Background ..................................................................................................................9

Problem Statement .....................................................................................................13

Topic Selection ..........................................................................................................15

Literature Review ......................................................................................................15

Research Methods .....................................................................................................23

Results .......................................................................................................................25

Discussion and Evaluation ........................................................................................32

Conclusions ........................................................................................................................35

Future Work ...............................................................................................................36

References ..........................................................................................................................37

Appendix A A&S Study Aggregated Results ...................................................................40

Appendix B Marshall University Alumni Database Variables .........................................41

Appendix C Rates for Greeks and non-Greeks Schools C – F .........................................42

Appendix D Median Dollars for Greeks and non-Greeks Schools C – F .........................43

Appendix E Giving By Marital Status and Class Year Schools B – E ............................44

Appendix F Giving by Marital Status and Class Year Schools F-H.................................45


4

Appendix G Degree Breakdown .......................................................................................46

Appendix H Major Breakdown ........................................................................................47

Appendix I Lifetime Giving and Age ...............................................................................48

Appendix J Donor Indicator and Degrees .........................................................................49

Appendix K Donor Response Model ...............................................................................50

Appendix L Likelihood Model .........................................................................................51

Appendix M Distribution of Alumni by State .................................................................52


5

List of Figures

Figure 1 - Understanding .................................................................................................. 14

Figure 2 - Deep Pockets .................................................................................................... 16

Figure 3 - Lifetime Greeks Bearing Gifts School A ......................................................... 18

Figure 4 - Lifetime Greeks Bearing Gifts School B ......................................................... 18

Figure 5 - Median Greeks Bearing Gifts School A ........................................................... 20

Figure 6 - Median Greeks Bearing Gifts School B ........................................................... 20

Figure 7 - Cluster Char ..................................................................................................... 22

Figure 8 - Giving and Age ................................................................................................ 26

Figure 9 – Giving and 9 - Giving and Years since Graduation ........................................ 27

Figure 10 - Giving By Degree .......................................................................................... 28

Figure 11 - Donating and Student Activity ....................................................................... 29


6

List of Tables

Table 1 - Donor Response Model ..................................................................................... 30

Table 2 - Donor Response Model Continued ................................................................... 31


7

Terms and Definitions

Algorithm – a set of rules for solving a problem

CAE – Council for Aide to Education

CASE – Council for Advancement and Support of Education

Dependent variable – value to predict

FICO – Fair Isaac Corporation, a credit-scoring model

FY – Fiscal Year

Greeks – alumni who were members of fraternities and sororities

Hard credit – total outright dollars given

Mining model – a combination of one or more algorithms and data

Multivariate – the multiple variables used in statistical analysis

Text mining - extracting values from free text data

Univariate – the single variable used in statistical analysis


8

Introduction

Monies from private sources have been of major importance to colleges and

universities. Private monies, unlike allocations from government, usually prescribed or at

least closely regulated, frequently represented unrestricted spending and were often a

source of institutional discretionary funds. These funds were a source for innovations and

risk taking. These discretionary funds frequently provided the margin of excellence that

separated one institution from another. Voluntary support played a critical role in

balancing institutional budgets and as the availability of those sources diminished,

institutions looked for ways to keep donors engaged (Leslie & Ramey, 1988).

The best institutions knew how to collaborate with alumni, friends and parents.

Such collaboration required knowing and understanding their interests and behaviors,

then tracking involvement such as giving, advocacy and relationships. Analytical tools

identified key characteristics indicating when people were ready to give and help

fundraisers understand behavioral patterns critical to donor retention (Birkholz, 2008).

Donors gave to what they cared about, to what they valued. Nonprofit

organizations who understood this knew they must bring together donor values and

corresponding institutional needs for the organization to be successful. If giving were

simply a matter of assets and income fundraising would be easy, but that is not the case.

Fundraisers must understand why people gave. Analytical tools helped fundraisers do this

(Birkholz, 2008).

Data mining was an analytical tool. It located patterns and relationships in data

that were useful to make valid predictions and draw meaningful conclusions about that


9

data. These patterns and relationships became the basis for building predictive and

descriptive models. Predictive models forecasted explicit values from known results.

Descriptive models described patterns and created meaningful subgroups (Larsen, 2009).

Most individuals have interacted with a predictive model when applying for

credit. The Fair Isaac Corporation (FICO) score is such an example. This score predicted

a statistical likelihood of loan repayment by analyzing characteristics of individuals who

do and do not pay back loans. When banks loaned money, an individual’s financial

ability and likelihood to pay were key criteria (Birkholz, 2008).

Fundraisers assessed donors according to financial ability and likelihood of

making a major gift. A donor’s connection to the institution and/or alignment of values

between the institution and the donor were key criteria (Birkholz, 2008).

Corporations such as Best Buy, J.P. Morgan and Volkswagen conducted analyses

of who purchased their product and services. Their goal was to produce key groups for

specific marketing campaigns. When fundraisers developed strategies for alumni, faculty

and community members they used similar customizations (Birkholz, 2008).

Background

Data mining produced new information by building a real world model using

existing data. The result was a description of patterns and relationships in the data to use

for prediction (Two Crows, 2004).

Data mining is a technical process driven by a business goal (Khabaza, 2009).

The steps listed below encompass both the technical process and business components

involved in data mining (Han & Kamber, 2006; Larsen, 2009):

• Define the problem or goal


10

• Determine patterns to mine

• Data cleaning, preparation and selection

• Train the model

• Validate

• Deploy

Business knowledge and data knowledge drove the first step. Individuals who

developed the problem statement or goal had to understand the business, for instance the

business of fundraising. They had to understand the information being mined in order to

answer questions like, “What do the codes in this field mean?” Without this, the problem

statement would be ill-defined and the results incorrectly interpreted (Kahbaza, 2009). A

focused statement describing the problem to solve was best (Two Crows, 2004). For

example, “Are constituents who attend University events repeatedly more likely to give

money than those who attend less often?”

The next step was to determine the kind of patterns to mine. This also required

that the data be understood, but from a different point of view. Data could have a

numerical value, e.g., number of gifts, or could be categorical, e.g., (donor/non-donor).

Categorical data could be ordinal, e.g., having a meaningful order, such as high, medium,

low; or nominal or unordered as in the case of zip codes. Having this information

influenced which algorithm(s) to use (Two Crows, 2004).

Data cleansing, preparation and selection was next. At the time of this writing,

there was a vast amount of information captured in databases. Choosing useful fields

depended on the problem statement, required understanding about the data captured, and


11

what information it provided. If the business goal was to increase sales in a particular part

of the country, then data about current sales in that area would be needed (Larsen, 2009).

Data cleansing, preparation and selection were steps unique to the data. Initially,

the problem statement guided data selection. If the model was to predict event

attendance, data about past event attendance was needed. However, not all of the data

available was useful, and some may be incorrect or missing, e.g., a blank in a marital

code or a gender code-indicating male but a name prefix indicating female. Inspecting the

data uncovered these problems and decisions made about what to do with incorrect and or

missing values. Discarding those records could result in a sample size giving an

inaccurate picture of the data. On the other hand, the fact a value contained no data could

be significant, i.e., perhaps the field captured information about a small subset of

individuals.

Missing values were particularly troublesome because not all mining methods

accommodated data that contained missing values (Cios, Kurgan, Pedrycz & Kurgan,

2007). One approach for fixing missing values was to calculate a substitute value such as

a marital code of married for two linked records missing that information (Two Crows,

2004). Realistically all problems could not be fixed, but being aware of those problems

allowed discrepancies to be corrected. Once cleansed and selected, the data were loaded

into a database and made available to data mining software algorithms. There were

numerous algorithms used in data mining, however not all were applicable to every

situation.

Classification algorithms identified characteristics that determined where an item

fit. For instance, classifying a loan applicant as a good or bad credit risk, or identifying a


12

potential, new donor. For the latter, the algorithm examined previous donor data to

discover attributes that distinguished donors from non-donors. Those distinguishing

attributes forecasted the value of a donor indicator attribute (dependent value) in future

cases (Larsen, 2009).

Regression algorithms predicted a continuous value. Regression looked at trends

in order to predict ones that might continue, e.g., donations over a span of fiscal years.

Looking at donations and gift dates over a span of years might reveal that donations were

seasonal (Larsen, 2009).

Segmentation algorithms divided data into groups having similar characteristics.

If donors were grouped into ranges of giving, the algorithm analyzed other known data

about the groups and looked for interesting similarities between them. Those similarities

became part of the process for dealing with non-donors (Larsen, 2009).

Association algorithms required that the data already had some sort of grouping,

such as multiple classes taken by a student or donors who attended a yearly event over

consecutive years. As with segmentation, association looked for characteristics in other

data known about the group to use when dealing with prospective students or donors

(Larsen, 2009).

After data selection, came model training. A data-mining model is a combination

of one or more algorithms and data. The model applied the algorithms to the data to

create the classification, association and regression formulas to solve the business

problem. For example, data with attributes for customers from a chain of stores could be

used to learn who was a good or bad credit risk. The model needed two sets of data; one

set trained the model and the second set validated the model. During validation, the


13

second set of data, for which the dependent variable is unknown, was loaded into the

model and the model predicted the value of the dependent variable (Larsen, 2009).

The last step was to deploy the model. New data, again with an unknown

dependent value, would be loaded into the model so that the model could discover

patterns and relationships.

Problem Statement

There was an ongoing need for additional monies in higher education. Private

monies funded scholarships, buildings and equipment either partially or completely.

Public education institutions such as Marshall University received less and less financial

support from state and federal governments. Between 2002 and 2005, state funding for

Marshall University dropped $7.4 million dollars. In 2006, funding increased slightly

each year until 2008. However, when corrected for the Consumer Price Index (CPI)

relative to the base fiscal year 2002, the financial outlook was less appealing. In FY2011,

the University expects to receive funds at FY2008 gross levels with the purchasing power

of FY2005 dollars (Kopp, 2010). See Figure 1.


14

Figure 1 - Understanding (Kopp, 2010)

A survey of 1,027 institutions conducted by the Council for Aid to Education

(CAE) showed that colleges brought in an estimated $27.85-billion in gifts in the 2009

fiscal year. The year before, they raised $31.6-billion. Although grim, the findings came

as no surprise given an economy where donors lost significant portions of their wealth or

were afraid that they would. The survey also found alumni of record who gave declined

in 2009; falling one percentage point to 10 percent which was the lowest level recorded.

This was significant because alumni were the largest source of contributions (Masterson,

2010).

Fundraisers turned to other means as institutions relied on them to supplement

traditional sources. Advancement officers looked for new techniques to expand their

reach and use their resources efficiently. One way to increase this efficiency was to


15

understand why constituents gave. In order to accomplish this, fundraisers had to learn

what they knew about constituents.

Topic Selection

The Marshall University Foundation, Inc. was a non-profit, tax-exempt,

educational corporation. The Foundation collaborated with Marshall’s Office of

Development to secure private financial support for the University. The Office of

Development Services produced information for the gift officers in support of their

fundraising activities. The decrease in monies from state government, lagging economy

and drop in overall alumni support necessitated new methods for understanding donor

giving. The University’s alumni database appeared to be the best place to begin to gather

this information.

Literature Review

Data mining may have had its roots in industry, but its application in fundraising

also attracted the attention of fundraisers. Research done by Wylie (2005), demonstrated

the value of mining the data in an organization’s own database to find prospective

donors. He selected eight schools, private and public, differing in size and student

population. For each school he obtained a random sample of at least 5,000 records from

the alumni database. Each record contained the total giving amount, preferred year of

graduation and marital status, Wylie studied the relationship between giving and

preferred year of graduation and giving and marital status. Figure 2 shows the results for

School A.


16

Figure 2 - Deep Pockets (Wiley, 2005)

This figure highlights several facts (Wiley, 2005):

• The oldest 25 percent of alumni (graduated in 1963 or before) accounted for almost three quarters (73 percent) of the total alumni dollars given.

• Alumni listed in the database as married accounted for a large amount (85 percent) of the total alumni dollars given.

• The youngest 50 percent of alumni (graduated in 1979 or later) accounted for only 9 percent of the total alumni dollars given.

The figures for Schools B through H for this study are in Appendices E and F.

The results of this these figures revealed the following (Wiley, 2005):

• At least 90 percent of the money from any alumni population tended to come from people who had been out of school at least 30 years.

• Regardless of the actual marital status, alumni listed as married tended to give much more money than alumni with other marital codes did.

• Alumni out of school at least 30 years and listed as married often gave a huge amount of money compared to any other group classified by marital status and class year.


17

Wylie (2005) recommended:

• If looking for major gifts, concentrate on people who have been out of school for at least 30 years (but nurture younger alumni).

• Prospect researchers (using a screening service and doing individual research) should focus on these older individuals.

In his article, Greeks Bearing Gifts, Wylie (2007) documented the importance of

capturing other types of information about alumni. In this study, giving information

captured for Greeks and non-Greeks crossed a fifty-year span from six, four-year

institutions across the country. Five of the institutions were private and one was a large

public school. All records of solicitable alumni were in the dataset from the smaller

schools, and random samples of at least ten thousand records were in the dataset from the

larger schools. Variables in the datasets included:

• Whether an alumnus was listed as having belonged to a Greek organization

• The preferred year of graduation for each alumnus

• The total lifetime hard credit giving from each alumnus

In this analysis for each school, Wylie (2007) answered four questions:

1. Was there a difference in the rate of lifetime giving between former Greeks and non-Greeks?

2. How did this rate change as a function of the length of time an alumnus was out of school?

3. For Greeks and non-Greeks donors, was there a difference in the median lifetime giving between the two?

4. How did this difference change as a function of how long they had been out of school?


18

The first two questions were answered by computing the percentage of Greeks

and non-Greeks who had ever donated at each of eleven, five-year intervals since years of

graduation (equal to or less than 5 years, 6-10 years, 11-15 years, 50 years or more).

Figures 3 and 4 show results for Schools A and B, respectively.

Figure 3 - Lifetime Greeks Bearing Gifts School A (Wylie, 2007)

Figure 4 - Lifetime Greeks Bearing Gifts School B (Wylie, 2007)


19

The figures for Schools A and B showed the very large difference in giving rates

between Greeks and non-Greeks regardless of the number of years the alumnus had been

out of school. It further showed that lifetime giving of non-Greeks out of school more

than 20 years off dropped off slightly, i.e., less than 1 percent, while those for Greeks

stayed about the same.

In none of the schools did the lifetime giving rates of non-Greeks ever exceed

those of the Greeks (Wiley, 2007). Only in one instance (Figure 6 for School F, alumni

out of school 5 years or less) was the participation rates the same. The difference in

lifetime participations between Greeks and non-Greeks widened as the time since

graduating increased, but not in every case. Sometimes, the gap narrowed, but the Greeks

always remained ahead of non-Greeks in giving. The remaining figures of Lifetime

Participating Rates by Greeks and Non-Greeks since graduation for Schools C through F

are in Appendix C.

Wylie (2007) answered questions 3 and 4, by calculating the median lifetime

giving by age group for Greeks and non-Greeks, but for only those alumni who ever

made a gift.

3. For Greeks and non-Greeks donors, was there a difference in the median lifetime giving between the two?

4. How did this difference change as a function of how long they had been out of

school?

The results for Schools A and B are in Figures 5 and 6, respectively.


20

Figure 5 - Median Greeks Bearing Gifts School A (Wylie, 2007)

Figure 6 - Median Greeks Bearing Gifts School B (Wylie, 2007)


21

This data indicated that Greeks gave more than non-Greeks and that difference

tended to grow the longer alumnus were out of school. Wylie (2007) believed this study

showed Greeks were definitely better givers than non-Greeks. He also pointed out how

important it was for institutions to look at the data in their own databases to learn about

their graduates, and put that information to use in their fundraising efforts. The results

for Schools C through F are in Appendix D.

In a study conducted between the Council for Advancement and Support of

Education (CASE) and Statistical Package for the Social Sciences (SPSS) alumni records

from the John Hopkins Zanvyl Krieger School of Arts and Sciences (A&S) were used to

explore data mining (Krieger & Luperchio, 2009). The model tested successfully on

datasets from other educational institutions. Including A&S, the schools that volunteered

data comprised every aspect of higher education and giving (Krieger & Luperchio, 2009):

• Seven universities, two colleges and one community college

• Six public and four private institutions

• Eight institutions across the United States, one in Canada, and one in England

• Five self-identified research institutions and two specializing in the liberal arts

• Alumni participation rates ranging from 3 percent to 92 percent


22

The analysis revealed four distinct patterns of giving by alumni as indicated in

Figure 7 (Krieger & Luperchio 2009).

Figure 7 - Cluster Chart (Krieger & Luperchio, 2009)

Although the cluster in Figure 7 was comprised of schools with low lifetime

participation rates between 5 and 15 percent, each institution still had a small subset of

major donors who provided most of the total giving. The high percent of non-donors

limited the ability of the A&S model to reduce non-donor representation in the top

deciles. The few committed major donors created a strong ideal major donor profile that

captured nearly 90 percent of known major donors for school in this cluster (Krieger &

Luperchio, 2009). The aggregated study results for the remaining clusters are contained

in Appendix A.


23

The A&S model was successful in developing a profile for the ideal major donor

at each institution, identifying qualified prospects for major gifts and the variables most

closely related with lifetime giving. The analysis also revealed similarities and

differences. Predictors that were most influential for one institution were completely

insignificant for another and some institutions produced stronger predictors than other

institutions, despite having fewer data variables (Krieger & Luperchio, 2009).

In his blog, Kevin MacDonnell (2010) posted 15 top predictors for annual giving.

Among those were class year, home telephone, marital status, employment, events

attended and business telephone. As in the 2009 study done by Krieger and Luperchio,

MacDonnell pointed out there was no “magic list of predictors” that worked everywhere

and always. While some variables such as ‘class year’ and ‘home’ telephone were

important, institutions must explore their own data.

Even though research indicated a correlation between attributes known about

alumni and giving, it also indicates every situation is unique, and understanding what

individual bits of information mean to a specific institution was most important.

According to McClintock (2004), “Data mining is not just about finding individual

wealthy prospects. This data mining is about truly understanding your prospect pool. It’s

about providing knowledge that informs strategic planning-knowledge that leads to

increased fund-raising results” (p. v).

Research Methods The data used in this project came from the Marshall University Alumni database.

Alumni in this database are graduates and former attendees of Marshall University and

any constituent, person or non-person who donated to Marshall University. For this


24

project, only individuals were included. The dataset contained 113,405 individuals

including both donors (23.8 percent) and non-donors (76.2 percent) and 49 variables

including graduation year, degree, major, home telephone, email, employment, event

attendance and several variables capturing previous donation behavior. Appendix B

contains a complete description of the variables.

Several tools provided functionality needed to capture and prepare the data for

modeling. The tables containing alumni information were imported from the Marshall

University Foundation, Inc. (MUFI) Oracle database into Microsoft Access, where one

table containing all needed information was created using SQL queries. The modeling

software needed variables created to represent the existence of a value for categorical

fields, such as email, employment, home telephone number, student activity, direct mail

response and fiscal year donations. These variables contained a 1 if the information was

present in the database and a 0 if it was not.

The data inspection for appropriate content and missing values revealed missing

gender values. These records received a new value based on the name prefix field where

a name prefix existed. Records with unknown data, such as birth year or age received a

zero or null value depending on the variable data type.

Rapid Insight’s Analytics predictive modeling software provided the functionality

for producing summary descriptive statistics and data modeling. Descriptive statistics

included mean, min, max, number of records, and standard deviation. This information

provided a check of record counts for specific fields, overall counts of the dataset,

number of observations per field and variable type. A complete list of the variables is

listed in Appendix B.


25

Results

Univariate analysis on several variables provided counts and segment information

about the dataset population. The majority of alumni currently reside in the Tri-State

area, consisting of West Virginia, Ohio and Kentucky. The overwhelming majority of

alumni reside in West Virginia, i.e., 50 percent of the population, followed by Ohio with

almost 10 percent and Kentucky with almost 5 percent. Appendix M contains

percentages for the entire population. The average graduation year was 1985, and

average age on record was forty-nine.

The majority of alumni represent the following degrees; Bachelor of Arts,

Bachelor of Arts in Business, Bachelor of Science and Bachelor of Science in Nursing.

Almost 64 percent of the alumni fell within less than 2 percent of the declared majors,

9.24 percent listed elementary education, followed by accounting at 4.49 percent, then

management with 4.39 percent. Appendices G and H list the complete degree and major

breakdown. Gender distribution indicates females make up 54 percent and males 45

percent of the alumni dataset. There are 54 percent listed as married, 12 percent as single

and unfortunately, almost 30 percent were listed as unknown, indicating lost information

and a missed opportunity. As Wiley (2005), reported in his article, Where the Alumni

Money Is, a study that included eight different higher education institutions, alumni listed

as married accounted for 86 percent of the total alumni dollars given. In order to present

the most complete picture of alumni, it is important to capture this most basic type of

data.

Multivariate analysis done between the donor indicator variable and other

variables underscore the impact of these variables on the potential of giving. This study


26

of variables identified segments of the dataset likely to become indicators for giving prior

to regression modeling. This analysis highlighted the importance of age and years since

graduation, and donating to the University. These results mirror the results found in the

study done by Wiley (2005), which illustrated a correlation between years since

graduation and donor giving. Figures 8 and 9 show the relationship between giving and

age, and giving and years since graduation.

Figure 8 - Giving and Age


27

Figure 9 – Giving and Years since Graduation

These results showing a steady increase in giving until age 50 may reflect this

group having increased financial security and increased capacity to donate. A slight

leveling occurs around the same time before increasing again until about age 70. This

offers the opportunity for two different types of solicitations using age groups as the

guiding factor. Appendix I shows a very similar relationship between lifetime giving of

$1,000 or more and age.

The relationship between degree of record and giving to the University indicates

alumni with a Bachelor of Science degree have a likelihood to donate that was not

immediately obvious in the multivariate analysis below.


28

Figure 9 - Giving By Degree

Figure 10 indicates graduates with a Bachelor of Arts and Bachelor of Arts in

Business have high donor counts. However, the underlying data shown in Appendix J

indicate graduates with a Bachelor of Science degree are donors who are more frequent.

Their donor count of 2,048 is 30 percent of their overall count of 6,724. Both the

Bachelor of Arts and Bachelor of Arts in Business have 30 and 31 percent respectively,

but their overall counts of 29,066 and 11,001 are much higher. Here, too the analysis

reveals information useful in framing solicitations.

A multivariate analysis of donor indicator and student activity revealed

disappointing, but expected results. The student activity code indicates alumni

participation in an activity while attending Marshall. Such activities include, among

others, belonging to University-related groups or organizations (fraternities, sororities,


29

student government), playing sports or being a member of the band. Figure 11 shows the

alumni coded as having participated in a student activity and whether or not they donated

anytime during the past five years.

Figure 10 - Donating and Student Activity

The columns represent alumni donors. A 1 indicates a donor and a 0 indicates a

non-donor. This figure shows that alumni who participated in a student activity donated

in fewer numbers than those alumni who participated in a student activity and did not

donate.

Although this Marshall University student activity code includes Greek

membership as well as other student activities, these results do not reflect the results

described in Greeks Bearing Gifts, (Wylie 2007), showing a strong connection between

Greek memberships and giving. It is important to put this in the context of the overall


30

counts in the database. Of the 86,329 actual graduates in the dataset, only 12,567 (14

percent) have information in the database indicating they participated in a student

activity. This is another example of being unable to present a complete and accurate

picture of Marshall’s alumni, therefore missing the opportunity to use that information in

the University’s fundraising efforts.

Having evaluated the variables based on their relationship to the donor code, a

logistic regression model provided a comprehensive picture of donor giving, using Rapid

Insight’s Analytics predictive modeling software. The goal was to identify characteristics

useful in framing a donation request that would increase the changes of receiving a

donation from a non-donor. The needed target variable for this case was the donor code

indicator, referred to as a response rate, defined as a binary variable containing a 1 if an

individual ever donated to the institution and a 0 if he or she did not. The Rapid Insight

mining tool identified 10 of the 49 dataset variables related to a response rate variable at a

significance level of (p=.01) The model

including variable coefficients and individual p-

values is located in Table 1. From this analysis,

it is evident which variables have a strong impact

on whether an individual will ever provide a

donation to Marshall University. In addition to

age, individuals who provide their telephone

information and email have a higher

propensity to donate than those individuals who do not provide phone or employment

information.

Variable Coefficient p-value

EMAIL 0.6788 0.0000 HPHONE 0.5386 0.0000 EMPLOYMENT 0.5712 0.0000 STUCODE 0.5549 0.0000 OTHCODE 0.3693 0.0000 ALUM -1.103 0.0000 GRADYR 0.9473 0.0000 ATTEVNT10 0.7484 0.0000 ATTEVNT09 1.610 0.0000 ATTEVNT07 0.7666 0.0000

Table 1 - Donor Response Model


31

Also of interest are the indicators for event attendance. Event attendance data was

included for FY05 through FY10, but the analysis indicates only FY07 and FY10 were

significant. The odds ratio of 2.1525 and 2.1135 respectively, support this (See appendix

K). This may be because the movie, We Are Marshall, released in late December 2006

greatly increased Marshall’s visibility nationwide and the momentum continued through

fiscal year 2007. Also during fiscal year 2010, the organization made a concentrated

effort to capture event attendee information and include that in the alumni database.

The model also scored

a Bachelor of Science Degree

strong as well as a major in

accounting and journalism.

Not surprising is the low donation likelihood score given to the residents of the state of

West Virginia. However, the high score for the state of Connecticut was unexpected since

the multivariate analysis indicated first the Tri-State area, then Ohio with individuals

likely to make a donation. A second model using as the response indicator the likelihood

to donate $1,000 or more in one’s lifetime, mirror the results seen in the donor response

model (See Appendix L). The low score for West Virginia is supported by the overall low

score (-1.103) of the alumni variable indicator as a characteristic of a donor. The state has

the largest population of alumni, yet has low donation numbers. Of the 40,979 alumni in

the state, only 10,877 or 26 percent are donors, versus 35 percent for both Ohio and

Kentucky. Connecticut has significantly fewer alumni in residence (162) but 92 of them

are donors for an impressive 56 percent. When looking at specific fiscal years and

numbers of donors, West Virginia fared much better. Of the 4,708 donors in FY2008, 47

Variable Coefficient p-value Binary (DEG1, Bachelor of Science) 0.3122 0.0000 Binary (MAJ1, BBA, Accounting) 0.5806 0.0000 Binary (MAJ1, Journalism) 0.9633 0.0000 Binary(STATE, WV) -0.2280 0.0000 Binary(STATE, CT) 2.1699 0.0000

Table 2 - Donor Response Model Continued


32

percent were West Virginia residents and in FY2010, 45 percent of the donors were West

Virginia residents. The donor response model considered all the variables, which

included a great deal more information nevertheless these numbers, indicate a need for

further research to understand the results.

Rapid Insight provides a utility to apply the logistic regression to the entire

dataset by scoring all individuals in the dataset and identifying those who currently do not

donate, but have a high likelihood of doing so. The scoring system ranked all individuals

in the dataset between 1 and 10 to indicate a propensity to donate. This model returned

4,201 current non-donors within the first decile, indicating a high propensity to donate.

The model was developed using fifty percent of the dataset and tested the remaining

dataset for accuracy. This resulted in a 76.12 percent concordance rate. The concordance

rate measures model fit. Percentages close to 100 percent indicates a nearly perfect

model.

Of the 4,201 non-donor records, 633 were in a dataset sent out to an external

wealth screening service to obtain a score indicating their propensity to donate a major

gift to the University. The model also returned 7,136 current donors in the first decile,

and of those, 4,839 were in the dataset sent to the screening company. The information

from the screening service clearly support the results returned from the scoring model,

indicating it is a successful model for predicting non-donors.

Discussion and Evaluation

The use of predictive modeling offers fundraising organizations the possibility of

new donors. The studies done using information gathered about alumni to create a set of

characteristics that identify an individual or a group of individuals offers the opportunity


33

to streamline and focus campaigns and solicitations. Equally important, modeling can

provide an institution with the resources to target the best constituents. While studies

show that a valid predictor for one institution may not work at another institution, they

underscore the importance of using existing in-house information to make those

identifications, and thus illustrating, the importance of capturing that information. The

result of this project reinforces those findings and makes a strong case for the importance

of capturing this information to use in analysis. Multivariate analysis in this study

supported results seen by Wylie (2005), and illustrate the importance of age and

philanthropic contributions. Model results highlighted the connection between attending

events, such as galas, homecoming and alumni weekend, and donating. Individuals

attending these events likely have a strong relationship with Marshall University and are

more inclined to donate. Despite the limited amount of activity related information

captured in the database, both the donor response model and the lifetime giving model

show a strong association between the propensity to donate money and attend the

institution’s activities. The results not only highlight different strategies for donation

requests using degrees, majors, graduation year and age, but areas for further study such

as donations by state and donations over a span of years.

Even though records in the dataset are missing an accurate marital code, the

multivariate analysis of donor code and marital reflect the results found by Wylie (2005).

Of the individuals coded as married, 28.9 percent are donors, as compared to single

individuals at 19.9 percent, and those coded as unknown at 14 percent.

The database does not have information such as event attendance for a large

number of events or attendees. Nevertheless, there is still a promising correlation between


34

event attendance and giving. For example, of the 149 individuals attending an event in

fiscal year 2010; they donated $878,832.64 in the same fiscal year. This warrants further

study as there is significant giving information (amounts, dates, areas of interest,

consistent giving) which could reveal giving trends and interests previously unknown.

Additionally, it clearly underscores the need to capture additional information about

alumni. Both the multivariate analysis and the models, which used the same variables,

illustrate the need to increase efforts to capture information about alumni such as event

attendance, marital codes and connections to the University.

The results of the multivariate analysis and models supports research done by

others, indicating, that using the data in an organization’s own database can yield results

useful to fundraising efforts. Further, there appear to be variables across numerous

studies that consistently associate themselves with the characteristics of donors.

This study uncovered associations between donors and information known about

donors that when applied to non-donors could yield beneficial results for the University’s

fundraising efforts. However, the data also indicate a need for further study so that the

information will be used in the most efficient and successful manner.


35

Conclusions

While not mainstream within the field of philanthropic giving, the abundance of

research and availability of modeling software indicate that predictive modeling is a

beneficial tool for fundraising organizations. It offers the opportunity to make

informative, statistically supported business and fundraising decisions. Research supports

the use of predictive modeling as a proven means to identify the best prospects, targeting

methods, and segmentation groups. Such a powerful tool is certain to become mainstream

in the near future. As more and more philanthropic organizations utilize modeling, it will

become the preferred tool to identify new donors and fundraise more efficiently. With the

ability to discover hidden patterns and build models to predict behavior, fundraising

organizations can address issues in solicitations, campaigns, marketing and prospect

research.

The analysis and logistic regressions developed throughout this project identified

several key characteristics about donor alumni as well as areas needing improvement in

the database. This information was positive because unknown connections and

relationships became known and their potentials revealed.

When speaking of connections and relationships, the most important are the

connections and relationships established and nurtured by fundraisers with current and

prospective donors. These relationships measured over time, vary from person to person,

and fundraiser to fundraiser and, are difficult to measure statistically. However, their

importance cannot be understated. Data mining cannot replace or duplicate this most


36

essential aspect of fundraising, but data mining can enhance and support that process in

such a way that both donor and institution benefit.

Future Work

The most common use of analytics is to identify characteristics useful in locating

major donors, but analytical tools are suited to other areas of fundraising. Some of these

include:

• Hone in on variables that are strong indicators for current and future use

• Understand donations in relations to event attendance

• Understand how fundraisers spend their time and what activities translates into a gift

• Which tasks contribute to increased giving and which detract

• Predict top donors

• Predict event attendance

• Predict who will be top donors in ten years

• Discover groups using text mining

• Develop an integrated prospecting system

• Uncover giving patterns

• Model phonathon segmentation


37

References

Birkholz, B. (2008). Fundraising analytics. Using data to guide strategy. John Wiley &

Sons, Inc.: New Jersey.

Cios, J., Kurgan, L., Pedrycz, W. & Swiniarski, R. (2007). Data mining. A knowledge

discovery approach. Springer Science+Business Media, LLC: New York.

Han J., & Kamber M. (2006). Data mining concepts and techniques. (2nd ed.).

Morgan Kaufmann Publishers: New York.

Iwankj, B., Nichol, J (Producers), & Nichol, J. (Director). (2006). We Are Marshall

[Motion picture]. United States: Warner Brothers.

Khabaza, T. (2009). Hard hat area: Myths and pitfalls of data mining. Executive brief.

SPSS. Retrieved December 10, 2009, from

ftp://hqftp1.spss.com/pub/web/wp/HHAEB-0209.pdf

Kopp, S. (2010, February). Understanding our budgetary challenges. Communiqué

retrieved August 27, 2010 from

http://www.marshall.edu/president/comm/feb2010.pdf

Krieger, Z., Luperchio, D. (2009). Data mining and predictive modeling in institutional

advancement: How ten schools found success. SPSS Technical report produced

jointly with the Council for the Advancement and Support of Education (CASE) and

SPSS Inc. Retrieved January 2, 2010, from

http://whitepapers.techrepublic.com.com/abstract.aspx?docid=1125993

Larsen, B. (2009). Delivering business intelligence with Microsoft SQL Server 2008.

New York: McGraw-Hill.

ftp://hqftp1.spss.com/pub/web/wp/HHAEB-0209.pdf�


38

Leslie, L., & Ramey G. (1988). Donor behavior and voluntary support for higher

education institutions. The Journal of Higher Education, Vol 59. No. 2 (Mar. –

Apr., 1988), pp. 115-132. Retrieved August 13, 2010 from

http://www.jstor.org/stable/1981689

McClintock, S. Foreword. (2004). Data mining for fund raisers, 2005. By Peter Wylie.

Council for Advancement and Support of Education: Washington, DC, V

MacDonnell, K. (2010, January 22). Four mistakes I have made. Message posted to

Retrieved January 20, 2010, from

http://cooldata.wordpress.com/2010/01/22four-mistakes-i-have-made/

MacDonnell, K. (2010, January 6). The 15 top predictors for annual giving. Retrieved

January 20, 2010, from message posted

http://cooldata.wordpress.com/2010/01/06/the-15-top-predictors-for-annual-giving/

Masterson, K. (2010). Private giving to colleges dropped sharply in 2009. The

Chronicle of Higher Education. Retrieved August 13, 2010, from

http://chronicle.com/article/Private-Giving-to-Colleges/63879/

Two Crows Corporation. (2005). Introduction to data mining and knowledge discovery.

(3rd ed.). [Electronic Booklet]. Potomac, MD. Retrieved February 26, 2008 from

http://www.twocrows.com/index.htm

Wylie, P. (2007). Greeks Bearing Gifts. Retrieved September 7, from

http://www.datadesk.com/products/mediadx/keydonor/Greeks_Bearing_Gifts.pdf

Wylie, P. (2005). Deep pockets. Where the alumni money is. Retrieved September 11,

2010 from http://www.datadesk.com/products/mediadx/keydonor/Deep

percent20Pockets.pdf

http://www.jstor.org/stable/1981689�

http://cooldata.wordpress.com/2010/01/22four-mistakes-i-have-made/�

http://cooldata.wordpress.com/2010/01/06/the-15-top-predictors-for-annual-giving/�

http://www.twocrows.com/index.htm�

http://www.datadesk.com/products/mediadx/keydonor/Greeks_Bearing_Gifts.pdf�

http://www.datadesk.com/products/mediadx/keydonor/Deep%20Pockets.pdf�

http://www.datadesk.com/products/mediadx/keydonor/Deep%20Pockets.pdf�


39

Wylie, P. (2004). Data mining for fund raisers. Council for Advancement and Support of

Education: Washington, DC.


40

Appendix A

A&S Study Aggregated Results (Kreiger & Luperchio, 2009)


41

Appendix B

Marshall University Alumni Database Variables


42

Appendix C

Rates for Greeks and non-Greeks Schools C – F (Wiley 2007)

dd

S


Appendix D

Median Dollars for Greeks and non-Greeks Schools C – F (Wiley, 2007)


44

Appendix E

Giving By Marital Status and Class Year Schools B – E (Wiley, 2005)


45

Appendix F

Giving by Marital Status and Class Year Schools F-H (Wiley, 2005)


46

Appendix G

Degree Breakdown (Marshall University, 2009)


47

Appendix H

Major Breakdown (Marshall University, 2009)


48

Appendix I

Lifetime Giving and Age (Marshall University, 2009)


49

Appendix J

Donor Indicator and Degrees (Marshall University, 2009)

DEG1 Y-variable Mean Y-variable Sum Count Bachelor of Applied Science 0 0 9 Bachelor of Arts 0.26901 7819 29066 Bachelor of Arts in Business 0.31161 3428 11001 Bachelor of Engineering Scienc 0.5377 164 305 Bachelor of Fine Arts 0.12006 85 708 Bachelor of Science 0.30458 2048 6724 Bachelor of Science in Chemist 0.16129 5 31 Bachelor of Science in Cytotec 0.11765 2 17 Bachelor of Science in Enginee 0.28571 2 7 Bachelor of Science in MedTech 0 0 25 Bachelor of Science in Nursing 0.20232 331 1636 Bachelor of Social Work 0.09365 28 299


50

Appendix K

Donor Response Model (Marshall University, 2009)


51

Appendix L

Likelihood Model (Marshall University, 2009)


52

Appendix M

Distribution of Alumni by State (Marshall University, 2009)

DATA MINING IN HIGHER EDUCATION - Rapid … · DATA MINING IN HIGHER EDUCATION 1 Abstract There has...

Documents

Transcript of DATA MINING IN HIGHER EDUCATION - Rapid … · DATA MINING IN HIGHER EDUCATION 1 Abstract There has...