Center for African Studies The MasterCard Foundation Scholars Program Database Project Final...
-
Upload
cleopatra-williams -
Category
Documents
-
view
213 -
download
0
Transcript of Center for African Studies The MasterCard Foundation Scholars Program Database Project Final...
Center for African StudiesThe MasterCard Foundation Scholars ProgramDatabase Project Final Presentation
PRESENTED BY TEAM 3MIN LIN
MINCHAO LINTIAN LIUYIFAI NG
HAOCHEN SHANYIGANG WANG
GUO YUYING YANGYE ZHONG
Client Overview• UC Berkeley Center for African Studies (CAS)
• The MasterCard Foundation Program• Provide comprehensive support for Education• For economically disadvantaged students from developing countries in Africa • Financial, academic, social, and career counseling • 2012-2020, 25 new students per year
• Client Need• A new database help better manage the program• Specialized in tracking financial transactions• Also store data and information related to the program• Financial, academic, social, career
Project ReviewDP I
• Familiarized with clients and understood their needs
• Created 1st version of Simplified EER diagram
• Set up schedule for the project
DP II
• Expanded and Revised EER diagram
• Conceptualized 5 queries• Developed Relational Schema
DP III
• Finalized EER Diagram• Created database in Access and
implemented relations into Access
• Altered Queries and developed SQL code
Final Presentation
• Implemented Queries into Access • Modified relational Schema• Did Normalization Analysis
Relational Schema1. Person (PID, Lname, Fname, MI, Birth_Date, Nationality, Gender, Primary_Phone_No, Address)
1a. Alumni (Alumini_PID1, Occupation, Company, Degree, Graduation_Date, Admission_Date)
1b. Student (Student_PID1, SID)
1bb. Pre_Student (Student_PID1, Financial_Background, SAT_Score, TOFEL_Score, High_School_GPA, No_HighSchool_Honor, No_HighSchool_Awards, HID9)
1bc. Student_current_status (SID, Mentor_Employee_PID1, Class_Level, Expected_Graduation_Date, App_Date, College_GPA)
1c. Employee (Employee_PID1, Position)
2. TransitionAccount (TID, Amount, Time, Year, Month, Date, PC_ID3)
2a. Withdrawal (W_TID2)
2aa. Check (Check_TID2, Description, Responsible_Person, Account_No)
2aaa. Account_Detail (Account_No, Routing_No)
2ab. Bearbuy (Bear_TID2 )
2ac. Cash (C_TID2, Responsible_Person)
2ad. BlueCard (BC_TID2)
2b. Deposit (D_TID2)
3. Personal_Card (PC_ID, Year, Month, Day, Time, Amount, Reimbursed, Employee_PID1)
4. Expense (EID, Amount, Year, Month, Day, Description, PMID5)
4a. InternalExpense (I_EID4) 4ab. Tuition (T_EID4, Semester, Degree_Class)
4ac. CourseMaterial (C_EID4, Course_Name, ISBN)4b. ExternalExpense (E_EID4, OID5) 4ba. Summer_Winter_Housing (SWH_EID4, Address, Start_Date, Finish_Date) 4baa. Summer_Winter_Housing_Timeline (Start_Date, Finish_Date, Duration) 4bab. Housing_Information(Address, Landlord, Agent,Contact_Phone_No)4bb. External_Course_Material (EC_EID4, Course_Name, ISBN) 4bc. SEVIS_Fee (S_EID4) 4bd. Travel_Airline_Ticket (TAT_EID4, Flight_No, Airline_Name) 4bda. Ticket_Detail (Flight_No, Airline_Name, Date, Departure_Location, Destination, Class) 4be. Office_Supplies (O_EID4, Supply_Name, Quantity, Lead_Time, Price, Discount)
5. Order (OID, Total_Amount, Description, Product_Name, Quantity, Price, Date,Req_Student_PID1)
5a. OnlineOrder (OOID5, Tracking_No, Discount, OSID)
5b. NormalOrder (NOID5)
6. Item (NOID5, IID, Iname, Description)
7. OnlineShop (OSID7, Website, Shipping_Method)
7a.Website_Email_Address (Website, Supplier_Email_Address ,Description)
Relational Schema (Continue)8. Course (CID, Semester, Professor, Final_Date, Description, Ctitle) 8a. GE (GE_CID8, Category) 8b. LowerDivision (L_CID8) 8c. UpperDivision (U_CID8) 8d. Sections(CID, Section_Number) 8e. Ctitle (CID, Ctitle)9.High_School(HID, Name, School_Type, Year_Founded, Relidious_Affiliation, Academic_Calender, Setting, Student_Faculty_Ratio, Total_Enrollment_No, Gender_Ratio, College_Enrollment_Rate)10.Job (JID, Position, No_Employee, Start_Date, Finish_date, Work_On_Alumini_PID1, Work_on_Student_PID1) 10a.Job_Detail (Position, Job_function, Location, Salary, Major_Req, Skill_Need, Description, Hour_Per_Week, Language_Need, Visa_Req) 10b.Job-Timeline (Start_Date, Finish_date, Duration) 10c. Full_1Time (F_JID10, Degree_Level, Exp_Req, Pre_I_JID10c) 10d. Part_Time (P_JID10, Degree_Level, Exp_Req) 10e. Internship (I_JID10, Class_Level, Referrer)11.Company (Company_ID, Company_Name, Description, Company_Type, Contact_Phone_No, Size, Industry_Type, Email_Address, Website, City, Street, Zip_Code)12.Country (Name, No_Student)13.Event (Event_ID, title, Start_Date, Finish_date, Duration, No_People, Sponsor, City) 13a. CAS_Event (CAS_Event_ID13, Description, Cost) 13b. Other_Event (O_Event_ID13, Description) 13ba. Recruiting_Event. (R_Event_ID13, Industry_Type, Job_Class, Major_Preferred)
14. Grade (Student_PID1, CID8, Grade_Option, Grade, Semester)
15. Survey (Survey_ID, title, date, description)
15a. Alumni_Survey (A_Survey_ID15, Alumini_PID1)
15b. Student_Survey (S_Survey_ID15, Student_PID1)
N to N Relationships:
16.Withdrawal_Pay_Expense (W_TID2, EID4)
17.Employee_Check_Inventory (Employee_PID1, OID5)
18.Student_Take_Course (Student_PID1, CID8, Grade, Grade_Option, Semesters)
18a. Ctitle(CID8, Ctitle)
19.Student_Participate_Event (Student_PID1, Event_ID13)
20.Company_Provide_Job (Company_ID11, JID10)
21.Company_Participate_Recuriting_Event (Company_ID11, Recuriting_Event_ID13)
22.Company_Located_Country (Company_ID11, Country_Name12)
23.Event_Located_Country (Event_ID13, Country_Name12)
Multivalue:
24.Person_Email (PID1, E-mail_Address)
25.Alumni_Major (Alumni_PID1a, Major)
26.Student_Major (Student_PID1b, Major)
27.Course_Midterm_Date (CID8, Year, Month, Day)
Decomposing to 1NF and 2NFCourse (CID, Semester, Professor, Final_Date, Section_Number, Description, Ctitle)
Course (CID, Semester, Professor, Final_Date, Description, Ctitle)Sections(CID, Section_Number)
To 1NF
Student_PID CID Grade Grade_Option Semester Ctitle
To 2NF
Student_Take_Course (Student_PID, CID, Grade, Grade_Option, Semesters)CourseTite(CID, Ctitle)
Student_Take_Course
Decomposing to 3NF
Check_TID2 Description Responsible_Person Account_No Routing_No
Check
To 3NF
Check (Check_TID2, Description, Responsible_Person, Account_No)Account_Detail (Account_No, Routing_No)
Decomposing to BCNF
CID Semester Professor Final_Date Description Ctitle
To BCNF
Course (CID, Semester, Professor, Final_Date, Section_Number, Description)Ctitle (CID, Ctitle)
Course
Query 1: Demand Forecasting and EOQ
Forecast student’s demands for each type of orders in the future based on historical records, then use this data to determine the best order quantity and time interval to place orders on online shops.
1. Well understand the students’ need for each type of supplies on a timely basis.2. Predict the order period and amount so that CAS could plan ahead.
Object
Benefits
Query 1: Demand Forecasting - SQLSELECT Order.ProductName AS Product, sum(Order.Quantity) AS SepTotalQuantityFROM [Order]WHERE Order.Date like "8/*/2013”GROUP BY Order.ProductName;
Step 1: Extract the Data from Access. Get the order quantity of a specific product over a period.
Sample output: A 3*3 matrix include all the product’s order quantity for a Specific period of a specific year
Query 1: Demand Forecasting - ProcessStep 2: Calculate the seasonal factor and monthly demand forecast with a calculator program wrote by Java.
Step 3: Put the result back into SQL to get the Economic order quantity (EOQ) model.
Query 1: EOQ - SQLSELECT DISTINCT Order.ProductName, IIF(Item.Quantity=0,“Yes”,“No”) AS StockOut,
Round(Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))) AS OptimalOrderQ,
IIf(OnlineOrder.ShippingTime>0,Round((OnlineOrder.ShippingTime)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price)),0) AS ReorderPointQuantity,
Round(((Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS OrderCycleDays, OnlineShop.Website AS Website, Order.Date AS OrderDate,
Order.Date +Round(((Sqr(2*(OnlineOrder.ShippingFee) *(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS NextOrderDate
FROM Order, OnlineOrder, OnlineShop, ItemWHERE Order.OID=OnlineOrder.OID AND OnlineOrder.OSID=OnlineShop.OSID AND Item.OID=Order.OIDORDER BY Order.Date;
Check whether the inventory is stocked out or not
Calculate optimal order quantity
Calculate Reorder point
Calculate order cycle and dates
Calculate Next order date
Query 2: Academic Performance
What are the factors that affect students’ academic performance (i.e. GPA) and to what extent?Classes, parents’ education level, Traveling Expense, Course material fee, etc.
1. Understand factors that may affect students’ performance2. Wisely Allocate the fund according to the analysis3. Help student get more successful by planning events accordingly
Object
Benefits
Query 2: Academic Performance - SQL SELECT Student.SID,1.714+0.589*A.Indicator-0.00632*IIf(Student.Degree=“Undergraduate”,1,0)+0.0165*IIf(Student.Gender=“Male”,1,0)+0.000644*Student.SATScore-0.0147*B.NumberEventAttend+0.0528*IIF(Student.Research=“Yes”,1,0) AS ExpGPAFrom Student, (SELECT Student.SID,Count(internship.JID) AS Indicator FROM Student, Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID Group BY Student.SID Union Select Student.SID,0 From Student Where Student.PID NOT IN(SELECT student.PID From Student,Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID))As A,
Select specific variable and combine with coefficients to obtain result by linear regression model
Determine whether a student have ever attend any internships or not (binary variable)Defined as table A
Query 2: Academic Performance – SQL (Cont.) (SELECT Student.SID,Count(StudentParticipateEvent.EventID) AS NumberEventAttend From Student,StudentParticipateEvent Where Student.PID=StudentParticipateEvent.PID Group BY Student.SID) AS BWhere A.SID=Student.SID AND B.SID=Student.SID;
Determine how many events a student have attend (numerical value)Defined as table B
Step 2: Run linear regression over all the variables, then use Akaike Information Criterion to reduce the model to the most efficient model.
Implement with R
Query 3: High School Comparison
Is there significant difference between each aspect of two high schools?
1. Compare the quality of two high schools based on the records of admitted students. 2. Help with decision making when comparing applicants with similar qualifications from the perspective of their high school strengths
Object
Benefits
Query 3: High School Comparison - SQLSELECT Student.PID, Student.HID, Student.CollegeGPA, Student.SATScore, Student.TOFELScore, Student.[HighSchoolEvents#], Student.[HighSchoolAward#]FROM StudentWHERE (((Student.HID)=1)) OR (((Student.HID)=2));
Step 1: Extract the data from Access by SQL
Step 2: Calculate the mean of each corresponding category of all admitted students from these two high schools, then use t-test with unequal variances to get the p-value
Implement with Excel
Query 3: High School Comparison - OutputStep3: Use Holm-Bonferroni method to find out if each difference is significant.
Implement with MATLAB.
[corrected_p, h]=bonf_holm([0.38 0.414 0.0513 0.334 0.257] ,0.5) corrected_p = 1.0020 0.7600 0.2565 1.0280 1.0280h = 0 0 1 0 0
Step4: Sample output from MATLAB
Query 4: Category Expense
What is the distribution of expense on four major categories of external expenses (i.e. Course Material, Travel, Office Supply and Housing, ) that is paid by the program? Generate distribution chart and use statistical tools to analyze these distributions.
1. Track expenses related to students.2. Decide the expense constraints for students on each category.3. Generate clear expense report, could be included in the annual report for the MasterCard Foundation.
Object
Benefits
Query 4: Category Expense - SQLStep 1: Find the total expense of each student
SELECT Student.PID, Sum(Expense.Amount) AS AmountOfSumFROM (OfficeSupply INNER JOIN Expense ON OfficeSupply.EID = Expense.EID) INNER JOIN Student ON Expense.PID = Student.PIDGROUP BY Student.PIDORDER BY Student.PID;
SELECT Count([OfficeSupply Query].PID) AS CountOfPID, Partition([AmountOfSum],0,1100,100) AS Expr1FROM [OfficeSupply Query]GROUP BY Partition([AmountOfSum],0,1100,100);
Step 2: Generate the data for histogram
Step 3: Use Report function to generate graphs, and use Access toolbox to generate the Mean, and Standard Deviation of the distribution. Step 4: If, in most cases, the distribution is bell-distributed, we could use 68-95,99.7 rule, aka Three-sigma rule, to set up expense constraints for students.
Query 4: Category Expense - Output
Results after the second stepCount the number of PID to generate a histogram
Query Results after the first step& The toolbox that could be used to calculate average and variance quickly
Query 5: Alumni & Employment
What are the relationships among an alumni’s employment status, alumni’s GPA, alumni’s education level, number of events an alumni has attend etc?
1. Based on the analysis result, help current students find jobs.2. Find out the most important factor that affect a student’s future employment.
Object
Benefits
Query 5: Alumni & Employment - SQLSELECT Alumni.PID, IIF(Alumni.Degree="Undergraduate",1,0) AS Degree,count(StudentParticipateEvent.EID) AS NumEventAttend, IIF(AlumniMajor.school_of_medicine = "Yes", 1, 0) As Sch_Medicine,IIF(AlumniMajor.school_of_law = "Yes", 1, 0) As Sch_Law, IIF(AlumniMajor.college_of_engineering = "Yes", 1, 0) As Sch_Engi, IIF(AlumniMajor.school_of_optometry = "Yes", 1, 0) As Sch_OptIIF(AlumniMajor.college_of_natural_resource = "Yes", 1, 0) As Sch_Nat,IIF(AlumniMajor.college_of_letter_science = "Yes", 1, 0) As Sch_Science, IIF(AlumniMajor.school_of_information = "Yes", 1, 0) As Sch_Inf,IIF(AlumniMajor.school_of_social_welfare = "Yes", 1, 0) As Sch_welfare, IIF(AlumniMajor.haas_business_school = "Yes", 1, 0) As Sch_has, Company.c_latitude, Company.c_longitudeFROM Alumni, StudentParticipateEvent, AlumniMajor, Company, FullTimeWHERE FullTime.JID = Job.JID AND Company.CompanyID=Job.CompanyID AND Job.PID=Alumni.PID AND Alumni.PID=StudentParticipateEvent.PID AND AlumniMajor.PID=Alumni.PID;
Step 1: Extract the data from Access by SQL
Query 5: Alumni & Employment - ProcessStep 2: Fitting Logistic Regression Model with R
• this model will give the result of the probability for predicted variable to be 1 ( which means this person will get employed or not)
• Potentially 20 candidate models
Step 3: Model Selection• Cross-Validation
Employment = GPA + Event + Degree • AIC ( Akaike Information Criterion)
Employment = GPA + Event + Degree + School of Information• BIC ( Bayesian Information Criterion)
Employment = GPA• Deviance Selection
Employment = GPA + Event + Degree
Query 5: Alumni & Employment - ProcessStep 4: Cut-off Selection
• Find a cut of the predicted probability which will let us judge if the predicted value is 1 or 0
• Method: Building Confusion Matrix• Choosing a cut off probability first • Using confusion Matrix to find the best cut off
• Base on the result we choose 0.58
Query 5: Alumni & Employment - Process
Step 7: Creating XML concatenate with KML plot Alumni’s Company on Google Earth
• Creating an KML plot for Alumni who graduated from different college• Plot those coordinates on Google earth• Get intuition employment status geographically for each college in UC
Berkeley, which will give us an intuition where has higher employment rate for corresponding college’s current student.
• Example for School of Medicine
Future Work & Improvements
• Create forms & reports• Make our database more user-friendly
• Create additional queries: • Other useful Queries• Such as monthly balance check