INTRODUCTIONDP SUMMARIESQUERIES International Student College Experience Enhancement Program Team...
-
Upload
theodora-conley -
Category
Documents
-
view
213 -
download
1
Transcript of INTRODUCTIONDP SUMMARIESQUERIES International Student College Experience Enhancement Program Team...
INTRODUCTION DP SUMMARIES QUERIES
International Student College Experience Enhancement Program
Team MembersAlice ZhangFlorence LiaoHuan GuoJake MagnerLi ShubinViraj MohanZahin Ali
NORMALIZATION
FORMS
Project Background
To design a database for a website that helps international students with various aspects of “settling in”, by providing a platform for interaction between students, local communities, cultural organizations and employers
Project Objective
XiYiRen, a start up social utility website will be using a small part of our expansive project, focusing on Chinese students.
Client
INTRODUCTION DP SUMMARIES QUERIES NORMALIZATION
FORMS
Project Background: Objective and Client descriptionSummary of entities involvedDatabase capabilities Simplified EER diagram with 10 entities, 3 Weak entities/relationships, and superclass/subclass division
DP I Summary
Progress
INTRODUCTION DP SUMMARIES QUERIES NORMALIZATION
FORMS
DP II Summary
Revised simplified EER diagramIncluding more entities and 30 relationships Implementation of queries in relational algebraRealized need for more complex queries utilizing IEOR methods: forecasting, optimal event locating, etc.
Progress
INTRODUCTION QUERIES NORMALIZATION
FORMSDP SUMMARIES
DP III Summary
Revised simplified EER diagramRelational schema Five queries implemented in SQL and AccessFocused on client-centric queries
Progress
INTRODUCTION QUERIES NORMALIZATION
FORMSDP SUMMARIES
EER
INTRODUCTION QUERIES NORMALIZATION
FORMSDP SUMMARIES
Relational Schema1. Person(Pid, Fname, Lname, MI, Birth_date, Profile5)2. Student(Pid1, Housing7, University14, Pickup_Person3, Flight, Country11, price_preference,
year, sleep, wakeup, study, friends, outgoing)3. Community_Member(Pid1, occupation)4. Alumni(Pid1, Class, Occupation, Donation_Amount) 5. Profile(Profile_id, Pic, Email, Phone)6. Location(Street, City, State, Apt_Suite, Zip, x, y)7. Housing(Hid, offered_by_person1, Street6, Apt_Suite6, Zip 6, offered_by_org8, org_profile5,
price, availability_date, furnished, number_rooms, number_bathrooms, water, electice, garbage, gas, internet, move-in special)
8. Organization(OrgName, Profile_id5, Street6, Apt_Suite6, Zip 6, type, description)9. Department(DepName, University14)10. Event(EventName, Profile_id5, Street6, Apt_Suite6, Zip 6, description, attendance, date, time)11. Country(Name, Capital, Population)12. Language(Name, Countries_spoken_in)13. Resource(Rid, Owner1, Price, Quantity)14. University(Name, student_population, ranking)15. Donation(Did, Amount, Time, Date, Pid1)
INTRODUCTION QUERIES NORMALIZATION
FORMSDP SUMMARIES
Relational Schema (contd)16. Mentors(Mentor1, Mentee2)17. Student_University(Student2, University14)18. Person_in_Org(Person1, OrgName8, OrgProfile5)19. RSVP(Person1, EventName10, EventProfile5, SurveyScore)20. Student_in_Department(Student2, DepName9, UniName14)21. Person_speaks_language(Person1, Language12)22. Housing_near_Uni(Housing7, UniName14)23. Organization_University(OrgName8, OrgProfile5, UniName14)24. Org_holds_event(OrgName8, OrgProfile5, EventName10, EventProfile5)25. Org_speaks_Language(OrgName8, OrgProfile5, Language12)26. Org_Country(OrgName8, OrgProfile5, Country11)27. Dep_sponsors_event(DepName9, UniName14, EventName10, EventProfile5)28. Event_speaks_language(EventName10, EventProfile5, Language12)29. Event_country(EventName10, EventProfile5, Country11)30. Country_Language(Country11, Language12)31. Alumni_Uni(Pid4, UniName14, class_of)32. Alumni_Dept(Pid4, DepName9)33. Person_gives_donation(Pid1, Did15)34. Rommates(Pid11, Pid21)
INTRODUCTION QUERIES NORMALIZATION
FORMSDP SUMMARIES
Relational Design
INTRODUCTION QUERIES NORMALIZATION
FORMSDP SUMMARIES
Query 1: Roommate Matching
•Shows all possible roommate combinations ordered by MatchRating.• A dorm/off-campus housing facility can use it to pair up students interested in their housing
Description
Description of Attributes
Sleep Early to late sleep time (Scale of 1-5)
Wakeup Early to late (1-5)
Outgoing Outgoingness Level (1-5)
Study In room(1) - Library(5)
Friends Having friends in room never(1) – always(5)
INTRODUCTION DP SUMMARIES QUERIES NORMALIZATION
FORMS
Query 1: Roommate Matching
SELECT P.Fname, P.Lname, Q.Fname, Q.Lname, Min(0.2*(Abs(S.sleep-R.sleep))+0.2*(Abs(S.wakeup-R.wakeup))+0.2*(Abs(S.outgoing-R.outgoing))+0.2*(Abs(S.study-R.study))+0.2*(Abs(S.friends-R.friends))) AS MatchratingFROM Student AS S, Student AS R, Person AS P, Person AS QWHERE (((S.pid)=[P].[pid]) AND ((Q.pid)=[R].[pid] And (Q.pid)<[P].[pid]))GROUP BY P.Fname, P.Lname, Q.Fname, Q.LnameHAVING (((([P].[Fname]=[Q].[Fname]) And ([P].[Lname]=[Q].[Lname]))=False))ORDER BY Min(0.2*(20-Abs(S.sleep-R.sleep))+0.2*(20-Abs(S.wakeup-R.wakeup))+0.2*(20-Abs(S.outgoing-R.outgoing))+0.2*(20- Abs(S.study-R.study))+0.2*(20-Abs(S.friends-R.friends)));
SQL Code
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 1: Roommate Matching
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 2: New Student Forecasting
•Extracts the data of how many new students come each year which can then be used to forecast the future number of students•The year table is a one attribute table containing a list of years •Uses regression equation y=ax+b with slope b = (N∑XY - (∑X)(∑Y))/(N∑X2 - (∑X)2), and intercept a = (∑Y - b(∑X))/N. Where N = number of tuples, X =year, and Y = number of students
Description
SELECT y.year AS [Year], count(s.pid) AS Number_Of_Students, u.name AS UniversityFROM [year] AS y, student AS s, university AS uWHERE s.year=y.year AND s.university=u.nameGROUP BY y.year, u.nameORDER BY y.year;
SQL Code
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 3: Event Interest
•Outputs a list of all events along with their computed attendance rate, the average level of student interest, and a metric combining surveyed interest with actual attendance•Organizations throwing events with low attendance but high survey scores may need to look into changing venues or increasing advertising.
Description
SELECT e.EventName, e.Attendance/(Count(r.person)) AS Attendance_Rate, Avg(r.SurveyScore) AS Surveyed_Interest, Avg(r.SurveyScore)*e.Attendance/(Count(r.person)) AS Interest_MetricFROM Event AS e, RSVP AS rWHERE (((r.EventProfile)=[e].[Profile_id]))GROUP BY e.EventName, e.Profile_id, e.AttendanceORDER BY Avg(r.SurveyScore)*e.Attendance/(Count(r.person)) DESC;
SQL Code
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 3: Event Interest
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 4: Optimal Event Location
•Selects optimal potential event location on UC Berkeley campus in relation to attendee housing locations. •By utilizing P-Median approach for event location that minimizes total demand weighted distances•Assume P = 1 and calculate Dij by utilizing Euclidean distance formula:
Description
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 4: Optimal Event Location
SELECT e.EventName, l2.street AS Potential_Location, sum(((l.x-l2.x)^2)+((l.y-l2.y)^2)^0.5) AS distance, AVG(s.EventInterest) AS DemandFROM Student AS s, RSVP AS p, Housing AS h, location AS l, location AS l2, Event AS eWHERE s.PID=p.person And p.EventName=e.EventName And s.housing=h.hid And h.street=l.street And h.state=l.state And h.city=l.city And h.apt_suite=l.apt_suite And h.zip=l.zipGROUP BY e.EventName, l2.streetORDER BY e.EventName, sum(((l.x-l2.x)^2)+((l.y-l2.y)^2)^0.5);
SQL Code
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 4: Optimal Event Location
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Assumptions: (1)Only take students who arrive at the airport between 8am to 7:59 pm into account(2)Buses leave the airport on the hour. (3)The opportunity cost of each student waiting for a bus for an hour is $10. (4) Each type I bus has a total of 5 seats and each type II bus has a total of 10 seats.(5) We only deal with the arrival hour of each student, (student arriving at 1:01pm is treated the same as a student arriving at 1:59pm in this query implementation. and a ten-seat-vehicle to the airport and back cost $50 and $100, respectively.
•For date, airport extract # of students arriving in each time interval C i
•A≤i≤L; Ci is interpreted as the number of students arriving at the airport no earlier than (i-1) o’clock but prior to i o’clock
Description
Query 5: Min Airport Pick-up Cost
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Query 5: Min Airport Pick-up Cost
Formulation Decision variables:tij= 1 if a type j bus is arranged to pick up students at i o’clock.tij = 0 otherwise; (For A≤i≤L, 1≤j≤2)Objective Function (Cost Min.):
SELECT s.airport AS Airport, s.arr_date AS Arr_Date, s.flight_arr_hour AS Arr_Time, COUNT(*) AS Number_of_Students
FROM student AS sGROUP BY s.flight_arr_hour, s.arr_date, s.airport;
SQL Code
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Subject to. People_constrain {Z in A,B,C,D,E,F,G,H,I,J,K,L}:
Query 5: Min Airport Pick-up Cost
INTRODUCTION DP SUMMARIES NORMALIZATION
FORMSQUERIES
Normalization Analysis: 1NF
R is in 1NF if the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
Profile (Profile_id, Pic, Emails, Phones)Pic (Profile_id, Pic)Email (Profile_id, Email)Phone (Profile_id, Phone)
1NF
INTRODUCTION DP SUMMARIES QUERIES NORMALIZATION
FORMS
Normalization Analysis: 2NF
R is in 2NF if R is in 1NF and every nonprime attribute A in R is fully functionally dependent on the primary key of R.
Location (Street, City, State, Apt_Suite, Zip, x, y)Assumption: ZIP_CODE determines CITY and STATE. Location1 (Street, Apt_Suite, Zip, x, y)Zip (Zip, City, State) Organization (OrgName, Profile_id5, Street6, Apt_Suite6, Zip6, type, description)Assumption: The name of an organization determines its type. OrgName (OrgName, Type)Organization1 (OrgName, Profile_id5, Street6, Apt_Suite6, Zip6, description)
2NF
INTRODUCTION DP SUMMARIES QUERIES FORMSNORMALIZATION
Normalization Analysis: 3NF
R is in 3NF if R is in 2NF and no nonprime attribute of R is transitively dependent on the primary key. Housing (Hid, offered_by_person1, Street6, Apt_Suite6, Zip 6, offered_by_org8, org_profile5, price, availability_date, furnished, number_rooms, number_bathrooms, water, electricity, garbage, gas, internet, move_in_special, ready_to_move_in)Assumption: For a housing place to be “ready to move in”, it has to have Internet, water, electricity, gas and garbage.Housing1 (Hid, offered_by_person1, Street6, Apt_Suite6, Zip 6, offered_by_org8, org_profile5, price, availability_date, furnished, number_rooms, number_bathrooms, move_in_special, Water, Electricity, Garbage, Gas, Internet)Ready_to_move_in (ready_to_move_in, Water, Electricity, Garbage, Gas, Internet)
3NF
INTRODUCTION DP SUMMARIES QUERIES FORMSNORMALIZATION
Normalization Analysis: BCNF
R is in BCNF if whenever a nontrivial functional dependency XA holds in R, then X is a superkey of R.
Student (Pid1, Housing7, University14, Pickup_Person3, Flight, Country11, price_preference, year, sleep, wakeup, study, friends, outgoing)
BCNF
INTRODUCTION DP SUMMARIES QUERIES FORMSNORMALIZATION
INTRODUCTION DP SUMMARIES QUERIES NORMALIZATION
FORMS
Organization Form
Person Form
INTRODUCTION DP SUMMARIES QUERIES NORMALIZATION
FORMS
Student Report
Questions?