HALL OF FAME INDUCTIONS
-
Upload
talia-strnad -
Category
Documents
-
view
62 -
download
4
Transcript of HALL OF FAME INDUCTIONS
HALL OF FAME
INDUCTIONSTALIA STRNAD
Agenda
Story
Problems
Solutions
Questions
Queries
References
Story
I am the dba at the baseball hall of fame.
My job duties include supervising the different customer service reps that
utilize the database on a daily basis.
At the museum, we use projections to display different information about
hall of fame inductions.
Also, we regularly get calls from organizations requesting statistics from the
hall of fame database.
Since my representatives don’t need access to the entire database, I
created different views for them to search through.
Problems
Did not have download problems with data
Online source converted to csv
However, had to copy/paste data into notepad, vs simply downloading data.
CSV format issues
%
Code names instead of real names
DECIMAL vs INT datatype
Try inserting/selecting value as integer
If column datatype is decimal need decimal value vs integer value
Problems with Common Table Expressions for certain math functions
Solutions
Changed data type from decimal to varchar in my table to accept %
Found new data to list names instead of codes
Changed selection to decimal value vs int value
For example: typed 437 instead of 437.0
Researched & found an example query of what I wanted: Common Table
Expression (CTE)
Helpful when you need to reference/join the same data set multiple times.
Questions Attempted
Are there more players than other positions inducted into the hall of fame
since 1936?
Are there more null or not null entries in the votes column?
How many managers have been inducted between 2009 & 2013? Since
2000?
Queries
Created a database
Created tables
Imported data into database/table
Inserted data into table
Created a view
Deleted entries in a table
Created an index
Retrieved data from more than one table
Combined rows from the player_info table based on the common
‘name’ field.
Queries Pt 2
Combined rows from the player_info & player_votestables based on the common ‘votes’ field.
Found the difference between 2 different counts of data
Counted all the votes in the table
Null valuesVacant (NOT EMPTY)
Not null values Not_Vacant
View all the results from the view
The year inducted between 2009-2013.
Query Pt 3
Described the table’s structure to test what
allowed nulls vs not nulls
VOTES & PERCENT_OF_BALLOTS = NULL
Concepts
Ch 1– Database (P1)
Structure that contains different categories of information and the relationships between these categories.
Use of: Created new database.
Ch 1- Sample Data (Entire chapter)
Set of data collected.
Use of: Retrieved sample data from website.
A sample of data: the view I created of managers inducted
after the year 2000.
Ch 2- Primary Key (P30)
Unique Identifier for the table.
All columns in the table functionally dependent on PK
No subcollection of the columns in the PK (if it’s a collection) also has attribute 1
Data didn’t come with one, so SQL created one (for each table & view).
Ch 2- First (1NF) & Second (2NF) normal forms (P40&44)
Does not contain a repeating group—1NF
1 NF & the primary key only contains a single column.—2NF
Concepts pt 2
Ch 3- INSERT command (P72)
Adds rows to a table.
Used when importing the data.
Ch 3- Data types (P70)
For each column in the table, specify the type of data to be stored.
For PERCENT_OF_BALLOTS, wanted to use DECIMAL. Needed VARCHAR instead. [%]
Ch 4- Aggregate functions (P112)
Special functions used by SQL
Used count to identify number of null values vs not nulls
Ch 4- Operators to retrieve certain sections/subsections of data (P105&108)
BETWEEN OPERATOR in view determine managers inducted BETWEEN 2009 & 2013
LIKE OPERATOR in create view query to only include those positions in view where the name is LIKE ‘MAN’.
Concepts pt 3
Ch 5- Self-join (P143)
Using an alias to join a table to itself
JOINED player_info table to itself with alias
NAME=NAME & INDUCTED_AS LIKE ‘%MAN%’
Ch 5- Correlated subquery (P139)
The subquery involves a table listed in the outer query
Ch 6- Creating new table from an existing (P167-169)
CREATE TABLE command
INSERT INTO command
Ch 6- DELETE COMMAND (P175)
To delete data from the database, use the DELETE COMMAND
DELETE COMMAND query
SELECT YEAR_INDUCTED, NAME, INDUCTED_ASFROM PLAYER_INFOWHERE EXISTS(SELECT *FROM AFTER2000WHERE INDUCTED BETWEEN 2010 AND 2014AND PLAYER_INFO.NAME = AFTER2000.NAMEAND NAME LIKE '%BOB%‘)GO
Concepts pt 4
Ch 7- CREATE VIEW COMMAND/defining query (P192)
A program’s/individual user’s picture of the database
Create view AFTER2000 to view the managers inducted after the year 2000
Ch 7-CREATE INDEX COMMAND (P 207)
The main way to increase the efficiency which data is retrieved from the
database.
Use of: Created index to search more easily through the names &
corresponding votes in descending order.
Takeaways
While working with my data, I began
to realize there were more
players than any other positions.
I couldn’t decide how to use basic math
in this assignment.
I realized, I could use math to describe this trend.
I figured out how to subtract 2 count queries and display the results as
‘difference.’
Sum up
I learned that even with a small data set (312 rows), you can still use the
data to help learn more about databases.
I enjoyed learning about & using baseball hall of fame statistics.
I enjoyed relearning all the database concepts.
References
http://stackoverflow.com/questions/2934192/beginner-sql-question-
arithmetic-with-multiple-count-results
http://stackoverflow.com/questions/4740748/when-to-use-common-table-
expression-cte
http://www.baseball-reference.com/awards/hof.shtml
Once at this site, select the CSV option as shown below. The data will be
converted to CSV on the browser. The downside is the CSV format is not directly
downloaded and you must copy/paste into a text editor.
Code--Appendix
CREATE DATABASE BASEBALL_STATS;
CREATE TABLE PLAYER_INFO
(
YEAR_INDUCTED VARCHAR(5) NOT NULL,
NAME VARCHAR(40) NOT NULL,
LIFE_SPAN VARCHAR(15) NOT NULL,
VOTED_BY VARCHAR(30) NOT NULL,
INDUCTED_AS VARCHAR(30) NOT NULL,
VOTES VARCHAR(15) NULL,
PERCENT_OF_BALLOTS VARCHAR(10) NULL
)
GO
Code—Pt 2
BULK
INSERT PLAYER_INFO
FROM '\\cpsc231-sqla\SHARED\TS28576\playerInfo.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
Code—Pt 3
CREATE VIEW AFTER2000 (INDUCTED, NAME, JOB_TITLE) AS
SELECT YEAR_INDUCTED, NAME, INDUCTED_AS
FROM PLAYER_INFO
WHERE YEAR_INDUCTED > 2000
AND INDUCTED_AS LIKE '%MAN%';
CREATE TABLE PLAYER_VOTES
(NAME VARCHAR (50) NOT NULL,
VOTED_BY VARCHAR(15) NOT NULL,
VOTES DECIMAL(4, 1) NULL,
BALLOT_PERCENTAGE VARCHAR(25) NULL
)
GO
Code—Pt 4
INSERT INTO PLAYER_VOTES
SELECT NAME, VOTED_BY, VOTES, PERCENT_OF_BALLOTS
FROM PLAYER_INFO
WHERE VOTES < 300.0
GO
SELECT P1.NAME, P2.NAME, P1.INDUCTED_AS
FROM PLAYER_INFO P1, PLAYER_INFO P2
WHERE P1.NAME = P2.NAME
AND P1.INDUCTED_AS LIKE '%MAN%'
GO
Code—Pt 5
SELECT PLAY.NAME, VOTE.NAME, PLAY.INDUCTED_AS, VOTE.VOTES
FROM PLAYER_INFO PLAY, PLAYER_VOTES VOTE
WHERE PLAY.VOTES = VOTE.VOTES
AND PLAY.INDUCTED_AS LIKE '%PLAY%'
GO
SELECT * FROM AFTER2000 WHERE INDUCTED BETWEEN 2009 AND 2013
GO
SELECT COUNT(*) AS VACANT FROM PLAYER_INFO WHERE VOTES IS NULL;
SELECT COUNT(*) AS NOT_VACANT FROM PLAYER_INFO WHERE VOTES IS NOT NULL;
Code—Pt 6
CREATE INDEX VOTES ON PLAYER_VOTES(NAME, VOTES DESC);
SELECT YEAR_INDUCTED, NAME, INDUCTED_AS
FROM PLAYER_INFO
WHERE EXISTS
(SELECT *
FROM AFTER2000
WHERE INDUCTED BETWEEN 2010 AND 2014
AND PLAYER_INFO.NAME = AFTER2000.NAME
AND NAME LIKE '%BOB%'
)
GO
Code—Pt 7
WITH c1 AS(select count(*) as position from PLAYER_INFO WHERE INDUCTED_AS IS NOT NULL),
c2 AS
(SELECT COUNT(*) AS PLAYER FROM PLAYER_INFO WHERE INDUCTED_AS LIKE '%PLAY%')
SELECT c1.position, c2.PLAYER, c1.position -c2.PLAYER AS DIFFERENCE
FROM c1, c2
GO
sp_help PLAYER_INFO
GO
DELETE FROM PLAYER_VOTES WHERE VOTED_BY LIKE '%RU%';
Schema Diagram
Data Flow Diagram