Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a...

71
Stat 342 - Wk 3 What is SQL Proc SQL 'Select' command and 'from' clause 'group by' clause 'order by' clause 'where' clause 'create table' command 'inner join' (as me permits) Stat 342 Notes. Week 3, Page 1 / 71

Transcript of Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a...

Page 1: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Stat 342 - Wk 3

What is SQL

Proc SQL

'Select' command and 'from' clause

'group by' clause

'order by' clause

'where' clause

'create table' command

'inner join' (as time permits)

Stat 342 Notes. Week 3, Page 1 / 71

Page 2: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Feedback About the lab.

I got some feedback that there is too much to do in the lab in one hour.

To me, this is fine, because it still leaves you with a collectionof programs to run for later.

Stat 342 Notes. Week 3, Page 2 / 71

Page 3: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

What is SQL?

SQL stands for System Query Language. It is used to build and read from large central databases. Most of the login systems and profile systems you interact with online use either SQL or something derived from them.

Industry jobs are asking for SQL experience by name, and it takes a short time to master.

Stat 342 Notes. Week 3, Page 3 / 71

Page 4: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Stat 342 Notes. Week 3, Page 4 / 71

Page 5: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

When a login is attempted, an SQL mini-program called a query, is run. (For any trustworthy site, your password is encrypted on YOUR computer first)

Stat 342 Notes. Week 3, Page 5 / 71

Page 6: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Stat 342 Notes. Week 3, Page 6 / 71

Page 7: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

In this query, the data table 'login' contains two variables, 'user' and 'passhash' (encrypted password). The query will only find something that matches BOTH entries.

Stat 342 Notes. Week 3, Page 7 / 71

Page 8: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

SQL is used software that involves both a server (large central computer) and a client (your personal computer)

When the data is on a server, only parts returned (output) tothe client are visible to the client.

Stat 342 Notes. Week 3, Page 8 / 71

Page 9: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

This is good for security, as well as scalability. An SQL database on a server could hold a dataset much larger than a typical hard drive. Also, many people can access and work with the same master copy of the data at the same time.

That master copy, being on the server, can be updated live.

Stat 342 Notes. Week 3, Page 9 / 71

Page 10: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

SQL is also works across many different software systems, with a minimum of language-specific quirks.

Stat 342 Notes. Week 3, Page 10 / 71

Page 11: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

One 'native' version of it is called MySQL.

Stat 342 Notes. Week 3, Page 11 / 71

Page 12: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

In SAS, SQL code can be entered and used using the PROC SQL procedure, which will look at for most of this lecture.

Stat 342 Notes. Week 3, Page 12 / 71

Page 13: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Stat 342 Notes. Week 3, Page 13 / 71

Page 14: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

In R, SQL code can be entered through your choice of severalpackages. A popular one is sqldf because it allows you run queries on online or local data with very little effort.

Stat 342 Notes. Week 3, Page 14 / 71

Page 15: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Ready to start this chapter in earnest?

Stat 342 Notes. Week 3, Page 15 / 71

Page 16: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

About PROC SQL

The SQL procedure is the Base SAS implementation of Structured Query Language. PROC SQL is part of Base SAS software, and you can use it with any SAS data set (table).

We won't be using PROC SQL for the more complex tasks of setting up and dealing with a server, but we will use, PROC SQL as an alternative to other SAS procedures or the DATA step.

Stat 342 Notes. Week 3, Page 16 / 71

Page 17: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Why would we work with SQL at all if other PROCs and DATA steps can do most or all of these things already?

1. Sometimes it's just easier or simpler to do something in SQL than to do it another.

2. SQL code is very similar from system to system, so you could (nearly) take it and copy/paste it into MySQL or R with sqldf and get an identical result. Also, someone familiar with SQL can read your code without knowing SAS.

The general form of proc sql is:

Stat 342 Notes. Week 3, Page 17 / 71

Page 18: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

proc sql <SAS options>;

<SQL query>;

Usually proc steps always end with 'run;' to tell SAS when to stop compiling and start running the code. For SQL queries, the marker to switch from compiling to running is simple ';' , so proc sql uses this ending convention instead.

proc sql <SAS options>;

<SQL query>;

Stat 342 Notes. Week 3, Page 18 / 71

Page 19: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

The options that can go in this first line include 'title' and 'outobs', which set a title or the number of output rows, justas they would in proc print.

'outobs' in particular is irritating. Other versions of SQL determine the number of rows inside the query with 'limit' or 'top'. If you are copying an SQL query into SAS, these clauses (code parts) will have to be removed and translated.

Stat 342 Notes. Week 3, Page 19 / 71

Page 20: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

libname stat342 '/folders/myshortcut/..';

proc sql <SAS options>;

<SQL query>;

Also, if you are using global statements like titles, libraries, and ODS*, these are carried into proc sql just like any other procedure.

However, for most cases where both SAS and SQL have a way of writing something, it's okay to write either way.

Stat 342 Notes. Week 3, Page 20 / 71

Page 21: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

For example, SAS allows the use of 'ge' in the place of ' >= ' for 'greater than or equal'.

Usually SQL only allows ' >= ' to be used.

In proc sql, either ' >= ' or ' ge ' can be used.

Stat 342 Notes. Week 3, Page 21 / 71

Page 22: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

So...

This works,

proc sql;

select * from libname.datasetname

where income > 10000;

and so does this,

proc sql;

select * from libname.datasetname

where income ge 10000;

Stat 342 Notes. Week 3, Page 22 / 71

Page 23: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

SAS and SQL work very well together. It also does some of the additional grunt work for you.

For example,

- When creating a table, usually an SQL query will fail to run if there is already of the table of the mentioned name. PROCSQL will do the work of deleting the old table and replacing it with the new one automatically.

Stat 342 Notes. Week 3, Page 23 / 71

Page 24: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

- When deleting a table, usually an SQL query will throw an error if the table isn't there. PROC SQL simply ignores it.

Stat 342 Notes. Week 3, Page 24 / 71

Page 25: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

About the select statement

The select statement is used to get information from one or more data sets. Most of the work done in SQL uses select in some way.

The simplest select query isproc sql;

select * from tablename;

Which selects all the variables from the dataset 'tablename'.

Stat 342 Notes. Week 3, Page 25 / 71

Page 26: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

We can also set the table to only return a few rows, instead of all of them. This is done with the 'outobs' setting, just like in proc print.

The following will return 10 rows of every variable from the dataset 'tablename'.

proc sql outobs=10;

select * from tablename;

Stat 342 Notes. Week 3, Page 26 / 71

Page 27: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

In SQL, * works like _all_ in SAS, it chooses every variable.

We can specify particular variables by putting them between'select' and 'from'.

The following...

proc sql;

select age, name, bees from tablename;

...will give you table of every row from tablename with 3 columns, one each for 'age', 'name', and 'bees'.

Stat 342 Notes. Week 3, Page 27 / 71

Page 28: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Spacing is irrelevant to the SQL compiler, so it's good practice to space the query with one clause on each line.

A clause is any portion of a query using a word with a specialfunction, such as 'from', which indicates the name of the input table.

proc sql;

select age, name, bees

from tablename;

The 'from' clause can refer to any data set in SAS, not just ones in the work library.Stat 342 Notes. Week 3, Page 28 / 71

Page 29: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Libraries are specified as <library> . <dataset>, just like otherdata and proc steps.

libname wk03 '/folders/myshortcut/wk03';

proc sql;

select age, name, bees

from wk03.tablename;

You can rename variables when you select them with

<input varname> as <output varname>

Stat 342 Notes. Week 3, Page 29 / 71

Page 30: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

. This is useful if the original name isn't descriptive enough, or a name is used multiple times in different tables.

proc sql;

select age as years, name, bees as buzzers

from wk03.tablename;

You can also run simple functions on variables, and name the result something. Simple functions include

mean() Get the average of this variableStat 342 Notes. Week 3, Page 30 / 71

Page 31: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

count() Get the number of rows of this var.

first() Value of first row

sum() Get the number of rows

round() Rounds each value

ucase() Converts any letters to upper case

Stat 342 Notes. Week 3, Page 31 / 71

Page 32: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Figure out what this code does

proc sql;

select age, lcase(name) as Name,

count(name) as N,

from wk03.tablename;

Stat 342 Notes. Week 3, Page 32 / 71

Page 33: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Figure out what this code does

proc sql;

select age, lcase(name) as Name,

count(name) as N,

from wk03.tablename;

Stat 342 Notes. Week 3, Page 33 / 71

Page 34: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Hope this isn't too fuzzy for you.Stat 342 Notes. Week 3, Page 34 / 71

Page 35: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

The 'group by' clause

'group by' is a clause you can include in a SAS query to tell it that any aggregation done should output one row for each unique value of the grouping variable. Without it, aggregation will be done on everything that's selected.

Usually when aggregating a variable with mean(), count(), ormax(), it not just the average, count, or largest value of all the values that you're interested in.

Stat 342 Notes. Week 3, Page 35 / 71

Page 36: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

If you had a data set of sales at a store, you could be interested in:

- The total value of sales in each department.

- The most expensive item sold each day.

- The number of sales made each day.

- The number of each product that was sold all year.

Stat 342 Notes. Week 3, Page 36 / 71

Page 37: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

The total value of sales in each department.

proc sql;

select dept, sum(price) as total

from salesdata

group by dept;

Here, the variable 'total' will have sum of the values in 'price'for each of the departments.

Stat 342 Notes. Week 3, Page 37 / 71

Page 38: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

- The most expensive item sold each day.

- The number of sales made each day.

proc sql;

select day, max(price) as biggest

, count(price) as Nsales

from salesdata

group by day;

Stat 342 Notes. Week 3, Page 38 / 71

Page 39: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

- The number of each product that was sold all year.

proc sql;

select productID, count(productID) as Nsales

from salesdata

group by productID;

Stat 342 Notes. Week 3, Page 39 / 71

Page 40: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

More than one variable can be used as grouping variable. If this is done, aggregation will be done on every unique COMBINATION of those variables.

The following would give the total sales made in each department for each day.

proc sql;

select day, dept, sum(price) as total

from salesdata

Stat 342 Notes. Week 3, Page 40 / 71

Page 41: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

group by day, dept;

One more point: The 'outobs' setting refers to the table that is OUTPUT. It does not affect the table that is being used to aggregate data.

This table will aggregate information from thousands of sales, but it will show the first 10 day-dept combinations.

proc sql outobs=10;

select day, dept, sum(price) as total

from salesdata

Stat 342 Notes. Week 3, Page 41 / 71

Page 42: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

group by day, dept;

Stat 342 Notes. Week 3, Page 42 / 71

Page 43: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Don't be afraid to axolotl questions

The 'order by' clause

The rows that come out of an SQL query can be very disordered. If you are accessing data from a server composed of multiple computers or hard drives, you may even end up with the rows of your data in a different order each time you request it.

The 'order by' clause makes those rows more consistent by dictating which ones should appear on the top.Stat 342 Notes. Week 3, Page 43 / 71

Page 44: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

The syntax for the 'order by' clause is

order by <varname> <asc/desc>

for one variable, and

order by <var1> <asc/desc>, <var2> <asc desc>

Stat 342 Notes. Week 3, Page 44 / 71

Page 45: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

for two variables. Three or more variables works the same way: variable name, then asc or desc, then a comma.

If the ordering variable is numeric, the option 'desc' will put the rows in order of highest to lowest value of that number (desc stands for 'descending order'). 'asc' will order them from lowest to highest (ascending order).

If the ordering variable is text, then ordering will be done alphabetically. (asc is A-Z, desc is Z-A)

Stat 342 Notes. Week 3, Page 45 / 71

Page 46: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

If the ordering variable is time based, like a date, then the ordering will be done in chronological order. (asc is earliest first, desc is latest first)

More than one variable can be used as an ordering variable. The order of the first variable counts as more important thanthat of the second. In other words, the second one is a 'tiebreaker' for the first one.

Stat 342 Notes. Week 3, Page 46 / 71

Page 47: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

If there are still ties after all the ordering variables are used, the order of rows for ties is unpredictable. If a local dataset is the 'from' dataset, the order within ties will probably be the order of the rows of that 'from' dataset.

Here, the countries will be arranged by immunity rates in 2000, starting with the highest rate. There are several countries with a 99% immunization rate in the year 2000, so among those the immunization rate in 1995 is used.

Any country with a 98% immunization rate in 2000 will show up AFTER any country with a 99% rate in 2000, regardless of 1995 rates.

Stat 342 Notes. Week 3, Page 47 / 71

Page 48: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

proc sql;

select Country, Y1995, Y2000

from wk03immunity

order by Y2000 desc, Y1995 desc;

The 'where' clause

The 'where' clause is used to subset the data being used. Only rows that satisfy the 'where' criteria are returned (or processed by grouping).

In R, the closest analogue is the which() command.

Stat 342 Notes. Week 3, Page 48 / 71

Page 49: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

In SAS, the closest analogue is if-then statements in the data set.

The following SQL procedure would return the row entry of everyperson older than 25.

proc sql;

select age, name, bees

from tablename

where age > 25;

Stat 342 Notes. Week 3, Page 49 / 71

Page 50: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

More than one condition can be used, but the conditions have to be separated by boolean operators 'and', 'not' and 'or'.

proc sql;

select age, name, bees

from tablename

where age > 25 and age < 35;

Here, two conditions are being placed on the same variable.

Stat 342 Notes. Week 3, Page 50 / 71

Page 51: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

More than one variable can be used. The 'where' conditions only need to refer to variables in the table (or tables) being input. They don't even have to be the variables being selected.

proc sql;

select age, name, bees

from tablename

where age > 25 or dogs > 3;

Stat 342 Notes. Week 3, Page 51 / 71

Page 52: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Try to interpret this one!

proc sql;

select age, name, bees

from tablename

where age > 25 and name is not 'Thor';

Stat 342 Notes. Week 3, Page 52 / 71

Page 53: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Criteria from the 'where' clause is applied before aggregation. So any functions and 'group by' code is only run on rows of the tablethat are part of the 'where' subset.

The following would only total the revenue from sales that were more than $5.00 per item.

proc sql;

select day, dept, sum(price) as total

from salesdata

where price > 5.00

group by day, dept;

Stat 342 Notes. Week 3, Page 53 / 71

Page 54: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Another Example problem - Interpret this:

proc sql;

title 'Population of ...';

select Continent, sum(Population) as TotPop

from sql.countries

where Population gt 1000000

group by Continent

order by TotPop;

Notice also that the several clauses in this sql query appear in a fixed order. This order cannot be changed.

Stat 342 Notes. Week 3, Page 54 / 71

Page 55: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

proc sql <options like outobs>;

title <title>;

select <variables, aggregation, names>

from <data set>

where <condition>

group by <vars>

order by <vars>;

Stat 342 Notes. Week 3, Page 55 / 71

Page 56: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

This is also the order computation is done. The title and variable names are decided, then the data set is determined,then the conditions on the data set, then aggregation, and finally sorting the output.

Stat 342 Notes. Week 3, Page 56 / 71

Page 57: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Let's get creative!

Stat 342 Notes. Week 3, Page 57 / 71

Page 58: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

The 'create table' command

'create table' is not a part of the select command at all, but a completely separate command.

It can used before a 'select' query to tell SAS to make a new dataset with the output from the select command, rather than outputting it to a table.

Stat 342 Notes. Week 3, Page 58 / 71

Page 59: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Note that all the output of 'select' is a table, even if that table only includes a single value of a single row.

The syntax for the create table command is

proc sql;

create table <new table name> as

<select query>;

Stat 342 Notes. Week 3, Page 59 / 71

Page 60: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

'create table' is placed at the beginning of the query, right after 'proc sql;', which is used to tell SAS that a query is approching.

The 'select query' would be output as a table, but instead is it saved as a SAS dataset (or an SQL table in other platforms).

The following code takes our aggregation of sales by day and department and makes it a data set we can review directly.

Stat 342 Notes. Week 3, Page 60 / 71

Page 61: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

proc sql;

create table sales_summary as

select day, dept, sum(price) as total

from salesdata

group by day, dept;

...that new aggregation data set can be read more quickly than the raw sales data. It DOES have to be updated with any new information that is used in the original data table.

Stat 342 Notes. Week 3, Page 61 / 71

Page 62: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

If a new day of sales is added to 'salesdata', it will not show up in 'sales_summary' automatically.

Stat 342 Notes. Week 3, Page 62 / 71

Page 63: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Here we would observe only the summaries in 2016. Note the date format.

proc sql;

select * from sales_summary

where day > 2015-12-31;

Creating a new table not only allows you to save your work in a permanent location, but it can also make more complicated

Stat 342 Notes. Week 3, Page 63 / 71

Page 64: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

analyses much easier, such as inner joins between three or more datasets.

Inner Joins!

Inner joins are the most popular of several ways to combine two or more datasets. If someone refers simply to a 'join', it is usually an inner join.

Stat 342 Notes. Week 3, Page 64 / 71

Page 65: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

A select statement with an inner join takes variables from both datasets and matches them up according to some variable in comment, such as userID or date.

Let Key1 the joining varaible.

Stat 342 Notes. Week 3, Page 65 / 71

Page 66: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

Stat 342 Notes. Week 3, Page 66 / 71

Page 67: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

An inner join makes a new dataset with one row for each matched variable value and the chosen variables from each.

Stat 342 Notes. Week 3, Page 67 / 71

Page 68: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

With an inner join, only 1 row is created for each instance where a value from the first join variable matches a value from second join.

There are other kinds of joins, such as left join, right join, and outer join.

For an outer join, every COMBINATION that is one value from table 1 and one value from table B. In the case of

Stat 342 Notes. Week 3, Page 68 / 71

Page 69: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

having N ID values, we could end with N2 combinations of ID values.

A select statement with an inner join has syntax like this:

select <dataset 1>.<variable>, ...<dataset 2>.<variable>

from <dataset 1>

inner join <dataset 2>

on <condition that matches the two datasets together>;

Stat 342 Notes. Week 3, Page 69 / 71

Page 70: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

The most common condition used here is dataset1.variable =dataset2.variable

In the following code, the Y2000 variable from each of two different data sets, teen fertility and school years. The joining variable found in both datasets is 'country'.proc sql;

select wk03teenfertility.Country, wk03teenfertility.Y2000 as fertility2000, wk03schoolyears.Y2000 as school2000

Stat 342 Notes. Week 3, Page 70 / 71

Page 71: Stat 342 - Wk 3 - SFU.cajackd/Stat342/Lect_Wk03.pdf · 2016-09-23 · A clause is any portion of a query using a word with a special function, such as 'from', which indicates the

from wk03teenfertility

inner join wk03schoolyears

on wk03teenfertility.Country = wk03schoolyears.Country;

Stat 342 Notes. Week 3, Page 71 / 71