Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database...

38
Featherman’s T-SQL Adventures © - Arrays (aka SQL Table Variables) Arrays are used in data management, data manipulation, data compiling, and report generation processes. Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction data from different sources stored on disk can be copied into one or more RAM based arrays so that the data can be comingled and then analyzed. It is typical to have to bring together many different datasources to analyze business performance. Marketers for example often purchase datasets such as DMV data that can be used as an indicant for affluence of a zip code. Buying industry data is also helpful to compare a firm’s performance as compared to the industry. The analyst would use arrays to merge in the purchased data (this is where understanding granularity is key). Once the data is merged, analytics can be run, dashboards updated, and the newly configured data can be written back to disk storage for long-term storage. The process of data retrieval and manipulation can be automated with SQL stored procedure, for example so that a dashboard can be updated every hour. The major benefit of using arrays is that they can take in data from many different sources merged into new formations and then saved back to disk. If you connect reports to this process then the reporting infrastructure can also be automated. To perform analytics and generate insight into business problems the analyst often needs to pull together many different datasets. The insight is generated when new columns of metrics can be built. An example is to merge in weather or credit level data to investigate seasonal changes to automobile sales in different regions. While analysts and DBA’s can pull together data into arrays for data processing for a special research project, DBA’s also spend considerable time automating these processes to 1 Figure 1 courtesy https://softwarerecs.stackexchange.com

Transcript of Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database...

Page 1: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Featherman’s T-SQL Adventures © - Arrays (aka SQL Table Variables)

Arrays are used in data management, data manipulation, data compiling, and report generation processes. Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction data from different sources stored on disk can be copied into one or more RAM based arrays so that the data can be comingled and then analyzed. It is typical to have to bring together many different datasources to analyze business performance. Marketers for example often purchase datasets such as DMV data that can be used as an indicant for affluence of a zip code. Buying industry data is also helpful to compare a firm’s performance as compared to the industry. The analyst would use arrays to merge in the purchased data (this is where understanding granularity is key).

Once the data is merged, analytics can be run, dashboards updated, and the newly configured data can be written back to disk storage for long-term storage. The process of data retrieval and manipulation can be automated with SQL stored procedure, for example so that a dashboard can be updated every hour. The major benefit of using arrays is that they can take in data from many different sources merged into new formations and then saved back to disk. If you connect reports to this process then the reporting infrastructure can also be automated.

To perform analytics and generate insight into business problems the analyst often needs to pull together many different datasets. The insight is generated when new columns of metrics can be built. An example is to merge in weather or credit level data to investigate seasonal changes to automobile sales in different regions. While analysts and DBA’s can pull together data into arrays for data processing for a special research project, DBA’s also spend considerable time automating these processes to perform regular ETL (Extract, transform, Load) processes to support business analysis.

Consider the scenario of a graduating student that needs to choose amongst three job offers in three different cities. They are needing to base their career decision on multiple criteria (e.g., weather data, crime data, housing price data, entertainment opportunity ratings.). The typical thoughtful student would pull all the data for each decision criteria into a spreadsheet to facilitate comparison. An array is similarly a spreadsheet in the memory of the computer server, you can manipulate the data in each cell (intersection of row and column), and you can then aggregate the data in new ways.

The concept in this scenario is the same, once you pull the data together, you can compare it and make new calculations to facilitate decision-making. You could pull data from different datasources into columns of your array. Often by pulling data together you can see the patterns and can have an epiphany. When using an array in a database (SQL calls them table variables) you run SQL SELECT queries to retrieve data from different

1

Figure 1 courtesy https://softwarerecs.stackexchange.com

Page 2: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

databases, and comingle this data, then create new columns of derived analytics. This is the alchemy of business analysts, this is the magical insight the best analysts are famous for. Successful business analysts create new insights and understanding when they create new columns of analytics. In the spreadsheet or array you would create one row per city (being considered for employment). The columns are built from the imported data. Once you have all your important decision criteria implemented in columns of arrays the value that the business analyst adds are the new columns of metrics that you can implement. In the student scenario, for example you would want to create a new column such as a weighted average so you can compare your options numerically.

The point is worth repeating you don’t get promoted because you can do SQL, manage data, and automate processes, these are expected. The analyst or DBA gets promoted when they provide insight and analysis to top management that make problem-solving easier. It’s a merging of business understanding, intuition, technical skills and persistence.

But we are getting ahead of ourselves. Back to the nuts and bolts of SQL Arrays. SQL statements build datasets, and array tables free up the developer to merge data from different datasources). Because you can add columns of data together, they can be at different levels of granularity. This is another innovation that has not been presented in our class together.

Table variables (arrays) have been used by SQL programmers for over 50 years to transfer data from storage to working memory to perform data management, data integrations, and data aggregation. Shuttling data from disk storage to RAM storage for processing is a standard operating procedure for data centers because RAM has always worked a lot faster than secondary storage (RAM processing is measured in nano-seconds compared to milliseconds for disk read/write speed). Arrays are used in many forms of coding to store values, and as a place where you can manipulate and comingle data and build a structure of values.

For our needs arrays tables are used to hold data and are useful to pull data together when other SQL query methodologies do not work.

A recurring reporting problem is that data is needed at different levels of granularity, for example it is common to add a final column with totals or % of total metric. In SSRS, PowerBI or Tableau reports you can add column and row totals easily. If your dataset from a SQL query needs a mixture of detailed and summary data however (as a function of ETL processing) then you can use one or more arrays to mingle data that was calculated at different levels of summarization. Often several layers of aggregation and data shaping are needed in ETL processes.

Array tables excel at allowing you to pull data together, co-mingling data from different sources, into a compiled dataset that is ready for analysis. Bringing disparate data into one organized dataset allows dashboards and reports to be easily generated and updated. A pre-requisite is that you need to have a dimension attribute (column) in common between the different source data tables. The field in common will usually be a primary key such as employeeID, productID, storeID, but can also be textual city, state, or country (for example when compiling econometrics data). When you can’t figure a way to pull data together from different sources (i.e., the joins don’t work, and neither does a sub-query), then pull the data into one or more an array table. If you need more than one array then you can join the array tables inside your query.

2

Page 3: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

When using array tables, in essence you are just pulling data from different sources into one table to compact data for storage and further analysis. For example, array tables can hold data that is culled from different sources, and this array table (aka resultset) can be written to a new permanent SQL database table, and used as the datasource for reports.

Often the compiled data is the datasource for a dashboard, other times the arraytable is the in-memory storage table to hold compiled data in the middle of an Extract, Transform and Load process. The newly compiled array table can then be merged with other array data in an iterative ETL process of data culling, formatting and combining. The data organization can be automated by running stored procedures and the custom built array data can be saved to SQL database tables for permanent storage.

Array tables then are your canvas and you the emerging DBA can paint your masterpiece dataset onto them, column by column.

Arrays are either wide (many columns), tall (many rows) or both. You specify the columns, their names and data types. Next you run a SELECT statement that specifies the data for the columns, and to then insert rows of data into the array.

To improve data management and simplify organization of data assets, a DBA may want to build fewer more complex resultsets, each one serving as the datasource for many reporting needs (rather than build lots of queries, build a few that are multi-purpose!!). The array table then is usually the intermediate step needed to pull together data into a complex dataset (then save the dataset into a new database table.

Let’s get started!

Examples:

USE [Featherman_Analytics];

DECLARE @SubCategory nvarchar(30) = 'Mountain Bikes'

SELECT [Model], Sum(OrderQuantity) as [Total] FROM FactResellersales as sInner JOIN [dbo].[AW_Products_Flattened] as pON p.ProductKey = s.ProductKey

WHERE [Sub Category] = @SubCategoryGROUP BY [Model]

In a later module when we automate the report creation,

Before we jump into arraytables which are a grid of variables where each column can be a different datatype, let’s take another look at declaring and using a single variable in SQL. What is a variable? A variable is a named piece of system memory (in RAM) that is created by, accessible and manipulated by SQL programs and the users of the SQL programs.

The DECLARE @variablename datatype is the way to create a variable. In this line you can set the value of the variable immediately (such as run a sub-query) or use a SET command later on. The program shown to the left allows one textual sub-category value. Here we show the query in test mode (by giving a value to the variable). Future usage of the query will receive values (from a dropdown list in an ASP.NET webpage, Excel, SSRS, or Tableau project) passed in and used in the WHERE statement to filter the results retrieved from the database. It works really well.

3

Page 4: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

we turn the query into a stored procedure. The value for the @subCategory variable would be passed from the reporting interface that is ‘calling’ the stored procedure

For example if the query is saved into a stored procedure called spUnitsForSubCategory then you would run the query and retrieve the data by typing

EXEC spUnitsForSubCategory 'Mountain Bikes'

Variables are use to pass values from one system to the next.

A simple nvarchar variable that can accept a word or phrase. Most variables that programmers use can store one value, a number, a date or a word or set of words. In T-SQL use the DECLARE phrase to create a variable in the memory (RAM) of the development machine. The DECLARE line shown to the left creates a variable of datatype nvarchar that can accept up to 30 characters (letters or numbers).

The idea is that this query would be ‘called’ or ‘invoked’ from Excel or a reporting software which passes in the value for the variable. This is called a parameterized query and can provide different results depending on value passed into the query, similar to a slicer. We can see that the variable is used in the WHERE clause to filter the results. So remember you can always pull a summary value into your dataset by using local variable and sub-queries. If you bring in a variable at a different level of summarization, then you can make a whole new set of metrics, such as percent of total for different categories.

Next in this document we focus on a special type of variable called a table variable because the shape is a grid table. You specify the number of columns, then later make a plan to fill the rows. Arrays (table variables) are used to bring data together from different datasources, such as different databases. After you merge and integrate the data, then you can create new columns of business performance metrics, and this is where you start to add value to your employer and boost your career.

Often the array is used to organize and compile data into a coherent dataset (with new columns of derived data – aka metrics). It is common to save the results of an array into a new database table, then connect a series of reports to that compiled database table.

4

Page 5: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

This module’s Data Management Tip

Its time to do some DBA work copying tables of data from one database to the next. This is much easier than exporting and emailing an excel or csv file. Another technique that DBA’s use is to create a .bak backup file and copy that to a new machine and restore the data onto a different machine that also has SQL Server Database software installed.

Procedure 1: copy data into a DB creating a new table To copy data from one database table into a new database table in the same database or into a different database, use the following procedure. When selecting the data you could change the columns, filter the columns, filter the rows, change granularity, etc. When you copy the data you create a new table. You can however only create the table once, so future data migration into the same table would use procedure #2, as you already have the table created.

SELECT * INTO newtable FROM oldtable---------------------------so if you log into SSMS using your database you could then copy data into your database CREATING A NEW TABLEUSE [MF11YourDatabase]

SELECT *

INTO [dbo].[Australia]

FROM [Featherman_Analytics].[featherman].[BikeSales_Australia]

--------------------------------------------------------------------------------------------------------------------------

Procedure 2: copy data into an existing database table You can copy rows of data from one database table into a second database table as follows (for example when copying data to archives).

INSERT INTO database table SELECT * FROM [database].[table]

You can also copy rows of data from a database into an array. You must already have the array created before you can load some or all the columnsINSERT INTO @array SELECT * FROM [database].[table]

You can also add a GROUP BY to this INSERT INTO query to change the granularity of the data (summarize it) before loading the data into a table or array.

5

What is an array anyway?

When you run a query you can see the results, but that’s not an array. You can’t alter the results of the query or save them unless they are organized into a data structure called an array, or the data is exported.

An array is a spreadsheet like partition of RAM, that can store data into rows and columns.

An array is like a tabular container that you put different data types and data sources into and make soup.

Page 6: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Copying Data: DBA Examples

USE [Featherman_Analytics]

SELECT * INTO Featherman_Analytics.dbo.ManagersList

FROM [AdventureWorksDW2012].[dbo].[DimEmployee]WHERE [Title] LIKE '%manager%'

First we demonstrate code that demonstrates how Featherman (the owner of the Featherman_analytics database) could copy a table of data from AdventureWorksDW2012.In this case both tables are in the same database but you can use this code to copy from one database to the next such as copying data from featherman_analytics into your MF01### database (or similar). You can run a similar query and use YOUR database similarly.

This code creates a new table in the database specified by the INTO statement. The code creates the ManagersList table in the Featherman_Analytics database and inserts the following data INTO it. The data copied in comes FROM the DimEmployee table of AdventureWorksDW2012.

INSERT INTO SalesArchives (SELECT * FROM SalesTransactionsWHERE CustomerID = @p1 AND OrderDate < ‘1/1/2016’)

DELETE FROM SalesTransactionsWHERE CustomerID = @p1 AND OrderDate < ‘1/1/2016’

In this example you copy rows of data to an archives database table, then delete the rows from the transaction table

Here data is first copied from the SalesTransactions table to the SalesArchives table (to keep the transactions fact table from getting too big, slowing down report queries).

After copying the data to archives you can delete it from the SalesTransactions table.

INSERT INTO @RSSales ([ProductKey],[Reseller Units])( SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] GROUP BY [ProductKey])

Pertinent to this module on arrays, this query demonstrates how to load rows of data from database tables into an array named @RSSales. We copy data from database tables INTO an array, to merge and integrate it with other data. Why?

We will use the INSERT INTO arrayname SELECT * FROM SQL table command

6

If you are logged into YOUR database then you can use this technique to move some data into your database account. Log into YOUR database, then USE YOUR database. Your INTO statement should be YOUR database.

Page 7: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Array Examples:

Use the prior information to move and copy data as needed. Now lets see how arrays can make an analyst’s work patterns easier.

First array example: Merging tables of data into one large table. Here we load an array with data from different tables, pulling all the data needed for a report into one array. In this special case each of the database tables have the same schema. This example loads an array with data, it could also be used to load a database table with data from several other database tables

It is common to give regional managers their own datatables on the corporate server to enable simple uploads of business performance data using a simple script. This approach can speed up data management and reporting at the regional level. Notice on the first image that there are six tables of data, one for each country where sales are made (The Rollup table stores the combined data – the opposite of drill-down is rollup).

A common requirement is to need to combine data from the regional BikeSales tables (rollup the data for higher level management) for some further analysis. How would you do that? Copy rows into Excel? No need (and that process is not automated so every time you want to refresh the report, ugh, back to Excel copy/paste purgatory. Behold the UNION query – which is used to add more rows of data to an array (or database) when the schema is the same across all the tables. You can use UNION queries to copy more rows of data into an existing table that is storing similar data.

What we want is the table on the right where you can see the data is merged from the different tables (Australia, Canada, France shown).

This example demonstrates how to use the INSERT INTO, SELECT * FROM, and UNION SELECT commands to pull rows of data from different database tables into an array, then write that array to a new table in the database. In this following example all the tables are in the same database (at corporate HQ) but the tables can also be in different databases. This SQL code should give you some ideas, you can pull different data together, transform it, add metrics then save the data into a database table, finishing by connecting a report to that table. We will of course automate the entire process. Analyst/DBA’s that can automate report delivery for the rest of the company, are heros, because they produce the compiled information managers need to make informed decisions. Better decisions based on data can be a competitive advantage.

USE [Featherman_Analytics]; Example #11) First you create a table array, here we have 4 columns of data

7

Page 8: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

DECLARE @Sales TABLE ([Country] varchar(50), [Product] varchar(50), [Model] varchar(50), [Totals] decimal(10,2) )

INSERT INTO @Sales ([Country], [Product], [Model], [Totals]) SELECT * FROM [featherman].[BikeSales_Australia]

UNION SELECT * FROM [featherman].[BikeSales_Canada]UNION SELECT * FROM [featherman].[BikeSales_France]UNION SELECT * FROM [featherman].[BikeSales_Germany]UNION SELECT * FROM [featherman].[BikeSales_UK]

DECLARE @TotalSales decimal = (SELECT SUM([Totals]) FROM @Sales)

SELECT [Country], [Model], SUM([Totals]) as [Totals], FORMAT(SUM([Totals])/@TotalSales, 'P2') as [% Total]

FROM @Sales

GROUP BY [Country], [Model]ORDER BY [Country], [Model]

/* If you are logged into SSMS as the database owner then you could save the compiled data from the array into a table on your DB as follows. It is common to compile data in arrays then save the compiled dataset into a DB table to facilitate reporting.SELECT * INTO [Featherman_Analytics].[dbo].[BikeSales_rollUP]FROM @Sales*/

(Country, Product, Model, Totals).

2) Next we insert rows of data into that array, all the data from the BikeSales_Australia. Remember the syntax is INSERT INTO @ARRAYTABLE(columns) SELECT * FROM SQL table

3) Next you can add more records (rows) with the UNION SELECT command. Each additional UNION SELECT command works with the INSERT INTO statement.

4) Next we use a sub-query to calculate a value for a % of Total calculation. This is a sub-query of the array table, that computes a grand total and stores that value into a local variable.

5) Now that we have all the data in the array and have totaled one column, we use a SELECT statement to build and display the output. Here we add a new % of total metric column. So you can run a GROUP BY query on data that is in an array? Sure you can also run a PIVOT.

We also do not need to select all the columns to include them in the dataset (the product column was omitted to change the granularity from product to model). The result is a reduction in the # rows. When the data is in an array we can add calculated columns, and shape the data into the format that we need.

6) Finally, when the data is shaped and formatted, use a SELECT * INTO FROM query to save the data back into the database into a new table (which can serve as the datasource of a report or dashboard).

8

Page 9: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

USE [Featherman_Analytics];

DECLARE @Sales TABLE ([Country] varchar(50), [Product] varchar(50), [Model] varchar(50), [Totals] decimal(10,2), [%Total] decimal(10,2))

DECLARE @TotalSalesA decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_Australia])DECLARE @TotalSalesC decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_Canada])DECLARE @TotalSalesF decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_France])DECLARE @TotalSalesG decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_Germany])DECLARE @TotalSalesUK decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_UK])

INSERT INTO @Sales ([Country], [Product], [Model], [Totals], [%Total])

SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesA

FROM [featherman].[BikeSales_Australia]

UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesC FROM [featherman].[BikeSales_Canada]

UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesF FROM [featherman].[BikeSales_France]

UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesG FROM [featherman].[BikeSales_Germany]

UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesUK FROM [featherman].[BikeSales_UK]

SELECT [Country], [Model], SUM([Totals]) as [Totals], SUM([%Total]) as [% Total]FROM @Sales

GROUP BY [Country], [Model]ORDER BY [Country], [Model]

Example #2

What if the manager wants to drill down to see % of totals by model within each country? We would need several local variables.

How do we provide that data grouping? First we add a new column to the array that will hold a new metric. Next we pull the totals from the 5 tables into 5 variables.Next we run our INSERT INTO SELECT FROM UNION SELECT statements.

The insert into loads the first set of rows, the UNION SELECT queries just add more rows.

Each UNION SELECT statement calculates the % of total for that country, for each product in the dataset and populates the values for the new column [%Total]. The final SELECT statement is how you display the compiled data.

A GROUP BY statement is used to change the granularity from product to Model (by removing product from the GROUP BY)

9

Page 10: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

10

Page 11: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example #3: This next example again shows that you can pull data from different sources and merge them into an array of your own design. The example is simple, but should give you the insight that you really can pull disparate data into a table for reporting. Here sub-queries to pull data from different datasources into the same array. This is a simple example that could be solved with two variables but extrapolating to a larger scenario, the concept can be very helpful. Here there is no primary key to connect the tables, but the data can still be pulled into adjacent columns of an array, which allows further analysis.

USE [Featherman_Analytics];DECLARE @ChannelWidth TABLE([#SubCategories in Stores] decimal, [#SubCategories on Web] decimal)

--insert the data retrieved below into the specified columns of the arrayINSERT INTO @ChannelWidth ([#SubCategories in Stores], [#SubCategories on Web])

SELECT --this is the sub-query for the first column(SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= rs.[ProductKey])

--this is the sub-query for the second column, (SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactInternetSales] as fisINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= fis.[ProductKey])

--now we display the arraySELECT * FROM @ChannelWidth

The brand manager is wanting to analyze and adjust how many product sub-categories are sold on each channel. Unless you write SQL often, this investigation might be time consuming. Sub-queries and arrays again come to the rescue.

Again you define an array and INSERT values INTO the virtual table using sub-queries, then you use two different SELECT statements that are sub-queries to select

Notice there is no primary key field that is in common between these two tables. We just want to pull values from the tables and display them in a table. Its just a baby step but an important one.

You can put any values you want into columns of an array, they can be totally different. The array can hold different types of unrelated data.

Usually you want to pull data columns about some attribute of interest (e.g., City) so your rows are used to organize the data. If your first column is US state, you could pull different data into columns for each row (each state).

11

Page 12: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Here is the general pattern of using arrays in this module

step 1 declare the @array w/columns. The DECLARE statement creates the columns of the array.

step 2 Optional: create and load local variables with values calculated from database tables using sub-queriesYou can choose to bring in values that are already summarized at this step, or you can bring in all the data into the array, then perform the summary query in step 4 below for example to total column in your array. If you are pulling a value from a different table, perform this at step 2).

step 3 Load some of the columns in your array, usually the dimension attributes and optionally some measuresThe INSERT INTO statement adds the rows to the array, filling some or all of the columns

INSERT INTO @array SELECT column names FROM Database tableOptional: (Use JOINS to integrate data, UNION SELECT to add more rows from different tables, use GROUP BY or PIVOT to change granularity of incoming data). When you load data into an array, often this is where the first data transformation occurs.

Insight: in some cases, you can create and load one array, in other cases you create and load a few arrays, and then copy the compiled data from the arrays into one larger summary array. You might need to use this extra step when the data will not comply.

step 4 now that you have some of the columns of the array loaded, you can transform the data, by calculating new columns of metrics, etc.UPDATE @array SET [Column 1] = [Column 2] * 1.4 This is the step where you create all the columns of analytics, transforming the data to create new insights. THIS IS THE MAIN VALUE ADD for an analyst. Focus on this step. Demonstrate that you understand that your new metrics are needed to create insight, to enable better business performance management. Give the managers the insights they need to make decisions. Learn the metrics that each department and process need.

step 5 Display the results of your analysis, using a SELECT statement. Optional: change the granularity with a GROUP BY()? Optional: add more columns of metrics.SELECT column names from @array

Step 6 Optional: copy your built array dataset into a new database table, or copy data into an existing table (add rows)

12

Page 13: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example #4:

The next is a little more powerful query – Often when you do a join of tables you do so to add more columns of metrics or to add rows to make more complete lists. This query tackles a different problem, adding aggregate columns from a second table. One way to do this is to create two virtual tables with the results you want then to join the tables to display the different aggregations. So here we create two virtual tables, select data into them, then finally pull the desired compiled data from the two virtual tables. This following query is MUCH EASIER by utilizing array tables.

Here you can see that the same product key is used in the database tables for both sales channels. The product key is an internal referent number used as the primary key (rather than the ProductAlternateKey] field). This is common when copying data into data warehouses.

In lines 7-12 The standard INSERT INTO SELECT FROM query is used to load rows of data into two columns of the array. This is a SELECT sub-query inside an INSERT INTO statement. Because there are no rows of data yet in the array, we can’t use an UPDATE SET command, we have to start with an INSERT command. So remember the pattern INSERT INTO SELECT FROM

In lines 14-19 because we already have rows of data in the array, we can add data to a column FOR THOSE EXISTING ROWS using the pattern UPDATE SET SELECT FROM

The UPDATE SET does not add more rows of data; rather it fills in a column with values for the rows you already have. Line 16 provides GROUP BY functionality. Line 18 is an INNER JOIN which may present a problem, if you wanted an OUTER JOIN

Line 21 adds two columns in the array together and makes sure the NULLS are removed so that the calculation works.

Lines 24-28 perform a percent of total calculation. If you want to format the column with a % then you have to use a NVARCHAR textual datatype for the column. Using a text column might make problems when you use PowerBI for the final reporting, if further data transformations and measures need to be calculated.

13

Run the code on the next page

Page 14: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

USE [AdventureWorksDW2014]

DECLARE @Combined TABLE ([Product] int, [Reseller Units] decimal, [Internet Units] decimal, [Total Units] decimal, [% Reseller Channel] nvarchar(8), [% Internet Channel] nvarchar(8))

INSERT INTO @Combined ([Product],[Reseller Units])( SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] GROUP BY [ProductKey])

UPDATE @Combined SET [Internet Units] =( SELECT ISNULL(SUM([OrderQuantity]), 0) FROM [dbo].[FactInternetSales] as s WHERE s.[ProductKey] = [Product])

UPDATE @Combined SET [Total Units] = ISNULL([Internet Units], 0) + ISNULL([Reseller Units], 0)

UPDATE @Combined SET [% Reseller Channel] = FORMAT([Reseller Units]/[Total Units], 'p1')

UPDATE @Combined SET [% Internet Channel] =FORMAT([Internet Units]/[Total Units], 'p1')

SELECT * FROM @Combined ORDER BY [Product]

Clearly the strength of using arraytables is the ease of creating calculated fields from the available columns in the arrays. UPDATE SET is your big hammer used to pound the data into shape.

If all the metrics columns needed here were created inside an inner join, then the query would be much harder to write. Especially if the tables do not fit together nicely.

Please note that the fields in the JOIN should have different names, to remove data ambiguity. In your array for the first column, it is strongly recommended that you use a similar but different column names in your array ESPECIALLY for the first key column.

This query did not work when the first column in the array was the same as the column form the underlying database table. The ambiguity would show up in the WHERE statement on line 18. The query worked when the column in the array was changed to [Product] to make it different, and apparently reduce the ambiguity of the query.

Please compare this pattern to the pattern listed on page 11 above.The problem with this query is that it performs an INNER JOIN and that might not produce all the records needed.

14

Page 15: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Saving data from an array into a database table

Line 30 above displays the data in the array

Line 32 creates a new table in the database and copies the data and schema from the array into the database table. You can only execute line 32 once. If you run line 32 more than once then you will receive an error “already have that table” meaning you can’t create it again. So you can only run line 32 once, but it is helpful in that the table is made quickly for you.

If you already have a table in your database and you want to copy rows of data into it, use lines 35-38. You might want to clear out the data that is currently in the database before refreshing the data. To clear out the old data then use code like line 33

Line 35-38 copies the data from the array into the database table. Lines 36, 37 are the columns that you are saving the data into.

This code is a little different because it is an INSERT INTO statement that uses a SELECT FROM WHERE command inside it.

If you want to add one row of values into your array or database table the code would look like

INSERT INTO tablename (column names) VALUES (data values separated by commas). This is shown in the next document.

15

Page 16: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example #5:

Here is another way to use arrays. Use sub-queries to load data into two array tables (pulling data from different data sources). Next use the final SELECT query to merge the data from the two arrays. This type of procedure is needed when the prior examples do not work.

Step 1 - declare the array tables and their columns

Step 2 – Select data using normal queries and load the data INTO the arrays

Step 3 Select columns from the two array tables, add metrics

Step 4 – join the arrays together on a common field

DECLARE @ArrayTable1 Table (column1 int, column2 Decimal)

DECLARE @ArrayTable2 Table (column1 int, column2 Decimal)

INSERT INTO @ArrayTable1(SELECT Column1, Sum(Column2)FROM SomeTable GROUP BY Column1)

INSERT INTO @ArrayTable2(SELECT Column1, Count(Column2) FROM SomeTableGROUP BY Column1)

SELECT columns and create metrics using columns from either array @ArrayTable1, or @ArrayTable2

FROM @ArrayTable1 AS T1, @ArrayTable2 AS T2WHERE T1.primarykeyfield = T2primarykeyfield

--You can also replace the prior two lines that connect the table on a common field with the following FULL JOIN

FROM @temptable1 AS T1FULL JOIN @temptable2 AS T2

This query shows conceptually one way to create and combine two tables of retrieved data when you have a common field of interest, the primary key (preferably integer). First you create arrays in step one and then SELECT values and INSERT the retrieved data INTO the arrays in step 2. Finally you use a final SELECT statement to pull data from the two table variables together and to create new columns of derived data so that the resultset has all the calculated or aggregate fields.

The WHERE statement is an inner join in that values are displayed ONLY if there is a value in both tables (both side of the join). The blue code shows the FULL JOIN which can show the data whether there is a match or not in the other table.

The major benefit is that when you pull data from different data sources into arrays you can join them and make new columns of required metrics, the data can be different levels of granularity.

These arrays are staging tables in RAM (common when creating ETL procedures). The code above can be used WHEN YOU KNOW FOR SURE that there is an attribute field in common and there are values in the field in both tables (such as month, or city). Reports often need to give perspective by displaying data at different levels of granularity to examine the scope of a problem, (city, state, country columns can all coexist in adjacent columns).

You can for example show a department is losing money for one store in one city, but the same department is not problematic at the state level. Managers need to know how big problems are.

16

Page 17: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example #6

USE [AdventureWorksDW2014];DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] decimal (10,2))DECLARE @INETSales TABLE ([ProductKey] int,[Internet Units] decimal (10,2)) INSERT INTO @RSSales ([ProductKey],[Reseller Units])(SELECT [ProductKey], ISNULL(SUM([OrderQuantity]), 0)FROM [dbo].[FactResellerSales]GROUP BY [ProductKey])

DECLARE @#ItemsRS decimal = (SELECT COUNT(*) FROM @RSSales)PRINT @#ItemsRS

INSERT INTO @INETSales ([ProductKey], [Internet Units])(SELECT [ProductKey], ISNULL(SUM([OrderQuantity]) , 0)FROM [dbo].[FactInternetSales]GROUP BY [ProductKey]) DECLARE @#ItemsIC decimal = (SELECT COUNT(*) FROM @INETSales)PRINT 'Number SKUs Internet Channel ' + CAST(@#ItemsIC AS nvarchar(8))

SELECT rs.[ProductKey], rs.[Reseller Units] AS [Reseller Units], ISNULL(inet.[Internet Units], 0) as [Internet Units], rs.[Reseller Units] + ISNULL(inet.[Internet Units], 0) AS [Total Units] ,FORMAT(rs.[Reseller Units]/(rs.[Reseller Units] + ISNULL(inet.[Internet Units], 0)), 'p1') AS [% Reseller Channel] ,FORMAT(ISNULL(inet.[Internet Units], 0)/(rs.[Reseller Units] + ISNULL(inet.[Internet Units], 0)), 'p1') AS [% Internet Channel] FROM @RSSales AS rsFULL OUTER JOIN @INETSales AS inet ON rs.ProductKey = inet.ProductKey--WHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKey

This example uses a full join to get all the product sale data from either sales channel (whereas the prior query gets only the products that are sold on both channels). Try running the query with and without the final WHERE filter.

While this is a FULL JOIN that should get ALL the records from both tables, check the picture on the next page. If you run this query without the WHERE filter, you will see there are records that do not fit. Don’t worry the next example solves this problem. The point is that the FULL OUTER JOIN or FULL JOIN does not work in this case.

The solution is similar to module 5, start the query with the dimension field of interest from the dimension table, not from a fact table.

The rectangle boxes on the left are used to perform a count of the records. The DECLARE uses a sub-query to count the number of records in the array.

The PRINT statement displays the variable with the number of records, this required these two lines. IN the second rectangle on the left a concatenation of text and numbers is performed. You have to convert the number into a text value to display it.

17

Page 18: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

18

Page 19: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example #7 – here is a simple example to reinforce the pattern. First use an INSERT INTO SELECT FROM query, then use UPDATE SET queried to fill the rest of the columns with data. The next example solves the INNER JOIN problem allowing an OUTER JOIN (getting all the records from both fact tables

USE [AdventureWorksDW2012];-- Set up your arrayDECLARE @Sales TABLE ([Product] int, [Reseller Units] decimal, [Inet Units] decimal )

-- Declare which columns will be added to INSERT INTO @Sales ([Product], [Reseller Units] )

-- load the first two columns( SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] GROUP BY [ProductKey])

-- Bring in the values for the next column UPDATE @Sales SET [Inet Units] = (SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales]WHERE [ProductKey] = [Product] )

-- Notice the WHERE statement is a JOIN from the database table to the array

-- Now choose what columns to displaySELECT [Product], [Reseller Units], ISNULL([Inet Units], 0) as [INET Units]

FROM @SalesORDER BY [Product]

This is an innovative use of a sub-query that leverages an UPDATE TableName SET [Column name] = value command. We fill a column of data. While the values here go from NULL to a number, the this technique could also be used to update columns of values such as changing prices of products, or adjusting inventory recoreds.

So there are many ways to pull in or update data a) subquery, b) create values in the INSERT INTO statement, c) use an UPDATE SET command.

The amazing part is that the WHERE statement connects the FactInternetSales column to a column in the arraytable. This opens up a lot of possibilities.

We still need to SELECT the values from the arraytable to show them, which gives us a chance to use our ISNULL command (thanks again Andrew Pirie and Leo for the ISNULL code).

19

Page 20: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example 8: This solution demonstrates

USE [AdventureWorksDW2014]

DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] decimal) DECLARE @INETSales TABLE ([ProductKey] int, [Internet Units] decimal)DECLARE @Combined TABLE ([ProductKey] int, [Reseller Units] decimal, [Internet Units] decimal, [Total Units] decimal, [% Reseller Channel] nvarchar(8), [% Internet Channel] nvarchar(8))

INSERT INTO @RSSales ([ProductKey],[Reseller Units])( SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales]GROUP BY [ProductKey] )

INSERT INTO @INETSales ([ProductKey], [Internet Units])( SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactInternetSales]GROUP BY [ProductKey] )

INSERT INTO @Combined ([ProductKey], [Reseller Units], [Internet Units])( SELECT rs.[Product],rs.[Reseller Units], inet.[Internet Units]

FROM @RSSales as rs, @INETSales as inetWHERE rs.ProductKey = inet.ProductKeyORDER BY rs.ProductKey )

UPDATE @Combined SET [Total Units] = [Internet Units] + [Reseller Units]

UPDATE @Combined SET [% Reseller Channel] = FORMAT([Reseller Units]/[Total Units], 'p1')

UPDATE @Combined SET [% Internet Channel] = FORMAT([Internet Units]/[Total Units], 'p1')

SELECT * FROM @Combined

DECLARE @#ItemsRS decimal

This example uses three arrays, sort of a brute force methodology that you might need to use. First you declare the three arrays.

Next use the INSERT INTO commands to load two arrays.

Next use the third INSERT INTO command to merge the data from the first two arrays. The trick is that when you load the @Combined array you are using the following code:

FROM @RSSales as rs, @INETSales as inetWHERE rs.ProductKey = inet.ProductKey

This is a form of an inner join, which cuts down the number of rows that are retrieved. Here the Product key field has to be in both tables. You will not always want an inner join so the next example solves this problem.

One of the most satisfying aspects of working with arrays is the power and ease of use of UPDATE SQL statement.

The UPDATE statement is easy to type, and powerful in its ability to transform data. Use the UPDATE statement to load data into existing columns. Here we first total two columns, which seems simple, but what if the two columns came from different databases from different continents? It might be a great step in the direction of accurate, useful reporting. Did you know the origin of the IS profession is data management and report-making.

The last two UPDATE statements calculate a % by comparing two other columns.

Finally, after you create the compiled and calculated data, you use a SELECT command to produce and display the results.

20

While it is an interesting extension to use more arrays then combine them, this query is still a problem because it pulls the product key from reseller table.

Be careful! Count the records!!

Page 21: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

= (SELECT COUNT(*) FROM @Combined)PRINT @#ItemsRSExample 9: Here is the solution, load the product keys from the products table, The code is on the next page. This example uses one array and solves the problem of not

using an INNER JOIN. The result is that the query operates as a FULL JOIN, here pulling ALL the sales data for each productKey in either table (FactResellerSales, and FactInternetSales).

Lines 4 -7 load all the product keys and product names into the array, this ensures you get all the rows, because they are from the dimension table, NOT from one of the FACT transaction tables.

Lines 9-12 Load the [Reseller Units] column with values that are summarized from the FactResellerSales table. This is a form of GROUP BY query. Notice the nifty use of the ISNULL(, 0) syntax. This makes sure you do not have the pesky NULLS In the column. Ditto lines 14 – 17.

Line 19 totals two other columns. This works because the columns have numbers or a zero in them.

Lines 21 -25 (ditto 27-31) Calculates a % of total for every row in the array. It also removes the possibility of a divide by zero error, and only performs the calculation if the data is available. This is necessary because some of the columns have zeros in them and computers historically do not like divide by zero errors.

21

Page 22: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

USE [AdventureWorksDW2014]DECLARE @Combined TABLE ([Key] int, [Product Name] nvarchar(40), [Reseller Units] decimal, [Internet Units] decimal, [Total Units] decimal, [% Reseller Channel] nvarchar(8), [% Internet Channel] nvarchar(8))

INSERT INTO @Combined ([Key], [Product Name] )(SELECT p.[ProductKey], [EnglishProductName]FROM [dbo].[DimProduct] as pWHERE [FinishedGoodsFlag] =1)

UPDATE @Combined SET [Reseller Units]= (SELECT ISNULL(SUM([OrderQuantity]), 0) FROM [dbo].[FactResellerSales] as rWHERE r.[ProductKey] = [Key])

UPDATE @Combined SET [Internet Units]= (SELECT ISNULL(SUM([OrderQuantity]), 0) FROM [dbo].[FactInternetSales] as iWHERE i.[ProductKey] = [Key])

UPDATE @Combined SET [Total Units] = [Internet Units] + [Reseller Units]

UPDATE @Combined SET [% Reseller Channel] = (CASEWHEN [Reseller Units] = 0 AND [Internet Units] = 0 THEN 'n/a'WHEN [Reseller Units] >= 0 THEN FORMAT([Reseller Units]/[Total Units], 'p1')END)

UPDATE @Combined SET [% Internet Channel] = (CASE WHEN [Internet Units] = 0 AND [Internet Units] = 0 THEN 'n/a' WHEN [Internet Units] >= 0 THEN FORMAT([Internet Units]/[Total Units], 'p1')END)

SELECT * FROM @Combined

This query has all the products all 397 of them. Other queries cut out the products that are sold on the Internet channel, but not sold in the reseller channel. The UPDATE SET command fixed the problem

22

Page 23: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

USE [Featherman_Analytics];DECLARE @Sales TABLE ([Sub Category] nvarchar(50), [Model] nvarchar(50), [ProductID] int, [Product] nvarchar(50), [Reseller Profit] decimal (10,2), [Internet Profit] decimal (10,2), [Total Profit] decimal (10,2), [Reseller %] decimal (10,2), [Internet %] decimal (10,2), [Channel Preference] nvarchar(50))

INSERT INTO @Sales ([Sub Category], [Model], [ProductID], [Product], [Reseller Profit])SELECT [Sub Category], [Model], p.[ProductKey], [Product], SUM([SalesAmount] - [TotalProductCost]) FROM [dbo].[FactResellerSales] as frsINNER JOIN [Featherman_Analytics].[dbo].[AW_Products_Flattened] as p ON frs.ProductKey = p.ProductKeyGROUP BY [Sub Category], [Model], p.[ProductKey], [Product]

UPDATE @Sales SET [Internet Profit] = (SELECT SUM([SalesAmount] - [TotalProductCost])FROM [dbo].[FactInternetSales] as iINNER JOIN [Featherman_Analytics].[dbo].[AW_Products_Flattened] as p ON i.ProductKey = p.ProductKeyWHERE p.[ProductKey] = [ProductID] )

UPDATE @Sales SET [Total Profit] = ([Reseller Profit] + [Internet Profit])

UPDATE @Sales SET [Reseller %] = (CASEWHEN [Reseller Profit] > 0 THEN ([Reseller Profit]/ [Total Profit])WHEN [Reseller Profit] <=0 THEN 0END)

UPDATE @Sales SET [Internet %] = (CASEWHEN ([Internet Profit] < [Total Profit])

THEN ([Internet Profit] /[Total Profit])WHEN ([Internet Profit] >= [Total Profit]) THEN 1WHEN [Internet Profit] <= 0 THEN 0END) UPDATE @Sales SET [Channel Preference] =(CASE

Example #10: This example leverages our use of UPDATE SET commands inside arrays. As we shall see UPDATE SET commands are easy to create as the expressions can use calculated columns in the expressions

The INSERT INTO SELECT FROM statement compiles the data for the first five columns. Then a series of update set commands are used, the first fills a summary, calculated column using a sub-query. Next in green the [Total Profit] column is calculated by simply adding two other array columns.

Next the [Reseller %] and [Internet%] columns are calculated. This is an analysis of profit, therefore some column values can possibly be negative (look at the reseller profit column to see negative values – l;osing money. A CASE statement is used fix the

23

Page 24: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

WHEN [Reseller %] > .8 THEN 'Reseller dominant'WHEN [Reseller %] BETWEEN .55 AND .799 THEN 'Reseller favored'WHEN [Reseller %] Between .40 AND .50 THEN 'Internet favored'WHEN [Reseller %] < .4 THEN 'Internet dominant 'END)

SELECT * FROM @SalesWHERE [Total Profit] IS NOT NULLORDER BY [Total Profit] DESC

results so that it is not possible to see a value greater than 1. These are great usages of CASE statements.

The last CASE statement within an UPDATE SET fills the [Channel Preference] column. It provides a categorizing column that can be used as a slicer. The categories indicate whether the product generates more profit on the Internet channel, Reseller channel or mix.

Finally the data is displayed on-screen, filtering the list to only those catalog products that had sales. The following chart is possible in Excel.

USE [Featherman_Analytics];

DECLARE @TotalSales decimal(8,0) = (SELECT SUM([TotalSales]) FROM featherman.Customers2020)PRINT @TotalSales

UPDATE featherman.Customers2020 SET PercentTotal = (CASE WHEN TotalSales = 0 THEN 0 WHEN TotalSales > 0 THEN TotalSales/@TotalSalesEND)

What does this code to the left accomplish? It is clever

24

Page 25: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

Example #11 - the total sales booked for each sales rep is compared to the sales quota set for that sales rep. This query then is the basis for a dashboard that measures business performance to goal. Combining the data for employee over time is one way to look at the data, another is to look at each time period and rank/rate the sales rep performance within that time period.

USE [AdventureWorksDW2012];DECLARE @SalesPerformance TABLE ([EmployeeID] int, [Name] nvarchar(75), [CalendarYear] int, [CalendarQuarter] int, [SalesAmountQuota] decimal, [SalesAmountActual] decimal, [Performance to goal] decimal, [Sales KPI] nvarchar(10))

INSERT INTO @SalesPerformance ([EmployeeID], [Name], [CalendarYear], [CalendarQuarter], [SalesAmountQuota] )SELECT q.[EmployeeKey], CONCAT([FirstName], ' ', [LastName]) as Name, [CalendarYear], [CalendarQuarter] , [SalesAmountQuota]FROM [dbo].[FactSalesQuota] as qINNER JOIN [dbo].[DimEmployee] as e ON e.[EmployeeKey] = q.[EmployeeKey]

UPDATE @SalesPerformance SET [SalesAmountActual]= (SELECT SUM([SalesAmount]) FROM [dbo].[FactResellerSales]WHERE [EmployeeKey] = [EmployeeID] AND DATEPART(year, [OrderDate]) = [CalendarYear] AND DATEPART(quarter, [OrderDate]) = [CalendarQuarter] GROUP BY [EmployeeKey], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate]) )

UPDATE @SalesPerformance SET [Performance to Goal]= ([SalesAmountActual] - [SalesAmountQuota])

UPDATE @SalesPerformance SET [Sales KPI]= FORMAT([SalesAmountActual] / [SalesAmountQuota], 'p1')

SELECT * FROM @SalesPerformance

/*SELECT * INTO [Featherman_Analytics].[dbo].[SalesPerformance]FROM @SalesPerformance */

The sales quotas are in the [FactSalesQuota] table and do not have to be aggregated. Quotas are for each quarter for each employee.

The actual sales are in the [FactResellerSales] table and saved by invoice. The invoices have to be aggregated for each employee and totaled for each year and quarter. The GROUP BY statement fills the [SalesAmountActual] column.

Here the UPDATE SET command allows you to calculate two columns of metrics. Notice the [Sales KPI] field is a varchar(10) data type so the FORMAT([Foeld name]), ‘p1’ can be used to nicely format the number into a percent.

Results are below as are some idea for charts.

25

Page 26: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

26

Page 27: Washington State University  · Web view2021. 3. 26. · Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction

A lot of ground was covered in this .docx. Arrays seems to make a lot of data manipulation easier. Sub-queries work great to pull columns of values into the array, even changing the summary level of the data. UNION SELECT commands can be used to add rows of data, and UPDATE SET commands can be used to calculate values to fill columns of data with totals, % of total calculations and other new metrics.

HAPPY SQL Coding! Mauricio Featherman, Ph.D.3-26-21

27