Retail Sales

27
RETAIL-SALES DWBI IN-STAGE ETL SPECIFICATIONS SOURCE TABLES 1. STG_CUST_ORD_ITM_DTLS 2. STG_CUST_ORD_DTLS 3. STG_PROD_DTLS Overview: First source table contains very detailed transactional data, second table contains lookup data of orders related to customers, third table contains products selected by the customers during transactions in each order. First source table contains primary key ITEM_DETAIL_CODE defined with sequence of values for each transaction generated by the individual customers with CUSTOMER_ID in different orders. For individual customers, transactions are identified with ITEM_ID with sequence of values uniquely and repeated for multiple orders. ORDERNO is an invoice number generated for each visit though online or offline. For repeated customers ORDERNO can not be

Transcript of Retail Sales

Page 1: Retail Sales

RETAIL-SALES DWBI IN-STAGE ETL SPECIFICATIONS

SOURCE TABLES

1.STG_CUST_ORD_ITM_DTLS

2.STG_CUST_ORD_DTLS

3.STG_PROD_DTLS

Overview:

First source table contains very detailed transactional data, second

table contains lookup data of orders related to customers, third table contains

products selected by the customers during transactions in each order.

First source table contains primary key ITEM_DETAIL_CODE

defined with sequence of values for each transaction generated by the

individual customers with CUSTOMER_ID in different orders. For

individual customers, transactions are identified with ITEM_ID with

sequence of values uniquely and repeated for multiple orders. ORDERNO is

an invoice number generated for each visit though online or offline. For

repeated customers ORDERNO can not be repeated. Each item selected by

the customers defined with PRODUCTNO.

Second source table contains orders information for reference in order

to lookup the data coming from source based on lookup condition on

ORDERNO.

Third source table contains products information that are selected by

the customer for each transaction based on reference key PRODUCTNO.

Page 2: Retail Sales

Data flow diagram for RET-SALES-ETL-IN-STAGE

Construct staging layer for target table loading with following steps in stage

working database.

Step 1: Define filter condition on ORDER_STATUS=’Y’ at source side and

select required columns from the source, ITEM_ID, ORDERNO,

CUSTOMER_ID, PRODUCTNO with key performance indicators like QTY,

ITEM_PRICE, SALES_PRICE, ITEM_COST and VAT (if applicable).

Step2: Design business logic for each transaction in the following way,

ITEM_AMOUNT= QTY*ITEM_PRICE;

ITEM_DISC= QTY*(ITEM_PRICE-SALE_PRICE);

ITEM_FINAL_AMOUNT=QTY*SALES_PRICE*VAT;

Step3: Select reference from lookup tables STG_CUST_ORD_DTLS,

STG_PROD_DTLS on ORDERNO and PRODUCTNO to find updated

information from the source.

Step4: Select surrogate keys from all possible dimensions with reference of

source table and connect to the target table, to load detailed data with

arithmetic operations based on parameters and variables to select required

data from the source, in daily basis.

Below fact table is a transactional grain fact table to maintain data

daily basis in incremental manner for ad-hoc and standard requirements.

Page 3: Retail Sales

TARGET TABLE 1

1. WH_ITEM_FACT

WH_CUSTOMER_ID

WH_RETAILER_ID

WH_TIME_ID

WH_REGION_ID

WH_MARKETPLC_ID

WH_AGENT_ID

WH_CURRENCY_ID

WH_SALES_PERSON_ID

WH_PRODUCT_ID

WH_LEGAL_ENTITY_ID

WH_SESSION_ID

WH_ORDER_METHOD_ID

WH_SESSION_TYPE_ID

WH_STORE_ID

ORDER_NO

ORDER_DATE

ITEM_ID

ITEM_QTY

ITEM_PRICE

ITEM_COST

ITEM_SALES_PRICE

ITEM_VAT

ITEM_AMOUNT

ITEM_DISC

ITEM_FINAL_AMOUNT

SHIPPED_DATE

DELIVARY_DATE

DELIVARY_STATUS

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

DATE

NUMBER (20)

NUMBER (20)

NUMBER (20,2)

NUMBER (20,2)

NUMBER (20,2)

NUMBER (20.2)

NUMBER (20,2)

NUMBER (20,2)

NUMBER (20,2)

DATE

DATE

VARCHAR2 (1)

Page 4: Retail Sales

Data flow diagram for standard database (Working Database):

1. Business Logic Implementation at Item Level

The below data flow diagram shows to design ETL logic for RETAIL-

SALES Data Mart loaded in daily basis. The grain of the fact table is

ITEM_ID with ORDERNO and ORDER_DATE. Use filter to eliminate

unwanted data at source itself; filter condition is ORDER_STATUS=’Y’, this

condition eliminate orders cancelled at OLTP systems by end-users.

After cleanse and scrub source data arrange in required format using

available lookup tables based on lookup condition with common columns.

Finally control the transactional data based on required condition. Define

filter condition on delivery status is equal to ‘Y’ coming from STG_CUST_ORD_DTLS.

Figure 1: Dataflow Diagram for Working Database at Item Level.

Page 5: Retail Sales

Dimension Modeling:

Design dimension model with available dimension tables and fact table as Star Schema. There are fourteen dimension tables and one fact table available in the target system listed below,

Dimension tables:

1. WH_CUSTOMER_DIM

2. WH_PRODUCT_DIM

3. WH_TIME_DIM

4. WH_ORDER_METHOD_DIM

5. WH_RETAILER_DIM

6. WH_AGENT_DIM

7. WH_MARKET_PLACE_DIM

8. WH_STORE_DIM

9. WH_SALES_PERSON_DIM

10. WH_SESSION_ID

11. WH_SESSION_TYPE_DIM

12. WH_LEGAL_ENTITY_DIM

13. WH_SUPPLIER_DIM

14. WH_ORDER_DIM

Figure 2: Dimension modeling diagram for Sales Data Mart for Level 1 requirements.

Page 6: Retail Sales

Explanation:

For level1 requirements target table loaded with very detailed data

with necessary conditions, data manipulation with necessary formulas based

on reference from dimension tables. It is semi-non-volatile in nature; means

allows modifications coming from source systems and loaded in daily basis

incremental manner. It is mainly constructed for standard requirements for

customer order management for daily requirements.

Find data for each ORDERNO for all transactions (ITEM_ID)

generated by the CUSTOMER_ID based on various dimensions like

PRODUCT_ID, ORDER_DATE, etc.

Before load data to the target table control the data based on location,

then records direction will be changed and take reference from various

dimension tables connect them to target definition.

Business Logic Implementation for Order Level:

This business logic maintains overall information for each order generated by the customer; aggregate transactional data at order level irrespective of products selected by the customer in daily basis and loaded in incremental manner.

The main metrics used for order level requirements shown below:

1. ORDER_AMOUNT = SUM(ITEM_AMOUNT)

2. ORDER_DISC = SUM(ITEM_DISC)

3. ORDER_FINAL_AMOUNT = SUM(ITEM_SALES_AMOUNT)

4. ORDER_QTY = SUM (ITEM_QTY)

How to add VAT to ORDER_AMOUNT?

Then select lookup flat file given by the business people contains data in following way:

Page 7: Retail Sales

MIN_ORDER_AMOUNT (in DOLLARS)

MAX_ORDER_AMOUNT (in DOLLARS)

VAT(in percentage)

100

250

500

1000

250

500

1000

2000

0.9%

1.49%

2.49%

4.49%

Define lookup condition on input order amount with multiple lookup ports in the following way:

If ORDER_AMOUNT >= MIN_ORDER_AMOUNT AND

ORDER_AMOUNT <= MAX_ORDER_AMOUNT

Then add associate VAT to ORDER_FINAL_AMOUNT using following formula:

ORDER_VAT_AMOUNT = ORDER_FINAL_AMOUNT*VAT/100

Select output of expression stage after cleansing, aggregate the data

with necessary aggregate functions based on ORDERNO with the reference

of associate dimension keys. Design the logic to calculate VAT for

ORDER_AMOUNT; if ORDER_AMOUNT is less than specified lookup

values, and then select null value from the lookup stage.

Working with null values: if lookup condition is not satisfied then that stage

pass null values to next level; if any calculation is define on null value we

can get null from that calculation. Then uses decode function to overcome

null values in the following way:

DECODE (VAT, NULL, 0, VAT);

Page 8: Retail Sales

Design output column

FINAL_ORDER_AMOUNT=ORDER_FINAL_AMOUNT+

ORDER_FINAL_AMOUNT*DECODE (VAT, NULL, 0, VAT)/100;

TARGET TABLE 2

WH_ORDER_FCT

WH_TIME_ID

WH_CUSTOMER_ID

WH_AGENT_ID

WH_RETAILER_ID

WH_STORE_ID

WH_ORDER_METHOD_ID

WH_REGION_ID

WH_MARKET_PLACE_ID

WH_CURRENCY_ID

ORDER_DATE

ORDER_QTY

ORDER_AMOUNT

ORDER_DISC

ORDER_VAT

ORDER_FINAL_AMOUNT

FINAL_ORDER_AMOUNT

SHIPPED_DATE

DELIVERY_DATE

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

DATE

NUMBER(20)

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20,2)DATE

DATE

Page 9: Retail Sales

2. Business Logic Implementation at Product Sales Level 2:

Product sales fact table used for analyze business process in daily

basis in incremental manner. The below diagram shows the RETAIL-

PRODUCTS-SALES Data Mart loaded in daily basis. It allows the fully

summarized data based on following requirements in daily basis with

reference of possible dimensions.

a. Sales Revenue

b. Daily Net Profit

c. Daily Gross Profit

d. Total Quantity Sold

e. Total Orders, etc.

These metrics calculated and loaded in incremental manner for daily

basis requirements. Arrange the data Retailer wise based on location and

market place. At the end of the year target should contain 365 days

information. It is very feasibility to use by the business people for best

decision making in point of time. The grain of the fact table is

PRODUCT_ID with all possible dimensions.

Define filter condition on DELIVERY_STATUS=’Y’ and use

necessary aggregate functions (here three level of aggregations are needed)

and load data to the target table with surrogate reference of required

dimensions.

Page 10: Retail Sales

TARGET TABLE 3

1. 3. WH_PRODUCT_SALES_FACT

WH_RETAILER_ID

WH_TIME_ID

WH_REGION_ID

WH_MARKETPLC_ID

WH_AGENT_ID

WH_CURRENCY_ID

WH_SALES_PERSON_ID

WH_PRODUCT_ID

WH_LEGAL_ENTITY_ID

WH_ORDER_METHOD_ID

WH_SESSION_TYPE_ID

WH_STORE_ID

ORDER_DATE

TOTAL_REVENUE

TOTAL_QTY

TOTAL_ORDERS

TOTAL_NET_PROFIT

TOTAL_GROSS_PROFIT

TRANS_DATE

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

NUMBER (20)

DATE

NUMBER (20)

NUMBER (20)

NUMBER (20,2)

NUMBER (20,2)

NUMBER (20,2)

DATE

Dimension modeling for RETAIL-PRODUCT-SALES:

Design dimension model for retail product sales star schema with

confirmed dimensions used in retail item sales star schema visible like an

integrated schema also called as galaxy schema shown below.

Page 11: Retail Sales

Figure 3: Dimension modeling diagram for Sales Data Mart for Level 2 requirements.

Explanation:

For level2, use three aggregations with to find product level sales

information like daily SALES_REVENUE, NET_PROFIT,

GROSS_PROFIT, etc; based on dimension keys reference from the various

dimension tables with the filter condition delivery status= ‘not null’ or ‘Y’.

From this data business people can understand what kind of products can buy

rapidly by the customers.

Summary:

From above explanation Level 1 used for customer order management and

leve2 used for products sales analysis.

Page 12: Retail Sales

Data flow diagram for RET-SALES-ETL-POST-STAGE

In this stage load non-volatile data from working data base to DWH in

periodic manner. Use periodic snap-shot fact tables to load aggregated

information for analysis based on different dimensions. This is useful for

overall Sales Analysis, Traffic Analysis, and Market-Basket Analysis.

Market Basket Analysis:

Market-Basket analysis is useful to analyze what combination of

products more frequently purchased by the customers. That means same

combination of the products that are selected by different customers on

particular period of time. For this requirement design ETL logic to load

aggregated data to target system with combination of CUSTOMER_ID and

PRODUCT_ID based on following requirements,

1. TOTAL_AMOUNT_SOLD

2. TOTAL_QTY_SOLD

Based on time period. This fact table maintains Semi-summarized data

with last transaction date by the customers. In this requirement we can

include CUSTOMER_TYPE based on following condition,

If TOTAL_AMOUNT_SOLD>=5000 Dollars

OR TOTAL_QTY_SOLD >= 150 Units

Then CUSTOMER_TYPE = ‘prime’ or ‘non-prime’.

Source Table:

1. WH_ITEM_FACT

Step1: select required from the source, define filter condition on time id like

WH_TIME_ID between 1 AND 31 if January month data needed from

source.

Page 13: Retail Sales

Step2: define necessary aggregated function to calculate at CUSTOMER_ID

and PRODUCT_ID in the following way with necessary group by columns,

based on last order date. For this requirement don’t enable group by port for

ORDER_DATE.

1. TOTAL_AMOUNT_SOLD=SUM (ITEM_FINAL_AMOUNT).

2. TOTAL_QTY_SOLD= (ITEM_QTY).

Step3: define filter condition to find customer type with following condition,

If TOTAL_AMOUNT_SOLD>=5000 Dollars

OR TOTAL_QTY_SOLD >= 150 Units

Then CUSTOMER_TYPE = ‘prime’ or ‘non-prime’.

Step4: provide rank for each customer based on market place. And finally load required data to the target table.

Page 14: Retail Sales

Target table 4

4 AGGR_CUST_SALES_FCT

WH_CUSTOMER_ID

WH_PRODUCT_ID

WH_AGENT_ID

WH_RETAILER_ID

WH_CURRENCY_ID

WH_MARKETPLACE_ID

WH_ORDER_METHOD_ID

WH_REGION_ID

LAST_ORDER_DATE

TOTAL_AMOUNT_SOLD

TOTAL_QTY_SOLD

CUSTOMER_TYPE

TRANSDATE

RANK

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

DATE

NUMBER(20,2)

NUMBER(20,2)

VARCHAR2(30)

DATE

NUMBER(20)

Note: arrange the records based rank and define rank equal to 1000 at rank transformation stage.

Dataflow diagram for Product Sales Analysis:

In this dataflow diagram design ETL logic to analyze sales can be

done on different category of products in different market places based on

retailer and agent combined. For this requirement choose

WH_PRODUCT_SALES_FCT fact table and load bulk amount of data to

target table periodic manner with necessary aggregate functions. Select

required periodic fact table WH_AGGR_PRODUCT_SALES_FCT from

Page 15: Retail Sales

target system. Take reference from different dimension tables and aggregate

incoming data shown in below,

1. TOTAL_SALES_REVENUE = SUM (SALES_REVENUE)

2. TOTAL_NET_PROFIT =SUM (NET_PROFIT)

3. TOTAL_GROSS_PROFIT = SUM (GROSS_PROFIT)

4. TOTAL_QTY_SOLD = SUM ( TOTAL_QTY)

Target Table 5

WH_AGGR_PRODUCT_SALES_FCT

WH_PRODUCT_ID

WH_SALES_PERSON_ID

WH_RETAILER_ID

WH_AGENT_ID

WH_MARKETPLACE_ID

WH_CURRENCY_ID

WH_REGION_ID

WH_ORDER_METHOD_ID

LAST_ORDER_DATE

TOTAL_SALES_REVENUE

TOTAL_NET_PROFIT

TOTAL_GROSS_PROFIT

TOTAL_QTY

RANK

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

DATE

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20)

NUMBER(20)

Page 16: Retail Sales

Note: Don’t use dimension tables for reference to load data into aggregate

fact tables, because reference is already taken when load data into working

database.

Summary: target table 3 used for Market-Basket Analysis where as target

table 4 used for Product Sales Analysis.

Divergence table for Traffic Analysis

Traffic Analysis used to find number of customers available in

different locations and what they can interested to select, here we can find

increase or decrease number of customers, revenue, net profit and gross

profit, etc., based on granularity on time period. Here no need to take

reference from dimension tables and calculate measures based on week,

month, quarter, year; that means grain of this fact table is week. It is an

independent table and also called as snap shot table or materialized view.

Source tables:

1. WH_ITEM_FACT

2. WH_PRODUCT_SALES_FCT

Step1: Define mapping variable as $$WEEK and define initial value equal to

1

Step2: Select required columns from the source table WH_ITEM_FACT;

define filter condition to select required data show below,

TO_NUMBER (

TO_CHAR (

TO_DATE (WH_ITEM_FCT.ORDER_DATE, ‘MM/DD/YYYY’), ‘WW’))

= $$WEEK

Page 17: Retail Sales

Same as like second source table, select required columns from the source

table WH_ITEM_FACT; define filter condition to select required data show

below,

TO_NUMBER (

TO_CHAR (

TO_DATE (WH_PRODUCT_SALES_FCT.ORDER_DATE,

‘MM/DD/YYYY’), ‘WW’)) = $$WEEK

Step 3: Define variable expression as $$WEEK = $$WEEK +1.

Step4: Design logic to find WEEK, MONTH, QUARTER, and YEAR from

last ORDER_DATE.

Step5: Aggregate incoming data with necessary aggregate function to find

required results based on LOCATION and time period shown below with

three aggregation stages.

1. TOTAL_CUSTOMERS = DISTINCT COUNT (WH_ITEM_FCT.

WH_CUSTOMER_ID) based on time period and location;

2. TOTAL_ORDERS= SUM

( WH_PRODUCT_SALES_FCT.TOTAL_ORDERS) based on

time period and location;

3. TOTAL_REVENUE=SUM

( WH_PRODUCT_SALES_FCT.SALES_REVENUE) based on time

period and location;

4. TOTAL_NET_PROFIT= SUM

( WH_PRODUCT_SALES_FCT.NET_PROFIT) based on time

period and location;

Page 18: Retail Sales

5. TOTAL_GROSS_PROFIT= SUM

( WH_PRODUCT_SALES_FCT.GROSS_PROFIT) based on time

period and location;

Target table 6

WH_AGGR_SALES_FCT

YEAR

QTR

MONTH

WEEK

COUNTRY

STATE

CITY

TOTAL_REVENUE

TOTAL_NET_PROFIT

TOTAL_GROSS_RPFOT

TOTAL_ORDERS

TOTAL_CUSTOMERS

TRANSDATE

NUMBER(20)

NUMBER(20)

NUMBER(20)

NUMBER(20)

VARCHAR2(30)

VARCHAR2(30)

VARCHAR2(30)

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20,2)

NUMBER(20)

NUMBER(20)

DATE

Summary: Transaction grain fact tables loaded in daily basis to maintain

very detailed data in specified order, where as periodic fact tables loaded in

periodic basis and allows aggregate functions. Here four star schemas

constructed and modeled like galaxy schema, that means all integrated

together. This is called as Sales Data Mart used for customer order

management, Sales Analysis, Market-Basket Analysis.

Page 19: Retail Sales

Summary of the project:

1. Number of dimension tables 14

2. Number of fact tables 5

3. Number of divergence tables 1

4. Number of lookup tables 4

Use SCD Type-1 for following dimension tables,

a. WH_ORDER_DIM

b. WH_STORE_DIM

c. WH_REGION_DIM

d. WH_MARKET_PLACE_DIM

e. WH_ORDER_METHOD_DIM

f. WH_SUPPLIER_DIM

Use SCD Type-2 for following dimension tables,

a. WH_CUSTOMER_DIM

b. WH_RETAILER_DIM

c. WH_AGENT_DIM

d. WH_SALES_PERSON_DIM

e. WH_SESSION_TYPE_DIM

f. WH_LEGAL_ENTITY_DIM

Page 20: Retail Sales

Currency dimension WH_CURRENCY_DIM is monster dimension, no need

to construct ETL logic for this dimension. It is directly used in the OLAP

systems for reporting. For time dimension WH_TIME_DIM don’t construct

ETL logic, just generate sequence for this dimension.

Use incremental loading for transaction grain fact table and daily basis

loading fact tables shown below,

a. WH_ITEM_FCT

b. WH_ORDER_FCT

c. WH_PRODUCTS_SALES_FCT

Use periodic loading for periodic snap shot fact tables and divergence tables

shown below,

a. WH_AGGR_CUSTOMER_SALES_FCT

b. WH_AGGR_PRODUCT_SALES_FCT

c. WH_AGGR_SALES_FCT (divergence table).

Choose parallel processing mechanism for staging Database

construction. Always load fresh data to staging for every execution in

order to replace existing data from the staging.