MODIFY your way of thinking when it comes to anomalous data formats Steve Simon State Street...
-
Upload
theodore-flowers -
Category
Documents
-
view
212 -
download
0
Transcript of MODIFY your way of thinking when it comes to anomalous data formats Steve Simon State Street...
MODIFY your way of thinking when it comes to anomalous
data formats
Steve Simon State Street Corporation
What we shall examine during this hour
•Data files of different formats.•Examine ways and means of massaging the different formats into one ‘usable’ format.•Examine ways of “manufacturing” records to facilitate generating end user reports.
A bit of history
• While working at a major airline a few years back, I encountered problem where one of our databases contained flight information with a start date of the service and a planned termination date for the service”.
Orig Dest Origin City DestCity
Start Date
End Date
ALB SDF ALBANYNY
LOUISVILLE KY
20070122
20080403
ABQ LBB LUBBOCKTX
ALBUQUERQU
ENM
20070101
20071031
• Our booking database on the other hand contained ‘daily records’ of the seating status of each class, for each flight segment (which could consist of one or more legs).
• This necessitated the break down of the data shown above into “a record per day” format.
Date Orig Dest The Key
20070101 ABQ LBB 20070101ABQLBB
20070102 ABQ LBB 20070102ABQLBB
20070103 ABQ LBB 20070103ABQLBB
20070104 ABQ LBB 20070104ABQLBB
So that we could effect a join to the “Available Seating” database.
KEY F Y M N Q S20070101ABQLBB 9 8 1 2 3 920070102ABQLBB 2 5 2 7 3 420070103ABQLBB 4 4 1 8 9 520070104ABQLBB 9 2 1 1 3 4
Available Seating
The raw data
FILEDEF RAW DISK C:/ibi/apps/steve/AirlineSchedule.txtFILEDEF AIRLINE DISK C:\ibi\apps\steve\AIRLINE.FOC-RUN CREATE FILE AIRLINE-RUNMODIFY FILE AIRLINE FIXFORM DEPARTURE/3 DEPARTURECITY/50 ARRIVAL/3 ARRIVALCITY/50 FIXFORM STARTDATE/A8 ENDDATE/A8 MATCH WITH-UNIQUES DEPARTURE ARRIVAL ON MATCH REJECT ON NOMATCH INCLUDE DATA ON RAWEND
Creating that “record per day”
FILEDEF AIRLINE1 DISK C:/ibi/apps/steve/AIRLINE.OUTTT -RUN MODIFY FILE AIRLINE COMPUTE STARTDATE1/YYMD = 0; COMPUTE ENDDATE1/YYMD = 0; COMPUTE STARTCITY/A50=; COMPUTE ENDCITY/A50=; COMPUTE STARTCODE/A3=; COMPUTE ENDCODE/A3=; COMPUTE TEMPDATE/YYMD=0; PERFORM EXTRACT1
Filedef’s and variable initialization
We shall utilize the Scratch Pad Area (SPA)
Get the data from the databaserecord by record
CASE EXTRACT1 NEXT WITH-UNIQUES DEPARTURE ARRIVAL ON NEXT ACTIVATE DEPARTURECITY ARRIVALCITY STARTDATE
ENDDATE ON NEXT COMPUTE STARTDATE1= D.STARTDATE; ON NEXT COMPUTE ENDDATE1 = D.ENDDATE; ON NEXT COMPUTE STARTCITY = D.DEPARTURECITY; ON NEXT COMPUTE ENDCITY = D.ARRIVALCITY; ON NEXT COMPUTE STARTCODE = D.DEPARTURE; ON NEXT COMPUTE ENDCODE = D.ARRIVAL; ON NEXT COMPUTE TEMPDATE = D.STARTDATE; ON NEXT PERFORM EXTRACT2 ON NONEXT GOTO EXIT ENDCASE
Start date greater than end date?Yes: quit case No: write the record to file
CASE EXTRACT2 IF TEMPDATE GT ENDDATE1 THEN PERFORM EXTRACT1;TYPE ON AIRLINE1 "<TEMPDATE><STARTCODE><ENDCODE><STARTCITY> <ENDCITY>" COMPUTE TEMPDATE = TEMPDATE + 1; GOTO EXTRACT2 ENDCASE DATA END -RUN
The output
Where do we go from here?The available seating table resides
in a SQL Server database
Load this data into our SQL Server data repository
Create INSERT statements
FILEDEF ROUTECOUNT DISK C:/ibi/apps/steve/AIRLINE.OUTTT-RUNAPP HOLD steveTABLE FILE ROUTECOUNTPRINT *ON TABLE HOLD AS RECCOUNTEND-SET &LLINES = &LINES;-START111-SET &FILENUM = 1;-SET &CURRENTCTR =0;-SET &FIRSTLINE = 'INSERT INTO DailyFlights(Date,Start,Destination,';-SET &FIRSTLINE1 = 'StartCity,DestinationCity)';-SET &SECONDLINE =;-SET &THIRDLINE = ;-SET &APOST = HEXBYT(39,'A1');-SET &DATEE=;-SET &STARTC=;-SET &DESTC=;-SET &SCDEST=;-SET &ECDEST=;
Write the SQL “Use”Statements
FILEDEF ROUTECOUNT1 DISK C:/ibi/apps/steve/AIRLINE.OUTTT FILEDEF SCHEDULE DISK C:/ibi/apps/steve/AIRLINE.SQL1-RUN-WRITE SCHEDULE USE FUSE2007-WRITE SCHEDULE GO-WRITE SCHEDULE BEGIN TRANSACTION
Read all records & write to file
-REPEAT LOOPER FOR &I FROM 1 TO &LLINES STEP 1-READ ROUTECOUNT1 &A.2 &DATEE.10 &C.1 &STARTC.3 &A.1 &DESTC.3 &B.1 &SCDEST.50,- &CA.1 &ECDEST.50-SET &SECONDLINE = ' VALUES (' || &APOST || &DATEE || &APOST;-SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &STARTC || &APOST;-SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &DESTC || &APOST;-SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &SCDEST || &APOST;-SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &ECDEST || &APOST;-SET &SECONDLINE = &SECONDLINE || ');';-WRITE SCHEDULE &FIRSTLINE-WRITE SCHEDULE &FIRSTLINE1-WRITE SCHEDULE &SECONDLINE-LOOPER-WRITE SCHEDULE COMMIT TRANSACTION
The Insert Statements
USE FUSE2007GOBEGIN TRANSACTIONINSERT INTO DailyFlights(Date,Start,Destination,StartCity,DestinationCity) VALUES ('2006/11/30','ABE','MHT','ALLENTOWN, PA','MANCHESTER, NH');INSERT INTO DailyFlights(Date,Start,Destination,StartCity,DestinationCity) VALUES ('2006/12/01','ABE','MHT','ALLENTOWN, PA','MANCHESTER, NH');
…..COMMIT TRANSACTION
The 50 million foot view
Raw Data
Sequential File SQL Statements
File SystemWatcher & SSIS Load Package
cd C:\Program Files\Microsoft SQL Server\90\DTS\Binn
DTExec /f "C:\AirlineScheduleLoad\AirlineSchedule\AirlineSchedule\bin\LoadSchedule.dtsx"
Query created by join
JOIN DATEM AND START AND DESTINATION IN DAILYFLIGHTS TO DATEE AND START AND DESTINATION IN BOOKINGS AS J1-RUNDEFINE FILE DAILYFLIGHTSCITYPAIR/A10 = START || '-'|| DESTINATION;ENDTABLE FILE DAILYFLIGHTSPRINT DATEM AS 'Date‘ CITYPAIR AS 'City Pair‘ STARTCITY AS 'Origin' DESTINATIONCITY AS 'Destination‘ FCLASS AS 'F’ YCLASS AS 'Y‘ MCLASS AS 'M' NCLASS AS 'N‘ QCLASS AS 'Q‘ SCLASS AS 'S'BY START NOPRINTBY DESTINATION NOPRINTBY DATEM NOPRINTON TABLE SUBHEAD"Orange Free State Airlines""Flight Schedule“
…..
During this hour we
•Examined data files of different formats.•Examined ways and means of massaging the different formats into one ‘usable’ format.•Examined ways of “manufacturing” records to facilitate generating end user reports.•Verified that the data was correct.
During this hour we
• Saw that there were many “different” ways to modify anomalous data into the format of your choice, to produce the reports that you require.Which really goes to show that you can..
MODIFY your way of thinking when it comes to anomalous
data formats
Steve Simon State Street Corporation
PowerPoint presentation & code samples may be found at:http://cid-4c765fc825912e4d.skydrive.live.com/browse.aspx/Public
or by email [email protected]