Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017
-
Upload
alteryx -
Category
Data & Analytics
-
view
244 -
download
1
Transcript of Master the Essentials of Data Prep and Data Blending - Inspire Europe 2017
ESSENTIALS OF DATA PREP & DATA BLENDINGPresented by Ben Gomez
FORWARD-LOOKING STATEMENTS This presentation includes “forward-looking statements” within the meaning of the Private Securities Litigation Reform Act of 1995. These forward-looking statements may be identified by the use of terminology such as “believe,” “may,” “will,” “intend,” “expect,” “plan,” “anticipate,” “estimate,” “potential,” or “continue,” or other comparable terminology. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product availability, growth and financial metrics and any statements regarding product roadmaps, strategies, plans or use cases. Although Alteryx believes that the expectations reflected in any of these forward-looking statements are reasonable, these expectations or any of the forward-looking statements could prove to be incorrect, and actual results or outcomes could differ materially from those projected or assumed in the forward-looking statements. Alteryx’s future financial condition and results of operations, as well as any forward-looking statements, are subject to risks and uncertainties, including but not limited to the factors set forth in Alteryx’s press releases, public statements and/or filings with the Securities and Exchange Commission, especially the “Risk Factors” sections of Alteryx’s Quarterly Report on Form 10-Q. Thesedocuments and others containing important disclosures are available at www.sec.gov or in the “Investors” section of Alteryx’s website at www.alteryx.com. All forward-looking statements are made as of the date of this presentation and Alteryx assumes no obligation to update any such forward-looking statements.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are only intended to outline Alteryx’s general product direction. They are intended for information purposes only, and may not be incorporated into any contract. This is not a commitment to deliver any material, code, or functionality (which may not be released on time or at all) and customers should not rely upon this presentation or any such statements to make purchasing decisions. The development, release,and timing of any features or functionality described for Alteryx’s products remains at the sole discretion of Alteryx.
AGENDAHandling Input Data
• Caching
• Sampling
• Input Macro
Building Workflows Efficiently
• Evaluate your data
• Document for clarity
• Simplify the process
Important Details
• Testing your work
PRESENTER
To watch a recording of this session from Inspire Europe 2017, visit
alteryx.com/inspire-europe-2017-tracks
HANDLING INPUT DATA
Best Practice – Use caching when you don’t need live data
• Currently available with relational databases only
HANDLING INPUT DATA
Best Practice – Use caching when you don’t need live data
Caching – 53 seconds Caching – 1.9 seconds!
HANDLING INPUT DATA
Best Practice - Sample data to speed up processing during development
HANDLING INPUT DATA
Best Practice - Sample your data
• Use database sampling features
• PostgreSQL:
• SELECT * FROM table TABLESAMPLE SYSTEM (5)
• SQL Server:
• SELECT TOP 5000 * From table ORDER BY newid()
• Oracle:
• SELECT * FROM table SAMPLE(5)
DEMOS• Input Macros
• Data Profiling
• Document for clarity
• Simplify the Process
BUILDING WORKFLOWS EFFICIENTLY
Best Practice – Evaluate your data
• Browse Tool can help to identify hidden data problems which can produce invalid results and slow you down
• Duplicate records
• Missing values
• Unexpected characters
• Invalid values or ranges
HANDLING INPUT DATA
Best Practice – Utilize Input Macros for frequently-used sources
BUILDING WORKFLOWS EFFICIENTLY
Best Practice – Document for clarity
BUILDING WORKFLOWS EFFICIENTLY
Best Practice – Use sorts sparingly; sorting data is expensive.
When data is joined by fields a sort is done on the full data set, both sides, unless the data was previously sorted and no operations have been done that invalidate the sort
The Unique tool performs a sort: be aware of extra Unique tools
BUILDING WORKFLOWS EFFICIENTLY
Best Practice – Simplify the Process
Only keep the data you are using
• Don’t keep fields that you don’t need
• Don’t create spatial objects until you’re ready to use them; discard them once you are done
• Don’t keep duplicate fields
• Set data aside and rejoin it later
• Best Practice: Add a record id field early that can be used to rejoin records later
BUILDING WORKFLOWS EFFICIENTLY
Best Practice – Simplify the Process
BUILDING WORKFLOWS EFFICIENTLY
Best Practice – Simplify the Process
Separate formulas with distinctly different tasks
• One function per formula, unless they are very closely related or building on each other.
• Easier to understand the process
• Easier to debug
• Easier to split out parts of the data
• Easier to copy and paste specific functionality.
IMPORTANT DETAILS
Best Practice – Build in tests to make sure your work is correct
• Create a test for assumptions
• Number of records
• Results of calculations
• Duplicates or Not
• Eliminates the need for a visual verification
• Prevents unnoticed errors down the road
IMPORTANT DETAILS
Best Practice - Limit data movement
• Where is your data?
• Where is your processing?
IMPORTANT DETAILS
Best Practice - Limit data movement
#inspire16#
alteryx.com/trial
Ready to bring these incredible and tangible benefits to your organization?
Download a FREE Trial of Alteryx and start making your data work for you, instead of you working for your data