2010 Data Miner Survey

download 2010 Data Miner Survey

of 13

Transcript of 2010 Data Miner Survey

  • 8/7/2019 2010 Data Miner Survey

    1/13

    Karl Rexer, PhD

    PresidentRexer Analytics

    www.RexerAnalytics.com

    2010 Data Miner Survey Highlights The Views of 735 Data Miners

    Predictive Analytics World

    Washington, DCOctober 2010

  • 8/7/2019 2010 Data Miner Survey

    2/13

    2010 Rexer Analytics 2

    2010 Data Miner Survey: Overview

    Fourth annual survey

    47 questions

    10,000+ invitations emailed

    plus newsgroups, vendors,and snowball referrals

    Respondents:735 data miners

    from 60 countries

    33%

    31%

    12%

    5%

    19%

    Corporate

    Consultants

    Note: Data from tool vendors was

    excluded from many analyses

    Academics

    NGO / Govt

    Vendors

    45%

    36%

    12%

    North America

    USA 40% Canada 4%

    Europe

    Germany 7%

    UK 5% France 4%

    Poland 4%

    Asia Pacific

    India 4% Australia 3% China 2%

    Central & South

    America (4%)

    Columbia 2% Brazil 1%

    Middle East & Africa (3%)

    Israel 1%

    Turkey 1%

  • 8/7/2019 2010 Data Miner Survey

    3/13

    2010 Rexer Analytics 3

    10%

    10%

    10%

    11%

    13%

    13%

    14%

    15%

    15%

    25%

    29%

    41%

    0% 10% 20% 30% 40% 50%

    Government

    Internet-based

    Manufacturing

    Medical

    Technology

    Pharmaceutical

    Retail

    Telecommunications

    Insurance

    Academic

    Financial

    CRM / Marketing

    Fields Applying Data Mining

    Question: In what fields do youTYPICALLY apply data mining?

    (Select all that apply)

    CRM / Marketing, Financial and Academic are the most commonlyreported fields. This has been consistent since the 2007 survey.

    Many data miners work in several fields.

  • 8/7/2019 2010 Data Miner Survey

    4/13

    2010 Rexer Analytics 4

    8%

    9%

    9%

    11%

    12%

    13%

    14%

    16%

    21%

    21%

    22%

    25%

    26%

    27%

    31%

    32%

    60%

    68%

    69%

    0% 10% 20% 30% 40% 50% 60% 70%

    MARS

    Uplift Modeling

    Link Analysis

    Genetic Algorithms

    Social Network Analysis

    Rule Induction

    Survival Analysis

    Anomoly Detection

    Bayesian

    Support Vector

    Ensemble Models

    Association Rules

    Text Mining

    Factor AnalysisNeural Nets

    Time Series

    Cluster Analysis

    Regression

    Decision Trees

    Data Mining Algorithms

    Decision trees, regression, and cluster analysis continue to form a triad of corealgorithms for most data miners. This is very consistent, year to year.

    However, a wide variety of algorithms are being used.

    Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply)

    Corporate Consultants Academic NGO / Govt

    10% 12% 4% 5%

    Ensemble Models

    Uplift Modeling

    Corporate Consultants Academic NGO / Govt

    21% 27% 20% 18%

  • 8/7/2019 2010 Data Miner Survey

    5/13

    2010 Rexer Analytics 5

    Text Mining

    STATISTICA Text Miner 19%

    IBM SPSS Modeler 17%

    SAS Text Miner 9%

    IBM SPSS Text Analytics 7%

    Rapid Miner 6%

    Provalis Wordstat 2%

    GATE 2%

    KXEN 2%

    Oracle Text or ODM 1%Megaputer Text Analyst 1%

    Autonomy 1%

    Other 35%Text Miners

    About a third of data minerscurrently incorporate text

    mining into their analyses,

    and another third plan to.

    Software Used

    Plan to Start

    Text Mining

    No Plans to

    Conduct TextMining

    0% 20% 40% 60%

    The focus of our text miningis to extract key themes

    (sentiment analysis)

    We use text fields as inputs /predictors in a larger model

    We use text mining as part ofsocial network analyses

    30%

    34%

    36%

    55%

    59%

    21%

  • 8/7/2019 2010 Data Miner Survey

    6/13

    2010 Rexer Analytics 6

    35%

    24%

    49%

    39%

    26%

    18%

    7%

    0%60%

    Computing Environments

    A lot of data mining happens on desktop and laptop computers. Frequently the data and processing is local(not on servers, mainframe or cloud).

    Only a small minority of data mining is on the cloud.

    Question: What are the computing environments/platforms on which datamining/analytics occurs at your company/organization? (Check all that apply)

    Corporate

    Consultant

    Academic

    NGO/Govt

    Vendor

    5% 10% 7% 3% 14%

    20% 16% 14% 32% 26%

    28% 30% 19% 29% 45%

    48% 36% 25% 47% 39%

    43% 49% 58% 58% 35%

    29% 24% 15% 32% 37%

    28% 36% 46% 42% 44%

    Cloud Computing

    Centralized Mainframe/Server

    Local Server

    Desktop PC/Workstation (with data &processing on server, mainframe or cloud)

    Desktop PC/Workstation (withdata & processing locally)

    Laptop PC (with data & processingon server, mainframe or cloud)

    Laptop PC (with data &processing locally)

    Overall

  • 8/7/2019 2010 Data Miner Survey

    7/13

    2010 Rexer Analytics 7

    Analytic Capability & Data Quality

    Analytic capability: Theres room to improve if were going to Compete on Analytics.

    Data Quality Question: How do you rate the quality of dataavailable for analysis at your company/organization?

    Data quality: 48% rate it strong or very strong (same as last year)

    16% rate it poor or very poor (13% last year)

    Analytic Capability Question: How do you rate theanalytic capabilities of your company/organization?

    13%35%30%20%

    8%40%35%13%

  • 8/7/2019 2010 Data Miner Survey

    8/13

    2010 Rexer Analytics 8

    Overcoming Challenges: Best Practices

    Top challenges facing data miners: Dirty data: #1 challenge every year, 2007-2010

    Explaining data mining to others: always in the top 4 challenges,2007-2010

    Difficult access to data: always in the top 3 challenges, 2007-2010

    This year survey respondents provided BestPractices for overcoming these challenges. E.g., Dirty Data: Use anomaly detection to flag records to put before

    subject matter experts.

    E.g., Dirty Data: All projects begin with low-level data reports showingcounts of records, verification of keys (uniqueness, widows/orphans), and

    distributions of field contents. These reports are echoed back to the datacontent experts.

    See the list of Best Practices at www.RexerAnalytics.com in early

    November.

  • 8/7/2019 2010 Data Miner Survey

    9/13

    2010 Rexer Analytics 9

    Data Mining SoftwareSurvey Questions:

    What Data mining/analytic tools did you use in2009? (rate each as never, occasionally, orfrequently)

    What one Data Mining software package do youuse most frequently?

    Overall Corporate Consultants Academics NGO / Govt

    The average data miner reports using 4.6 software tools. R is used by the most data miners (43%).

    STATISTICA is the primary data mining tool chosen most often (18%).

  • 8/7/2019 2010 Data Miner Survey

    10/13

    2010 Rexer Analytics 10

    Satisfaction with Data Mining Tools

    Question: Please rate your overall satisfaction

    with your primary Data Mining software package.

    2010 2009

    Sample size < 20

    STATISTICA received the highest satisfaction ratings. Consistent with

    the 2009 findings, R and SPSS Modeler users are also quite satisfied.

    About 80% of STATISTICA and R users also report that they are extremely likely to

    stay with these primary tools over the next 3 years. This is reported by only 42-45%

    of SAS, SPSS Statistics, and SAS-EM users; and only 18% of Weka users.

    Continued Use question (not graphed): What is the likelihood that you will continue

    to use this tool as your primary Data Mining software package over the next 3 years?

  • 8/7/2019 2010 Data Miner Survey

    11/13

    2010 Rexer Analytics 11

    Data Mining and the Economy

    Question: How will the number of data mining projects your

    organization conducts in 2010 compare to what has beentypical in the past few years?

    There is a strong market for data mining: 73% of data miners foresee increases in the number of data mining projects. Offshoring of data mining is also increasing: It is reported by 14% of data

    miners this year (8% last year).

    Offshoring Question (not graphed): Has your company moved

    any data mining or other analytics to another country to takeadvantage of lower wages in the destination country?

    Number of Data Mining Projects in 2010

  • 8/7/2019 2010 Data Miner Survey

    12/13

    2010 Rexer Analytics 12

    Number of respondents

    What do you envision as the primary future trends in data

    mining? (open-ended survey question)

    Future Trends in Data Mining

    50

    32

    32

    26

    15

    15

    12

    11

    0 10 20 30 40 50 60

    Growth in Data Mining Adoption

    Text Mining

    Social Network Analysis

    Automation

    Cloud Computing

    Data Visualization

    Tools Get Easier to Use

    Scaling to Bigger Data

  • 8/7/2019 2010 Data Miner Survey

    13/13

    2010 Rexer Analytics 13

    How to Get More Information

    Questions? Talk with me at PAW Call or email me if you dont see me in the hallways

    Copy of these slides Available now

    2010 Data Miner Survey Summary Report (Free) Available in early November

    Available at PAW website or email me

    Best Practices for overcoming data miningchallenges Available in early November at

    www.RexerAnalytics.com

    Karl Rexer, PhD

    [email protected]

    617-233-8185