Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad...

23
Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad...

Page 1: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

Fast, Accurate Creation of Data Validation Formats by

End-User Developers

Christopher ScaffidiBrad Myers, Mary Shaw

Carnegie Mellon University

Page 2: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

22

Contextual inquiry:Contextual inquiry:What challenges do end users face?What challenges do end users face?

Observed 3 administrative assistants, 4 managers, and 3 webmasters/graphic designers (1-3 hrs, each)

Background Toped Evaluation New Opportunities

Page 3: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

33

One person’s task: validate web forms--One person’s task: validate web forms--but he didn’t know JavaScript / regexpsbut he didn’t know JavaScript / regexps

Is the input valid?“EDSH 225”

Is the input questionable?“GATE 225”

Or is it obviously invalid?“412-555-5444”

Background Toped Evaluation New Opportunities33

Page 4: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

44

Hurricane Katrina “Person Locator” site:Hurricane Katrina “Person Locator” site:Many inputs unvalidatedMany inputs unvalidated

Background Toped Evaluation New Opportunities44

Page 5: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

55

Spreadsheets contain lots of typos:Spreadsheets contain lots of typos:inconsistent formatting & invalid stringsinconsistent formatting & invalid strings

• Above: part of an actual spreadsheet on our university’s web site• Plenty of invalid strings in users’ spreadsheets during contextual inquiry• For thousands of other examples: EUSES Spreadsheet Corpus

Background Toped Evaluation New Opportunities

Page 6: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

66

Needed: a usable mechanism for Needed: a usable mechanism for implementing validationimplementing validation

66 Background Toped Evaluation New Opportunities

Page 7: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

77

Coming Up…Coming Up…

• Background– Formative pilot study– Related work

• Toped

• Evaluations– Usability– Expressiveness

• New opportunities

Page 8: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

88

Formative pilot studyFormative pilot study

• Motivation: Exploring the “gulf of execution” for data– User has to figure out how to map intentions to the

features provided by a computer system– Poor “closeness of mapping” impedes system use Before designing system, probe the concepts and

terminology familiar to users

• Asked 4 administrative assistants to verbally describe two kinds of data– American mailing addresses– University project numbers

Background Toped Evaluation New Opportunities

Page 9: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

99

Formative pilot studyFormative pilot study

• Participants identified and named the parts of data• Eg: Street address, city, state, zip code

– They hierarchically refined parts until sub-parts became small enough that they lacked names

• At that point, they described parts with constraints– Constraints were sometimes “soft”: not always true– They used adverbs of frequency to indicate softness

• Eg: “usually” or “sometimes”

• Implications– Users describe data in terms of constrained parts– Valid data sometimes violate certain constraints

Background Toped Evaluation New Opportunities

Page 10: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1010

Alternate approaches: limited support for Alternate approaches: limited support for expressing constraints on structured stringsexpressing constraints on structured strings

• Grammars based on sequences of characters– Context-free grammars (CFGs)

• Grammex• Apple data detectors (CFGs + regexps)

– Regular expressions (regexps)• SWYN regexp editor

• Lapis patterns: constrained structured strings– Intentionally designed to support outlier finding

@PhoneNumber is Number equal to /\d\d\d/ then "-" then Number equal to /\d\d\d\d/ ignoring nothing

Background Toped Evaluation New Opportunities

Page 11: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1111

1. Name

2. Describe

3. Test

4. Save

1111 Background Toped Evaluation New Opportunities

Toped: A form fill-in UI to Toped: A form fill-in UI to mediatemediatebetween users and grammarsbetween users and grammars

Page 12: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1212

The system generates an augmented CFG The system generates an augmented CFG from format descriptionfrom format description

A part that almost always has 1-8 lowercase letters:

#WORD : #CHLIST : COUNT(#CH)>=1 && COUNT(#CH)<=8 {90}#CHLIST : #CH | #CH #CHLIST #CH : a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

• More compact than a pure CFG• More expressive than a pure CFG

– Some constraints are impossible to represent as CFG– Some constraints need to be soft

Background Toped Evaluation New Opportunities

Page 13: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1313

Testing strings against grammarsTesting strings against grammars

• Downgrade a parse if it violates constraints– Penalty = 1 – (strength of constraint)/100– Multiply penalties– Propagate penalties up parse tree– Choose best parse (ie: parse with least penalties)

• Show error messages– Track violated constraints, concatenate into message

• If parse fails completely, show portions of format description that were used to generate unsatisfied CFG productions.

– End-user development tools may offer user option of overriding some errors, depending on penalties.

Background Toped Evaluation New Opportunities

Page 14: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1414

Showing error messages after testing Showing error messages after testing strings against the generated CFGsstrings against the generated CFGs

1414Background Toped Evaluation New Opportunities

Page 15: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1515

Usability: Does Toped help users to Usability: Does Toped help users to implement string validation?implement string validation?

• Between-subjects lab experiment– Direct comparison system: Lapis– (We also compare results to those of SWYN study – see paper)

• Recruited 17 participants (9 Toped, 8 Lapis)– Approx half were administrative assistants, approx

half were master’s students (mostly information systems), distributed roughly equally across tools

– 1 participant mis-interpreted instructions (=> 8 & 8)

Background Toped Evaluation New Opportunities

Page 16: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1616

Usability: Does Toped help users to Usability: Does Toped help users to implement string validation?implement string validation?

• Study structure– Background questionnaire– Tutorial (30 min)– 3 tasks (20 min)– User satisfaction questionnaire

• Detail of a task:– Validate 1 kind of data

• phone numbers, mailing addresses, company names– User goal: For each kind, find typos in 25 strings

• Randomly drawn from EUSES spreadsheet corpus• And we also retained 25 strings for further accuracy tests

Background Toped Evaluation New Opportunities

Page 17: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1717

Usability: Users were nearly 2 times as fast Usability: Users were nearly 2 times as fast and found 3 times as many typosand found 3 times as many typos

Toped Lapis RelativeImprovement

Significant?(Mann-Whitney)

Tasks completed 2.79 1.75 60% p<0.01

Typos identified

On 75 visible strings 16.50 5.75 187% p<0.01

On all 150 strings 31.25 9.50 229% p<0.01

F1 accuracy measure

On 75 visible strings 0.74 0.51 45% No

On all 150 strings 0.68 0.46 48% No

User satisfaction 3.78 3.06 24% p=0.02

Toped also compares favorably to SWYN regexp editor – see paper

Background Toped Evaluation New Opportunities

Page 18: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1818

Expressiveness: Does Toped provide Expressiveness: Does Toped provide adequate primitives for validating real data?adequate primitives for validating real data?

• Logged data typed by 4 users into browser (3 weeks)– For each text string, we recorded:

• A label for the text field (e.g.: “Phone”)• A regexp summarizing the string (e.g.: \d\d\d-\d\d\d-\d\d\d\d)

• Examined data, wrote scripts to cluster strings– 94% of the 5897 strings were in 19 clusters– Each cluster had 1-2 formats

• Used Toped to create formats– Omitted 5 clusters that were for “general text”, usernames or

passwords (so we could post format descriptions online)

Background Toped Evaluation New Opportunities

Page 19: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

1919

Expressiveness: Does Toped provide Expressiveness: Does Toped provide adequate primitives for validating real data?adequate primitives for validating real data?

• Overall, successful– We were able to create formats for each kind of data– The formats identified many probable typos

• Ideas for improvements– Ways to reuse constraints from format to format– Primitives for kinds of parts: Numeric, word-like, …

Background Toped Evaluation New Opportunities

Page 20: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

2020

Data Description EditorData Description EditorTopedToped++: an improved editor: an improved editor

2020Background Toped Evaluation New Opportunities

Page 21: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

2121

Contributions and New OpportunitiesContributions and New Opportunities

• Toped – UI to mediate between users & grammars– Enables users to work faster & more effectively– Adequately expressive for validating many kinds of data– Provided a start for new line of similar editor tools

• New Opportunities (aka “Future Work”)– Extending Toped+ to automatically reformat data [IUI’09]– Providing a repository for sharing formats (in-progress)

– Developing new ways to make use of ability to identify strings that violate soft constraints

Background Toped Evaluation New Opportunities

Page 22: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

2222

Thank You…Thank You…

• To Margaret Burnett, Brad Myers, Valentina Grigoreanu, Mary Beth Rosson, Mary Shaw and others in the EUSES Consortium for feedback over the years

• To NSF for funding

• To ISEUD 2009 for this opportunity to present

Page 23: Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

2323

TopedToped++: key improvements vs Toped: key improvements vs Topedin terms of Cognitive Dimensionsin terms of Cognitive Dimensions

• Better closeness of mapping– Constraints “belong” to parts in all formats

• Higher juxtaposability– Easy to view & compare multiple formats

• Lower error-proneness– Helps prevent senseless combinations of constraints

• Lower viscosity– Drag-and-drop / copy-and-paste speeds up edits

• Improved progressive evaluation– User can test each part individually

Background Toped Evaluation New Opportunities