Cultivating Research Taste (illustrated via a journey in
Program Synthesis research) Programming Languages Mentoring
Workshop 2015 Sumit Gulwani Microsoft Research, Redmond
Slide 2
Problem Definition Advisors interest and funding, Internship,
Course project Intersection with your collaborators interest Next
logical advance in your current portfolio Talk to potential
customers, market surveys Solution Strategy Develop new techniques
vs. Apply existing techniques Cross-disciplinary Impact Paper,
Tool, Awards, Media Personal happiness Cultivating research taste
is a journey! Once you develop it, you start on another journey! 1
Dimensions in Research
Slide 3
2 Program Synthesis Goal: Synthesize a program in the
underlying domain-specific language (DSL) from user intent using
some search algorithm. An old problem, but more significant today.
Diverse computational platforms & programming languages.
Enabling technology: Better algorithms & faster machines
Synthesis can revolutionize end-user programming if we: target the
right set of application domains such as Data manipulation allow
the right intent specification mechanism Examples, Natural Language
can tame the huge search space for real-time interaction
Domain-specific search algorithms PPDP 2010 [Invited talk paper]:
Dimensions in Program Synthesis;
Slide 4
3 Graduation Advice (2005) George Necula UC-Berkeley You will
have too many problems to solve; you cant pursue them all. Make
thoughtful choices.
Slide 5
4 From Program Verification to Program Synthesis Statement s
Precondition P Postcondition Q Forward dataflow analysis: From s,
P, compute Q Program Synthesis: Backward dataflow analysis: From s,
Q, compute P From P, Q, compute s Nebojsa Jojic MSR Redmond
(2005)
Slide 6
5 Synthesis using SAT/SMT Constraint Solvers Venkie MSR
Bangalore (2006) Try using SAT solvers, which have been engineered
to solve huge instances. Program synthesis is an extremely hard
combinatorial search task!
Slide 7
Results: Managed to synthesize a wide variety of programs from
logic specs. Approach: Reduce synthesis to solving SAT/SMT
constraints. Bit-vector algorithms (e.g., turn-off rightmost one
bit) [PLDI 2011, ICSE 2010] SIMD algorithms (e.g., vectorization of
CountIf) [PPoPP 2013] Undergraduate book algorithms (e.g., sorting,
dynamic prog) [POPL 2010] Program Inverses (e.g, deserializers from
serializers) [PLDI 2011] Graph Algorithms (e.g., bi-partiteness
check) [OOPSLA 2010] 6 Initial results in program synthesis
Slide 8
Mid-life Awakening (2010) Software developers End users Two
orders of magnitude more users
Slide 9
Problem Definition Advisors interest and funding, Internship,
Course project Intersection with your collaborators interest Next
logical advance in your current portfolio Talk to potential
customers, market surveys Solution Strategy Develop new techniques
vs. Apply existing techniques Cross-disciplinary Impact Paper,
Tool, Media, Awards Personal happiness Cultivating research taste
is a journey! Once you develop it, you start on another journey! 8
Dimensions in Research
Problem Definition Advisors interest and funding, Internship,
Course project Intersection with your collaborators interest Next
logical advance in your current portfolio Talk to potential
customers, market surveys Solution Strategy Develop new techniques
vs. Apply existing techniques Cross-disciplinary Impact Paper,
Tool, Awards, Media Personal happiness Cultivating research taste
is a journey! Once you develop it, you start on another journey! 12
Dimensions in Research
Slide 14
Guarded Expression G := Switch((b 1,e 1 ), , (b n,e n ))
Boolean Expression b := c 1 c n Atomic Predicate c := Match(v
i,k,r) Trace Expression e := Concatenate(f 1, , f n ) Atomic
Expression f := s // Constant String | SubStr(v i, p 1, p 2 ) |
Loop( w: e) Index Expression p := k // Constant Integer | Pos(r 1,
r 2, k) // k th position in string whose left/right side matches
with r 1 /r 2 Regular Expression r := TokenSequence(T 1,,T n ) 13
Flash Fill: Domain Specific Language POPL 2011: Automating String
Processing in Spreadsheets using Input-Output Examples; Sumit
Gulwani.
Slide 15
Let w = SubString(s, p, p) where p = Pos(r 1, r 2, k) and p =
Pos(r 1 , r 2 , k) 14 Substring Operator s p p w w1w1 w2w2 w1w1
w2w2 r 1 matches w 1 r 2 matches w 2 r 1 matches w 1 r 2 matches w
2
Slide 16
15 Syntactic String Transformations: Example Switch((b 1, e 1
), (b 2, e 2 )), where b 1 Match(v 1,NumTok,3), b 2 : Match(v
1,NumTok,3), e 1 Concatenate(SubStr2(v 1,NumTok,1), ConstStr(-),
SubStr2(v 1,NumTok,2), ConstStr(-), SubStr2(v 1,NumTok,3)) e 2
Concatenate(ConstStr(425-),SubStr2(v 1,NumTok,1),
ConstStr(-),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1
Output (425)-706-7709425-706-7709 510.220.5586510-220-5586 235
7654425-235-7654 745-8139425-745-8139
Slide 17
16 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i
3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i
3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace
expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S
2, S 3, S 4. Let S = S 1 S 2 S 3 S 4. 2(a). If S ; then result is
S. Challenge: Each S j may have a huge number of expressions. Key
Idea: We have a DAG based data-structure that allows for succinct
representation and manipulation of S j. Flash Fill: Search
Algorithm
Slide 18
17 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i
3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i
3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace
expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S
2, S 3, S 4. Let S = S 1 S 2 S 3 S 4. 2(a). If S ; then result is
S. 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4
}, s.t. S 1 S 2 ; and S 3 S 4 ;. 3. Learn boolean formulas b 1, b 2
s.t. b 1 maps i 1, i 2 to true, and b 2 maps i 3, i 4 to true. 4.
Result is: Switch((b 1,S 1 S 2 ), (b 2,S 3 S 4 )) Flash Fill:
Search Algorithm Search Methodology: Reduce learning of an
expression to learning of sub-expressions
(Divide-and-Conquer!)
Slide 19
General Principles Prefer shorter programs. Fewer number of
conditionals. Shorter string expression, regular expressions.
Prefer programs with fewer constants. Strategies Baseline: Pick any
minimal sized program using minimal number of constants. Machine
Learning: Programs are scored using a weighted combination of
program features. Weights are learned using training data. 18
Ranking Rishabh Singh
Slide 20
19 Experimental Comparison of various Ranking Strategies
StrategyAverage # of examples required Baseline4.17 Learning1.48
Technical Report: Predicting a correct program in Programming by
Example; Singh, Gulwani
Slide 21
Current Flash Fill Model Auto-prediction avoids discoverability
issue. User inspects output and may provide additional examples.
Show programs in any desired language (after conversion from DSL).
Paraphrase in English. Computer initiated interactivity Highlight
less confident entries in the output. Ask directed questions based
on distinguishing inputs. 20 User Interaction Model
Slide 22
Problem Definition Advisors interest and funding, Internship,
Course project Intersection with your collaborators interest Next
logical advance in your current portfolio Talk to potential
customers, market surveys Solution Strategy Develop new techniques
vs. Apply existing techniques Cross-disciplinary Impact Paper,
Tool, Awards, Media Personal happiness Cultivating research taste
is a journey! Once you develop it, you start on another journey! 21
Dimensions in Research
Slide 23
Initial Success: Media articles & Blogposts
Slide 24
Defined a new research trajectory, which keeps me busy with a
passionate sense of purpose. End-user Programming using Examples
and Natural Language Intelligent Tutoring systems 23 Broader
Impact
Slide 25
Dimensions in Research Problem definition, Solution strategy,
Impact Cultivating research taste is a journey Mine involved:
Program analysis -> Program synthesis -> Program synthesis
for end-users using examples Once you develop it, you start a new
journey Mine involves: having fun with cross-disciplinary research
in Frameworks for end-user programming using examples & NL
Intelligent Tutoring systems Conclusion