Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming...

download Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.

If you can't read please download the document

Transcript of Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming...

  • Slide 1
  • Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft Research, Redmond
  • Slide 2
  • Problem Definition Advisors interest and funding, Internship, Course project Intersection with your collaborators interest Next logical advance in your current portfolio Talk to potential customers, market surveys Solution Strategy Develop new techniques vs. Apply existing techniques Cross-disciplinary Impact Paper, Tool, Awards, Media Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 1 Dimensions in Research
  • Slide 3
  • 2 Program Synthesis Goal: Synthesize a program in the underlying domain-specific language (DSL) from user intent using some search algorithm. An old problem, but more significant today. Diverse computational platforms & programming languages. Enabling technology: Better algorithms & faster machines Synthesis can revolutionize end-user programming if we: target the right set of application domains such as Data manipulation allow the right intent specification mechanism Examples, Natural Language can tame the huge search space for real-time interaction Domain-specific search algorithms PPDP 2010 [Invited talk paper]: Dimensions in Program Synthesis;
  • Slide 4
  • 3 Graduation Advice (2005) George Necula UC-Berkeley You will have too many problems to solve; you cant pursue them all. Make thoughtful choices.
  • Slide 5
  • 4 From Program Verification to Program Synthesis Statement s Precondition P Postcondition Q Forward dataflow analysis: From s, P, compute Q Program Synthesis: Backward dataflow analysis: From s, Q, compute P From P, Q, compute s Nebojsa Jojic MSR Redmond (2005)
  • Slide 6
  • 5 Synthesis using SAT/SMT Constraint Solvers Venkie MSR Bangalore (2006) Try using SAT solvers, which have been engineered to solve huge instances. Program synthesis is an extremely hard combinatorial search task!
  • Slide 7
  • Results: Managed to synthesize a wide variety of programs from logic specs. Approach: Reduce synthesis to solving SAT/SMT constraints. Bit-vector algorithms (e.g., turn-off rightmost one bit) [PLDI 2011, ICSE 2010] SIMD algorithms (e.g., vectorization of CountIf) [PPoPP 2013] Undergraduate book algorithms (e.g., sorting, dynamic prog) [POPL 2010] Program Inverses (e.g, deserializers from serializers) [PLDI 2011] Graph Algorithms (e.g., bi-partiteness check) [OOPSLA 2010] 6 Initial results in program synthesis
  • Slide 8
  • Mid-life Awakening (2010) Software developers End users Two orders of magnitude more users
  • Slide 9
  • Problem Definition Advisors interest and funding, Internship, Course project Intersection with your collaborators interest Next logical advance in your current portfolio Talk to potential customers, market surveys Solution Strategy Develop new techniques vs. Apply existing techniques Cross-disciplinary Impact Paper, Tool, Media, Awards Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 8 Dimensions in Research
  • Slide 10
  • Problem Definition: Inspired by Excel help forums
  • Slide 11
  • Typical help-forum interaction 300_w5_aniSh_c1_b w5 =MID(B1,5,2) 300_w30_aniSh_c1_b w30 =MID(B1,FIND(_,$B:$B)+1, FIND(_,REPLACE($B:$B,1,FIND(_,$B:$B),))-1) =MID(B1,5,2)
  • Slide 12
  • Flash Fill (Excel 2013 feature)
  • Slide 13
  • Problem Definition Advisors interest and funding, Internship, Course project Intersection with your collaborators interest Next logical advance in your current portfolio Talk to potential customers, market surveys Solution Strategy Develop new techniques vs. Apply existing techniques Cross-disciplinary Impact Paper, Tool, Awards, Media Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 12 Dimensions in Research
  • Slide 14
  • Guarded Expression G := Switch((b 1,e 1 ), , (b n,e n )) Boolean Expression b := c 1 c n Atomic Predicate c := Match(v i,k,r) Trace Expression e := Concatenate(f 1, , f n ) Atomic Expression f := s // Constant String | SubStr(v i, p 1, p 2 ) | Loop( w: e) Index Expression p := k // Constant Integer | Pos(r 1, r 2, k) // k th position in string whose left/right side matches with r 1 /r 2 Regular Expression r := TokenSequence(T 1,,T n ) 13 Flash Fill: Domain Specific Language POPL 2011: Automating String Processing in Spreadsheets using Input-Output Examples; Sumit Gulwani.
  • Slide 15
  • Let w = SubString(s, p, p) where p = Pos(r 1, r 2, k) and p = Pos(r 1 , r 2 , k) 14 Substring Operator s p p w w1w1 w2w2 w1w1 w2w2 r 1 matches w 1 r 2 matches w 2 r 1 matches w 1 r 2 matches w 2
  • Slide 16
  • 15 Syntactic String Transformations: Example Switch((b 1, e 1 ), (b 2, e 2 )), where b 1 Match(v 1,NumTok,3), b 2 : Match(v 1,NumTok,3), e 1 Concatenate(SubStr2(v 1,NumTok,1), ConstStr(-), SubStr2(v 1,NumTok,2), ConstStr(-), SubStr2(v 1,NumTok,3)) e 2 Concatenate(ConstStr(425-),SubStr2(v 1,NumTok,1), ConstStr(-),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1 Output (425)-706-7709425-706-7709 510.220.5586510-220-5586 235 7654425-235-7654 745-8139425-745-8139
  • Slide 17
  • 16 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 S 2 S 3 S 4. 2(a). If S ; then result is S. Challenge: Each S j may have a huge number of expressions. Key Idea: We have a DAG based data-structure that allows for succinct representation and manipulation of S j. Flash Fill: Search Algorithm
  • Slide 18
  • 17 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 S 2 S 3 S 4. 2(a). If S ; then result is S. 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4 }, s.t. S 1 S 2 ; and S 3 S 4 ;. 3. Learn boolean formulas b 1, b 2 s.t. b 1 maps i 1, i 2 to true, and b 2 maps i 3, i 4 to true. 4. Result is: Switch((b 1,S 1 S 2 ), (b 2,S 3 S 4 )) Flash Fill: Search Algorithm Search Methodology: Reduce learning of an expression to learning of sub-expressions (Divide-and-Conquer!)
  • Slide 19
  • General Principles Prefer shorter programs. Fewer number of conditionals. Shorter string expression, regular expressions. Prefer programs with fewer constants. Strategies Baseline: Pick any minimal sized program using minimal number of constants. Machine Learning: Programs are scored using a weighted combination of program features. Weights are learned using training data. 18 Ranking Rishabh Singh
  • Slide 20
  • 19 Experimental Comparison of various Ranking Strategies StrategyAverage # of examples required Baseline4.17 Learning1.48 Technical Report: Predicting a correct program in Programming by Example; Singh, Gulwani
  • Slide 21
  • Current Flash Fill Model Auto-prediction avoids discoverability issue. User inspects output and may provide additional examples. Show programs in any desired language (after conversion from DSL). Paraphrase in English. Computer initiated interactivity Highlight less confident entries in the output. Ask directed questions based on distinguishing inputs. 20 User Interaction Model
  • Slide 22
  • Problem Definition Advisors interest and funding, Internship, Course project Intersection with your collaborators interest Next logical advance in your current portfolio Talk to potential customers, market surveys Solution Strategy Develop new techniques vs. Apply existing techniques Cross-disciplinary Impact Paper, Tool, Awards, Media Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 21 Dimensions in Research
  • Slide 23
  • Initial Success: Media articles & Blogposts
  • Slide 24
  • Defined a new research trajectory, which keeps me busy with a passionate sense of purpose. End-user Programming using Examples and Natural Language Intelligent Tutoring systems 23 Broader Impact
  • Slide 25
  • Dimensions in Research Problem definition, Solution strategy, Impact Cultivating research taste is a journey Mine involved: Program analysis -> Program synthesis -> Program synthesis for end-users using examples Once you develop it, you start a new journey Mine involves: having fun with cross-disciplinary research in Frameworks for end-user programming using examples & NL Intelligent Tutoring systems Conclusion
  • Slide 26
  • 25 Backup Slides for Flash Fill Demo
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49