Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09]...
Transcript of Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09]...
![Page 1: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/1.jpg)
Automatically Synthesizing SQL Queries
from Input-Output Examples
Sai Zhang
University of Washington
Joint work with: Yuyin Sun
![Page 2: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/2.jpg)
Goal: making it easier for non-expert users
to write correct SQL queries
2
• Non-expert database end-users
– Business analysts, scientists, marketing managers, etc.
can describe what the query task is do not know how write a correct query
This paper: bridge the gap!
![Page 3: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/3.jpg)
An example
SELECT name, MAX(score)
FROM student, enrolled
WHERE student.stu_id = enrolled.stu_id
GROUP BY student.stu_id
HAVING COUNT(enrolled.course_id) > 1
name stu_id
Alice 1
Bob 2
Charlie 3
Dan 4
stu_id course_id score
1 504 100
1 505 99
2 504 96
3 501 60
3 502 88
3 505 68
Table: student Table: enrolled
name MAX(score)
Alice 100
Charlie 88
Output table
Find the name and the maximum course score of each student
enrolled in more than 1 course.
The correct SQL query:
![Page 4: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/4.jpg)
Existing solutions for querying a database
• General programming languages
+ powerful
− learning barriers
• GUI tools
+ easy to use
− limited in customization and personalization
− hard to discover desired features in complex GUIs
4
![Page 5: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/5.jpg)
SELECT name, MAX(score)
FROM student, enrolled
WHERE student.stu_id = enrolled.stu_id
GROUP BY student.stu_id
HAVING COUNT(enrolled.course_id) > 1
name stu_id
Alice 1
Bob 2
Charlie 3
Dan 4
stu_id course_id score
1 504 100
1 505 99
2 504 96
3 501 60
3 502 88
3 505 68
Table: student Table: enrolled
name MAX(score)
Alice 100
Charlie 88
Output table
SQLSynthesizer
Our solution: programming by example
![Page 6: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/6.jpg)
How do end-users use SQLSynthesizer?
6
Real, large database tables Desired output result
SQL?
Small, representative
Input-output examples
SQLSynthesizer
![Page 7: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/7.jpg)
SQLSynthesizer’s advantages
7
• Fully automated − Only requires input-output examples
− No need of annotations, hints, or specification of any form
• Support a wide range of SQL queries − Beyond the “select-from-where” queries [Tran’09]
![Page 8: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/8.jpg)
Expressiveness
Ease
of
Use
GUI tools
SQLSynthesizer
Programming
languages
Comparison of solutions
![Page 9: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/9.jpg)
Outline
• Motivation
• A SQL Subset
• Synthesis Approach
• Evaluation
• Related Work
• Conclusion
9
![Page 10: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/10.jpg)
Designing a SQL subset
10
The full SQL language
A SQL Subset
• 1000+ pages specification
• PSPace-Completeness [Sarma’10]
• Some features are rarely used
SQLSynthesizer’s focus: a widely-used SQL subset
![Page 11: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/11.jpg)
How to design a SQL subset?
11
The full SQL language
A SQL Subset
• Previous approaches:
– Decided by the paper authors [Kandel’11] [Tran’09]
• Our approach:
– Ask experienced IT professionals for the most widely-used SQL features
?
![Page 12: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/12.jpg)
Our approach in designing a SQL subset
1. Online survey: eliciting design requirement
2. Designing the SQL subset
3. Follow-up interview: obtaining feedback
− Ask each participant to select 10 most widely-used SQL features
− Got 12 responses
− Ask each participant to rate the sufficiency of the subset
Supported SQL features
1) SELECT.. FROM…WHERE
2) JOIN
3) GROUP BY / HAVING
4) Aggregators (e.g., MAX, COUNT, SUM, etc)
5) ORDER BY
Supported in the previous work
[Tran’09]
Not sufficient at all Completely sufficient
0 5
Average rating: 4.5
![Page 13: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/13.jpg)
Our approach in designing a SQL subset
1. Online survey: eliciting design requirement
1. Designing the SQL subset
2. Follow-up interview: obtaining feedback
− Ask each participant to select 10 most widely-used SQL features
− Got 12 respondents
− Ask each participant to rate the sufficiency of the subset
Supported SQL features
- SELECT.. FROM…WHERE
- JOIN
- GROUP BY / HAVING
- Aggregators (e.g., MAX, COUNT, SUM, etc)
- ORDER BY
Supported in the previous work
[Tran’09]
Not sufficient at all Completely sufficient
0 5
Average rating: 4.5
The SQL subset is enough to
write most common queries.
![Page 14: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/14.jpg)
Outline
• Motivation
• Language Design
• Synthesis Approach
• Evaluation
• Related Work
• Conclusion
14
![Page 15: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/15.jpg)
SQLSynthesizer Workflow
15
Input-Output
Examples SQLSynthesizer Queries
Select the desired query, or provide more examples
Input
tables
Output
table
![Page 16: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/16.jpg)
SQLSynthesizer Workflow
16
Input-Output
Examples SQLSynthesizer Queries
Select the desired query, or provide more examples
Input
tables
Output
table
A SQL query
![Page 17: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/17.jpg)
SQLSynthesizer Workflow
17
Input-Output
Examples SQLSynthesizer Queries
Select the desired query, or provide more examples
Input
tables
Output
table
Filter Project Combine
![Page 18: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/18.jpg)
SQLSynthesizer Workflow
18
Input-Output
Examples SQLSynthesizer Queries
Select the desired query, or provide more examples
Input
tables
Output
table
Filter Project
SELECT name, MAX(score)
FROM student, enrolled
WHERE student.stu_id = enrolled.stu_id
GROUP BY student.stu_id
HAVING COUNT(enrolled.course_id) > 1 Query condition
Join condition
Projection columns
Join
condition
Query
condition
Projection
columns
A complete SQL:
Combine
![Page 19: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/19.jpg)
Multiple solutions
19
Input
tables Output
table
Query 2
Query 1
Query 3
…
![Page 20: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/20.jpg)
Multiple solutions
20
Input
tables Output
table
Filter Project
Combine
… …
Computes all solutions, ranks them, and shows them to the user.
![Page 21: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/21.jpg)
Key techniques
21
Input
tables
Output
table
Filter Project
Join
condition
Query
condition
Projection
columns
Combine
1. Combine: Exhaustive search over legal combinations
(e.g., cannot join columns with different types)
2. Filter: A machine learning approach to infer query conditions
3. Project: Exhaustive search over legal columns (e.g., cannot apply AVG to a string column)
![Page 22: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/22.jpg)
22
Cast as a rule learning problem:
Finding rules that can perfectly divide a search space
into a positive part and a negative part
All rows in the
joined table
Rows contained
in the output table
The rest of
the rows
Learning query conditions
![Page 23: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/23.jpg)
Search space: the joined table
23
name stu_id
Alice 1
Bob 2
Charlie 3
Dan 4
stu_id course_id score
1 504 100
1 505 99
2 504 96
3 501 60
3 502 88
3 505 68
Table: student
Table: enrolled
Name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Bob 2 504 96
Charlie 3 501 60
Charlie 3 502 88
Charlie 3 505 68
Join on the stu_id column
The joined table
(inferred in the
Combine step)
![Page 24: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/24.jpg)
Finding rules selecting rows contained in
the output table
24
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Bob 2 504 96
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
The joined table
name MAX(score)
Alice 100
Charlie 88
Output table
![Page 25: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/25.jpg)
Finding rules selecting rows containing
the output table
25
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Bob 2 504 96
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
The joined table
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68 name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
![Page 26: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/26.jpg)
Finding rules selecting rows containing
the output table
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Bob 2 504 96
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
The joined table
No good rules!
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
![Page 27: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/27.jpg)
Solution: computing additional features
• Key idea:
– Expand the search space with additional features
• Enumerate all possibilities that a table can be aggregated
• Precompute aggregation values as features
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Bob 2 504 96
Charlie 3 501 60
Charlie 3 502 88
Charlie 3 505 68
Suppose grouping it by stu_id
MAX(score) SUM
(score)
COUNT
(course_id)
100 199 2
100 199 2
96 96 1
88 216 3
88 216 3
88 216 3
additional features
...
The joined table
![Page 28: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/28.jpg)
Finding rules without additional features
28
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Bob 2 504 96
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
The joined table
No good rules!
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
![Page 29: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/29.jpg)
Finding rules with additional features
The joined table
name stu_id course_id score COUNT(course_id) MIN(score)
Alice 1 504 100 2 99
Alice 1 505 99 2 99
Bob 2 504 96 1 96
Charlie 3 501 60 3 60
Charlie 3 502 88 3 60
Charlie 4 505 68 3 60
COUNT(course_id) > 1
(after groupping by stu_id)
SELECT name, MAX(score)
FROM student, enrolled
WHERE student.stu_id =
enrolled.stu_id
GROUP BY student.stu_id
HAVING COUNT(enrolled.course_id) > 1
after the table is grouped by stu_id
…
name stu_id course_id score
Alice 1 504 100
Alice 1 505 99
Charlie 3 501 60
Charlie 3 502 88
Charlie 4 505 68
![Page 30: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/30.jpg)
Ranking multiple SQL queries
• Occam’s razor principle: rank simpler queries higher
– A simpler query is less likely to overfit the examples
• Approximate a query’s complexity by its text length
30
name age score
Alice 20 100
Bob 20 99
Charlie 30 99
name
Alice
Bob
Query 1: select name from student where age < 30
Query 2: select name from student where
name = ‘Alice’ || name = ‘Bob’
Input table: student Output table
![Page 31: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/31.jpg)
Outline
• Motivation
• Language Design
• Synthesis Approach
• Evaluation
• Related Work
• Conclusion
31
![Page 32: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/32.jpg)
Research Questions
• Success ratio in synthesizing SQL queries?
• What is the tool time cost?
• How much human effort is needed in writing examples?
• Comparison to existing techniques.
32
![Page 33: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/33.jpg)
Benchmarks
• 23 SQL query related exercises from a classic textbook
– All exercises in chapters 5.1 and 5.2
• 5 forum questions
– Can be answered by using standard SQL
(Most questions are vendor-specific)
– 2 questions contain example tables
33
![Page 34: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/34.jpg)
Evaluation Procedure
• Rank of the correct SQL query
• Tool time cost
• Manual cost
– Example size, time cost, and the number of interaction rounds
(All experiments are done by the second author)
34
Input-Output
Examples SQLSynthesizer Queries
Select the desired query, or provide more examples
![Page 35: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/35.jpg)
Results: success ratio
35
28 SQL questions
Fail on 8
questions
Succeed on 20 questions
5 forum questions
23 textbook
exercises
Succeed on 5
forum questions
Succeed on 15
exercises
Fail on 8
exercises
Require writing sub-queries, which are not
supported in SQLSynthesizer
The correct query ranks 1st
in all succeeded questions
![Page 36: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/36.jpg)
Result: tool time cost
• On average, 8 seconds per benchmark
– Min: 1 second, max: 120 seconds
– Roughly proportional to the #table and #column
36
![Page 37: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/37.jpg)
Results: manual cost
• Example size
– 22 rows, on average (min: 8 rows, max: 52 rows)
• Time cost in writing examples
– 3.6 minutes per benchmark, on average
(min: 1 minute, max: 7 minutes)
• Number of interaction rounds
– 2.3 rounds per benchmark, on average
(min: 1 round, max: 5 rounds)
37
![Page 38: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/38.jpg)
Comparison with an existing approach
• Query-by-Output (QBO) [Tran’09]
– Support simple “select-from-where” queries
– Use data values as machine learning features
38
name age score
Alice 20 100
Bob 20 99
Charlie 30 80
name
Alice
Bob
select name from student
where age < 30
Table: student
age MAX(score)
20 100
30 80 select age, max(score) from student
group by age
Output table
![Page 39: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/39.jpg)
Query-by-Output vs. SQLSynthesizer
39
Query-by-Output SQLSynthesizer
28 SQL questions
28 SQL questions
Fail on 26 questions
Succeed on 2 questions Succeed on 20 questions
Fail on 8
questions
- Many realistic SQL queries use aggregation features.
- Users are unlikely to get stuck on simple “select-from-where” queries.
![Page 40: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/40.jpg)
Experimental Conclusions
• Good success ratio (71%)
• Low tool time cost
– 8 seconds on average
• Reasonable manual cost
– 3.6 minutes on average
– 2.3 interaction rounds
• Outperform an existing technique
– Success ratio: QBO (7%) vs. SQLSynthesizer (71%)
40
![Page 41: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/41.jpg)
Outline
• Motivation
• Language Design
• Synthesis Approach
• Evaluation
• Related Work
• Conclusion
41
![Page 42: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/42.jpg)
Related Work
• Reverse engineering SQL queries
Query-by-Examples [Zloof’75]
A new GUI with a domain-specific language to write queries
Query-by-Output [Tran’09]
Uses data values as features, and supports a small SQL subset.
View definition Synthesis [Sarma’10]
Theoretical analysis, and is limited to 1 input/output table.
• Automated program synthesis
PADS [Fisher’08], Wrangler [Kandel’11], Excel Macro [Harris’11],
SQLShare [Howe’11], SnippetSuggest [Khoussainova’11],
SQL Inference from Java code [Cheung’13]
− Targets different problems, or requires different input.
− Inapplicable to SQL synthesis
42
![Page 43: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/43.jpg)
Outline
• Motivation
• Language Design
• Synthesis Approach
• Evaluation
• Related Work
• Conclusion
43
![Page 44: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/44.jpg)
Contributions
• A programming-by-example technique
– Synthesize SQL queries from input-output examples
– Core idea: using machine learning to infer query conditions
• Experiments that demonstrate its usefulness
– Accurate and efficient
• Inferred correct answers for 20 out of 28 SQL questions
• 8 seconds for each question
– Outperforms an existing technique
• The SQLSynthesizer implementation
http://sqlsynthesizer.googlecode.com
44
![Page 45: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/45.jpg)
[Backup Slides]
45
![Page 46: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/46.jpg)
The most widely-used SQL features
46
0 2 4 6 8 10 12
RIGHT JOIN
NOT EXIST
UNION
NOT NULL
LIKE
BETWEEN
LEFT JOIN
HAVING
INNER JOIN
FULL JOIN
MIN
MAX
IN
SUM
AVG
DISTINCT
COUNT
ORDER BY
GROUP BY
SELECT... FROM..
Number of votes
21 features
Aggregation features
The standard select ..
from.. where.. feature
Joining features
Existential features
Value matching features
![Page 47: Automatically Synthesizing SQL Queries from Input …• Query-by-Output (QBO) [Tran’09] –Support simple “select-from-where” queries –Use data values as machine learning](https://reader035.fdocuments.in/reader035/viewer/2022063003/5f748c0f568a0733603b5c6e/html5/thumbnails/47.jpg)
Design a SQL subset
47
0 2 4 6 8 10 12
RIGHT JOIN
NOT EXIST
UNION
NOT NULL
LIKE
BETWEEN
LEFT JOIN
HAVING
INNER JOIN
FULL JOIN
MIN
MAX
IN
SUM
AVG
DISTINCT
COUNT
ORDER BY
GROUP BY
SELECT... FROM..
Special joins
Wildcard matching
Sub-query
Covered features
Uncovered features
Covered 15 features