"Error Recovery" by @alaz at scalaby#8
-
Upload
vasil-remeniuk -
Category
Technology
-
view
1.022 -
download
3
Transcript of "Error Recovery" by @alaz at scalaby#8
![Page 1: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/1.jpg)
Parsers Error Recovery for Practical Use
Parsers Error Recovery for Practical Use
Alexander Azarov
[email protected] / Osinka.ru
February 18, 2012
![Page 2: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/2.jpg)
Parsers Error Recovery for Practical Use
Context
Osinka
I Forum of “BB” kindI (2.5M+ pages/day, 8M+ posts total)
I User generated content: BBCode markup
I Migrating to Scala
I Backend
![Page 3: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/3.jpg)
Parsers Error Recovery for Practical Use
Context
Plan
I Why parser combinators?
I Why error recovery?
I Example of error recovery
I Results
![Page 4: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/4.jpg)
Parsers Error Recovery for Practical Use
Context
BBCode
Few BBCode tags
[b]bold[/b] <b>bold</b>[i]italic[/i] <i>italic</i>[url=href]text[/url] <a href="href">text</a>[img]href[/img] <img src="href"/>
![Page 5: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/5.jpg)
Parsers Error Recovery for Practical Use
Context
BBCode example
I example of BBCode
Example
[quote="Nick"]original [b]text[/b][/quote]Here it is the reply with[url=http://www.google.com]link[/url]
![Page 6: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/6.jpg)
Parsers Error Recovery for Practical Use
Task
Why parser combinators, 1
I Regexp maintenance is a headache
I Bugs extremely hard to find
I No markup errors
![Page 7: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/7.jpg)
Parsers Error Recovery for Practical Use
Task
Why parser combinators, 2
One post source, many views
I HTML render for WebI textual view for emailsI text-only short summaryI text-only for full-text search indexer
![Page 8: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/8.jpg)
Parsers Error Recovery for Practical Use
Task
Why parser combinators, 3
Post analysis algorithms
I links (e.g. spam automated analysis)I imagesI whatever structure analysis we’d want
![Page 9: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/9.jpg)
Parsers Error Recovery for Practical Use
Task
Universal AST
One AST
I different printers
I various traversal algorithms
![Page 10: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/10.jpg)
Parsers Error Recovery for Practical Use
Problem
Sounds great. But.
This all looks like a perfect world.But what’s the catch??
![Page 11: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/11.jpg)
Parsers Error Recovery for Practical Use
Problem
Sounds great. But.
Humans.They do mistakes.
Example
[quote][url=http://www.google.com][img]http://www.image.com[/url[/img][/b]
![Page 12: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/12.jpg)
Parsers Error Recovery for Practical Use
Problem
Sounds great. But.
Humans.They do mistakes.
Example
[quote][url=http://www.google.com][img]http://www.image.com[/url[/img][/b]
![Page 13: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/13.jpg)
Parsers Error Recovery for Practical Use
Problem
User-Generated Content: Problem
Erroneous markup
I People do mistakes,I But no one wants to see empty post,I We have to show something meaningful in any case
![Page 14: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/14.jpg)
Parsers Error Recovery for Practical Use
Problem
Black or White World
I Scala parser combinators assume valid input
I Parser result: Success | NoSuccess
I no error recovery out of the box
![Page 15: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/15.jpg)
Parsers Error Recovery for Practical Use
Solution
Error recovery: our approach
I Our Parser never breaksI It generates “error nodes” instead
![Page 16: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/16.jpg)
Parsers Error Recovery for Practical Use
Solution
Approach: Error nodes
I Part of AST, FailNode contains the possible causes of thefailure
I They are meaningful
I for highlighting in editorI to mark posts having failures in markup (for moderators/other
users to see this)
![Page 17: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/17.jpg)
Parsers Error Recovery for Practical Use
Solution
Approach: input & unpaired tags
I Assume all input except tags as text
I E.g. [tag]text[/tag] is a text node
I Unpaired tags as the last choice: markup errors
![Page 18: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/18.jpg)
Parsers Error Recovery for Practical Use
Example
Example
Example
![Page 19: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/19.jpg)
Parsers Error Recovery for Practical Use
Example
Trivial BBCode markup
Example (Trivial "one tag" BBCode)
Simplest [font=bold]BBCode [font=red]example[/font][/font]
I has only one tag, fontI though it may have an argument
![Page 20: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/20.jpg)
Parsers Error Recovery for Practical Use
Example
Corresponding AST
AST
trait Node
case class Text(text: String) extends Nodecase class Font(arg: Option[String], subnodes: List[Node]) extends
Node
![Page 21: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/21.jpg)
Parsers Error Recovery for Practical Use
Example
Parser
BBCode parser
lazy val nodes = rep(font | text)lazy val text =rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ {texts => Text(texts.mkString)
}lazy val font: Parser[Node] = {fontOpen ~ nodes <~ fontClose ^^ {case fontOpen(_, arg) ~ subnodes => Font(Option(arg),
subnodes)}
}
![Page 22: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/22.jpg)
Parsers Error Recovery for Practical Use
Example
Valid markup
Scalatest
describe("parser") {it("keeps spaces") {parse(" ") must equal(Right(Text(" ") :: Nil))parse(" \n ") must equal(Right(Text(" \n ") :: Nil))
}it("parses text") {parse("plain text") must equal(Right(Text("plain text") ::
Nil))}it("parses bbcode-like text") {parse("plain [tag] [fonttext") must equal(Right(Text("
plain [tag] [fonttext") :: Nil))}
![Page 23: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/23.jpg)
Parsers Error Recovery for Practical Use
Example
Invalid markup
Scalatest
describe("error markup") {it("results in error") {parse("t[/font]") must be(’left)parse("[font]t") must be(’left)
}}
![Page 24: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/24.jpg)
Parsers Error Recovery for Practical Use
Example
Recovery: Extra AST node
FailNode
case class FailNode(reason: String, markup: String) extends Node
![Page 25: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/25.jpg)
Parsers Error Recovery for Practical Use
Example
Recovery: helper methodsExplicitly return FailNode
protected def failed(reason: String) = FailNode(reason, "")
Enrich FailNode with markup
protected def recover(p: => Parser[Node]): Parser[Node] =Parser { in =>val r = p(in)lazy val markup = in.source.subSequence(in.offset, r.next.offset
).toStringr match {case Success(node: FailNode, next) =>Success(node.copy(markup = markup), next)
case other =>other
![Page 26: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/26.jpg)
Parsers Error Recovery for Practical Use
Example
Recovery: Parser rules
I never break (provide “alone tag” parsers)I return FailNode explicitly if needed
nodes
lazy val nodes = rep(node | missingOpen)lazy val node = font | text | missingClose
![Page 27: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/27.jpg)
Parsers Error Recovery for Practical Use
Example
“Missing open tag” parser
Catching alone [/font]
def missingOpen = recover {fontClose ^^^ { failed("missing open") }
}
![Page 28: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/28.jpg)
Parsers Error Recovery for Practical Use
Example
Argument check
font may have limits on argument
lazy val font: Parser[Node] = recover {fontOpen ~ rep(node) <~ fontClose ^^ {case fontOpen(_, arg) ~ subnodes =>if (arg == null || allowedFontArgs.contains(arg)) Font(
Option(arg), subnodes)else failed("arg incorrect")
}}
![Page 29: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/29.jpg)
Parsers Error Recovery for Practical Use
Example
Passes markup error tests
Scalatest
describe("recovery") {it("reports incorrect arg") {parse("[font=b]t[/font]") must equal(Right(FailNode("arg incorrect", "[font=b]t[/font]") :: Nil
))}it("recovers extra ending tag") {parse("t[/font]") must equal(Right(Text("t") :: FailNode("missing open", "[/font]") :: Nil
))}
![Page 30: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/30.jpg)
Parsers Error Recovery for Practical Use
Example
Passes longer tests
Scalatest
it("recovers extra starting tag in a longer sequence") {parse("[font][font]t[/font]") must equal(Right(FailNode("missing close", "[font]") :: Font(None, Text("t
") :: Nil) :: Nil))
}it("recovers extra ending tag in a longer sequence") {parse("[font]t[/font][/font]") must equal(Right(Font(None, Text("t") :: Nil) :: FailNode("missing open", "
[/font]") :: Nil))
![Page 31: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/31.jpg)
Parsers Error Recovery for Practical Use
Example
Examples source code
I Source code, specs:https://github.com/alaz/slides-err-recovery
![Page 32: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/32.jpg)
Parsers Error Recovery for Practical Use
Results
Production use outlines
I It works reliably
I Lower maintenance costsI Performance (see next slides)
I Beware: Scala parser combinators are not thread-safe.
![Page 33: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/33.jpg)
Parsers Error Recovery for Practical Use
Results
Performance
I The biggest problem is performance.
Benchmarks (real codebase)
PHP ScalaTypical 8k 5.3ms 51msBig w/err 76k 136ms 1245ms
I Workaround: caching
![Page 34: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/34.jpg)
Parsers Error Recovery for Practical Use
Results
Surprise!
Never give up
I find a good motivator instead (e.g. presentation for Scala.by)
![Page 35: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/35.jpg)
Parsers Error Recovery for Practical Use
Results
Performance: success story
I Want performance? Do not use Lexical
I Forget those scary numbers!
Benchmarks (real codebase)
PHP ScalaTypical 8k 5.3ms 51ms 16msBig w/err 76k 136ms 1245ms 31ms
I Thank you, Scala.by!
![Page 36: "Error Recovery" by @alaz at scalaby#8](https://reader033.fdocuments.in/reader033/viewer/2022052900/55615444d8b42a780d8b5177/html5/thumbnails/36.jpg)
Parsers Error Recovery for Practical Use
Results
Thank you
I Email: [email protected] Twitter: http://twitter.com/aazarovI Source code, specs:
https://github.com/alaz/slides-err-recovery