Domain-speciﬁc Languageslaemmel/slecourse/slides/dsls.pdf · This slide has been extracted from...

© 2012-14, Ralf Lämmel, Software Languages Team, University of Koblenz-Landau, and collaborators (see 1st slide)

Domain-specific Languages Course "Software Language Engineering"

University of Koblenz-Landau

Department of Computer Science

Ralf Lämmel

Software Languages Team

1


What’s a DSL?

2


An example of a DSL

3

...102:45:44 Aldrin: Okay. Engine Stop.102:45:45 Aldrin: ACA out of Detent.102:45:46 Armstrong: Out of Detent. Auto.102:45:47 Aldrin: Mode Control, both Auto. Descent Engine Command Override Off. Engine Arm, Off. 413 is in.102:45:57 Duke: We copy you down, Eagle.102:45:58 Armstrong (on-board): Engine arm is off. (Pause) Houston, Tranquility Base here. The Eagle has landed....

Simplified, domain-specific grammar and terminology focused on “Moon landing”.

This slide has been extracted from “Creating DSLs in Java” by Venkat Subramaniam, JavaWorld.com.


Domain-specific notation

4


matrixA.multiply(matrixB);

matrixA * matrixB

✘

✔


Domain-specific?

5

The word domain in DSL refers to "an area or sphere of knowledge, influence, or activity." Focusing on a domain gives you a context -- a logical framework within which you can evolve models for an application.



CSS

6

A:hover { color: #FF0000; text-decoration: none; }



Make

7

GCC=cxx -I.all : clean compilecompile : myprogmyprog: MyProg.cc Util.o $(GCC) -o myprog MyProg.cc Util.oUtil.o : Util.h Util.cc $(GCC) -c Util.ccclean : /bin/rm -f myprog Util.o



GPL ←→ DSL

8

This slide has been extracted from “Domain Specific Languages” by Lukas Renggli (course slides). See license at the end of the deck.


General-purpose language

Turing complete

Well understood and widely used

Applicable to a wide range of problems

9



GPL Pros

Excellent support through IDEs

Easy to find experienced developers

Growth through libraries

10



GPL Cons

Growth of language is (often) not possible

Can be very verbose

Lack of abstractions

11



Domain-specific language

Small language targeted at a particular problem domain

Expressive in its own domain

Often declarative

12



DSL ProsTailored to a particular application domain [Mernik 2005] Expressive and easy to use [Hudak 1998] Little languages, little maintenance, increased productivity [Deursen 1997] Communication with domain experts [Fowler, Domain-specific languages, Addison-Wesley 2010]

13



DSL ConsExtra investment

Language engineering Learning a new language

Weak IDE support Editor, Debugger, Refactoring

Migration might be difficult Evolving into generality

14



Aspects of DSL implementation

15


Implementation styles

16

Internal: APIs, combinator libraries, fluent APIs

External: a different language independent of host language

Embedded: extend host language with syntax & semantics

Workbenches support embedded language development.


External vs. internal DSLs

17

An external or free-standing DSL is designed to be independent of any particular language.

An internal or embedded DSL, on the other hand, is designed and implemented using a host language.



External

18

‣ DSLs that use a different syntax to the main language that uses them (if any).

‣ Examples– make, flex, yacc, bison

– XPath, SQL

– sed, awk



External Pros

19

‣ Language Designer– Tools for compiler construction can be used

‣ Language User– Simpler to use than a GPL



External Cons

20

‣ Language Designer– Expensive to implement

‣ Language User– Weak tool support

– Yet another language to learn

– Often targeted towards a particular GPL

– Often difficult to closely integrate with GPL



CSS (external DSL)

21

A:hover { color: #FF0000; text-decoration: none; }



Makefile (external DSL)

22

GCC=cxx -I.all : clean compilecompile : myprogmyprog: MyProg.cc Util.o $(GCC) -o myprog MyProg.cc Util.oUtil.o : Util.h Util.cc $(GCC) -c Util.ccclean : /bin/rm -f myprog Util.o



Ant (external DSL)

23

<project name="AnExampleProject" default="jarit" basedir="."><property name="src" location="src"/><property name="build" location="build"/><property name="distrib" location="distrib"/><target name="compile" description="compile your Java code from src into build" ><javac srcdir="${src}" destdir="${build}"/></target><target name="jarit" depends="compile" description="jar it up" ><jar jarfile="${distrib}/AnExampleProject.jar" basedir="${build}"/></target></project>



Internal

24

‣ DSLs that share the same syntax to the main language that uses them.

‣ Examples– Parsec (Haskell)

– PetitParser (Smalltalk)

– rake, rspec (Ruby)

– jQuery (JavaScript)

– pypy (Python)



Internal

25

‣ A subset of the host language is used.

‣ Popular in Lisp, Scheme, Ruby, Smalltalk and JavaScript.



Internal Pros

26

‣ Language Designer– No special tools required

– No new grammar required

‣ Language User– Intermixable with GPL

– Tools continue to work

– No new language to learn



Internal Cons

27

‣ Limited expressivity

‣ Unnecessary syntactic noise

‣ Constrained by host language



Rake (internal in Ruby)

28

ORIGINAL = 'input.dat'BACKUP = 'input.dat.bak'task :default => BACKUPfile BACKUP => ORIGINAL do |task|cp task.prerequisites[0], task.nameend



Gant (internal in Groovy)

29

#slightly modified version of example from http://gant.codehaus.org/

includeTargets << gant.targets.Clean

cleanPattern << [ '**/*' , '**/*.bak' ]

cleanDirectory << 'build'

target (stuff : 'A target to do some stuff') {

println 'Stuff'

depends clean

echo message : 'A default message from Ant'

otherStuff()

}

target (otherStuff : 'A target to do some other stuff') {

println 'OtherStuff'

echo message : 'Another message from Ant'

clean()

}

setDefaultTarget stuffGenerates

“ant” (XML).



Parser combinators in Parsec (internal)

30


Internal DSL approaches

31

‣ Fluent APIs‣ Operator Overloading

– C++, C#, Smalltalk, Python, Ruby, ...

‣ Annotations– Java, C#, Smalltalk, Python, ...

‣ Parse Tree Manipulation– C#, Smalltalk

‣ Macros / Program generation– LISP, Scheme, Template Haskell, Converge


This slide has been extracted from “Dynamic Language Embedding” by Lukas Renggli (PhD thesis). See license at the end of the deck.

Taxonomy for internal, external and embedded languages��

��

��

��

��

��

�� ! ! " " �"'�%"� � �"�(��&�!��%��'�)��(&��#��'��#&'� �"��(�� ,��"'��%�'��&��! �&& ,� �"'#�'��#&' � �"��(�� "� � '## &� �(' � '��% � &,"'�+ � �"� � &�!�"'��& � �&&'%��' ,��#"&'%��"��

�� " " ! ! �+'�%"� � �"�(��&��%�� "��$�"��"'�#� � '��#&' � �"��(�� &�!��&�'��!��-�( '�'#��"'��%�'��"'#�'��#&'� �"�(��"��)� #$!�"'�'## &�

�� " " " " �!�� "�(��&��#!��"��'��)�"'��&�#��"'�%�"� ��"��+'�%"� � �"�(��&� �� ,��"��!�� "��(��(&�&�'��&�!��+��('�� %�$%�&�"'�'�#"��&�'��#&'��"��"'��%�'�&�*� �*�'��'## &�

�� +#"#!,��#%��"'�%"� � �+'�%"� ��"��!��"�(��&�

�� #�"��&#�'��,��#*�)�%��%��'��'## &�#��'��#&'��)� #$!�"'��")��%#"!�"'� �"'��%�'�#"�*�'��'��#&'� �"�(��&��-�( '�

�� "��'*��"�'��&��'*#��+'%�!�&�*��-"��!�� "�(��&� (�� *��+'�"��#&'� �"�(��*�'��"�*�&,"'�+��"��&�!�"'��&��"�(��*#%��"��&�&($$#%'�'��)� #$!�"'�#��!�� "�(��&��,�"'%#�(��"��#!!#"�%�$%�&�"'�'�#"��"��,��"'��%�'�"��!( '�$ �� "�(��&�"'#��#!!#"�'## &�'� �#%��+�!$ �� '#%&�'��)�"'��#��'��&'%��' �"�(��-"�'�#"&��"��('#!�'�� ,�$%#)��&,"'�+�� '�"�� ('#��#!$ �'�#"��"��%%#%��#%%��'�#"� �� ,� ��&�"� ��(��%��"��(&��'#&'�$�'�%#(��$��&�#��#��!$ �!�"'��"��%�"'� �"�(��&�

��)��#�(&��#(%�%�&��%��#"��!�� "�(��&��(&��'��,��#!��"��'��&'%�"�'��#��#'�� "'�%"� ��"��+'�%"� � �"�(��&� ��%"�� %"�� $#�"'�#('�'��'��!��$$%#��&� ��'#��''�%�%�(&��#��+�&'�"��#&'� �"�(��'(%�&��"��'## &� �"��&��"�-��"' ,�%��(��)� #$!�"'��"��'%��"�"��#&'&� �"�$%��'��#*�)�%� �"�(��*#%��"��&��#�"#'� �)�%��'��+�&'�"��'## &��('�$%#)��'��%�#*"��")�%#"!�"'� ��'�"�'��,��"'%#�(��"#"�&'�"��%�� "�(��%�$%�&�"'��'�#"��"��'�(&�$#&��#!$�'�� ',�$%#� �!&�*�'��+�&'�"��#��

�


Language workbenches

33

‣ IDEs designed for building DSLs.

‣ Common representation of host and domain specific languages.

‣ Examples– Spoofax, Sugar/J

– JetBrains Meta Programming System (MPS)

– openArchitectureWare



Simple patterns for internal DSL implementation

36


© 2012-14, Ralf Lämmel, Software Languages Team, University of Koblenz-Landau, and collaborators (see 1st slide) 37

Example

Processor p =new Processor(2, Processor.Type.i386);

Disk d1 =new Disk(150, Disk.UNKNOWN_SIZE, null);

Disk d2 =new Disk(75, 7200, Disk.Interface.SATA);

return new Computer(p, d1, d2);



Method chains

38

computer() .processor() .cores(2) .i386() .end() .disk() .size(150) .end() .disk() .size(75) .speed(7200) .sata() .end() .end();

Indentation only for inspiration. The “end” methods return the “parent”. The other methods

return “this”.



Imperative sequences

39

computer(); processor(); cores(2); i386(); disk(); size(150); disk(); size(75); speed(7200); sata(); end();

Indentation only for inspiration. The methods

must switch focus.



Nested method calls

40

computer( processor( cores(2), Processor.Type.i386), disk( size(150)), disk( size(75), speed(7200), Disk.Interface.SATA));

These are static methods which replace “functional

constructors”.



Literal Collections

41

[:computer, [:processor, [:cores, 2], [:type, :i386]], [:disk, [:size, 150]], [:disk, [:size, 75], [:speed, 7200], [:interface, :sata]]]



Closures

42

computer() do | c | c.processor() do | p | p.cores(2) p.i386() end c.disk().size(150) c.disk() do | d | d.size(75) d.speed(7200) d.sata() end end

b.computer() do b.processor() do b.cores(2) b.i386() end b.disk().size(150) b.disk() do b.size(75) b.speed(7200) b.sata() end end



For comparison: translation

43


computer: processor: cores 2 i386 disk: size 150 disk: size 75 speed 7200 sata

Processor p = new Processor(2, Processor.Type.i386);

Disk d1 = new Disk(150, Disk.UNKNOWN_SIZE, null);

Disk d2 = new Disk(75, 7200, Disk.Interface.SATA);

return new Computer(p, d1, d2);

Parse Generate

DSL script Semantic model Generated code


Fluency?

44


Do something with each name in a collection!

45


for(int i = 0; i < names.size(); i++){ String name = (String) name.get(i); //...}

Not quite fluent.



46


Iterator iter = name.iterator();while(iter.hasMore()){ String name = (String) iter.next(); //...}

Still not too fluent.



47


for(String name : names){ // ...}

That’s fluent.


Fluency in Groovy

48


names.each { name -> //...}

That’s very fluent.


Fluency in Java

49


JSONObject json = new JSONObject().accumulate("key1", "value1").accumulate("key2", "value2");

Context


Repeated context in Java

50


java.util.ArrayList<String> cart = new java.util.ArrayList<String>();

cart.add("Milk");cart.add("Juice");cart.add("Apple");

System.out.printf("My cart has %d items.", cart.size());


Conceivable approach with method chaining

51


ChainingArrayList<String> cart = new ChainingArrayList<String>();

cart.add("Milk").add("Juice").add("Apple");

System.out.printf("My cart has %d items.", cart.size());


“With” in Groovy

52


cart = []

cart.with { add "Milk" add "Juice" add "Apple"

println "My cart has $size items."}


“With” in JS

53


cart = new java.util.ArrayList()

with(cart){ add("Milk") add("Juice") add("Apple")

println("My cart has " + size() + " items.")}


��

� ��

��

��

� ��

��

��

��

�� 2+ 1&,+��".2"+ " ! ! ! ! ! ! ! ! !�� 2+ 1&,+��"01&+$ ! ! ! ! ! ! ! ! !�� 2+ 1&,+��%�&+&+$ ! ! ! ! ! ! ! ! !�� &$%"/��/!"/��2+ 1&,+0 "# "# "# "# ! ! ! ! !�� +$2�$"��&1"/�)0 ! ! ! ! ! ! ! ! !�� -"/�1,/��3"/),�!&+$ # ! ! # # ! ! ! !�� "1��++,1�1&,+0 # # ! ! # "# # # !�� /,$/�*��"+"/�1&,+ # "# "# # # ! "# # "#�� /,��/,$/�**&+$ "# "# # # # ! "# # "#

�� %"��--)& ��&)&16�,#�3�/&,20�&+1"/+�)�)�+$2�$"�-�11"/+0�&+� ,**,+�-/,�$/�**&+$�)�+$2�$"0� # +,1�02--,/1"!� "# -�/1)6�-,00&�)"�20&+$�4,/(�/,2+!0� )&��/�/&"0�,/�)�+$2�$"�"51"+0&,+0� ! #2))�02--,/1�

��

�2+ 1&,+�0".2"+ "0��/"�1%"�*,01��0& �#,/*�,#�� 0"/&"0�,#�#2+ 1&,+� �))0�&020"!�1,�-"/#,/*��0".2"+ "�,#�� 1&,+0�,/� ,+9$2/�1&,+�01"-0� �%"�#2+ 1&,+0�1%�1*�("�2-�02 %��+�&+1"/+�)�)�+$2�$"��/"�202�))6�3&0&�)"�,+��0-" &9 �,�'" 1�,/�&+��0-" &9 �0 ,-"�,+)6� 1,��3,&!�1%"�-,))21&,+�,#�1%"�$),��)�+�*"0-� "�

�%&)"�#2+ 1&,+�0".2"+ "0��/"�01/�&$%1#,/4�/!�1,�&*-)"*"+1�&+�*,01�)�+$2�$"0� 1%"60%,4�3�/&,20�)&*&1�1&,+0�&+�-/� 1& "� �+"�02 %�)&*&1�1&,+�&0�1%�1�1%"� ,*-)"1"�&+1"/�#� "�%�0�1,��"�&*-)"*"+1"!��1��0&+$)"�-)� "� �2/1%"/*,/"� &#�1%"�)�+$2�$"�&0�20"!�1,!"0 /&�"�0,*"1%&+$�*,/"�1%�+��:�1�)&01� 1%"+�1%"�&*-)"*"+1�1&,+�+""!0�1,�(""-�1/� (,#�� ,+1"51� �%"��3��"5�*-)"�&+��,4)"/80�� ,,(�20"0��/�&1/�/6�4%&1"0-� "0�1,3&02�)&7"�1%"�+"01&+$� 1%&0�%�0�%,4"3"/�+,�*"�+&+$�#,/�1%"�)�+$2�$"�&10")#�

��

��

��

� ��

��

��

�%"��*�))1�)(�-/,$/�**&+$�)�+$2�$"�-/,3&!"0��2+&.2"�06+1� 1& �)� ,+01/2 1� �))"!�� 1%�1��)),40�,+"�1,�0"+!�*2)1&-)"�*"00�$"0�1,�1%"�0�*"�/" "&3"/� �%&0�*�("01%"�#2+ 1&,+�0".2"+ "�� ,**,+�-�11"/+�&+��*�))1�)(� �%"��$/&11"�*"1�*,!")

��


Embedding approaches

55


Taxonomy for pidgin, creole and argot embedded languages

��

��

�� &��')&�)�$$ %��#�%�,�� *��&%��)%��. +��+��&)$��%��*+),�+,)�&��')&�)�$ +� *�+0' ��##0�*'�� 3��,* %��*�+�&��),#�*��##��+��)�$$�)� �� &��')&�)�$$ %��#�%�,��*�) ��*�+��$��% %��&��')&�)�$ +��%��*'�� 3��,* %��-�) &,*�+��% (,�*� $&*+�&�+�%� +� *�� -�%�+�)&,��&�,$�%+�+ &%� �)��)�%�� $'#�$�%+�+ &%� &)��%&+�+ &%�#�*�$�%+ �*� �&��+��)�*0%+�/��%��*�$�%+ ��3%��')&�)�$$ %��#�%�,�� ++� ��

�� &��')&�)�$$ %��#�%�,�� *�%&+�&%#0�� -�%��0� +*�*0%+�/� �,+��#*&+�)&,��*&�+.�)��# �)�) �*� ��+�%��*+�%��)��# �)�)0��&)$*�+��&�0�&��.&)�*�,*�� %��')&�)�$$ %��#�%�,�� + &%�#�# �)�) �*�')&- ��*�)- ��*��&)�*'�� 0�+�*"*��*�+�&��+ -��# �)�) �*��3%�*�+��-&��,#�)0��-�#&'�)��%�,*��

��-��3%��+�/&%&$0�&�� )�%+�+0'�*�&��$��#�%�,��*��%��.��*�*�**��&.�.�##��/ *+ %��'')&��*�*,''&)+�+�� )��-�#&'$�%+��%�� %+��)�+ &%�. +��/ *+ %��+&&#*� ��-��&'+��+�)$ %&#&�0��)&$�%�+,)�#�#�%�,��*��%# *+�� %��#� �� %�+��&$� %�&��%�+,)�#�#�%�,�� 1' �� %2� *�1��)�$$�+ ��##0�* $�'# 3��&)$�&��#�%�,��,*��&)��&$$,% ��+ &%��+.��%�'�&'#��%&+�*��) %��&$$&%�#�%�,��2 ��1�)�&#�2� *�1��$&+��)�+&%�,��&)$��)&$�+��&%+��+�&��+.&#�%�,��*�+�)&,��%��)# �)�' �� %�*+��2 �%�1�)�&+2� *��1!�)�&%�&)�*#�%��&��'�)+ �,#�)��)&,'�&)��#�**2 ��.�##��%��+��

��

��

��

��

� � � ! " " � ' �� %� *��* $'# 3��&)$�&��+��&*+�#�%�,�� +� %+)&�,��*��%�.�-&��,#�)0��%��%�.�*�$�%+ �*�+&�+��&��

�� " " " � �)�&#��%��*�+��*0%+�/�&��+��&*+�#�%�,��%��+��)��&)��#*&�+��-&��,#�)0��%��3%�*�%�.�*�$�%+ �*�

�� ! ! " �%��)�&+��%��*�+��*�$�%+ �*�&��+��/ *+ %��#�%�,��. +��&,+��+ %�� +*�*0%+�/�

�� /&%&$0��&)�' �� %� �)�&#��%��)�&+��$��#�%�,��*�

�� ' �� %��%�*�+��*0%+�/�&��+��&*+�#�%�,��+&��/+�%�� +*�*�$�%+ �*��' %�## *� �� *�" %��&��$��#�%�,��)�,*�*��# $ +��'�)+�&��+��&*+�*0%+�/��%��&$� %�*� +�. +��%�.�-&��,#�)0� �%�+�� )�* $'#�*+��&)$' �� %*��%�� $'#�$�%+��0� %+�)')�+ %��&*+�#�%�,��+,)�*� *,��*# +�)�#��))�0*�&)�*+) %�*� � .�##�"%&.%��/�$'#�� *�+��&)$�+�*+) %��&��+�� ,%�+ &%� %�+��*+�%��)�� # �)�)0�


The remaining combinations

��

�� !0#-*#�',20-"3!#1��!-+.*#2#*7�,#5�17,2�6� 7�"#:,',%�'21�-5,�%0�++�0�,"��!312-+�20�,1$-0+�2'-,�2-�2&#�&-12�*�,%3�%#�2&�2�"#:,#1�2&#�1#+�,2'!1��-0�#6�+.*#� �� #'(#0 �� !-+ ',#1��5'2&��!-,4#,'#,2�17,2�62-��!!#11�0#*�2'-,�*�"�2� �1#1��,"�.0-!#11��*1-�.�01#0�%#,#0�2-01�13!&�1�� 00� �� !�,� #�!-,1'"#0#"�!0#-*#1� �*2&-3%&�2&#7��0#�+-12*7'+.*#+#,2#"�#62#0,�*�2-�2&#�&-12�*�,%3�%#�

�� ,��0%-2�31#1�2&#�#6'12',%�&-12�*�,%3�%#�17,2�6� 32�!&�,%#1�'21�1#+�,2'!1��,��0%-2�0#',2#0.0#21�2&#�1#+�,2'!1�-$�4�*'"�&-12�*�,%3�%#�!-"#� 5&#0#�1�.'"�%',�!-"#�'1�-,*7�17,2�!2'!�**7�!-00#!2�&-12�!-"#�9�'2�&�1�+#�,',%�-,*7�$-02&#�.'"%',� �0%-21��0#�!-++-,*7�31#"�2-�'+.*#+#,2�,#5�!0-11!322',%�*�,�%3�%#�$#�230#1� 13!&��1�20�,1�!2'-,�*�+#+-07�-0�!-,2',3�2'-,�.�11',%�127*#�5'2&-32�2&#�*�,%3�%#�31#0�,##"',%�2-� #��5�0#�-$�2&#�!&�,%#� ��!0-�1712#+1�,"��1.#!2�-0'#,2#"�$0�+#5-0)1��0#�5#**�),-5,�+#!&�,'1+1�$-0�!&�,%',%2&#� #&�4'-0�-$�2&#�&-12�*�,%3�%#�5'2&-32�2-3!&',%�'21�17,2�6�

�,�#+ #""#"�*�,%3�%#�+312�#'2&#0�',20-"3!#�,#5�17,2�6�2-�2&#�&-12�*�,%3�%#�$-02&#�!-,!#.21�'2�',20-"3!#1��!0#-*#�� -0�'2�+312��"-.2�2&#�&-12�17,2�6��1�'1� �$�2&#�&-1217,2�6�'1�0#31#"� '2�+312�#'2&#0� #�-4#0*-�"#"� 0#',2#0.0#2',%�2&#�17,2�6�',��,-4#*5�7��.'"%',�� -0�'2�+312��*2#0�2&#�1#+�,2'!1�-$�2&#�&-12��0%-2��

�5-�-2&#0�.-11' *#�!-+ ',�2'-,1�-$�2&#��220' 32#1�-$�-30�2�6-,-+7��0#�*'12#"�', �� *# �� &'*#�',2#0,�*�*�,%3�%#1��0#��1.#!'�*�!�1#�-$�#+ #""#"�*�,%3�%#1�2&#727.'!�**7�"-�,-2�0#/3'0#�1.#!'�*�2--*�13..-02��,"�2&31��0#�,-2�',�2&#�$-!31�-$�2&'1"'11#02�2'-,� �302&#0+-0#� 5#�2�)#�2&#�&-12�*�,%3�%#��1�%'4#,�

��

��

��

��

�� ! " ! �,2#0,�*�*�,%3�%#1��0#�!-,:,#"� 7�2&#�&-12�17,2�6�2&#7�+�)#��!0#�2'4#�31#�-$�2&#�&-12�*�,%3�%#��,""#:,#��,#5�4-!� 3*�07�

�� ! ! ! �&#�&-12�*�,%3�%#�'1��%#,#0�*�.30.-1#�.0-%0�+�+',%�*�,%3�%#�27.'!�**7�$-**-5',%��*�,%3�%#�12�,�"�0"� �-12�*�,%3�%#1�&�4#�:6#"�17,2�6��,"�1#�+�,2'!1�

�� &#�0#+�',',%�.-11' *#�!-+ ',�2'-,1�-$ �� *# ��

�#��0%3#�2&�2�2&#�!�2#%-0'8�2'-,�-$�#+ #""#"�*�,%3�%#1�',2-�.'"%',� !0#-*#��,"�0%-2�*�,%3�%#1�'1�!-+.*#2#� �1'"#�$0-+�',2#0,�*��,"�&-12�*�,%3�%#1� 2&#�0#+�',�',%�-2&#0�!-+ ',�2'-,1�!&�,%#�2&#�17,2�6� 32�*#2�2&#�1#+�,2'!1�3,!&�,%#"� �&'1

�

(New syntax w/o semantics does not make sense.)


SLE meets NLP

In the domain of natural language, a “pidgin” is “a grammatically simplified form of a language used for communication between people not sharing a common language”; a “creole” is “a mother tongue formed from the contact of two languages through an earlier pidgin stage”; an “argot” is a “jargon or slang of a particular group or class” [Jewell and Abate, 2005].


Heterogenous language integration

59

‣ At least one of the languages (metalanguage or DSL) is largely, or completely ignorant of the existence of the other languages.

‣ Examples– Preprocessors

– Stratego/XT, TXL



Homogenous language integration

60

‣ All languages are specifically designed to work with each other.

‣ Examples– LISP/Scheme Macros, Template Haskell

– Nemerle, Metalua (Lua), xTc (C)

– Converge, Diesel



Multiple context-dependent languages

Switching between different languages should be possible at arbitrary points and not enforce the use of special syntactic markers. It should be practicable to mix and match different language extensions and the host language. Language changes should be scopable at a fine-grained level, to make it possible to use several otherwise conflicting language extensions in the same compilation unit.

Source: “Embedding Languages Without Breaking Tools” by Lukas Renggli, Tudor Girba, Oscar Nierstrasz, ECOOP 2010.


Homogeneous tool integration

A uniform tool set is important to software engineers. To ease the development and use of embedded languages all development activities should happen in the same familiar programming environment of the host language. No special code browser, editors, debuggers or source control should be necessary; all existing tools should continue to work transparently with different languages. To facilitate debugging a precise bi- directional connection between the original source, the various transformation stages and the final executable code should be maintained.



Wanted!

63


Multiple context-dependent languages

Homogenous tool integration (editors, debuggers, etc.)


Table 2. Comparison of di�erent systems for embedded language authoring. Weindicate custom host languages with parentheses.

Type

System

Host

Language

Pidgin

Cre

ole

Arg

ot

Multiple

Languages

HomogeneousTools

HomogeneousLanguage

Helvetia Smalltalk X X X X X X

ExtensibleCompilers

Java Annotation Processing Java X · X X · XableJ Java X X X X · ·Dryad Java X X X X · ·JastAddJ Java X X X X · ·Polyglot Java X X X X · XXoc C X X X X · ·

MetaProgrammingSystems

Cola (various) · X · X · XConverge (Python) · X · X · XMetaOCaml OCaml X · X X · XScheme Scheme X · X X X X

LanguageWorkbenches

JetBrains MPS (Java) · X · X X XIntentional Software (C#) ? X ? ? X ?openArchitectureWare Java · X · · · ·Java Development Tools Java · · · · X ·IDE Metatooling Platform Java · · · · X ·WholePlatform Java · X · · · ·Katahdin (C#) · X · X · XCeteva XMF (Java) · X · X · X

LanguageTransformationSystems

Khepera C X X X · · ·MontiCore Java · X · · · ·MetaBorg X X X X · ·

2.2 Meta-Programming Systems

Meta-Programming Systems are programming languages that come with meta-programming facilities targeted at code generation.

Cola [14] implements an open object model for experimentation with di�erentprogramming paradigms. Cola is bootstrapped in itself using a Smalltalk-likelanguage called Pepsi. Jolt is a Lisp-like language that serves as a commonabstract syntax and executable representation for other languages. OMeta [15] isan object-oriented pattern matcher based on Parsing Expression Grammars thatis used to transform new languages to Jolt. Even if all languages are built on


Type

System

Host

Language

Pidgin

Cre

ole

Arg

ot

Multiple

Languages

HomogeneousTools

HomogeneousLanguage


ExtensibleCompilers




LanguageWorkbenches










Type

System

Host

Language

Pidgin

Cre

ole

Arg

ot

Multiple

Languages

HomogeneousTools

HomogeneousLanguage


ExtensibleCompilers




LanguageWorkbenches








Type

System

Host

Language

Pidgin

Cre

ole

Arg

ot

Multiple

Languages

HomogeneousTools

HomogeneousLanguage


ExtensibleCompilers




LanguageWorkbenches








Extensible Compilers are best described as open toolboxes that provide entry points into the tool chain to extend and change the host language. Meta-Programming Systems are programming languages that come with metaprogramming facilities targeted at code generation.



Type

System

Host

Language

Pidgin

Cre

ole

Arg

ot

Multiple

Languages

HomogeneousTools

HomogeneousLanguage


ExtensibleCompilers




LanguageWorkbenches








Type

System

Host

Language

Pidgin

Cre

ole

Arg

ot

Multiple

Languages

HomogeneousTools

HomogeneousLanguage


ExtensibleCompilers




LanguageWorkbenches








Language Workbenches are characterized by a specialized IDE with a well-defined workflow to specify and use different languages. Language designers are required to follow clearly defined steps to describe syntax, semantics and editor behavior of a new language. Language transformation systems define languages through the transformation and composition of language models.


Homogenous embedding with HELVETIA

67

HELVETIA is an extensible system that intercepts the compilation pipeline of the Smalltalk host language to seamlessly integrate language extensions.




2. The specification of the cells and their content is repetitive and rather hardto read.

3. The instantiation of di�erent shapes is cumbersome as in this case the hostlanguage syntax is rather verbose.

The following code addresses these issues:

row = grow.row = fill.column = grow.column = fill.(1 , 1) = label

text: [ :each | each name ];borderColor: #black;borderWidth: 1.

(1 , 2) - (2 , 1) = rectangleborderColor: #black;borderWidth: 1;width: 200;height: 100.

Listing 2. Pidgin: Eliminating syntactic noise.

While the above code is syntactically valid and is parsed by the standardSmalltalk parser, it is not semantically valid. For example numbers do notimplement a , method, and column is an unknown variable.

In our implementation (described in Section 4.1), the above pidgin exampleis transformed transparently into the code from Listing 1. This kind of transfor-mation simplifies the amount and complexity of source code significantly. Thetransformation is specified at the AST level using two transformation rules thatare applied by the compiler after parsing.

3.2 A Creole: Mondrian

The pidgin shows an improvement over the original Smalltalk code, but our goalis to obtain an even more concise CSS-like language as in the listing below:

shape {cols: #grow, #fill;rows: #grow, #fill;

}label {

position: 1 , 1;text: [ :each | each name ];borderColor: #black;borderWidth: 1;

}rectangle {

position: 1 , 2;colspan: 2;













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


5.1 Homogeneous Language Integration

Rules

<parse> <transform> <attribute>

SourceCode

SmalltalkParser

SemanticAnalysis

BytecodeGenerator

ExecutableCode

Traditional Smalltalk Compiler

Pidgin

CreoleArgot

Fig. 3. The code compilation pipeline showing multiple interception paths.

In Smalltalk the unit of compilation is the method. Whenever a method issaved, it automatically triggers the Smalltalk compiler within the same envi-ronment. The source code and compiled methods as well as the compiler areall available at runtime. As seen in Figure 3 we have enhanced the standardcompilation pipeline to be able to intercept the data passed from one compilationstep to the other. We are able to perform source-to-source transformations orto bypass the regular Smalltalk parser altogether. Furthermore we are able toperform AST transformations either before, instead of, or after semantic analysis.

The rules to intercept this transformation pipeline are defined using annotatedmethods. These methods constitute conventional Smalltalk code that is called atcompile time [38]. The interception rules allow us not only to modify data in thepipeline but also to bypass conventional components.

– A rule marked with <parser> allows one to intercept the parsing of the sourcecode. The result of a parser rule can be either a new source string (in case ofa source-to-source transformation) or a Smalltalk AST (in which the originalSmalltalk parser is skipped).

– A rule marked with <transform> is performed on the AST after parsing andbefore semantic analysis. It allows developers to apply arbitrary transforma-tions on the AST. Furthermore, it is possible to change the default semanticanalysis and instead perform a custom one.

– A rule marked with <attribute> is performed after symbol resolution andbefore bytecode generation. This makes it possible to perform transformationson the attributed AST as well.

Compilation errors are handled by the standard toolchain. Since all datapassed from one step to the next carries information on its original source location,

The code compilation pipeline showing multiple interception paths.


Helvetia Pros

‣ Tools continue to work

‣ Homogenous Embedding

‣ Relatively simple implementation

‣ Incremental development of DSLs


Helvetia Cons

‣ Pidgin DSL is constrained by syntax

‣ Creole DSL requires you to write a parser and specify a transformation

‣ Language definitions are in “scripts”, there is no real model behind it

Pidgin

In the domain of natural language, a pidgin is a grammatically simplified form of a language used for communication between people not sharing a common language.


Floating Point Numbers

digit = "0" | "1" | ... | "9" ; number = [ "-" ] digit { digit } [ "." digit { digit } ] ;


JParsec Parser (plain internal DSL)

final Pattern digit = Patterns.range('0', '9');

final Pattern number = Patterns.seq(Patterns.isChar('-').optional(), digit.many(1), Patterns.seq(Patterns.isChar('.'), digit.many(1)).optional());


PetitParser (Advanced internal DSL)

digit ^ $0 asParser / $1 asParser / ... / $9 asParser

number ^ $- asParser optional , self digit plus , ($. asParser , self digit plus) optional


Pidgin PetitParser (Proper embedding)

digit $0 / $1 / ... / $9

number $- optional , digit plus , ($. , digit plus) optional


Pidgin transformations

DSLTreePattern new expression: '`#literal' do: [ :ast | ``(`,(ast) asParser) ]

DSLTreePattern new expression: '`variable' do: [ :ast | ``(self `,(ast name)) ]

DSLTreePattern new expression: '`.statement' do: [ :ast | ``(^ `,(ast statement)) ]

Creole

In the domain of natural language, a creole is a mother tongue formed from the contract of two languages through an earlier pidgin stage.

Creole PetitParser

digit = "0" | "1" | ... | "9" ; number = [ "-" ] digit { digit } [ "." digit { digit } ] ;

EBNF Parser

production = identifier "=" choice ";" ; choice = sequence { "/" sequence } ;sequence = element { element } ;element = option / repetition / literal / identifier ;option = "[" choice "]" ;repetition = "{" choice "}" ;literal = '"' string '"' / "'" string "'" ;


EBNFParser

EBNFCompiler EBNFHighlighter

production: Object

choice: Object

option: Object

repetition: Object

...

option: ASTNode

repetition: ASTNode

...

literal: Color

identifier: Color

option = "[" choice "]" ;

super repetition ==> [ :ast | ``(`,(ast second) optional) ]


Homogenous tool integration













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


Traditional Smalltalk debugger with language specific syntax highlighting stepping through an EBNF for parsing.



Quasiquoting (needed for metaprogramming)

82


http://101companies.org/wiki/Quasi-quotation

http://101companies.org/wiki/Quasi-quotation



Quasiquote Meta Level

Sourcecode Base Level

Quasiquote `` Unquote `,


Lisp Scheme

Template Haskell

QuasiquoteSmalltalk

Quote `

Quasiquote `` [| ... |] ``

Unquote , ${ ... } `,

Splice ,@ $< ... > `@


Quasiquote example (Compute code for x^3)

raise: aNode to: anInteger anInteger = 0 ifTrue: [ ^ ``1 ]. anInteger = 1 ifTrue: [ ^ aNode ]. ^ ``(`,(self raise: aNode to: anInteger - 1) * `,aNode)

power3: aNumber ^ `@(self raise: ``aNumber to: 3)

aNode would be an AST node for an expression.


Quasiquote decompiled

raise: aNode to: anInteger anInteger = 0 ifTrue: [ ^ RBLiteralNode value: 1 ]. anInteger = 1 ifTrue: [ ^ aNode ]. ^ RBMessageNode receiver: (self raise: aNode to: anInteger - 1) selector: #* arguments: (Array with: aNode)

power3: aNumber ^ aNumber * aNumber * aNumber


More HELVETIA examples

87


HELVETIA is an extensible system that intercepts the compilation pipeline of the Smalltalk host language to seamlessly integrate language extensions.



A UML package shape in Mondrian.

Package Name

(1, 2) (2, 2)

(2, 1)y = 1

y = 2

x = 1 x = 2

Fig. 1. A UML package shape in Mondrian.

aBuilder row grow. " defines row sizing "aBuilder row fill.

aBuilder column grow. " define column sizing "aBuilder column fill.

aBuilder x: 1 y: 1 add: (LabelShape new " define the cells "text: [ :each | each name ];borderColor: #black;borderWidth: 1;yourself).

aBuilder x: 1 y: 2 w: 2 h: 1 add: (RectangleShape newborderColor: #black;borderWidth: 1;width: 200;height: 100;yourself)

Listing 1. Traditional Mondrian Forms Builder API.

Mondrian provides an internal DSL that o�ers a high-level interface forcomposing visualizations. While it makes the composition easy, there is still aconsiderable amount of syntactic noise that makes the script hard to read.

We incrementally bend the syntax of the host language towards a moresuitable DSL, first by creating a pidgin, and then by creating a creole. Our finalgoal is to be able to define the visualizations using a simple syntax resemblingcascading style-sheets (CSS), which o�ers a compact notation to programmersand designers to declaratively specify layout and design of web sites.

If we have a look at the code in Listing 1 we see that the noise is causedby certain semantic elements that are required to make this example run asSmalltalk code conforming to the original Mondrian API. We discover threethings that are repetitive and that could be simplified:

1. The variable aBuilder is referenced in every rule as an entry point to constructand configure the di�erent parts of the forms.

Package Name

(1, 2) (2, 2)

(2, 1)y = 1

y = 2

x = 1 x = 2











Traditional Mondrian Forms Builder API(Example of internal DSL)



Package Name

(1, 2) (2, 2)

(2, 1)y = 1

y = 2

x = 1 x = 2











Issues with the code

1. The variable aBuilder is referenced in every rule as an entry point to construct and configure the different parts of the forms.2. The specification of the cells and their content is repetitive and rather hard to read.3. The instantiation of different shapes is cumbersome as in this case the host language syntax is rather verbose.














}label {


}rectangle {


90



Package Name

(1, 2) (2, 2)

(2, 1)y = 1

y = 2

x = 1 x = 2











Pidgin: Eliminating syntactic noise

While the above code is syntactically valid and is parsed by the standard Smalltalk parser, it is not semantically valid. For example numbers do not implement a , method, and column is an unknown variable.



The transformation for the pidgin













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


4 Specifying Embedded Languages with

In this section we present the specifications of the three examples given in theprevious section.

4.1 Specifying the Mondrian Pidgin

The syntax of the Mondrian pidgin can be parsed by the traditional parser of theSmalltalk host language. However, we need to apply several transformations toget the semantics right. We define a set of transformation rules that are appliedby Helvetia after parsing the code from Listing 2:

1

2 <transform>3 ^ TreeRule new4 expression: 'row = �@expr';5 expression: 'column = �@expr';6 action: [ :ast |7 ast swapWith: ��(aBuilder8 �,(ast receiver)9 �,(ast at: '�@expr')) ]

10

11

12 <transform>13 ^ TreeRule new14 expression: '(�@x , �@y) = �@expr';15 expression: '(�@x , �@y) - (�@w , �@h) = �@expr';16 action: [ :ast |17 ast swapWith: ��(aBuilder18 x: �,(ast at: '�@x')19 y: �,(ast at: '�@y')20 w: �,(ast at: '�@w' ifAbsent: [ 1 ])21 h: �,(ast at: '�@h' ifAbsent: [ 1 ])22 add: ��(�,(Shapes at: (context at: '�var') name)23 new �,(ast at: '�@expr'))) ]

The transformation rules are split into two methods. Each of these methodsis tagged with the method annotation <transform> (lines 2 and 12), so that thecompiler knows that it has to apply these transformations before performingsemantic analysis. Each rule consists of two match expressions (lines 4–5 and14–15) to find particular parse-tree nodes. This functionality is part of theRefactoring Engine [34] and is provided by the host environment. In our contextthese patterns match the specific constructs we introduced in Listing 2.

The action blocks (lines 6–9 and 16–23) perform a transformation on thematched AST node. For example the first action block transforms expressionsof the form row = grow into aBuilder row grow. It does so by using a syntacticextension of Smalltalk with partial evaluation [35]. Everything that follows thequasiquote meta-character �� is delayed in execution and represents the AST of

The transformation rules are split into two methods. Each of these methods is tagged with the method annotation <transform> (lines 2 and 12), so that the compiler knows that it has to apply these transformations before performing semantic analysis. Each rule consists of two match expressions (lines 4–5 and 14–15) to find particular parse-tree nodes. This functionality is part of the Refactoring Engine [...] and is provided by the host environment. [...] these patterns match the specific constructs we introduced [...].



The transformation for the pidgin













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


4 Specifying Embedded Languages with

In this section we present the specifications of the three examples given in theprevious section.

4.1 Specifying the Mondrian Pidgin

The syntax of the Mondrian pidgin can be parsed by the traditional parser of theSmalltalk host language. However, we need to apply several transformations toget the semantics right. We define a set of transformation rules that are appliedby Helvetia after parsing the code from Listing 2:

1

2 <transform>3 ^ TreeRule new4 expression: 'row = �@expr';5 expression: 'column = �@expr';6 action: [ :ast |7 ast swapWith: ��(aBuilder8 �,(ast receiver)9 �,(ast at: '�@expr')) ]

10

11

12 <transform>13 ^ TreeRule new14 expression: '(�@x , �@y) = �@expr';15 expression: '(�@x , �@y) - (�@w , �@h) = �@expr';16 action: [ :ast |17 ast swapWith: ��(aBuilder18 x: �,(ast at: '�@x')19 y: �,(ast at: '�@y')20 w: �,(ast at: '�@w' ifAbsent: [ 1 ])21 h: �,(ast at: '�@h' ifAbsent: [ 1 ])22 add: ��(�,(Shapes at: (context at: '�var') name)23 new �,(ast at: '�@expr'))) ]

The transformation rules are split into two methods. Each of these methodsis tagged with the method annotation <transform> (lines 2 and 12), so that thecompiler knows that it has to apply these transformations before performingsemantic analysis. Each rule consists of two match expressions (lines 4–5 and14–15) to find particular parse-tree nodes. This functionality is part of theRefactoring Engine [34] and is provided by the host environment. In our contextthese patterns match the specific constructs we introduced in Listing 2.

The action blocks (lines 6–9 and 16–23) perform a transformation on thematched AST node. For example the first action block transforms expressionsof the form row = grow into aBuilder row grow. It does so by using a syntacticextension of Smalltalk with partial evaluation [35]. Everything that follows thequasiquote meta-character �� is delayed in execution and represents the AST of

The action blocks (lines 6–9 and 16–23) perform a transformation on the matched AST node. For example the first action block transforms expressions of the form row = grow into aBuilder row grow. [...]. Everything that follows the quasiquote meta-character `` is delayed in execution and represents the AST of the enclosed expression at runtime. Similarly everything that follows the unquote meta-character `, is again executed when performing the code and is used to combine smaller delayed values (e.g., from matched AST nodes) to larger ones. A third operator to compile, evaluate and splice in the result at compile-time is available too, but not used in the examples of this paper.














}label {


}rectangle {

position: 1 , 2;colspan: 2;borderColor: #black;borderWidth: 1;width: 200;height: 100;

}

Listing 3. Creole: A CSS-like syntax.

The code above does not follow Smalltalk syntax. At this point, the assumptionof a pidgin relying on the host syntax starts to get in our way.

The solution is to allow the definition of a new parser that handles the creolesyntax. We typically also want to integrate the new language constructs withthe host language or with other language constructs. In our example, the codetext: [ :each | each name ] provides such a case in which we parameterize theshape specification with a Smalltalk expression.

As shown in Section 4.2, Helvetia o�ers a mechanism for writing a customparser that can also include productions external to the language at hand. Likethis we can accommodate any syntax.

3.3 An Argot: Transactional Memory

In our previous work [31] we have presented a solution for introducing softwaretransactional memory (STM) [32,33] at the language level of dynamic program-ming languages without requiring changes to the virtual machine. We achievedthis by patching the compiler. As a result we were able to make all applications,libraries and system code transaction-aware. However since our changes at thecompiler level were rather ad hoc, we lost the ability to use the debugger withina transaction.

Introducing STM into an existing language provides a concrete use case forchanging the execution semantics without changing the syntax of the language.A piece of library code that is used as part of transactional code should continueto work without requiring any adaptation. Unlike a pidgin, which bends the hostsyntax in ways that break the semantics, an argot more subtly reinterprets thesemantics of otherwise valid code.

In the example below the global counter value is incremented by 1. The codedoes not ensure mutual exclusion, thus it might happen that some of the updatesare lost when multiple threads run this method concurrently.

value := value + anInteger

When running the above method from within a transaction, the change isdeferred to the end of the transaction, instead of incrementing the variableimmediately. This allows the system to check for conflicts and revert the changesif necessary. Thus, even if the source code looks exactly the same, its behav-ior changes. We describe the transformations applied to transactional code inSection 4.3.

93



Package Name

(1, 2) (2, 2)

(2, 1)y = 1

y = 2

x = 1 x = 2











Creole: A CSS-like syntax













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


The code above does not follow Smalltalk syntax. At this point, the assumption of a pidgin relying on the host syntax starts to get in our way.

The solution is to allow the definition of a new parser that handles the creole syntax. We typically also want to integrate the new language constructs with the host language or with other language constructs. In our example, the code text: [ :each | each name ] provides such a case in which we parameterize the shape specification with a Smalltalk expression.



Parser of the Mondrian Creole













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


the enclosed expression at runtime. Similarly everything that follows the unquotemeta-character �, is again executed when performing the code and is used tocombine smaller delayed values (e.g., from matched AST nodes) to larger ones.A third operator to compile, evaluate and splice in the result at compile-time isavailable too, but not used in the examples of this paper.

The two mentioned methods are all that is needed to implement the Mondrianpidgin. The swapWith: method call replaces the matched AST node with the newcode. Since all AST nodes carry information about their original source origin,a debugger is able to step through and properly highlight the recomposed codefragments. Newly generated code is marked as hidden, so that the user of thepidgin does not see it in the debugger.

4.2 Specifying the Mondrian Creole

In contrast to a pidgin, a creole requires a custom parser and Helvetia o�ersthe possibility to define one. For example, for the creole we presented in Listing 3we define the following grammar rules defined as individual methods of the classCSSParser:

= { rule }= selector "{" declarations "}"

= #identifier= declaration { ";" declaration }

= #keywordMessage

This grammar looks very similar to Extended Backus-Naur Form (EBNF)[36]. In fact, it is a DSL for parser generators implemented in Helvetia. Asan extension to EBNF we allow productions to reference grammar rules ofother languages. The name of external grammar rules are prefixed with a hashcharacter #. For example, the CSS selector is simply a Smalltalk identifier, andthe declaration of a property is a keyword message (a Smalltalk method namewith arguments, but without receiver) of the host language.

Next we create a new subclass of CSSParser called CSSTranslator, to reuse theabstract grammar definition and to augment it with productions to transform theparse tree nodes to the host language AST [37]. Again we use quasiquoting to buildthe AST of the host language. Two of CSSTranslator’s parse tree transformationslook like in the following listing. The other grammar productions are similarlydefined.

1

2 ^ super rules ==> [ :ast | ��(buildOn: aBuilder �,ast) ]3

4

5 ^ super rule ==> [ :ast | self transform: (ast at: 'selector')declarations: (ast at: 'declarations') ]

This assigns semantic actions to the productions defined in the superclass.The argument ast is a collection of parse nodes built by the grammar productions

This grammar looks very similar to Extended Backus-Naur Form (EBNF) [...]. In fact, it is a DSL for parser generators implemented in Helvetia. As an extension to EBNF we allow productions to reference grammar rules of other languages. The name of external grammar rules are prefixed with a hash character #. For example, the CSS selector is simply a Smalltalk identifier, and the declaration of a property is a keyword message (a Smalltalk method name with arguments, but without receiver) of the host language.



Translator of the Mondrian Creole













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


the enclosed expression at runtime. Similarly everything that follows the unquotemeta-character �, is again executed when performing the code and is used tocombine smaller delayed values (e.g., from matched AST nodes) to larger ones.A third operator to compile, evaluate and splice in the result at compile-time isavailable too, but not used in the examples of this paper.

The two mentioned methods are all that is needed to implement the Mondrianpidgin. The swapWith: method call replaces the matched AST node with the newcode. Since all AST nodes carry information about their original source origin,a debugger is able to step through and properly highlight the recomposed codefragments. Newly generated code is marked as hidden, so that the user of thepidgin does not see it in the debugger.

4.2 Specifying the Mondrian Creole

In contrast to a pidgin, a creole requires a custom parser and Helvetia o�ersthe possibility to define one. For example, for the creole we presented in Listing 3we define the following grammar rules defined as individual methods of the classCSSParser:

= { rule }= selector "{" declarations "}"

= #identifier= declaration { ";" declaration }

= #keywordMessage

This grammar looks very similar to Extended Backus-Naur Form (EBNF)[36]. In fact, it is a DSL for parser generators implemented in Helvetia. Asan extension to EBNF we allow productions to reference grammar rules ofother languages. The name of external grammar rules are prefixed with a hashcharacter #. For example, the CSS selector is simply a Smalltalk identifier, andthe declaration of a property is a keyword message (a Smalltalk method namewith arguments, but without receiver) of the host language.

Next we create a new subclass of CSSParser called CSSTranslator, to reuse theabstract grammar definition and to augment it with productions to transform theparse tree nodes to the host language AST [37]. Again we use quasiquoting to buildthe AST of the host language. Two of CSSTranslator’s parse tree transformationslook like in the following listing. The other grammar productions are similarlydefined.

1

2 ^ super rules ==> [ :ast | ��(buildOn: aBuilder �,ast) ]3

4

5 ^ super rule ==> [ :ast | self transform: (ast at: 'selector')declarations: (ast at: 'declarations') ]

This assigns semantic actions to the productions defined in the superclass.The argument ast is a collection of parse nodes built by the grammar productions

We create a new subclass of CSSParser called CSSTranslator, to reuse the abstract grammar definition and to augment it with productions to transform the parse tree nodes to the host language AST [...]. Again we use quasiquoting to build the AST of the host language. Two of CSSTranslator’s parse tree transformations look like in the following listing. The other grammar productions are similarly defined.



Parser registration for the Mondrian Creole













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


CSSParser

rules()rule()selector()declarations()declaration()

CSSCompiler

rules()rule()...

CSSHighlighter

rules()rule()...

Fig. 2. The CSS Parser Hierarchy.

of the superclass. Lines 3–4 use quasiquoting to define the method header and toembed the AST nodes of the rules into its body. Lines 7–9 call the helper methodbuild:declarations: with the selector token and a collection of declarationmessages to build a Smalltalk AST.

To tell the system to use our custom parser instead of the default one, we usea method annotation <parse> on the classes where we want to use the customsyntax. The code in Listing 3 is parsed, transformed and eventually compiled tobytecode identical to the methods we manually wrote in Listing 1 and Listing 2.

<parse>^ CSSTranslator

One small problem at this point is that the syntax highlighter in the code editoris broken. As before, we create a new subclass of CSSParser named CSSHighlighter

that underlines the selectors and dispatches to standard Smalltalk highlightingfor the definitions. Again this is achieved by overriding the appropriate methodsof CSSParser. For example, the selector method is defined in CSSHighlighter likethis:

^ super selector ==> [ :ast | ast -> TextEmphasis underlined ]

Using a <highlight> annotation we declare the handler to be responsible forsyntax highlighting of the CSS code:

<highlight>^ CSSHighlighter

Adding a pidgin or creole over an existing framework can simplify its use andreduce a considerable amount of syntactic noise. Without touching the original



Highlighting for the Mondrian Creole













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


CSSParser


CSSCompiler

rules()rule()...

CSSHighlighter

rules()rule()...











CSSParser


CSSCompiler

rules()rule()...

CSSHighlighter

rules()rule()...













Homogenous tool integration













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


Traditional Smalltalk debugger with language specific syntax highlighting stepping through a mixture of Smalltalk and the Mondrian creole.

the error location is determined automatically and it is revealed to the the userthrough the traditional means of the compiler. For example, when a variableis undeclared, its occurrence is highlighted and the user is asked to correct theproblem.

5.2 Homogeneous Tool Integration

To control syntax highlighting, we use the rule database to change the defaulthighlighting. The traditional Smalltalk syntax highlighter is only applied tonormal Smalltalk methods. As soon as there is a custom parser involved, thea�ected part of the source code remains black unless a custom highlighteris provided. The annotation <highlight> is used to define a highlighting rule.Figure 4 shows the result of the custom highlighter in the source pane of thedebugger. Helvetia provides similar extension points for code-completion andcontextual menus.

Fig. 4. Traditional Smalltalk debugger with language specific syntax highlightingstepping through a mixture of Smalltalk and the creole defined in Section 3.2.

The lack of dedicated tools to find and fix bugs in a new language is one ofthe major drawbacks when designing and using embedded languages. Since ourapproach uses the code abstraction of the host language, the standard debuggingtools continue to work. One can set breakpoints as in a conventional methods.Stepping through code written in a mixture of languages poses no problem either.The AST of a debugged method carries information about the source range inthe original code. Generated code either reuses the source ranges of the parentnode, or has no source range and is therefore invisible in the debugger. With thisinformation from the original AST the debugger is able to accurately highlight



An Argot: Transactional Memory













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


borderColor: #black;borderWidth: 1;width: 200;height: 100;

}

Listing 3. Creole: A CSS-like syntax.

The code above does not follow Smalltalk syntax. At this point, the assumptionof a pidgin relying on the host syntax starts to get in our way.

The solution is to allow the definition of a new parser that handles the creolesyntax. We typically also want to integrate the new language constructs withthe host language or with other language constructs. In our example, the codetext: [ :each | each name ] provides such a case in which we parameterize theshape specification with a Smalltalk expression.

As shown in Section 4.2, Helvetia o�ers a mechanism for writing a customparser that can also include productions external to the language at hand. Likethis we can accommodate any syntax.

3.3 An Argot: Transactional Memory

In our previous work [31] we have presented a solution for introducing softwaretransactional memory (STM) [32,33] at the language level of dynamic program-ming languages without requiring changes to the virtual machine. We achievedthis by patching the compiler. As a result we were able to make all applications,libraries and system code transaction-aware. However since our changes at thecompiler level were rather ad hoc, we lost the ability to use the debugger withina transaction.

Introducing STM into an existing language provides a concrete use case forchanging the execution semantics without changing the syntax of the language.A piece of library code that is used as part of transactional code should continueto work without requiring any adaptation. Unlike a pidgin, which bends the hostsyntax in ways that break the semantics, an argot more subtly reinterprets thesemantics of otherwise valid code.

In the example below the global counter value is incremented by 1. The codedoes not ensure mutual exclusion, thus it might happen that some of the updatesare lost when multiple threads run this method concurrently.

value := value + anInteger

When running the above method from within a transaction, the change isdeferred to the end of the transaction, instead of incrementing the variableimmediately. This allows the system to check for conflicts and revert the changesif necessary. Thus, even if the source code looks exactly the same, its behav-ior changes. We describe the transformations applied to transactional code inSection 4.3.

Revise semantics of read/write access use STM for consistent

access in parallel code.

When running the above method from within a transaction, the change is deferred to the end of the transaction, instead of incrementing the variable immediately. This allows the system to check for conflicts and revert the changes if necessary. Thus, even if the source code looks exactly the same, its behavior changes.



Specifying the Transactional Memory Argot













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


framework we are able to provide di�erent language skins that might be anappealing alternative to the internal DSL that was used before.

The changes shown in this section are all that is needed to also a�ect thedebugger. Figure 4 shows a live result of stepping through the execution of thescript building the UML package shape.

4.3 Specifying the Transactional Memory Argot

In a nutshell, our software transactional memory implementation works as follows.We compile every method in the system twice, once for the transactional andonce for the non-transactional context. On the transactional code we applytwo transformations: (1) all state access is reified to be dispatched through thetransactional context, and (2) method names and method sends are prefixedwith __atomic__. Furthermore we use method annotations to disable or customizethese transformations in certain places, such as when primitive code is called orin the transactional infrastructure itself. A transaction is started by assigninga transaction manager to a thread-local variable and by calling an __atomic__

method. At the end of a transaction the cached changes are atomically checkedfor conflicts and applied to the involved objects. The transaction boundaries arehandled at the language level using the reflective facilities of the host language.For details on the implementation of the semantic model please refer to ourprevious work [31].

Through the use of Helvetia the argot specification becomes simple. Thecomplete set of transformation rules are presented below:

1

2 <attribute>3 ^ ConditionRule new4 if: [ :context | context isTransactional ]5 then: (TreeRule new6 expression: '�@receiver �@msg: �@args' do: [ :ast |7 ast swapWith: ��(�,(ast at: '�@receiver')8 �,('__atomic__' , (ast at: '�@msg:'))9 �,(ast at: '�@args')) ];

10 expression: '�var := �@expr' do: [ :ast |11 ast swapWith: ��(self12 atomicInstVarAt: �,(ast binding index)13 put: �,(ast at: '�@expr')) ];14 expression: '�var' do: [ :ast |15 ast swapWith: ��(self16 atomicInstVarAt: �,(ast binding index)) ])

The code uses the <attribute> method annotation (line 2), to tell the com-piler that the rules are expected to run after the symbols have been resolved(attributed). Line 4 makes sure that the transformation is only performed whencompiling code for the transactional context. Lines 5–16 implement the actualtransformations, exemplified in the table below:

The code uses the <attribute> method annotation (line 2), to tell the compiler that the rules are expected to run after the symbols have been resolved (attributed). Line 4 makes sure that the transformation is only performed when compiling code for the transactional context. Lines 5–16 implement the actual transformations.



Registration of the argot













}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {














}label {


}rectangle {


6–9 Transform Message Sendsself printString � self __atomic__printString

10–13 Transform Instance Variable Writevalue := 'Atomic' � self atomicInstVarAt: 2 put: 'Atomic'

14–16 Transform Instance Variable Readvalue � self atomicInstVarAt: 2

All message sends are prepended with __atomic__ to ensure that the executionstays in the atomic context. All state accesses, such as instance variable reads andwrites, are transformed to message sends and dispatched through the transactionmanager. This allows us to delay modifications to objects, so that the changesare only visible within the current transaction. The number 2 in the examplesabove refer to the index of the named instance variable value. This slot index isretrieved from the attributed AST.

To trigger the compilation of a transactional and a non-transaction version ofevery method we hook into the parser using the <parser> annotation. We copythe compilation context and spawn a new compilation path for the transactioncontext. (Details follow in Section 5.)

<parser>aContext isTransactional ifFalse: [

aContext copybeTransactional;perform ].

^ nil

The implementation of transactional memory using Helvetia has numerousadvantages over our previous implementation: The code base is not integratedinto the compiler in an ad hoc manner anymore, but through clearly definedHelvetia extension points. In our previous implementation, we directly patchedin several places the existing compiler. Following the support of Helvetia, thetransformations are specified at a single place that is external to the compiler. As aconsequence the code is nicely modularized and more concise. Without additionalwork the use of the debugger becomes viable, because the transformations preservelocation integrity. In our previous example it was not possible to transparentlystep through transactional code, as the tools would display the generated code.The new implementation takes advantage of Helvetia and single steppingthrough transactional code looks exactly the same as the regular code.

5 Leveraging the Host Toolchain with

Helvetia manages to integrate multiple embedded languages with existingtools by leveraging and intercepting the existing toolchain and the underlyingrepresentation of the host language.Helvetia provides hooks to intercept parsing,AST transformation and semantic analysis of the standard compilation toolchain.

To trigger the compilation of a transactional and a non-transaction version of every method we hook into the parser using the <parser> annotation. We copy the compilation context and spawn a new compilation path for the transaction context.


Wrap-up

102


Summary

103

• DSL implementation concerns syntax, semantics, debugging, highlighting, traceability, and possibly others.

• DSLs may be implemented internally, externally, or via embedding.

• Programming environments get increasingly capable of non-trivial embedding approaches.


Applies to material by Lukas Renggli.

http://creativecommons.org/licenses/by-sa/2.5/

Domain-speciﬁc Languageslaemmel/slecourse/slides/dsls.pdf · This slide has been extracted from...

Documents

Transcript of Domain-speciﬁc Languageslaemmel/slecourse/slides/dsls.pdf · This slide has been extracted from...