Some Developments for the R Engine - University of...
Transcript of Some Developments for the R Engine - University of...
![Page 1: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/1.jpg)
Some Developments for the R Engine
Luke Tierney
Department of Statistics & Actuarial ScienceUniversity of Iowa
November 10, 2011
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 1 / 32
![Page 2: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/2.jpg)
Introduction
R is a language for data analysis and graphics.
R is based on the S language developed by John Chambers and othersat Bell Labs.
R is widely used in the field of statistics and beyond, especially inuniversity environments.
R has become the primary framework for developing and makingavailable new statistical methodology.
Many (over 3,000) extension packages are available through CRAN orsimilar repositories.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32
![Page 3: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/3.jpg)
Introduction
R is a language for data analysis and graphics.
R is based on the S language developed by John Chambers and othersat Bell Labs.
R is widely used in the field of statistics and beyond, especially inuniversity environments.
R has become the primary framework for developing and makingavailable new statistical methodology.
Many (over 3,000) extension packages are available through CRAN orsimilar repositories.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32
![Page 4: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/4.jpg)
Introduction
R is a language for data analysis and graphics.
R is based on the S language developed by John Chambers and othersat Bell Labs.
R is widely used in the field of statistics and beyond, especially inuniversity environments.
R has become the primary framework for developing and makingavailable new statistical methodology.
Many (over 3,000) extension packages are available through CRAN orsimilar repositories.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32
![Page 5: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/5.jpg)
Introduction
R is a language for data analysis and graphics.
R is based on the S language developed by John Chambers and othersat Bell Labs.
R is widely used in the field of statistics and beyond, especially inuniversity environments.
R has become the primary framework for developing and makingavailable new statistical methodology.
Many (over 3,000) extension packages are available through CRAN orsimilar repositories.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32
![Page 6: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/6.jpg)
Introduction
R is a language for data analysis and graphics.
R is based on the S language developed by John Chambers and othersat Bell Labs.
R is widely used in the field of statistics and beyond, especially inuniversity environments.
R has become the primary framework for developing and makingavailable new statistical methodology.
Many (over 3,000) extension packages are available through CRAN orsimilar repositories.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32
![Page 7: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/7.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 8: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/8.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 9: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/9.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 10: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/10.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 11: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/11.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 12: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/12.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 13: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/13.jpg)
History and Development Model
R is an Open Source project.
Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.
Developed by the R-core group since mid 1997,
Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek
Strong community support through
contributed extension packagesmailing lists and blogscontributed documentation and task views
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32
![Page 14: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/14.jpg)
Introduction
This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:
Byte code compilation of R code.Taking advantage of multiple cores for
basic vectorized operationssimple matrix operations.
Increasing the limit on the size of vector data objects.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32
![Page 15: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/15.jpg)
Introduction
This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:
Byte code compilation of R code.Taking advantage of multiple cores for
basic vectorized operationssimple matrix operations.
Increasing the limit on the size of vector data objects.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32
![Page 16: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/16.jpg)
Introduction
This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:
Byte code compilation of R code.Taking advantage of multiple cores for
basic vectorized operationssimple matrix operations.
Increasing the limit on the size of vector data objects.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32
![Page 17: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/17.jpg)
Introduction
This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:
Byte code compilation of R code.Taking advantage of multiple cores for
basic vectorized operationssimple matrix operations.
Increasing the limit on the size of vector data objects.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32
![Page 18: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/18.jpg)
Introduction
This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:
Byte code compilation of R code.Taking advantage of multiple cores for
basic vectorized operationssimple matrix operations.
Increasing the limit on the size of vector data objects.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32
![Page 19: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/19.jpg)
Introduction
This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:
Byte code compilation of R code.Taking advantage of multiple cores for
basic vectorized operationssimple matrix operations.
Increasing the limit on the size of vector data objects.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32
![Page 20: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/20.jpg)
Byte Code Compilation
The standard R evaluation mechanism
parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.
Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.
Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32
![Page 21: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/21.jpg)
Byte Code Compilation
The standard R evaluation mechanism
parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.
Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.
Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32
![Page 22: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/22.jpg)
Byte Code Compilation
The standard R evaluation mechanism
parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.
Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.
Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32
![Page 23: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/23.jpg)
Byte Code Compilation
The standard R evaluation mechanism
parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.
Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.
Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32
![Page 24: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/24.jpg)
Byte Code Compilation
The standard R evaluation mechanism
parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.
Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.
Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32
![Page 25: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/25.jpg)
Byte Code Compilation
Byte code is the machine code for a virtual machine.
Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.
Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.
Various strategies for further compiling byte code to native machinecode are also sometimes used.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32
![Page 26: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/26.jpg)
Byte Code Compilation
Byte code is the machine code for a virtual machine.
Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.
Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.
Various strategies for further compiling byte code to native machinecode are also sometimes used.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32
![Page 27: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/27.jpg)
Byte Code Compilation
Byte code is the machine code for a virtual machine.
Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.
Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.
Various strategies for further compiling byte code to native machinecode are also sometimes used.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32
![Page 28: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/28.jpg)
Byte Code Compilation
Byte code is the machine code for a virtual machine.
Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.
Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.
Various strategies for further compiling byte code to native machinecode are also sometimes used.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32
![Page 29: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/29.jpg)
Byte Code Compilation
Efforts to add byte code compilation to R have been underway forsome time.
The first release of the compiler occurred with R 2.13.0 this spring.
Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.
The current compiler and virtual machine produce goodimprovements in a number of cases.
Better results should be possible with some improvements to thevirtual machine and are currently being explored.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32
![Page 30: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/30.jpg)
Byte Code Compilation
Efforts to add byte code compilation to R have been underway forsome time.
The first release of the compiler occurred with R 2.13.0 this spring.
Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.
The current compiler and virtual machine produce goodimprovements in a number of cases.
Better results should be possible with some improvements to thevirtual machine and are currently being explored.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32
![Page 31: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/31.jpg)
Byte Code Compilation
Efforts to add byte code compilation to R have been underway forsome time.
The first release of the compiler occurred with R 2.13.0 this spring.
Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.
The current compiler and virtual machine produce goodimprovements in a number of cases.
Better results should be possible with some improvements to thevirtual machine and are currently being explored.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32
![Page 32: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/32.jpg)
Byte Code Compilation
Efforts to add byte code compilation to R have been underway forsome time.
The first release of the compiler occurred with R 2.13.0 this spring.
Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.
The current compiler and virtual machine produce goodimprovements in a number of cases.
Better results should be possible with some improvements to thevirtual machine and are currently being explored.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32
![Page 33: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/33.jpg)
Byte Code Compilation
Efforts to add byte code compilation to R have been underway forsome time.
The first release of the compiler occurred with R 2.13.0 this spring.
Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.
The current compiler and virtual machine produce goodimprovements in a number of cases.
Better results should be possible with some improvements to thevirtual machine and are currently being explored.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32
![Page 34: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/34.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 35: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/35.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 36: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/36.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 37: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/37.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 38: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/38.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 39: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/39.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 40: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/40.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 41: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/41.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 42: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/42.jpg)
Compiler Operation
The compiler can be called explicitly to compile single functions orfiles of code:
cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp
It is also possible to have package code compiled when a package isinstalled.
Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.
Alternatively, the compiler can be used in a JIT mode where
functions are compiled on first useloops are compiler before they are run
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32
![Page 43: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/43.jpg)
Compiler Operation
The current compiler includes a number of optimizations, such as
constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.
dnorm(y, 2, 3)
is replaced by
.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))
special instructions for many .Internals
The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32
![Page 44: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/44.jpg)
Compiler Operation
The current compiler includes a number of optimizations, such as
constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.
dnorm(y, 2, 3)
is replaced by
.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))
special instructions for many .Internals
The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32
![Page 45: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/45.jpg)
Compiler Operation
The current compiler includes a number of optimizations, such as
constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.
dnorm(y, 2, 3)
is replaced by
.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))
special instructions for many .Internals
The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32
![Page 46: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/46.jpg)
Compiler Operation
The current compiler includes a number of optimizations, such as
constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.
dnorm(y, 2, 3)
is replaced by
.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))
special instructions for many .Internals
The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32
![Page 47: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/47.jpg)
Compiler Operation
The current compiler includes a number of optimizations, such as
constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.
dnorm(y, 2, 3)
is replaced by
.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))
special instructions for many .Internals
The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32
![Page 48: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/48.jpg)
Compiler Operation
The current compiler includes a number of optimizations, such as
constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.
dnorm(y, 2, 3)
is replaced by
.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))
special instructions for many .Internals
The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32
![Page 49: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/49.jpg)
A Simple Example
R Code
f <- function(x) {s <- 0.0for (y in x)
s <- s + ys
}
VM Assembly Code
LDCONST 0.0SETVAR sPOPGETVAR xSTARTFOR y L2
L1: GETVAR sGETVAR yADDSETVAR sPOPSTEPFOR L1
L2: ENDFORPOPGETVAR sRETURN
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 10 / 32
![Page 50: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/50.jpg)
A Simple Example
R Code
f <- function(x) {s <- 0.0for (y in x)
s <- s + ys
}
VM Assembly Code
LDCONST 0.0SETVAR sPOPGETVAR xSTARTFOR y L2
L1: GETVAR sGETVAR yADDSETVAR sPOPSTEPFOR L1
L2: ENDFORPOPGETVAR sRETURN
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 10 / 32
![Page 51: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/51.jpg)
Some Performance Results
Timings for some simple benchmarks on an x86_64 Ubuntu laptop:
Benchmark Interp. Comp. Speedup Exper. Speedup
p1 32.19 7.98 4.0 1.47 21.9sum 6.72 1.86 3.6 0.59 11.4conv 14.48 4.30 3.4 0.81 17.9rem 56.82 23.68 2.4 4.77 11.9
Interp., Comp. are for the development version of R
includes some variable lookup improvements for compiled code
Exper.: experimental version using
separate instructions for vector, matrix indexing
typed stack to avoid allocating intermediate scalar values
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 11 / 32
![Page 52: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/52.jpg)
Some Performance Results
Timings for some simple benchmarks on an x86_64 Ubuntu laptop:
Benchmark Interp. Comp. Speedup Exper. Speedup
p1 32.19 7.98 4.0 1.47 21.9sum 6.72 1.86 3.6 0.59 11.4conv 14.48 4.30 3.4 0.81 17.9rem 56.82 23.68 2.4 4.77 11.9
Interp., Comp. are for the development version of R
includes some variable lookup improvements for compiled code
Exper.: experimental version using
separate instructions for vector, matrix indexing
typed stack to avoid allocating intermediate scalar values
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 11 / 32
![Page 53: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/53.jpg)
Some Performance Results
Timings for some simple benchmarks on an x86_64 Ubuntu laptop:
Benchmark Interp. Comp. Speedup Exper. Speedup
p1 32.19 7.98 4.0 1.47 21.9sum 6.72 1.86 3.6 0.59 11.4conv 14.48 4.30 3.4 0.81 17.9rem 56.82 23.68 2.4 4.77 11.9
Interp., Comp. are for the development version of R
includes some variable lookup improvements for compiled code
Exper.: experimental version using
separate instructions for vector, matrix indexing
typed stack to avoid allocating intermediate scalar values
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 11 / 32
![Page 54: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/54.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 55: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/55.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 56: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/56.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 57: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/57.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 58: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/58.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 59: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/59.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 60: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/60.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 61: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/61.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 62: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/62.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 63: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/63.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 64: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/64.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 65: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/65.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 66: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/66.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 67: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/67.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 68: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/68.jpg)
Future Directions
The current virtual machine uses a stack based design.
An alternative approach might use a register-based design.
Some additional optimizations currently being explored:
avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation
Some promising preliminary results are available.
Other possible directions include
Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches
Collaborations with computer scientists are being explored.
Maintainability is a key concern.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32
![Page 69: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/69.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 70: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/70.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 71: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/71.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 72: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/72.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 73: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/73.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 74: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/74.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 75: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/75.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 76: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/76.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 77: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/77.jpg)
Parallelizing Vector and Matrix Operations
Most modern computers feature two or more processor cores.
It is expected that tens of cores will soon be common.
A common question:
How can I make R use more than one core for my computation?
There are many easy answers.
But this is the wrong question.
The right question:
How can we take advantage of having more than one core to get ourcomputations to run faster?
This is harder to answer.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32
![Page 78: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/78.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 79: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/79.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 80: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/80.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 81: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/81.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 82: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/82.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 83: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/83.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 84: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/84.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 85: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/85.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 86: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/86.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 87: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/87.jpg)
Some Approaches to Parallel Computing
Two possible approaches:Explicit parallelization:
uses some form of annotation to specify parallelismpackages snow, multicore, parallel.
Implicit parallelization:
automatic, no user action needed
I will focus on implicit parallelization of
basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32
![Page 88: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/88.jpg)
Parallelizing Vectorized OperationsAn Idealized View
Basic idea for computing f(x[1:n]) on a two-processor system:
Run two worker threads.Place half the computation on each thread.
Ideally this would produce a two-fold speed up.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32
![Page 89: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/89.jpg)
Parallelizing Vectorized OperationsAn Idealized View
Basic idea for computing f(x[1:n]) on a two-processor system:
Run two worker threads.Place half the computation on each thread.
Ideally this would produce a two-fold speed up.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32
![Page 90: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/90.jpg)
Parallelizing Vectorized OperationsAn Idealized View
Basic idea for computing f(x[1:n]) on a two-processor system:
Run two worker threads.Place half the computation on each thread.
Ideally this would produce a two-fold speed up.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32
![Page 91: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/91.jpg)
Parallelizing Vectorized OperationsAn Idealized View
Basic idea for computing f(x[1:n]) on a two-processor system:
Run two worker threads.Place half the computation on each thread.
Ideally this would produce a two-fold speed up.
Parallel
Sequential n
n/2
n/2
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32
![Page 92: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/92.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 93: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/93.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 94: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/94.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 95: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/95.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 96: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/96.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 97: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/97.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 98: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/98.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 99: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/99.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 100: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/100.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 101: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/101.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 102: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/102.jpg)
Parallelizing Vectorized OperationsA More Realistic View
Reality is a bit different:
Parallel
Sequential n
n/2
n/2
There is
synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload
Parallelizing will only pay off if n is large enough.
For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.
Careful tuning to ensure improvement will be needed.
Some aspects will depend on architecture and OS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32
![Page 103: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/103.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
0 200 400 600 800 1000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n
CP
U ti
me
in m
illis
econ
ds
qnorm, Linux/AMD/x86_64
0 200 400 600 800 1000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n
CP
U ti
me
in m
illis
econ
ds
pgamma, Linux/AMD/x86_64
0 200 400 600 800 1000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n
CP
U ti
me
in m
illis
econ
ds
qnorm, Mac OS X/Intel/i386
0 200 400 600 800 1000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n
CP
U ti
me
in m
illis
econ
ds
pgamma, Mac OS X/Intel/i386
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 17 / 32
![Page 104: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/104.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 105: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/105.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 106: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/106.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 107: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/107.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 108: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/108.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 109: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/109.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 110: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/110.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 111: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/111.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 112: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/112.jpg)
Parallelizing Vectorized OperationsSome Experimental Results
Some observations:
Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,
sP ≈ s1/P
Relative slopes of functions seem roughly independent ofOS/architecture.
A simple calibration strategy:
Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.
The appropriate time to run calibration code is still open.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32
![Page 113: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/113.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 114: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/114.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 115: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/115.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 116: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/116.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 117: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/117.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 118: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/118.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 119: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/119.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 120: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/120.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 121: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/121.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 122: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/122.jpg)
Parallelizing Vectorized OperationsImplementation
Need to use threads
One possibility: using raw pthreads
Better choice: use Open MP
Open MP consists of
compiler directives (#pragma statements in C)a runtime support library
Most commercial compilers support Open MP.
Current gcc versions support Open MP; newer ones do a better job.
MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.
Support for Win64 now available also and should be in the toolchainsoon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32
![Page 123: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/123.jpg)
Parallelizing Vectorized OperationsImplementation
Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \
default(shared) private(i) reduction(&&:naflag)
for (i = 0; i < n; i++) {
double ai = a[i];
MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);
}
Steps in converting to Open MP:
check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32
![Page 124: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/124.jpg)
Parallelizing Vectorized OperationsImplementation
Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \
default(shared) private(i) reduction(&&:naflag)
for (i = 0; i < n; i++) {
double ai = a[i];
MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);
}
Steps in converting to Open MP:
check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32
![Page 125: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/125.jpg)
Parallelizing Vectorized OperationsImplementation
Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \
default(shared) private(i) reduction(&&:naflag)
for (i = 0; i < n; i++) {
double ai = a[i];
MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);
}
Steps in converting to Open MP:
check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32
![Page 126: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/126.jpg)
Parallelizing Vectorized OperationsImplementation
Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \
default(shared) private(i) reduction(&&:naflag)
for (i = 0; i < n; i++) {
double ai = a[i];
MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);
}
Steps in converting to Open MP:
check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32
![Page 127: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/127.jpg)
Parallelizing Vectorized OperationsImplementation
Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \
default(shared) private(i) reduction(&&:naflag)
for (i = 0; i < n; i++) {
double ai = a[i];
MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);
}
Steps in converting to Open MP:
check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32
![Page 128: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/128.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 129: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/129.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 130: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/130.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 131: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/131.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 132: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/132.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 133: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/133.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 134: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/134.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 135: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/135.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 136: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/136.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 137: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/137.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 138: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/138.jpg)
Parallelizing Vectorized OperationsImplementation
Some things that are not thread-safe:
use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)
Random number generation is also problematic.
Functions in nmath that have not been parallelized yet:
Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32
![Page 139: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/139.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 140: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/140.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 141: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/141.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 142: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/142.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 143: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/143.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 144: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/144.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 145: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/145.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 146: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/146.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 147: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/147.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 148: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/148.jpg)
Parallelizing Vectorized OperationsAvailability
Package pnmath is available at
http://www.stat.uiowa.edu/~luke/R/experimental/
This requires a version of gcc that
supports Open MPallows dlopen to be used on libgomp.so
A version using just pthreads is available in pnmath0.
Loading these packages replaces builtin operations by parallelizedones.
For Linux, Mac OS X predetermined intercept calibrations are used.
For other platforms a calibration test is run at package load time.
The calibration can be run manually by calling calibratePnmath
Hopefully we will be able to include this in R soon.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32
![Page 149: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/149.jpg)
Parallelizing Simple Matrix Operations
Very preliminary results for colSums on an 8-core Linux machine:
size
time
2.5e−05
3.0e−05
3.5e−05
4.0e−05
4.5e−05
5.0e−05
0 5000 10000 15000
●
●●●
●●●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
● ●
●
●
●●●●●●
●●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●● ●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
● ●
●●
● ●
●
●
● ●
● ●
● ●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●
●●
●
●●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
●
● ●
●●
● ●
●
●●●●●
●
●●●●
●
● ●●
●
●●
●
●
●
●
●
● ●●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●● ●
●
● ●
●
● ●
●
●
●
●●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●
●
● ● ● ●● ●
●
●●
●●
●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
12468
●
●
●
●
●
Preliminary results on OS X indicate cutoff levels may be much higher.
Part of the implementation work was done by Xiao Yang.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 23 / 32
![Page 150: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/150.jpg)
Parallelizing Simple Matrix Operations
Very preliminary results for colSums on an 8-core Linux machine:
size
time
2.5e−05
3.0e−05
3.5e−05
4.0e−05
4.5e−05
5.0e−05
0 5000 10000 15000
●
●●●
●●●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
● ●
●
●
●●●●●●
●●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●● ●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
● ●
●●
● ●
●
●
● ●
● ●
● ●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●
●●
●
●●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
●
● ●
●●
● ●
●
●●●●●
●
●●●●
●
● ●●
●
●●
●
●
●
●
●
● ●●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●● ●
●
● ●
●
● ●
●
●
●
●●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●
●
● ● ● ●● ●
●
●●
●●
●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
12468
●
●
●
●
●
Preliminary results on OS X indicate cutoff levels may be much higher.
Part of the implementation work was done by Xiao Yang.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 23 / 32
![Page 151: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/151.jpg)
Parallelizing Simple Matrix Operations
Very preliminary results for colSums on an 8-core Linux machine:
size
time
2.5e−05
3.0e−05
3.5e−05
4.0e−05
4.5e−05
5.0e−05
0 5000 10000 15000
●
●●●
●●●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
● ●
●
●
●●●●●●
●●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●● ●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
● ●
●●
● ●
●
●
● ●
● ●
● ●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●
●●
●
●●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
●
● ●
●●
● ●
●
●●●●●
●
●●●●
●
● ●●
●
●●
●
●
●
●
●
● ●●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●● ●
●
● ●
●
● ●
●
●
●
●●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●●
●
●●
●
● ● ● ●● ●
●
●●
●●
●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
12468
●
●
●
●
●
Preliminary results on OS X indicate cutoff levels may be much higher.
Part of the implementation work was done by Xiao Yang.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 23 / 32
![Page 152: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/152.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 153: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/153.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 154: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/154.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 155: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/155.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 156: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/156.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 157: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/157.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 158: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/158.jpg)
Parallelizing Simple Matrix Operations
Some issues to consider:
Again using too many processor cores for small problems can slow thecomputation down.
colSums can be parallelized by rows or columns:
Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).
rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.
A number of other basic operations can be handled similarly.
Simple uses of apply and sweep might also be handled along theselines.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32
![Page 159: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/159.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 160: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/160.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 161: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/161.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 162: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/162.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 163: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/163.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 164: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/164.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 165: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/165.jpg)
Using a Parallel BLAS
Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).
R has supported using a custom BLAS implementation for some time.
Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.
Atlas and Goto’s BLAS also come in sequential and threaded versions.
Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.
Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.
More testing is needed.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32
![Page 166: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/166.jpg)
Parallelization and Compilation
Compilation may be useful for parallelizing vector operations:
Many vector operations occur in compound expressions, like
exp(-0.5*x^2)
A compiler may be able to fuse these operations:
This will allow gains from parallelizing compound operations onshorter vectors.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32
![Page 167: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/167.jpg)
Parallelization and Compilation
Compilation may be useful for parallelizing vector operations:
Many vector operations occur in compound expressions, like
exp(-0.5*x^2)
A compiler may be able to fuse these operations:
This will allow gains from parallelizing compound operations onshorter vectors.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32
![Page 168: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/168.jpg)
Parallelization and Compilation
Compilation may be useful for parallelizing vector operations:
Many vector operations occur in compound expressions, like
exp(-0.5*x^2)
A compiler may be able to fuse these operations:
SQUARE
SQUARE
SCALE
SCALE EXP
EXP
EXPSQUARE SCALE
SQUARE SCALE EXP
Compiled, fused
Interpreted
Sequential SQUARE SCALE EXP
This will allow gains from parallelizing compound operations onshorter vectors.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32
![Page 169: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/169.jpg)
Parallelization and Compilation
Compilation may be useful for parallelizing vector operations:
Many vector operations occur in compound expressions, like
exp(-0.5*x^2)
A compiler may be able to fuse these operations:
SQUARE
SQUARE
SCALE
SCALE EXP
EXP
EXPSQUARE SCALE
SQUARE SCALE EXP
Compiled, fused
Interpreted
Sequential SQUARE SCALE EXP
This will allow gains from parallelizing compound operations onshorter vectors.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32
![Page 170: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/170.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 171: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/171.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 172: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/172.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 173: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/173.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 174: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/174.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 175: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/175.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 176: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/176.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 177: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/177.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 178: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/178.jpg)
Big Data
Automated data acquisition in science and commerce is producinghuge amounts of data.
Big Data is a hot topic in the popular and trade press.
Some categories:
fit into memoryfit on one machines disk storagerequire multiple machines to store
Smaller large data sets can be handled by standard methods if enoughmemory is available.
Very large data sets require specialized methods and algorithms.
R should be able to smaller large data problems on machines withenough memory.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32
![Page 179: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/179.jpg)
Limit on Vector Object Size
Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647
This limit represents the largest possible 32-bit signed integer.
For numeric (double precision) data this means the largest possiblevector is about 16 GB.
This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.
Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32
![Page 180: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/180.jpg)
Limit on Vector Object Size
Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647
This limit represents the largest possible 32-bit signed integer.
For numeric (double precision) data this means the largest possiblevector is about 16 GB.
This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.
Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32
![Page 181: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/181.jpg)
Limit on Vector Object Size
Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647
This limit represents the largest possible 32-bit signed integer.
For numeric (double precision) data this means the largest possiblevector is about 16 GB.
This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.
Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32
![Page 182: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/182.jpg)
Limit on Vector Object Size
Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647
This limit represents the largest possible 32-bit signed integer.
For numeric (double precision) data this means the largest possiblevector is about 16 GB.
This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.
Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32
![Page 183: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/183.jpg)
Limit on Vector Object Size
Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647
This limit represents the largest possible 32-bit signed integer.
For numeric (double precision) data this means the largest possiblevector is about 16 GB.
This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.
Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32
![Page 184: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/184.jpg)
Some Considerations
For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.
The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.
The R memory manager is easy enough to change.
Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.
External code used by R is also a problem, in particular the BLAS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32
![Page 185: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/185.jpg)
Some Considerations
For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.
The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.
The R memory manager is easy enough to change.
Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.
External code used by R is also a problem, in particular the BLAS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32
![Page 186: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/186.jpg)
Some Considerations
For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.
The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.
The R memory manager is easy enough to change.
Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.
External code used by R is also a problem, in particular the BLAS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32
![Page 187: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/187.jpg)
Some Considerations
For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.
The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.
The R memory manager is easy enough to change.
Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.
External code used by R is also a problem, in particular the BLAS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32
![Page 188: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/188.jpg)
Some Considerations
For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.
The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.
The R memory manager is easy enough to change.
Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.
External code used by R is also a problem, in particular the BLAS.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32
![Page 189: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/189.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 190: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/190.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 191: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/191.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 192: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/192.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 193: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/193.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 194: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/194.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 195: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/195.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 196: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/196.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 197: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/197.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 198: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/198.jpg)
Some Possible Directions
A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.
The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32
![Page 199: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/199.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 200: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/200.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 201: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/201.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 202: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/202.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 203: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/203.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 204: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/204.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 205: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/205.jpg)
Status of Explorations
Initial experiments are promising.
In some cases just changing data types will be sufficient.
In other cases more may be needed, such as
ability to interrupt computationsmore stable numerical algorithms
Some more experimentation is needed.
If successful, the header changes may occur within the next fewweeks.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32
![Page 206: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/206.jpg)
Summary
This talk has outlined several areas I believe are important and towhich I hope I can make some contributions during the near future.
The R development model is quite distributed: other R developers areworking on a wide range of other areas.
Fortunately conflicts are rare and the different efforts, so far at least,have merged together quite very successfully.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 32 / 32
![Page 207: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/207.jpg)
Summary
This talk has outlined several areas I believe are important and towhich I hope I can make some contributions during the near future.
The R development model is quite distributed: other R developers areworking on a wide range of other areas.
Fortunately conflicts are rare and the different efforts, so far at least,have merged together quite very successfully.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 32 / 32
![Page 208: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics](https://reader033.fdocuments.in/reader033/viewer/2022051912/60029070b6bbe3686467adb7/html5/thumbnails/208.jpg)
Summary
This talk has outlined several areas I believe are important and towhich I hope I can make some contributions during the near future.
The R development model is quite distributed: other R developers areworking on a wide range of other areas.
Fortunately conflicts are rare and the different efforts, so far at least,have merged together quite very successfully.
Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 32 / 32