Download - Amis Consulting LLP 1977-1981: Research in Combustion/Fluids. 1983-1991: Scientific computing image processing 1992-1997: UK Healthcare / Imperial College.

Amis Consulting LLP

1977-1981 : Research in Combustion/Fluids.

1983-1991 : Scientific computing image processing

1992-1997 : UK Healthcare / Imperial College

1997-2003 : Dotcom Boom (and bust) !!!

2003- : Financial Systems.

Currently involved in High Performance Computing and (of course) Big Data and well as all the other stuff.

Worked with a variety of technologies

● Languages (in anger) : Fortran / C / Ada / Perl / Python / Lisp / Java / PHP / Groovy / NodeJS

… our GOTO languages remain C and Perl but ???

● Back-ends:Unix (not just Linux) and Windows (so some .NET)

● Databases : Both relational and the NoSQL (Redis, Mongo Neo4J)

● Moving into the cloud: AWS: Map-reduce, Redshift, Google App Server

Then along came R …● At Kings in late-2000’s● Interest was in HPC (mainly CUDA) applied to

financial systems. ● Started using Matlab but was looking for a

similar type package for personal/company usage .

● Gnu/Octave and R both fitted the bill, R won – at the time.

● Looked at (and impressed by) Python

History

● Gang of “four”:– Jeff Bezanson, Virah Shah– Stefan Karpinski, Alan Edelman

● Started at MIT in 2010● First release February, 2012● Still actively maintained by G4● MIT using Julia in courses (on youtube)

What happened to Ada?

● Designed 1977/83 for US DoD in order to supercede 100’s of languages DoD used.

● Mandated its use in 1987.● Dropped the mandate in 1997.● Still used in air traffic control systems such as

iFacts, GNATS.● Nearest meetup group is in Stockholm.

Runners and Riders

Current field:

1. Runners: Matlab, R, Python

2. Riders: C/C++, Java

3. Outsiders: Scala, Clojure

4. Non-starter: Perl

What makes a good Data Science Language? (1)

● Be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks.

● Be free, open-source and platform independent.

● Be fast and efficient.

● Have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra.

● Have a strong type system, and be statically typed with good compile-time type checking and type safety.

● Have reasonable type inference.

● Have a REPL for interactive use

What makes a good Data Science Language? (2)

● Have good tool support - including build tools, doc tools, testing tools, and an intelligent IDE.

● Have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design

● Allow imperative programming for occasions where it makes sense.

● Be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications.

● Have excellent built-in data capabilities.

● Have comprehensive math and statistical routines.

Comparison with Matlab● Julia syntax is similar to Matlab but its

construction is purposely very different.● Matlab has only one data structure (the matrix)

and is optimised for matrix operations. Other native computations can be very slow.

● The focus on matrices lead to some important differences in MATLAB’s design compared to GP programming languages such as Julia.

● Julia uses similar matrix syntax to Matlab but also incorporates list comprehensions.

Comparison with R● Origins as open-source clone of S+.● Still seen as a “statistical” DSL.● R is single threaded and hard to speed up.● Introduced the data frame structure which is

also present in Julia● Julia also has an RDatasets package.● R has very good graphic and data visualisation

support.● Julia has a Google group: julia-stats.● Julia can call R modules using the Rif package.

Comparison with Python● Python now seen by many as the Data Science

language.● Strength lies in its community support.● Modules such as numpy, scipy, matplotlib and

pandas are very powerful.● Speed up using PyPy● Mature frameworks such as Django● Julia approach is co-operation not confrontation

via the PyCall and also IJulia IPython

What makes Julia special?

● It is written in Julia, apart from a small core, and the code is available to look at.

● The designers are data scientists and not tied to companies such as Google (Go) or Mozilla (Rust).

● It has been designed for parallelism / distributed computation

● It takes every opportunity to cooperate rather than confront.

● Julia intends to combine the best from MATLAB, R and Python into one language that is to be consistent, well designed and fast.

Special features

• Easy installation• JIT compilation• Built-in package manager• Coroutines and green threads• Multiple dispatch• Dynamic type system• Meta programming with Lisp-like macros• Call C functions directly• Call Python functions: (PyCall)• Best-of-breed C and Fortran libraries• Unicode support

The ones to read …

● Parallel computing– http:// julia.readthedocs/en/latest/manual/parallel-computing

● Metaprogramming– http://docs.julialang.org/en/latest/manual/metaprogramming

● Networking and streams– http://docs.julialang.org/en/latest/manual/networking-and-streams

● Calling C and Fortran code– http:// julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code

http://docs.julialang.org/en/latest/manual/parallel-computing



http://docs.julialang.org/en/latest/manual/metaprogramming

http://docs.julialang.org/en/latest/manual/metaprogramming

http://docs.julialang.org/en/latest/manual/networking-and-streams

http://docs.julialang.org/en/latest/manual/networking-and-streams

http://docs.julialang.org/en/latest/manual/calling-c-and-fortran-code




Modules and packages● Julia has its own built-in package manager

● There are (now) 250+ packages.

● These include:– Statistics

– Graphics

– System tools

– Database

– Web and Cloud

– Simulation

● Its quite easy to add your own package (via GITHub)

100+ contributors, 1000+ mailing list subscribers, 175+ packagesAWS, ArgParse, BSplines, Benchmark, BinDeps, BioSeq, BloomFilters, Cairo, Calculus, Calendar, Cartesian, Catalan, ChainedVectors, ChemicalKinetics, Clang, Clp, ClusterManagers, Clustering, Codecs, CoinMP, Color, Compose, ContinuedFractions, Cpp, Cubature, Curl, DICOM, DWARF, DataFrames, DataStructures, Datetime, Debug, DecisionTree, Devectorize, DictUtils, DictViews, DiscreteFactor, Distance,Distributions, DualNumbers, ELF, Elliptic, Example, ExpressionUtils, FITSIO, FactCheck, FastaIO, FastaRead, FileFind, FunctionalCollections, FunctionalUtils, GLFW, GLM, GLPK, GLUT, GSL,GZip, Gadfly, Gaston, GeoIP, GeometricMCMC,GetC, GoogleCharts, Graphs, Grid, Gtk, Gurobi, HDF5, HDFS, HTTP, HTTPClient, Hadamard, HttpCommon, HttpParser, HttpServer,HypothesisTests, ICU, ImageView,Images, ImmutableArrays, IniFile, Iterators, Ito, JSON, JudyDicts, JuliaWebRepl, KLDivergence, LIBSVM, Languages, LazySequences, LibCURL, LibExpat, LinProgGLPK, Loss, MAT, MATLAB, MCMC, MDCT, MLBase,MNIST, MarketTechnicals, MathProg, MathProgBase, Meddle, Memoize, Meshes, Metis, MixedModels,Monads, Mongo, Mongrel2, Morsel, Mustache, NHST, NIfTI, NLopt, Named, NetCDF, NumericExtensions, NumericFunctors, ODBC, ODE, OpenGL, OpenSSL, Optim, Options, PLX, PTools, PatternDispatch, Phylo,Phylogenetics, Polynomial, Profile, ProgressMeter, ProjectTemplate, PyCall, PyPlot, PySide, Quandl,QuickCheck, RDatasets, REPL, RNGTest, RPMmd, RandomMatrices, Readline, Regression, Resampling, Rif, Rmath, RobustStats, Roots, SDE, SDL, SVM, SemidefiniteProgramming, SimJulia, SimpleMCMC, Sims,Sodium, Soundex, Sqlite, Stats, StrPack, Sundials, SymPy, TOML, Terminals, TextAnalysis, TextWrap, TimeModels, TimeSeries, Tk, TopicModels, TradingInstrument, Trie, URLParse, UTF16, Units, ValueDispatch,WAV, WebSockets, Winston, YAML, ZMQ, Zlib

Julia does have graphics!

● Winston (Standard 2D graphics)

● Gadfly (Like 'gg2plot')

● Gaston (Uses gnuplot as graphics engine)

● PyPlot (Uses IPython/matplotlib.py)

● Plotly (http://plot.ly/api)

Simulated Stock Marketjulia> plothist(randn(100000), 100) julia> plot(cumsum(randn(10000)))

What’s missing?

● Cached package loading– At present all modules are compiled on the fly– Preloading would reduce startup times

● Better database connectivity– Uses ODBC– Simple d/b support via SQLite– No native Oracle, MySQL or Postgresql

● More comprehensive NoSQL support– Packages for Mongo, Redis.– JSON package helps with CouchDB, Neo4j

Familiar syntax for Matlab/Octave users

function randmatstat (t; n=10) v = zeros(t) w = zeros(t) for i = 1:t a = randn(n,n) b = randn(n,n) c = randn(n,n) d = randn(n,n) P = [a b c d] Q = [a b; c d] v[i] = trace((P'*P)^4) w[i] = trace((Q'*Q)^4) end std(v)/mean(v), std(w)/mean(w)end

Simulating an Asian Option

S0 = 100; # Spot priceK = 102; # Strike pricer = 0.05; # Risk free rateq = 0.0; # Dividend yieldv = 0.2; # Volatilitytma = 0.25; # Time to maturityT = 100; # Number of time stepsdt = tma/T; # Time increment

S = zeros(Float64,T); S[1] = S0;dW = randn(T)*sqrt(dt);[ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] +

0.5*v*v*dW[t]*dW[t]) for t=2:T ]x = linspace(1, T, length(T));p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%")add(p, Curve(x,S,color="red"))display(p)

Random Walk on Julia Studio

Going further …● Start with the julia.org website● Install Julia and read the documentation● Look at the training material

– http://julialang.org/teaching/

● Try the Julia Studio● Read/subscribe to Google-groups sites

– julia-users, julia-stats, julia-opt, julia-dev

● Join the LJuUG– http://www.meetup.com/London-Julia-User-Group

My Benchmarks

Language Timing (c = 1) Asian Optionc 1.0 1.681

julia 1.41 1.680

python (v3)

32.67 1.671

R 154.3 1.646

Octave 789.3 1.632

Results for 100,000 runs of 100 steps, (c ~ 0.73 s)

Samsung RV711 laptop with an i5 processor and 4Gb RAM running Centos 6.5 (Final)