Amis Consulting LLP
1977-1981 : Research in Combustion/Fluids.
1983-1991 : Scientific computing image processing
1992-1997 : UK Healthcare / Imperial College
1997-2003 : Dotcom Boom (and bust) !!!
2003- : Financial Systems.
Currently involved in High Performance Computing and (of course) Big Data and well as all the other stuff.
Worked with a variety of technologies
● Languages (in anger) : Fortran / C / Ada / Perl / Python / Lisp / Java / PHP / Groovy / NodeJS
… our GOTO languages remain C and Perl but ???
● Back-ends:Unix (not just Linux) and Windows (so some .NET)
● Databases : Both relational and the NoSQL (Redis, Mongo Neo4J)
● Moving into the cloud: AWS: Map-reduce, Redshift, Google App Server
Then along came R …● At Kings in late-2000’s● Interest was in HPC (mainly CUDA) applied to
financial systems. ● Started using Matlab but was looking for a
similar type package for personal/company usage .
● Gnu/Octave and R both fitted the bill, R won – at the time.
● Looked at (and impressed by) Python
History
● Gang of “four”:– Jeff Bezanson, Virah Shah– Stefan Karpinski, Alan Edelman
● Started at MIT in 2010● First release February, 2012● Still actively maintained by G4● MIT using Julia in courses (on youtube)
What happened to Ada?
● Designed 1977/83 for US DoD in order to supercede 100’s of languages DoD used.
● Mandated its use in 1987.● Dropped the mandate in 1997.● Still used in air traffic control systems such as
iFacts, GNATS.● Nearest meetup group is in Stockholm.
Runners and Riders
Current field:
1. Runners: Matlab, R, Python
2. Riders: C/C++, Java
3. Outsiders: Scala, Clojure
4. Non-starter: Perl
What makes a good Data Science Language? (1)
● Be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks.
● Be free, open-source and platform independent.
● Be fast and efficient.
● Have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra.
● Have a strong type system, and be statically typed with good compile-time type checking and type safety.
● Have reasonable type inference.
● Have a REPL for interactive use
What makes a good Data Science Language? (2)
● Have good tool support - including build tools, doc tools, testing tools, and an intelligent IDE.
● Have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design
● Allow imperative programming for occasions where it makes sense.
● Be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications.
● Have excellent built-in data capabilities.
● Have comprehensive math and statistical routines.
Comparison with Matlab● Julia syntax is similar to Matlab but its
construction is purposely very different.● Matlab has only one data structure (the matrix)
and is optimised for matrix operations. Other native computations can be very slow.
● The focus on matrices lead to some important differences in MATLAB’s design compared to GP programming languages such as Julia.
● Julia uses similar matrix syntax to Matlab but also incorporates list comprehensions.
Comparison with R● Origins as open-source clone of S+.● Still seen as a “statistical” DSL.● R is single threaded and hard to speed up.● Introduced the data frame structure which is
also present in Julia● Julia also has an RDatasets package.● R has very good graphic and data visualisation
support.● Julia has a Google group: julia-stats.● Julia can call R modules using the Rif package.
Comparison with Python● Python now seen by many as the Data Science
language.● Strength lies in its community support.● Modules such as numpy, scipy, matplotlib and
pandas are very powerful.● Speed up using PyPy● Mature frameworks such as Django● Julia approach is co-operation not confrontation
via the PyCall and also IJulia IPython
What makes Julia special?
● It is written in Julia, apart from a small core, and the code is available to look at.
● The designers are data scientists and not tied to companies such as Google (Go) or Mozilla (Rust).
● It has been designed for parallelism / distributed computation
● It takes every opportunity to cooperate rather than confront.
● Julia intends to combine the best from MATLAB, R and Python into one language that is to be consistent, well designed and fast.
Special features
• Easy installation• JIT compilation• Built-in package manager• Coroutines and green threads• Multiple dispatch• Dynamic type system• Meta programming with Lisp-like macros• Call C functions directly• Call Python functions: (PyCall)• Best-of-breed C and Fortran libraries• Unicode support
The ones to read …
● Parallel computing– http:// julia.readthedocs/en/latest/manual/parallel-computing
● Metaprogramming– http://docs.julialang.org/en/latest/manual/metaprogramming
● Networking and streams– http://docs.julialang.org/en/latest/manual/networking-and-streams
● Calling C and Fortran code– http:// julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code
Modules and packages● Julia has its own built-in package manager
● There are (now) 250+ packages.
● These include:– Statistics
– Graphics
– System tools
– Database
– Web and Cloud
– Simulation
● Its quite easy to add your own package (via GITHub)
100+ contributors, 1000+ mailing list subscribers, 175+ packagesAWS, ArgParse, BSplines, Benchmark, BinDeps, BioSeq, BloomFilters, Cairo, Calculus, Calendar, Cartesian, Catalan, ChainedVectors, ChemicalKinetics, Clang, Clp, ClusterManagers, Clustering, Codecs, CoinMP, Color, Compose, ContinuedFractions, Cpp, Cubature, Curl, DICOM, DWARF, DataFrames, DataStructures, Datetime, Debug, DecisionTree, Devectorize, DictUtils, DictViews, DiscreteFactor, Distance,Distributions, DualNumbers, ELF, Elliptic, Example, ExpressionUtils, FITSIO, FactCheck, FastaIO, FastaRead, FileFind, FunctionalCollections, FunctionalUtils, GLFW, GLM, GLPK, GLUT, GSL,GZip, Gadfly, Gaston, GeoIP, GeometricMCMC,GetC, GoogleCharts, Graphs, Grid, Gtk, Gurobi, HDF5, HDFS, HTTP, HTTPClient, Hadamard, HttpCommon, HttpParser, HttpServer,HypothesisTests, ICU, ImageView,Images, ImmutableArrays, IniFile, Iterators, Ito, JSON, JudyDicts, JuliaWebRepl, KLDivergence, LIBSVM, Languages, LazySequences, LibCURL, LibExpat, LinProgGLPK, Loss, MAT, MATLAB, MCMC, MDCT, MLBase,MNIST, MarketTechnicals, MathProg, MathProgBase, Meddle, Memoize, Meshes, Metis, MixedModels,Monads, Mongo, Mongrel2, Morsel, Mustache, NHST, NIfTI, NLopt, Named, NetCDF, NumericExtensions, NumericFunctors, ODBC, ODE, OpenGL, OpenSSL, Optim, Options, PLX, PTools, PatternDispatch, Phylo,Phylogenetics, Polynomial, Profile, ProgressMeter, ProjectTemplate, PyCall, PyPlot, PySide, Quandl,QuickCheck, RDatasets, REPL, RNGTest, RPMmd, RandomMatrices, Readline, Regression, Resampling, Rif, Rmath, RobustStats, Roots, SDE, SDL, SVM, SemidefiniteProgramming, SimJulia, SimpleMCMC, Sims,Sodium, Soundex, Sqlite, Stats, StrPack, Sundials, SymPy, TOML, Terminals, TextAnalysis, TextWrap, TimeModels, TimeSeries, Tk, TopicModels, TradingInstrument, Trie, URLParse, UTF16, Units, ValueDispatch,WAV, WebSockets, Winston, YAML, ZMQ, Zlib
Julia does have graphics!
● Winston (Standard 2D graphics)
● Gadfly (Like 'gg2plot')
● Gaston (Uses gnuplot as graphics engine)
● PyPlot (Uses IPython/matplotlib.py)
● Plotly (http://plot.ly/api)
Simulated Stock Marketjulia> plothist(randn(100000), 100) julia> plot(cumsum(randn(10000)))
What’s missing?
● Cached package loading– At present all modules are compiled on the fly– Preloading would reduce startup times
● Better database connectivity– Uses ODBC– Simple d/b support via SQLite– No native Oracle, MySQL or Postgresql
● More comprehensive NoSQL support– Packages for Mongo, Redis.– JSON package helps with CouchDB, Neo4j
Familiar syntax for Matlab/Octave users
function randmatstat (t; n=10) v = zeros(t) w = zeros(t) for i = 1:t a = randn(n,n) b = randn(n,n) c = randn(n,n) d = randn(n,n) P = [a b c d] Q = [a b; c d] v[i] = trace((P'*P)^4) w[i] = trace((Q'*Q)^4) end std(v)/mean(v), std(w)/mean(w)end
Simulating an Asian Option
S0 = 100; # Spot priceK = 102; # Strike pricer = 0.05; # Risk free rateq = 0.0; # Dividend yieldv = 0.2; # Volatilitytma = 0.25; # Time to maturityT = 100; # Number of time stepsdt = tma/T; # Time increment
S = zeros(Float64,T); S[1] = S0;dW = randn(T)*sqrt(dt);[ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] +
0.5*v*v*dW[t]*dW[t]) for t=2:T ]x = linspace(1, T, length(T));p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%")add(p, Curve(x,S,color="red"))display(p)
Random Walk on Julia Studio
Going further …● Start with the julia.org website● Install Julia and read the documentation● Look at the training material
– http://julialang.org/teaching/
● Try the Julia Studio● Read/subscribe to Google-groups sites
– julia-users, julia-stats, julia-opt, julia-dev
● Join the LJuUG– http://www.meetup.com/London-Julia-User-Group
My Benchmarks
Language Timing (c = 1) Asian Optionc 1.0 1.681
julia 1.41 1.680
python (v3)
32.67 1.671
R 154.3 1.646
Octave 789.3 1.632
Results for 100,000 runs of 100 steps, (c ~ 0.73 s)
Samsung RV711 laptop with an i5 processor and 4Gb RAM running Centos 6.5 (Final)
Top Related