My BITE Fellowship Edward Challis. This is a picture of me:

14
My BITE Fellowship Edward Challis

Transcript of My BITE Fellowship Edward Challis. This is a picture of me:

Page 1: My BITE Fellowship Edward Challis. This is a picture of me:

My BITE FellowshipEdward Challis

Page 2: My BITE Fellowship Edward Challis. This is a picture of me:

This is a picture of me:

Page 3: My BITE Fellowship Edward Challis. This is a picture of me:

This is my background:

• Postdoc in machine learning with David Barber at UCL.

• Short 6 month postdoc in using machine learning methods to detect disease in fMRI scans.

• PhD in scalable approximate inference for Bayesian linear models with David Barber at CSML.

• 2 years working in finance.

• MSc in Artificial Intelligence.

• Maths degree.

• School and childhood.

Tim

e…

Page 4: My BITE Fellowship Edward Challis. This is a picture of me:

Machine Learning?

Machine Learning is the study of algorithms that use data to improve their ability to

perform some task.

Iteration 1

Iteration 2 Iteration 3

Page 5: My BITE Fellowship Edward Challis. This is a picture of me:

BITE Fellowship

Received an email, liked the idea, started looking for good places to do the internship.

Page 6: My BITE Fellowship Edward Challis. This is a picture of me:

Martin Goodson

Page 7: My BITE Fellowship Edward Challis. This is a picture of me:

Skimlinks

Page 8: My BITE Fellowship Edward Challis. This is a picture of me:

Skimlinks’ Data

• Skimlinks collects loads of data:• Links to products on publisher pages• Clicks on links in pages• Purchases of products from product links.

• But in its raw form this data looks like:

Page 9: My BITE Fellowship Edward Challis. This is a picture of me:

Data processing and machine learning

How do we convert raw log files into understandable concepts such as products, topics, themes, intents.

The primary problem is that the datasets are extremely large. 1TB + for most sub-

problems.

Page 10: My BITE Fellowship Edward Challis. This is a picture of me:

What I worked on:

• Because the datasets are so large you can only work on them using cluster computing.

• Skimlinks is a leader in the UK in the adoption of the Apache Spark cluster computing framework.

• My seccondment at Skimlinks focused on implementing and applying machine learning algorithms on large clusters of large machines running Apache Spark to extract meaningful information from log files.

Page 11: My BITE Fellowship Edward Challis. This is a picture of me:

Why this was great for me:

• In academia its hard to get your hands on such large and interesting datasets.

• Skimlinks works on the largest datasets of any startup in London I know of.

• Distributed computing is the future.

• Adapting my ML knowledge into this domain is fascinating and challenging.

• Real problems are hard and subtle. Experience is required to solve them.

Page 12: My BITE Fellowship Edward Challis. This is a picture of me:

Why this was great for them:

• Skimlinks is always looking for ways to improve the ‘intelligence’ of its products. My ML knowledge helped the team approach and solve some of their hard problems.

• During the secondment we built effectively systems the processed TBs of data into useful and interpretable knowledge about products and content.

Page 13: My BITE Fellowship Edward Challis. This is a picture of me:

Future plans..

• My relationship with Skimlinks will continue. I’m now working with them part-time as a Data Scientist.

• This year we plan to publish papers on the methods we’ve developed.

• Skimlinks are UK experts in Apache Spark – I want to increase my expertise in this domain and do further research into ML on Spark.

• Skimlinks have deepened their relationship with CSML – new masters and postdoc projects are in the pipline.

Page 14: My BITE Fellowship Edward Challis. This is a picture of me:

Thank you BITE!!Specially: Ryan for being so helpful throughout, Prof

Izzat Darwazeh for making all this happen and Skimlinks + Martin Goodson for being great hosts.