Schedule for near future….wcohen/10-605/sgd-mf.pdf · • Rich Dad’s Guide to Investing: What...

Schedule for near future….

Previous SGD

Midterm •  Will cover all the lectures scheduled through today •  There are some sample questions up already from

previous years – syllabus is not very different for first half of course.

•  Problems are mostly going to be harder than the quiz questions

•  Questions often include material from a homework –  so make sure you understand a HW if you decided

to drop it •  Closed book and closed internet •  You can bring in one sheet

– 8.5x11 or A4 paper front and back

Wrap-up on iterative parameter mixing

NAACL 2010

Parallelizing perceptrons – take 2

Instances/labels

Instances/labels – 1

w -1 w- 2 w-3

Split into example subsets

Combine by some sort of

weighted averaging

Compute local vk’s

w (previous)

Recap: Iterative Parameter Mixing

Parallelizing perceptrons – take 2

Instances/labels

w -1 w- 2 w-3

Split into example subsets

Combine by some sort of

weighted averaging

Compute local vk’s

w (previous)

Parallel Perceptrons – take 2

Idea: do the simplest possible thing iteratively. •  Split the data into shards •  Let w = 0 •  For n=1,…

•  Train a perceptron on each shard with one pass starting with w • Average the weight vectors (somehow) and let w be that average

Extra communication cost: •  redistributing the weight vectors •  done less frequently than if fully synchronized, more frequently than if fully parallelized

All-Reduce

ALL-REDUCE

Introduction •  Common pattern:

– do some learning in parallel –  aggregate local changes from each processor

•  to shared parameters – distribute the new shared parameters

•  back to each processor

–  and repeat….

•  AllReduce implemented in MPI, also in VW code (John Langford) in a Hadoop/compatible scheme

ALLREDUCE

Gory details of VW Hadoop-AllReduce •  Spanning-tree server:

– Separate process constructs a spanning tree of the compute nodes in the cluster and then acts as a server

•  Worker nodes (“fake” mappers): – Input for worker is locally cached – Workers all connect to spanning-tree server – Workers all execute the same code, which

might contain AllReduce calls: • Workers synchronize whenever they reach an all-

reduce 16

Hadoop AllReduce

don’t wait for duplicate jobs

Second-order method - like Newton’s method

2 24 features ~=100 non-zeros/example 2.3B examples example is user/page/ad and conjunctions of these, positive if there was a click-thru on the ad

50M examples explicitly constructed kernel è 11.7M features 3,300 nonzeros/example old method: SVM, 3 days: reporting time to get to fixed test error

Matrix Factorization

Recovering latent factors in a matrix

m columns

v11 …

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

What is this for?

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

MF for collaborative filtering

What is collaborative filtering?

m movies

v11 …

… …

V[i,j] = user i’s rating of movie j

m movies

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

MF for image modeling

Data: many copies of an image, rotated and shifted (matrix with one image/row)

Image “prototypes:” a smaller number of row vectors (green=negative)

Reconstructed images : linear combinations of prototypes

MF for images

10,000 pixels

1000 * 10,000,00

… …

a1 a2 .. … am

b1 b2 … … bm v11 … … …

… …

V[i,j] = pixel j in image i

2 prototypes

MF for modeling text

•  The Neatest Little Guide to Stock Market Investing •  Investing For Dummies, 4th Edition •  The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

•  The Little Book of Value Investing •  Value Investing: From Graham to Buffett and Beyond •  Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not! •  Investing in Real Estate, 5th Edition •  Stock Investing For Dummies •  Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

m terms

doc term matrix

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

V[i,j] = TFIDF score of term j in doc i

Investing for real estate

Rich Dad’s Advisor’s:

The ABCs of Real Estate

Investment …

The little book of common

sense investing: …

Neatest Little Guide to Stock

Market Investing

MF is like clustering

k-means as MF

cluster means

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

original data set indicators for r

clusters

How do you do it?

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

talk pilfered from à …..

KDD 2011

m movies

… …

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

for image denoising

Matrix factorization as SGD

step size why does this work? 55

Matrix factorization as SGD - why does this work? Here’s the key

claim:

Checking the claim

Think for SGD for logistic regression •  LR loss = compare y and ŷ = dot(w,x) •  similar but now update w (user weights) and x (movie weight)

What loss functions are possible?

N1, N2 - diagonal matrixes, sort of like IDF

factors for the users/movies

“generalized” KL-divergence

What loss functions are possible?

ALS = alternating least squares

talk pilfered from à …..

KDD 2011

Like McDonnell et al with perceptron learning

Slow convergence…..

More detail….

•  Randomly permute rows/cols of matrix •  Chop V,W,H into blocks of size d x d

– m/d blocks in W, n/d blocks in H •  Group the data:

– Pick a set of blocks with no overlapping rows or columns (a stratum)

– Repeat until all blocks in V are covered •  Train the SGD

– Process strata in series – Process blocks within a stratum in parallel

More detail….

Z was V

More detail…. •  Initialize W,H randomly

– not at zero J •  Choose a random ordering (random sort) of the

points in a stratum in each “sub-epoch” •  Pick strata sequence by permuting rows and columns

of M, and using M’[k,i] as column index of row i in subepoch k

•  Use “bold driver” to set step size: –  increase step size when loss decreases (in an

epoch) – decrease step size when loss increases

•  Implemented in Hadoop and R/Snowfall

Wall Clock Time 8 nodes, 64 cores, R/snow

Number of Epochs

Varying rank 100 epochs for all

Hadoop scalability

Hadoop process setup time starts

to dominate

Hadoop scalability

Schedule for near future….wcohen/10-605/sgd-mf.pdf · • Rich Dad’s Guide to Investing: What...

Documents

Transcript of Schedule for near future….wcohen/10-605/sgd-mf.pdf · • Rich Dad’s Guide to Investing: What...

Impact Investing for Trusts - nhepc.org€¦ · A. Socially Responsible Investing (SRI) B. Impact Investing C. Values Based Investing, Triple Bottom Line Investing, Ethical Investing,

Rich Dads Guide to Investing: What the Rich Invest in That the Poor and Middle Class Do Not!

Becoming a Millionaire: Saving and Investing. Starting a Savings Plan “Getting rich is not a function of investing a lot of money; it is a result of investing.

Africa Focus - White & Case · 2019. 3. 26. · Venture investing Program/thematic investing Impact investing ESG investing Socially responsible investing Traditional investing Approximately

Investing in Instructional Technology: Accelerating ...€¦ · facilitated student learning, and virtual schooling. Students must have access to rich digital content and become proficient

Sustainable Investing in Wealth Management00c1fb9d-5d74-441a-903c-b59b53e7… · -Get a clear understanding of sustainable investing and rich industry knowledge through various cases.-Dive

Rich Dad guide to investing - WordPress.com · RICH DAD'S guide to investing PHILOSOPHY Wealth is a way of thinking and not a money amount. The Industrial Age with its promise of

Integrating representation learning and skill learning in ...wcohen/postscript/simstudent-aij-2014.pdfhuman learning, and artiﬁcial intelligence, by advancing the goal of creating

Value Investing: Activist Investing Aswath Damodaran.

THE LANDSCAPE FOR IMPACT INVESTING IN EAST … Africa...2 • THE LANDSCAPE FOR IMPACT INVESTING IN EAST AFRICA INTRODUCTION Sudan is East Africa’s second biggest economy, rich in

FAMILY OFFICE MOTIVES IN REAL ESTATE INVESTING - Jeff Hall & Corbin Rich, MANCHESTER CAPITAL MANAGEMENT, LLC

Investing (Or…some beginning thoughts on how to build wealth since I can’t sing like…well, somebody who is crazy-rich because they CAN sing)

Truly Rich Club's Stocks Update: Step by Step Guide to Investing in the Stock Market

ACTIVIST INVESTING IN ASIA - Oasis Management CompanyACTIVIST ACTIVIST INVESTING IN ASIA INVESTING IN ASIA Japan˜Constructivism. Activist investing in Asia 2 ... Activist Investing

Real Estate Investing Made Easy How to get rich, for real Bruce M Firestone.

Investing basics Getting started - msc - Home · Investing is not a get-rich-quick scheme and it is not gambling. Gambling is putting your money at risk by betting on a random outcome

Structured Literature Image Finder: Parsing Text and ...wcohen/postscript/jws-2010.pdf · Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature ...

Investing in Financial Assets Investment Strategies Investing in Stocks Investing in Bond

Limited (“SG Hiscock”) (“LaSalle”) (“Fund”)/media/equitytrustees/... · 2. How the SGH LaSalle Global Property Rich Fund works 3. Benefits of investing in the SGH LaSalle

Improving Application Performance Through Space Optimizationspeople.redhat.com/wcohen/ncsu2004/Space Optimizations.pdf · performance changes by looking at such characteristics as