Readinggroup xiang 24112016

Reading GroupDeep Crossing: Web-Scale Modeling

without Manually Crafted Combinatorial Features

Presenter:Xiang Zhang

02/05/2023

1.Background2.Abstract3.Why choose it4.what’s the main idea5.How it works6.Implementation7.Expermentation8.Conclusions9.Experience

Main Content

1. Background

02/05/2023

Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features

Authors from Microsoft Research

Keywords Neural Networks, DL, CNN, etc.

Conference KDD 2016

1. Background

02/05/2023

KDD (A*) Conferences on Knowledge Discovery and Data mining

SIGKDD Special Interest Group on Knowledge Discovery in Data

KDD 2017 Aug.13-17, Halifax, CanadaDecember 9, 2016 (DDL)

2.Abstract

Combine features Important and useful, but manually craft is time consuming and experience needed, accuracy unguaranteed especially in large scale, variety and volume of features

ContributionDeep neural network to automatically combine features

Tool Computational network Tool Kit, multi-GPU platform

02/05/2023

3. Why choose it

Automatically combine features (reduce D)

Web-scale (Massive data)

Feature in different types & Dimensions

Better performance

02/05/2023

4.what’s the main ideaWhat is feature extraction?

Individual features

An individual measurable property of a phenomenon being observed. (Representation of the data)

Combinatorial features

Defined in the joint space of individual features, to make the model shorter training time, simpler, better generalization.

02/05/2023

Manually:• Time• Experience

Automatically

4.what’s the main idea

5.How it works

02/05/2023

ModelArchitecture

02/05/2023

max(0, )O Ij j j jX W X b

( )j jm n ( 1)jn

( 1)jm

( 1)jm ( 1)jm

( 1)jm

( 1)jn

Reduce Dj jm n

5.How it works5.1 Embedding layers

02/05/2023

(0, )maxO Ij j j jX W X b

5.1 Embedding layers

Rectified linear unit (ReLU)• Elements non-negative

Activation function: a node defines the output of that node given an input

• ReLU• Logistic• Tanh• Sigmoid

5.How it works

02/05/2023

0 1, , ,O O O OKX X X X

5.2 stacking layers

02/05/2023

0 1, , ,O O O OKX X X X

256, set 256

embedding & stackingj jn m

stacking without embeddingjn

Feature Number

Inputs K

Embedding then stacking n

Stacking(non-embedding) K-n

Stacking all K

5.2 stacking layers

Stacking rules:Threshold: 256

5.How it works

02/05/2023

0 1 0 1( , , , , )OR IR IRF W W B BX X X

𝑋 𝐼𝑅

5.3 Residual layers

• Inputs and outputs have the same size• Residual Unit is first used beyond image

recognition

5.How it works

02/05/2023

5.4 Scoring layers

Sigmoid function:

𝑋 𝐼𝑅

5.How it works

02/05/2023

1log ( log( ) (1 ) log(1 ))N

i i i ii

loss y p y pN

5.5 Objective function

Objective function: loss function or its negative

N: No. of samples : sample label : output of Model(predict)

𝑋 𝐼𝑅

𝒚 𝒊𝒑 𝒊

02/05/2023

5.5 Early Crossing vs. Late Crossing

Deep CrossingDSSM

(Deep Semantic Similarity Model)

5.How it works

6.Implementation

02/05/2023

SoftwareComputational Network Toolkit (CNKT)

Same theoretical foundation with Tensorflow

Hardware Multi-GPU platform

24 days (1 GPS) to 20 hours (32 GPUs)

02/05/2023

7.Experimentation

7.1 Dataset

02/05/2023

7.Experimentation

7.2 Performance on a Pair of Text Inputs

DSSM: late crossing

DP: early crossing

DC>DSSM

7.Experimentation

2 May 2023

7.2 Performance on a Pair of Text Inputs

Production Model: one model can be used in sponsored search as baseline

DSSM < DC< Production

DC Main advantage: deal with many individual features

7.Experimentation

2 May 2023

7.3 Beyond Text Input

All features works best

7.Experimentation

2 May 2023

Only counting feature is weak

7.Experimentation

2 May 2023

Counting feature is useful

7.Experimentation

2 May 2023

Performance changes a lot as features number changes; log loss suffers a big fluctuation with different feature combination. So that feature selection is meaningful.

7.Experimentation

2 May 2023

7.4 Comparison with Production Models

2.2 billion samples

DC perform better with much less dataset

DC is easier to build and maintain

8.Conclusions

Deep Crossing work well in automatically feature combinatorial in large scale

Need less time and experience

02/05/2023

9.Experience

02/05/2023

• Deep learning (LSTM, CNN, etc.) can extract feature automatically, we can compare the efficient of them with this model

• we can use the raw data instead the individual features to train

• In different domains such mobile sensing, recommender system

2902/05/2023

Thanks！

3002/05/2023

Questions?

Readinggroup xiang 24112016

Engineering

Transcript of Readinggroup xiang 24112016

Semiconductor Memories (Yu xiang).ppt

Peter Xiang Gao, S. Keshav University of Waterloo.

Teoh jun xiang tutorial

Bian Xiang

Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Journalof Agro⁃EnvironmentScience HUANG Jun-xiang, LIU ...

Applications and Insights - University of Cambridgecbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/cnn_applications.pdf · Applications and Insights Christof Angermueller and Alex

FRM 2011 Xiang Song and Qiao Zou

Malaria - Massachusetts Institute of Technologyscripts.mit.edu/~varun_ag/readinggroup/images/8/8b/Malaria_MIRG.ppt · Malaria • Vector-borne disease • Mosquitoes are vectors for

Jaume Llibre and Xiang Zhang

Prof Liu Xiang YangProf Liu Xiang Yang Physics, NUSphyweb.physics.nus.edu.sg/~Biophysics/pc3267/A-PC 3267... · 2008. 1. 13. · yProf. Dr. Xiang Yang LIU yTel: 6516 2812 yEmail:

Ding Xiang

Teoh jun xiang tutorial 29 nov

JUN LI XIANG e-Portfolio

Yuanyuan Wang a , Xiang Li a,b

Fast Inference in Deep Generative Modelscbl.eng.cam.ac.uk/.../ReadingGroup/FastInferenceInDeepGenerativeM… · Fast Inference in Deep Generative Models Isabel Valera, Jonas Umlauft

HO EN XIANG - Universiti Tunku Abdul Rahman

Junli xiang eportfolio (final 3mb)

Literature Project By Ryan & Jin Xiang[1]

Joey Yap-Mian Xiang - Discover Face Reading.pptx