Quantization of Social Data for Friend Advertisement Recommendation System

62
Quantization of Social Data for Friend Advertisement Recommendation System Lynne Grewe and Sushmita Pandey California State University East Bay [email protected]

description

Quantization of Social Data for Friend Advertisement Recommendation System. Lynne Grewe and Sushmita Pandey California State University East Bay [email protected]. Your friends Nathan and Marty will like this . User and Friends. PPARS. Social Network Application. Advertisements. - PowerPoint PPT Presentation

Transcript of Quantization of Social Data for Friend Advertisement Recommendation System

Quantization of Social Data for Friend Advertisement Recommendation System

Quantization of Social Data for Friend Advertisement Recommendation SystemLynne Grewe and Sushmita Pandey

California State University East [email protected] GoalUsing Social Data to make Social Advertisement Recommendations. PPARSSocial Network Application

Social NetworkUser and Friends

AdvertisementsYour friends Nathan and Marty will like this

The ProblemsWhat is the Social data?Which Social Data is useable/best?How do we capture and analyze it?How do relate Social data to Advertisements?How do we deliver a Social Advertisement?

The EnvironmentSocial Network: MySpace, Facebook, Hi5, Orkut, LinkedIn, Netlog, more

Overview of TalkPPARS overviewData problem of multiple networksExample of DataParsingQuantizationResultsAdvertisement Recommendation ResultsFuture Work

Our System OverviewPPARS = Peer Pressure Advertisement Recommendation SystemDATAINPUTFRONT ENDGet user-friends quantized

Process groupsQuantizedAdUser AdchoicePeer Pressure Ad SelectionUser-origin

Group /Ad matches & socializeModel AdsSocial DataEvery network can provide different social dataTwo main splits: Facebook and OpenSocial (majority of others).

OpenSocial is an open standard adopted by over 30 containers and growing --- international audience. Allows for standardized access. Popular containers like MySpace, Linkedin, Google, Yahoo!, etc.Corporate support Google, Yahoo!, IBM, Microsoft, and more.Data FieldsAbout MeActivitiesAddressesAgeBody_typeBooksCarsCarsChildrenCurrent_LocationDate_Of_BirthDrinkerDrinkerEmailsEthnicityFashionFoodGenderHappiest_whenHas_appHeroesHumorIDInterestsJob_interestsJobsLanguages_SpokenLiving_ArrangmentsLooking_forMoviesMusicNameNetwork PrescenseNick NamePetsPhonePolitical ViewsProfile songProfile urlProfile video quotesRelationship status ReligionRomanceScared OfSchoolsSexual OrientationSportsStatus TagsThumbain UrlAddressesTime ZoneTurn OnsTurn OffsTV ShowsURLSSome Example DataAboutMeOk, so I am a graduate of with degrees in Philosophy, and Religion. I currently live in with my wife and daughter. I enjoy Snowboarding/skiing, Motorcycles, computers, sports cars, and hanging out with friends. Some Example DataAge33BooksThe Professor and the Madman, Plato, Aristotle, Locke, Hume, Kant, luscombeMoviesThings to do in Denver when yer dead, The Departed, Encino Man, Real GeniusMusicVery Eclectic, including Pennywise, Disturbed, System of a Down, Linkin Park, Senses Fail, Mudvayne, Goldfinger, and a bunch of others I am sure I cannot remember at this timeMusicallen to // chimaira // sw1tched // bleed the sky // destiny // 40 below summer // endo // nothingface // enhancer // watcha // lamb of god // soilwork // skrape // flaw // unearth // slodust // deftones // raunchy // devildriver // reveille // american head charge // nonpoint // stutterfly // factory 81 // in flames // (hed) p.e. // dry kill logic // primer 55 // 36 crazyfists // sevendust // taproot // candiria // bionic jive // funeral for a friend // .....TelevisionSmallville, herosSome Example DataInterestsSnowboarding/skiing, Motorcycles, computers, sports cars, and hanging out with friends.Some Example DataStatusMarriedStatusIn a RelationshipSmokerNoDrinkerYesHeroesFatherHeroesFreie Stelle als Held zu vergeben, Bewerbungen bitte an mich...Looking_forNetworking , FriendsEthnicityWhite / CaucasianChildrenProud parentSexual_OrientationStraightSome Example DataSchoolsUniversity Of Nevada-Reno Reno, NV Graduated:N/A Degree: Master's DegreeMajor: Hydrogeology

2007 to Present Purdue University-Main Campus West Lafayette,Indiana Graduated:2003 Student status: AlumniDegree: Bachelor's DegreeMajor: PhilosophyMinor: CPTClubs: Purdue Student Government Liberal Arts Student CouncilGreek: Delta Chi

2001 to 2003 Reed Hs Sparks, NV Graduated:N/A Student status: AlumniDegree: High School DiplomaSocial Data which?Not all networks provide access to same data Users can keep information privateNot all data is socialNot all data is directly useful for advertisersData Current_LocationDate_Of_BirthAddressesPhoneNot typically available / private

Not all data is social

Not all data is directly useful for advertisersIDNameHas_appNich_NameNetwork PresenceProfile urlProfile songProfile videoThumnail URL URLsDrinkerEmailsEthnicityFashionFoodInfrequent dataFor our scheme need in common data to be able to reason over in common feature space.Data that is NOT frequent:CarsFashionFoodHumorPolitical ViewsPetsHeroes

Social Data - whichFirst go around- based on network availability and commonality, user prevalence and estimated advertisement usefulnessBalance between small sample space and feature dimensionality

About MeActivitiesAgeGenderBooksTVMusicLooking ForDrinkerRelationshipEthnicityReligionLanguageInterests

Date_Of_Birth

Smoker

PPARS Front EndUser DataPARSINGIndividual Social Data Tokens

CodebooksWeb ServicesFriend 1 dataFriend2 dataFriendX dataUser-origin

OntologyCodebookQuantized

Set of User and Friend Quantized Data Vectors

QUANTIZATIONI like cars, have 2 kids,..Movies: Star WarsAge= 30 ..

ParsingCreate small social data tokens to passto QuantizationNull Data TestRaw Social DataHierarchical SegmentationSplit by . / ! / ?Split by : Split by - Split by ; Split by ,Individual Social Data Tokens

I like lots of movies. Like:Star Wars, Star Wars II, Jaws.And I love Harrison Fords acting.I like lots of movies LikeStar WarsStar Wars IIJawsAnd I love Harrison Fords acting.Parsing ExampleAbout Me input = "I work as an engineer at Motorola. I work in the peripherals department and do chip design. I am doing some management.

Resulting Social Data Tokens:I work as an engineer at MotorolaI work in the peripherals department and do chip designI am doing some managementParsing ExampleInterests input = Internet, Movies, Reading, Karaoke,Building alternate communities

Resulting Social Data Tokens:InternetMoviesReadingKaraokeLanguageBuilding alternative communitiesParsing ExampleMusic input = Bands: Superdrag, Weezer, The Doors, The Beach Boys, Journey Solo Artists: Billy Joel, Albums: Appetite for Destruction - Guns & Roses; Blue - Weezer

Resulting Social Data Tokens:BandsSuperdragThe DoorsCheap TrickThe Beach BoysJourney Solo ArtistsBilly JoelAlbumsAppetite for Destruction Guns & RosesBlue Weezer

Lost formatting of line return between Journey and Solo ArtistsParsingSimple technique of segmentationFuture work include semantics of phrases to detect potential headings, syntax rules around delimiters like : and QuantizationTake a social data token and translate it into a numerical feature vector. I like cars Cars = 0.2

For each social data field need to create meaningful feature vector elements.For each social data field need to come up with techniques/algorithms to translate the raw social data token into support for its different feature vector elements.Quantization- feature vectorPattern Recognition and Matching are later parts of PPARSNeed numerical representations for this of our user, friend social data and also to represent Ads.

I like cars =???what ad??

Cars = 0.2 Ad with cars around 0.2Quantization feature vectorFor each social data element like About Us, Gender, Movies we have designed its own feature vector.Result of technique used to quantize the input social token dataResult of studying keywords /trends in user database of sample social tokens.

To understand this ---- lets first discuss techniques used to quantize social data tokens as it related to the type of data element.Quantization and Social Data TypeNumerical Data

Data is naturally numerical i.e. Age, date of birthCan be quickly and effectively translated into number in some defined range: Address can be translated into lattitude and longitudePhone again limited in digitsTime zone again predefined ranges

Categorizable DataData where there is a predefined accepted taxonomy i.e. movies their genreData where through sample analysis and advertisement goals categories can be derived Example: interests, about me, food, fashionIndexed DataThis is data that has defined sets of values specific to either container or OpenSocial. Example : smoker = yes, no, occasionally, quit, neverOther examples: gender, relationship, drinker, sexual orientationOtherThis is data for which we can not easily derive an algorithm for categorizing. Examples Profile Image , Profile Song URL, etc.Collapsing of DataSome data fields have almost same meaning or content typically greatly overlapsAbout Me and Interests (and even Status)Age and Date of Birth

Categorizable DataThis is the bulk of the data fields: About Me, Interests, Music, Movies, TV, Books, Looking For, Religion, Ethnicity, Language

Determine Feature Elements:Accepted standard taxonomies Web Service taxonomiesAdvertisement driven taxonomies

PPARS Front EndUser DataPARSINGIndividual Social Data Tokens

CodebooksWeb ServicesFriend 1 dataFriend2 dataFriendX dataUser-origin

OntologyCodebookQuantized

Set of User and Friend Quantized Data Vectors

QUANTIZATIONI like cars, have 2 kids,..Movies: Star WarsAge= 30 ..

Categorization: Web ServiceFor some of our social data fields we are able to utilize popular web services to convert our social data tokens into search hits that have categorized information associated with them.

Example: Internet Video Archive and IMDBUse movie genreIVA movie search by actor Robert Redfordhttp://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford

Some of the Results :- - - THE UNFORESEEN English United States Two Birds Films 3018 Not Rated Documentary 13 IVA movie search continuedhttp://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford

9/16/2008 2/29/2008 Laura Dunn 36635 Robert Redford 7105 Willie Nelson 8591 Ann Richards 36642 Gary Bradley 36637 IVA movie search continuedhttp://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford

9/16/2008 http://videodetective.com/titledetails.aspx?publishedid=947964 -1 - -1 -1 false 2008 http://content.internetvideoarchive.com/content/photos/1250/05253626_.jpg 164 3/20/2008 8:00:00 AM Movie 947964 4/22/2011 1:57:00 PM

AND MORE !!!!

selected GENRE

IVA genres --- our movie feature elementsVideoCategoryNot AssignedWesternAction-AdventureChildren'sComedyDramaFamilyHorrorMusicalMystery-SuspenseNon-FictionSci-FiWarHealth/ WorkoutDocumentaryThrillerBiographyRomanceMovie QuantizationFor each Social data token Adam Sandler , Star Wars we can get multiple hits.

Example, Robert Redford first 8 hits:Drama = 5Western = 1Documentary = 2

Issues: How do we know if actor name, movie title, director or other? Multiple hits for actor or director ---what do we do? (evidence them all)Multiple hits for movie title what do we do? (take first hit)

These genres become our Movie feature elementsOrder of Movie QuantizationGiven any social data element parsed from the users MOVIE data, we cannot know apriori if it is a title or actor or directors name. It may even be the genre of movies a user likes.

Title search (take first hit)Actor search (evidence all)Director Search (evidence all)Keyword Matching (see next)Quantization Result 1Up,Forrest Gump,Rear Window,District 9,Pac-Man,WALLE,My Flesh and Blood, MacMusical, Yields:MOVIE_FAMILY=0.6, MOVIE_SCIFI=0.2, MOVIE_DOCUMENTARY=0.4, MOVIE_THRILLER=0.2Quantization using other servicesTV - IMDB, http://www.imdb.com/search/title?title_type=tv_series&title=".Books - Google Books Search, http://books.google.com/books/feeds/volumes?Music - IVAs music API http://api.internetvideoarchive.com/Music/**Quantization via Keyword MatchingWhat do we do when there is no pre-determined taxonomy and no services for database hits?Natural Language Processing techniques

Currently employ simple (but, effective and efficient) technique of Keyword matching /lookupCreate database of predetermined phrases/ keywordsLookup scheme to quantize social data token(s).Individual Social Data Tokens

CodebooksOntologyCodebookQuantized

Set of User and Friend Quantized Data Vectors

I work as an engineer About ME lookup??Watch a lot of drama Movies look up ??Keyword DatabaseUsed on : About Me / Interests, Religion, Ethnicity, Looking For, Language, Relationship

Secondary use: Books, TV, Music, MoviesWhen service fails to provide any hitsKeyword Database Creationmanual scanning of hundreds (at starting level) of user profilesdomain specific expert (human) knowledgedictionaries and taxonomies when exist

Issue: how determine weights for every entryExpert determined (consistency) or all equal valued (no sense of importance)Issue: at very beginning level---can we create a dictionary for everything ---no --- are there more advance NLP techniquesSome arbitrary Keyword DB entriesABOUT_MEHOMECats0.2ABOUT_MEHOMEChildren0.2ABOUT_MEHOMEDaughter0.2ABOUT_MEHOMEDog0.2ABOUT_MEHOMECats0.2ABOUT_MEHOMEChildren0.2ABOUT_MEHOMEDaughter0.2ABOUT_MEHOMEDog0.2ABOUT_ME HOMEhome 0.5

Some arbitrary Keyword DB entriesABOUT_MEENTERTAINMENT Shopping0.2ABOUT_MEENTERTAINMENT Shows0.2ABOUT_MEENTERTAINMENT Sing0.2ABOUT_MEENTERTAINMENT Ski0.2ABOUT_MEENTERTAINMENT Songwriter0.2

Keyword DB- evidence weightIssue: how determine weights for every entryExpert determined (consistency) or all equal valued (no sense of importance)

System options: DB weights can take on different values, option to run with all weights equal.Keyword DB- ??Issue: at very beginning level---can we create a dictionary for everything ---no --- are there more advance NLP techniques to explore for inferences.

While users can write anything (and do), remember we are focuses on Advertisement Recommendation --- so the scope of our language is limited to hits related to our feature vector elements.this is a constrained problemHome, Entertainment, Smoking, Work, Social, Movies, TV, Shopping, Books, etc.these are the kinds of areas we are concerned with.

Types of Keyword MatchingSTRICT Social data token must match exactly a DB entryDrama Drama I like Drama Drama X

DB_ENTRY_CONTAINS_DATA_ELEMENTData token must exist inside the DB entryDrama Drama and Comedy

DB_ENTRY_PARTOF_DATA_ELEMENTPart of data token matches DB entry (this is further segmenting data token) I like Drama Drama

Quantization Results different kinds of Keyword Matching I am a student and I work and love cars'Output STRICT: No hitsABOUT_ME_ENTERTAINMENT = -1 ABOUT_ME_WORK = -1ABOUT_ME_HOME] = -1ABOUT_ME_SOCIAL = -1ABOUT_ME_FOOD = -1

Quantization Results different kinds of Keyword Matching I am a student and I work and love cars'Output DB_ENTRY_CONTAINS_DATA_ELEMENTNo hitsABOUT_ME_ENTERTAINMENT = -1 ABOUT_ME_WORK = -1ABOUT_ME_HOME] = -1ABOUT_ME_SOCIAL = -1ABOUT_ME_FOOD = -1

Quantization Results different kinds of Keyword Matching I am a student and I work and love cars'

Output DB_ENTRY_PARTOF_DATA_ELEMENTkeyword = student ABOUT_ME_WORK =0.2 keyword = work ABOUT_ME_WORK =0.5 keyword = cars ABOUT_ME_ENTERTAINMENT =0.2 keyword = LOVE ABOUT_ME_HOME=0.2 ABOUT_ME_SOCIAL=0.2 ABOUT_ME_ENTERTAINMENT = 0.2ABOUT_ME_WORK = 0.7ABOUT_ME_HOME = 0.2ABOUT_ME_SOCIAL = 0.2ABOUT_ME_FOOD = -1

Quantization Results 2 using DB_ENTRY_PARTOF_DATA_ELEMENTFell in love with computers at 11, never got over it... Nonetheless, I have always understood that human problems are solved by people, not technology. My lifes work has been to empower communities to design and build their own solutions. 6 data tokens from parsingRESULTS: ABOUT_ME_ENTERTAINMENT = 0.2ABOUT_ME_WORK = 0.5ABOUT_ME_HOME = 0.2ABOUT_ME_SOCIAL = 0.2ABOUT_ME_FOOD = -1

Quantization Result 3 good null resultsi am xing ju. test ABOUT ME for opensocial.Parsed results:i am xing jutest ABOUT ME for opensocial

NO keyword db hits ABOUT_ME_ENTERTAINMENT=> -1 ABOUT_ME_WORK => -1 ABOUT_ME_HOME => -1 ABOUT_ME_SOCIAL => -1 ABOUT_ME_FOOD => -1

Quantization ResultsGarbage in and Garbage out

LoL really dude that is the way to be no hits

is this garbage LoL = lots of love..could you interpret this to be someone interested in social / friends?? Future deeper interpretation / semantic analysis?IndexedSmoker, Drinker, Gender, Relationship (some networks), Looking for (some networks) , etc.

Example for Drinker:

opensocial.Enum.Drinker.HEAVILYopensocial.Enum.Drinker.NOopensocial.Enum.Drinker.OCCASIONALLYopensocial.Enum.Drinker.QUITopensocial.Enum.Drinker.QUITTINGopensocial.Enum.Drinker.REGULARLYopensocial.Enum.Drinker.SOCIALLYopensocial.Enum.Drinker.YES

Quantized Feature Vector107 elementsNormalize to 0 to 1.0 (near)Advertisement DescriptionExperts manually determine the feature vector weighting for each add.Future to automate this from survey/ input directly from AdvertiserIs there a way to analyze the ad message or image image understanding? Will results even match advertisers goals.

PPARS --- Advertisement MatchingNot focus of this talkCurrently doing variations on KNN with different forms of clusteringEarly results with small advertising database and beginning Keyword database look goodWhat kinds of groups ---groups with user in it or not? based on only in common feature elements or not.

PPARS- Advertisement DeliveryArea of future work could be in effective delivery of social message related to selected add. Now simple form of direct delivery

Based on grouping of same gender and age and strong likesin interests on home.PPARS- Advertisement DeliveryArea of future work could be in effective delivery of social message related to selected add. Now simple form of direct delivery

Based on grouping of same gender and age and drinking.This is a grouping the user is not part of---only friendsYour friends Nathan and Marty will like this

PPARS- Advertisement DeliveryHere the grouping is loose only related by gender and very loosely by age. So the advertisement match is not greatQuestion: should be only serve to strong groups?

Analysis of Advertisement ResultsGroupings are tight when data allowsMatches to advertisements in levels best, top 10, etc. are correct Future WorkParsing more syntax and semantics (NLP)

Parsing differences in different languages.

Quantization extend to Natural Language Understanding in addition/replacement of Keyword matching, effects of different evidence accumulation.

Data Extrapolation using inference to create hits in more feature elements.