Detecting Fake Engagement on Instagram
-
Upload
precog -
Category
Engineering
-
view
185 -
download
0
Transcript of Detecting Fake Engagement on Instagram
Detecting Fake Engagement on Instagram
Indira Sen
linkedin/in/indira-sen-8a6068140
@drealcharbar fb.com/indira.sen.31
Dr. Ponnurangam Kumaraguru(chair)
1
Thesis Committee
- Dr. Anwitaman Datta, NTU Singapore
- Mr. Nitendra Rajput, InfoEdge
- Dr. Ponnurangam Kumaraguru, IIIT Delhi (Chair)
2
Likes on Instagram
3,363 likes
3
Likes on Instagram
1,008 likes
4
Why is Engagement Important on Instagram?
5
Why Fake Likes?
- ‘Influencers’ compensated on engagement: likes and comments
- Incentive to artificially inflate engagement metrics by purchasing likes, like markets or like back networks
- Inflated like count fool potential brand or advertisers into hiring ‘unworthy’ Influencers
6
Motivation
7
- Influencer Marketing - $1B industry- Fake influencers landed deals over
$500
- How do we automatically detect fraudulent likes on Instagram?
Core Thesis Question
Organic Likes- Likers who engage with content- Genuine reach
Inorganic Likes- Likers bought from marketplaces- Artificial reach
- Understanding properties of genuine liking behaviour B : {b1, b2, …, bn}- Reducing the effect of likes which do not match B
8
Thesis Outline
- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion
9
What is a Like Instance?
- Given a poster S whose post p has been liked by liker L, we define a like instance as the tuple (L, p, S)
10
Research Aim
- Find out the features of liker L, post p and S, to determine the probability of liker L genuinely liking that particular post p.
- Identify true reach of poster by determining fake likes received on the posted content.
11
Possible Reasons for Genuine Liking
Homepage: followees’ posts
Explore:Instagram’s
Recommendations
Likes of followees
12
Possible Reasons for Genuine Liking
Based on photos you liked
Based on people you follow
Similar to accounts you interact with
Explore
13
Possible Reasons For Genuine Liking
- Poster is a followee - Poster is a followee of a followee
- Topical interests in common
14
How to get Fake Likes
- Marketplaces
- Like Back collusion networks
- Link Farming hashtags
- Bots15
Architecture Diagram1) Liker meta and last 18 posts2) Poster meta and last 18 posts3) Post meta
Fake Likes
Other Likes
Training Data
Machine Learning
Model
Random unknown Likes
Fake
Not Fake
Features
Features
16
1 - α
α
Data Collection: Fake Likes
Purchased Fake Likes
Fake Likes 1: Likes given by Honeypot victims
Likes on videos with views = 0
Honeypot
Fake Likes 2
victim?
Instagram Featured users
Snowball Sample to
1M
Random sample of
500Honeypot Other Likesnot
victim?
17
Instagram Featured users
Snowball Sample to
1M
Random sample of
500Honeypot Other Likesnot
victim?
Data Collection: Fake Likes
Purchased Fake Likes
Fake Likes 1: Likes given by Honeypot victims
Likes on videos with views = 0
Honeypot
Fake Likes 2
victim?
17
Data Collection: Fake Likes
- Honeypots to trap fake likers bought through a service- If user falls for honeypot then we monitor their liking
behaviour
Honeypot
18
Instagram Featured users
Snowball Sample to
1M
Random sample of
500Honeypot Other Likesnot
victim?
Data Collection: Fake Likes
Purchased Fake Likes
Fake Likes 1: Likes given by Honeypot victims
Likes on videos with views = 0
Honeypot
Fake Likes 2
victim?
19
Data Collection: Other Likes
Purchased Fake Likers
Fake Likes 1: Likes given by Honeypot victims
Likes on videos with views = 0
Honeypot
Fake Likes 2
victim?
Instagram Featured users
Snowball Sample to
1M
Random sample of
500Honeypot Other Likesnot
victim?
20
Data Collection: Other Likes
- Randomly sample 500 users from 1M users who are not honeypot victims
#Likes #Posts #Likers #Posters
Fake 10,417 8,408 500 7,715
Other 11,810 11,644 500 7,631
21
Thesis Outline
- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion
22
Understanding Fake Likes
- Hypotheses indicative of fake liking behaviour
- Validate with 2 sample KS test
- Network effect:- Liker is follower of poster- Liker is follower of follower of poster
23
Liker is Follower of Poster
- Green edges: liker relationship
- Red edges: liker - follower relationship
- Other likes have a higher proportion of follower-likers
24
Other Likes
Fake Likes
Network Effects
25
- 90% fake like instances have only .25 of followee likes
90%
56%
Interest Overlap
- A user will like a post if she shares topical interests with the post
- Affinity: lower the affinity, the higher the overlap
26
Extracting Topics
- Bio, post text and post image- Wikification and Densecap for images
27
Extracting Topics
- Bio, post text and post image- Wikification and Densecap for images
28
Image topics
Post caption topics
Interest Overlap
- A user will like a post if she shares topical interests with the post
- Affinity
- non-commutative29
Affinity
- Affinity outperforms Jaccard distance in terms of discernibility
- post image topics strong indicators of genuine liking
30
- Our metric is able to capture semantic relationship between entities compared to other traditional distance metrics
- 90% of other likes have an average affinity of 0.5 - 90% of fake likes have an average affinity of 0.74
0.740.5
31
Other Features
- Celebrities tend to get more likes (engagement) - Genuine likers will keep coming back - repeated likers- Link Farming hashtags: #like4like, #l4l, #like2follow- Topical hashtags- Posting activity of liker (Badri et al, CIKM’16) and poster- Profile picture of liker: egghead profiles (cheap to
create)
32
Automatic Detection of Fake Likes
- Using features described and a set of ML classifiers
- Fake likes : Other likes ratio → 1:2
- SVM RBF kernel gives best performance
33
Classification Model
- Performance
- Manually look at 100 false negatives and find that 70 of them had high topical overlap
- Liker interest set was small: affinity metric limitation
Precision Recall F1-score
0 0.93 0.96 0.945
1 0.895 0.825 0.86
total 0.92 0.925 0.92
34
In the Wild Experiment
- random 1,34,669 like instances
- Categorize posts into : food, fashion, outdoors, merchandise, people, gadgets, pets, captioned
- We find 8,557 fake likes
- Manually analyze 100 of these and find 78 to be fake35
Thesis Outline
- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion
36
- Enable advertisers to make better decisions- Reduce the effect of fake likes a poster may have
received- Measure Deviation in reach
Reach Estimation
37
Who receives fake likes?
- Users posting about merchandise, outdoors (including travel posts) and people (posts containing faces) have highest deviation from the projected reach.
38
Who receives fake likes?
39
merchandise, outdoors (including travel posts) and people
Most posters do not have high deviation while some users have very high deviation
Do Popular Users have more Fake Likes?
- No, users with lower follower counts who maybe trying to gain a following higher deviation
40
‘Micro Influencers’ have higher deviation
Conclusion
- Automated method to detect fake like instances
- Performs well to identify unseen fake likes on Instagram.
- Find true reach of a user
- Helps advertisers and brands identify users with genuine, meaningful reach
41
Challenges, Limitations and Future Work
- Availability of labeled data, approximations using honeypot
- Data collection constraints, integrate network features
- Improve affinity, improve precision(dynamic features)
- Fine grained topical recommendations for brands and advertisers 42
Acknowledgement
- Anupama Aggarwal, PhD Scholar, IIIT Delhi- Committee members- Srishti Gupta, Divyansh Agarwal, Neha Jawalkar, Sonu
Gupta, Kushagra Bhargava- Siddharth Singh, Shiven Mian- Members of Precog- Family and friends
43
References
- https://instamacro.com/- http://nymag.com/selectall/2017/08/fake-instagram-accou
nt-earns-sponsored-influencer-money.html- http://www.independent.co.uk/life-style/gadgets-and-tech/
social-media-experiment-fake-instagram-accounts-make-money-influencer-star-blogger-mediakix-a7887836.html
- http://nymag.com/selectall/2017/08/fake-instagram-account-earns-sponsored-influencer-money.html
44