Thesis proposals Ideas for research from social bots to beer · Personalized profile preferences...
Transcript of Thesis proposals Ideas for research from social bots to beer · Personalized profile preferences...
Florian Daniel [email protected]
Thesis proposals Ideas for research from social bots to beer
February 25, 2019
Detection of harmful social bots
Identification of harmful communication patterns
Conversational screen readers
Domain-specific content extraction
(Social) Bot = algorithmically driven entity that behaves like a human in online communications
Talk with user
Redirect user
Write post
Comment post
Forward post
Like message
Follow user
Create user
Chat
Post
Endorse
Action
Participate
Abuse
Not regulated
Regulatedby law
Denigrate10
Be grossly offensive
3
Be indecent or obscene
4
Be threatening2
Disclose sensitive facts
1
Make false allegations
6
Deceive
Invade space
Spread misinformation
Clone profile
Mimic interest
AASlangBoostJuice
Geico
SethRich
Puma
Oreo
MSTay
DeathThreat
eCommerce, CustomerSvc
WiseShibe
DatingIvana
Trump, ColludingBots
PolarBotInstagres
InstaClone, JasonSlotkin6
SMSsex
Spam
Spam
Examples
F. Daniel, C. Cappiello, B. Benatallah. Bots Acting Like Humans: Understanding and Preventing Harm. IEEE Internet Computing, 2019, accepted for publication. https://ieeexplore.ieee.org/document/8611348
Empirical study shows: bots may cause harm to humans
What bots doWhen abuse happens
What else can they do?
What else can go wrong?
Account
Tweets
Detection of harmful social bots
Harm1 HumanHarm2 HarmN
How do we identify and classify bots according to the harm they may cause?
What has been done so far?
Not Safe For Work
Profile picture stolen from existing people (usually women)
Personalized profile preferences aimed at emulating actual users
Catchy profile description paired with unsafe URL
Tweets aim to redirect users to untrusted URLs
Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter. MSc Thesis, Politecnico di Milano, December 2018.
News-Spreader
High number of interactionsPropaganda hashtags and messages in description
Lots of (political) news retweeting
Propagandistic profile picture and tile
Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter. MSc Thesis, Politecnico di Milano, December 2018.
Spam-Bot
High number of tweets with high frequency
Corporate link in description paired with messages of job offers (product sales)
Repetitive tweets aimed at spamming URLs
Profile picture with corporate logo
Profile tile with catchy messages
Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter. MSc Thesis, Politecnico di Milano, December 2018.
Fake-Follower
Low number of content posting (usually null) Following as main interaction
Poor description (usually empty)
Optional (re)tweeting activity
Optional profile picture (usually missing)
Poor profile customization
Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter. MSc Thesis, Politecnico di Milano, December 2018.
Thesis ingredients
Creation of datasets
Feature selection and engineering
Multi-class ensemble classifier
Figure 7.3: Client - Server architecture
Inception neural network. Users with less than 10 tweets, or less than 10
tweets with embedded images, require a shorter timespan to be classified,
which is, in average up to 5 seconds. Figure 7.4 shows the repartition of
computational time, along with the processes involved. It is based on the
timespan needed to classify users with at least 100 tweets and with at least
10 tweets with images.
7.2.1 Engine
The engine of the web application is a Python 3 script. It performs all the
steps described in the pipeline execution section 6.2. The models, that have
been previously fitted with data, serialized and stored, are now loaded by
the Python script. They have to perform a single prediction at a time. In
addition to the models we built for the classification, the pre-trained convo-
lutional neural network for NSFW recognition has been introduced to the
pipeline, in order to infer on the media contents posted by the examined
user. The first step consists in calling the Twitter APIs to retrieve user’s
data and its most recent tweets, up to 100. The script, then, handles the
109
Web appCan we add more types of harm?How to train classes as users use BotBuster?
Which potential harms can we identifyinside the code of the bots?
Bot code repositories
GitHub
Identification of harmful communication patterns
Harmful code patterns
Next: patterns search engine + web site for users
12 A. Millimaggi and F. Daniel
Follow
Like
Tweet
Mention
Retweet
Talk to
Pause
Store
Indiscriminate follow
Whitelist-based follow
Blacklist-based follow
Phantom follow
Indiscriminate like
Whitelist-based like
Blacklist-based like
Mass like
Fixed-content tweet
AI-generated tweet
Trusted source tweet
Indiscriminate mention
Opt-in mention
Targeted mention
Whitelist-based mention
Blacklist-based mention
Indiscriminate retweet
Whitelist-based retweet
Blacklist-based retweet
Mass retweet
Indiscriminate talk
Fixed-content talk
AI-generated talk
Talk with opt-in
Targeted talk
Mimic human
Satisfy API contraints
Store persistently
Action Pattern Inva
de s
pace
Disc
lose
sens
itive
fact
s
Denig
rate
Be g
ross
ly of
fens
ive
Be in
dece
nt o
r obs
cene
Be th
reat
ening
Mak
e fa
lse a
llega
tions
Dece
iveSp
am
Spre
ad m
isinf
orm
ation
Mim
ic int
eres
tCl
one
profi
le
Enables Prevents Vulnerable to content abuse Vulnerable to trust abuse
Abuse
Fig. 3: Potential e↵ects of actions and patterns on the users in online communi-cations: patterns either enable, prevent or are vulnerable to abuses. For example,following an account with a denigrating or o↵ending username may perpetuateand endorse the denigration or o↵ense.
Andrea Millimaggi and Florian Daniel. On Social Bots Behaving Badly: Empirical Study of Code Patterns on GitHub. Submitted for publication to ICWE 2019, under review.
Python
What else can we learn from the code of bots? Is it possible to trace back from messages to code?
Conversational screen readers
Amazon Alexa
Google Assistant
Can we extract conversational knowledge from webpages? What about “talking” to websites?
Domain-specific content extraction
How effectively can we extract recipes from free text? And how do we do it?
Typical data science steps
Domain understanding
Data collection
Manual inspection of data
Hypotheses formulation
Feature engineering, data labeling
Algorithm engineering (from AI/machine learning to statistics)
Validation and hypotheses verificationOnline tool
http://www.floriandaniel.it
Soft skills
Proposals
Template
Florian Daniel