Blackpink Data
Transcript of Blackpink Data
Blackpink DataRelease 1.0.0
Marco Fantauzzo
May 04, 2022
CONTENTS:
1 Indices and tables 1
2 How to Build 32.1 Set up your machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Clone the repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.3 Install dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.4 Set API keys as environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.5 Twitter API keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.6 YouTube API key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.7 Spotify API key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.8 Instagram USERNAME and PASSWORD . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Fork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 First run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Standard run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.4 Schedule the bot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Modules 73.1 Main script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Tweet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Utils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Birthdays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.5 YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.6 Instagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.7 Spotify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.8 Billboard Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Python Module Index 15
Index 17
i
ii
CHAPTER
ONE
INDICES AND TABLES
• genindex
• modindex
• search
1
Blackpink Data, Release 1.0.0
2 Chapter 1. Indices and tables
CHAPTER
TWO
HOW TO BUILD
2.1 Set up your machine
2.1.1 Python
Make sure you have installed:
• Python 3.8
• pip
Please note that the spotify.pymodule, which is based on the library spotipy, seems to not work well with Windows,so I suggest to use Linux or WSL on Windows. All the following commands assume that you are in a Linux-likeenvironment.
2.1.2 Clone the repository
Run: git clone https://github.com/marco97pa/Blackpink-Data.git
For more info see this guide
Then cd to the new directory
2.1.3 Install dependencies
Run pip3 install -r requirements.txt to install all the required libraries
2.1.4 Set API keys as environment variables
The project is componed by different modules such as instagram.py, youtube.py and more. Each module is usedto get data from a different source. To get this data you need the corresponding API keys.
3
Blackpink Data, Release 1.0.0
2.1.5 Twitter API keys
Go to the Twitter Developers page, log in, go to Dashboard and create a new app with read and write permissions.Then copy the generated keys and set them as environment variables, by running these lines (change them with youractual key values):
export TWITTER_CONSUMER_KEY='xxxx' export TWITTER_CONSUMER_SECRET='xxxx' exportTWITTER_ACCESS_KEY='xxxx' export TWITTER_ACCESS_SECRET='xxxx'
2.1.6 YouTube API key
Go to Google Developers and follow their istructions on how to get an API key for YouTubeThen copy the generated key and set it as environment variable, by running this line (change with your actual keyvalue):
export YOUTUBE_API_KEY='xxxx'
2.1.7 Spotify API key
Go to Spotify Developer Dashboard, create a new app and get the API keys. Then set them as environment variables,by running these lines:
export SPOTIPY_CLIENT_ID='xxxx' export SPOTIPY_CLIENT_SECRET='xxxx'
2.1.8 Instagram USERNAME and PASSWORD
You can set your username and password like this: export INSTAGRAM_ACCOUNT_USERNAME='xxxxxx' exportINSTAGRAM_ACCOUNT_PASSWORD='xxxxxx'
2.2 Fork
By editing the data.yaml file you can make the script work with a different artist group.
For example, you could make a BTS Data Bot by editing the provided sample_data.yaml file and saving it as data.yaml
Edit the data.yaml accordingly with all the data you know. Leave empty fields or write fake data if you don’t knowsome details: they will be overwritten with the real ones at the first launch of the script.
With minimal or no code edits, the script could work even for single artists and not only groups.
4 Chapter 2. How to Build
Blackpink Data, Release 1.0.0
2.3 Run
2.3.1 First run
Assumed that you have a valid data.yaml file in the same directory as the script, run:python3 main.py -no-tweet
For the first run it is important that you use the -no-tweet option to prevent an overload of tweets in your timeline.You should also check that everything is fine by looking at the command line output and the data.yaml file
2.3.2 Standard run
From the next time, you can just run: python3 main.pyIt will tweet eventually changes on the dataset.
2.3.3 Parameters
By passing one or more parameters, you can disable a single module source. Actual parameters allowed are:
• -no-instagram: disables Instagram source
• -no-youtube: disables YouTube source
• -no-spotify: disables Spotify source
• -no-birthday: disables birthdays events source
• -no-twitter: disables Twitter source (used for reposting)
Remember that -no-twitter is different from -no-tweet:-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still bevisible on the console. This is really useful for testing.
2.3.4 Schedule the bot
If you want the bot to run 24/7, you should set the script to run (for example) every 5 minutes to check for updates.Look at How to schedule tasks on Linux using crontab to get an idea on how to do it.
2.3. Run 5
Blackpink Data, Release 1.0.0
6 Chapter 2. How to Build
CHAPTER
THREE
MODULES
3.1 Main script
main.check_args()
Checks the arguments passed by the command line
By passing one or more parameters, you can disable a single module source.
Actual parameters allowed are:
• -no-instagram: disables Instagram source
• -no-youtube: disables YouTube source
• -no-spotify: disables Spotify source
• -no-birthday: disables birthdays events source
• -no-twitter: disables Twitter source (used for reposting)
Remember that -no-twitter is different than -no-tweet:
-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still bevisible on the console. This is really useful for testing.
Returns: A dictionary that contains all the sources and their state (enabled or disabled, True or False)
main.load_group()
Reads the data.yaml YAML file
Data about a group is stored inside the data.yaml file in the same directory as the script
Returns: A dictionary that contains all the informations about the group
main.write_group(group)Writes the data.yaml YAML file
Data about a group is stored inside the data.yaml file in the same directory as the script
Args: group: dictionary that contains all the informations about the group
7
Blackpink Data, Release 1.0.0
3.2 Tweet
tweet.check_duplicates(message)Checks tweet message against 3 latest user tweets to ensure no duplicative posts
Args: message: a string containing the message to be posted
Returns: Boolean which signals True if a duplicate is found
tweet.edit_image(filename, text, text_size=200, crop=False)Edit an image by adding a text (uses the Pillow module)
Args:
• filename: filename of the image to be modified
• text: text to be added
• text_size (optional): size of the text (default: 200)
• crop (optional): if enabled removes black bars from a video thumbnail (16:9 over 4:3)
tweet.remove_URLs(text)Remove URLs from a text string
Args: text: any text containing URL(s)
Returns: the same text without URL(s)
tweet.retrieve_own_tweets(num=3)Retrieves recent tweets made by the bot.
Args: num: an integer with the number of tweets to retrieve.
Returns: a list of tweet objects
tweet.set_test_mode()
Enables the test mode
Prevents tweets from being posted. They are still printed in the console. This is really useful for debuggingpurposes
tweet.twitter_post(message)Post a message on Twitter (uses the Tweepy module)
Args: message: a string containing the message to be posted
tweet.twitter_post_image(message, filename, text, text_size=200, crop=False)Post a photo with message on Twitter (uses the Tweepy module)
Args:
• message: a string containing the message to be posted
• filename: filename of the image to be posted
tweet.twitter_repost(artist)Retweets latest tweets of a given account
Args: artist: a dictionary with all the details of the artist
Returns: an dictionary containing all the updated data of the artist
8 Chapter 3. Modules
Blackpink Data, Release 1.0.0
3.3 Utils
utils.convert_num(mode, num)Converts a number in any given number scale
Example: convert_num(“100K”, 600000) returns 6
Args:
• mode: (string) the scale for the conversion (“100K”, “M”, “10M”, “100M”, “B”)
• num: the number to be converted
Returns: the converted number
utils.display_num(num, short=False, decimal=False)Converts a number in a readable format
Args:
• num: the number to be converted
• short (optional): flag to get a long or short literal (“Mln” vs “million”)
• decimal (optional): flag to print also the first decimal digit (19.1 vs 19)
Returns: a string with a number in a readable format
utils.download(url, filename)Downloads a file, given an url and filename
Args: url: source from where download the image filename: name of the file to save
utils.download_image(url)Downloads an image, given an url
The image is saved in the download.jpg file
Args: url: source from where download the image
3.4 Birthdays
birthdays.check_birthdays(group)Checks if today is the birthday of a member of the group
It tweets if it is the birthday of someone
Args: group: a dictionary with all the details of the group
Returns: an dictionary containing all the updated data of the group
3.3. Utils 9
Blackpink Data, Release 1.0.0
3.5 YouTube
youtube.youtube_check_channel_change(old_channel, new_channel, hashtags)Checks if there is any change in the number of subscribers or total views of the channel
It compares the old channel data with the new (already fetched) data.
Args:
• old_channel: dictionary that contains all the old data of the channel
• new_channel: dictionary that contains all the updated data of the channel
• hashtags: hashtags to add to the Tweet
Returns: a dictionary with updated data of the channel
youtube.youtube_check_videos_change(name, old_videos, new_videos, hashtags)Checks if there is any new video
It compares the old videos list of the artist with the new (already fetched) videos list. It tweets if there is a newrelease or if a video reaches a new views goal.
Args:
• name: name of the channel
• old_videos: list that contains all the old videos
• new_videos: list that contains all the updated videos
• hashtags: hashtags to append to the Tweet
Returns: new_videos
youtube.youtube_data(group)Runs all the YouTube related tasks
It scrapes data from YouTube for the whole group and the single artists
Args: group: dictionary with the data of the group to scrape
Returns: the same group dictionary with updated data
youtube.youtube_get_channel(api, channel_id)Gets details about a channel
Args:
• api: The YouTube instance
• channel_id: the ID of that channel on YouTube
Returns: an dictionary containing all the scraped data of that channel
youtube.youtube_get_videos(api, playlist_id, name)Gets videos from a playlist
Args:
• api: The YouTube instance
• playlist_id: the ID of the playlist on YouTube
• name: name of the channel owner of the playlist
Returns: a list of videos
10 Chapter 3. Modules
Blackpink Data, Release 1.0.0
3.6 Instagram
instagram.clean_caption(caption)Removes unnecessary parts of an Instagram post caption
It removes all the hashtags and converts tags in plain text (@marco97pa –> marco97pa)
Args: caption: a text
Returns: the same caption without hashtags and tags
instagram.download_profile_pic(url)Downloads an image, given an url
The image is saved in the download.jpg file
Args: url: source from where download the image
instagram.instagram_data(group)Runs all the Instagram related tasks
It scrapes data from Instagram for the whole group and the single artists
Args: group: dictionary with the data of the group to scrape
Returns: the same group dictionary with updated data
instagram.instagram_last_post(artist, user_id)Gets the last post of a profile
It tweets if there is a new post: if the timestamp of the latest stored post does not match with the latest fetchedposts timestamp
Args:
• user_id: a profile ID
• artist: a dictionary with all the details of the artist
Returns: an dictionary containing all the updated data of the artist
instagram.instagram_profile(artist)Gets the details of an artist on Instagram
It tweets if the artist reaches a new followers goal
Args: artist: a dictionary with all the details of the artist
Returns:
• an dictionary containing all the updated data of the artist
• a Profile ID
3.6. Instagram 11
Blackpink Data, Release 1.0.0
3.7 Spotify
spotify.check_new_songs(artist, collection, hashtags)Checks if there is any new song
It compares the old discography of the artist with the new (already fetched) discography. It tweets if there is anew release or featuring of the artist.
Args:
• artist: dictionary that contains all the data about the single artist
• collection: dictionary that contains all the updated discography of the artist
• hashtags: hashtags to append to the Tweet
Returns: an artist dictionary with updated discography details
spotify.get_artist(spotify, artist, hashtags)Gets details about an artist
It tweets if the artist reaches a new goal of followers on Spotify
Args:
• spotify: The Spotify instance
• artist: dictionary that contains all the data about the single artist
• hashtags: hashtags to append to the Tweet
Returns: an artist dictionary with updated profile details
spotify.get_discography(spotify, artist)Gets all the releases of an artist
A release is single, EP, mini-album or album: Spotify simply calls them all “albums”
Example:
• DDU-DU-DDU-DU of BLACKPINK is a single
• SQUARE UP of BLACKPINK is a mini-album
• THE ALBUM of BLACKPINK is (really) an album
It also gets releases where the artist is featured. Example:
• Sour Candy is a song of Lady Gaga, but BLACKPINK are featured
Spotify also makes many “clones” of the same album: there could be extended albums or albums that lateradded tracks. Each one of this makes a duplicate of the same album. So this function also tries to clean up thediscography by removing duplicates.
Args:
• spotify: The Spotify instance
• artist: dictionary that contains all the data about the single artist
Returns: an dictionary with updated discography details
spotify.link_album(album_id)Generates a link to an album
Args: album_id: ID of the album
12 Chapter 3. Modules
Blackpink Data, Release 1.0.0
Returns: The link to that album on Spotify
spotify.link_artist(artist_id)Generates a link to an artist
Args: artist_id: ID of the artist
Returns: The link to that artist on Spotify
spotify.login()
Logs in to Spotify
Client credential authorization flow The following API keys are needed to be set as environment variables:
• SPOTIPY_CLIENT_ID
• SPOTIPY_CLIENT_SECRET
You can request API keys on the Spotify Developer Dashboard
See https://spotipy.readthedocs.io/en/2.16.1/#authorization-code-flow for more details
spotify.spotify_data(group)Runs all the Spotify related tasks
It scrapes data from Spotify for the whole group and the single artists
Args: group: dictionary with the data of the group to scrape
Returns: the same group dictionary with updated data
3.8 Billboard Charts
billboard_charts.billboard_data(group)Gets Billboard charts of a group
It starts all the tasks needed to get latest data and eventually tweet updates Data is updated once a day
Args:
• group: dictionary that contains all the data about the group
Returns: the same group dictionary with updated data
billboard_charts.get_artist_rank(artist, chart)Gets the Billboard Hot 100 chart and tries to find an artist
Args:
• artist: the artist to look for
Returns: a string containing the list of songs found in the chart (it can be empty)
3.8. Billboard Charts 13
Blackpink Data, Release 1.0.0
14 Chapter 3. Modules
PYTHON MODULE INDEX
bbillboard_charts, 13birthdays, 9
iinstagram, 11
mmain, 7
sspotify, 12
ttweet, 8
uutils, 9
yyoutube, 10
15
Blackpink Data, Release 1.0.0
16 Python Module Index
INDEX
Bbillboard_charts
module, 13billboard_data() (in module billboard_charts), 13birthdays
module, 9
Ccheck_args() (in module main), 7check_birthdays() (in module birthdays), 9check_duplicates() (in module tweet), 8check_new_songs() (in module spotify), 12clean_caption() (in module instagram), 11convert_num() (in module utils), 9
Ddisplay_num() (in module utils), 9download() (in module utils), 9download_image() (in module utils), 9download_profile_pic() (in module instagram), 11
Eedit_image() (in module tweet), 8
Gget_artist() (in module spotify), 12get_artist_rank() (in module billboard_charts), 13get_discography() (in module spotify), 12
Iinstagram
module, 11instagram_data() (in module instagram), 11instagram_last_post() (in module instagram), 11instagram_profile() (in module instagram), 11
Llink_album() (in module spotify), 12link_artist() (in module spotify), 13load_group() (in module main), 7login() (in module spotify), 13
Mmain
module, 7module
billboard_charts, 13birthdays, 9instagram, 11main, 7spotify, 12tweet, 8utils, 9youtube, 10
Rremove_URLs() (in module tweet), 8retrieve_own_tweets() (in module tweet), 8
Sset_test_mode() (in module tweet), 8spotify
module, 12spotify_data() (in module spotify), 13
Ttweet
module, 8twitter_post() (in module tweet), 8twitter_post_image() (in module tweet), 8twitter_repost() (in module tweet), 8
Uutils
module, 9
Wwrite_group() (in module main), 7
Yyoutube
module, 10
17
Blackpink Data, Release 1.0.0
youtube_check_channel_change() (in moduleyoutube), 10
youtube_check_videos_change() (in moduleyoutube), 10
youtube_data() (in module youtube), 10youtube_get_channel() (in module youtube), 10youtube_get_videos() (in module youtube), 10
18 Index