Lyrics Web Scraping and Text Mining Analysis

15
Zhaoyuan He Yihua Yang Qinyan Li Anwesan Pal 1 ECE 143: Group 2 Lyrics Web Scraping and Text Mining Analysis

Transcript of Lyrics Web Scraping and Text Mining Analysis

Page 1: Lyrics Web Scraping and Text Mining Analysis

Zhaoyuan He Yihua Yang Qinyan Li Anwesan Pal

1

ECE 143: Group 2

Lyrics Web Scraping and Text Mining Analysis

Page 2: Lyrics Web Scraping and Text Mining Analysis

Contents

➢ Web Scraping

➢ Data Cleaning

➢ Data Visualization

➢ Text Mining

2

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

➢ Introduction

➢ Conclusion

Page 3: Lyrics Web Scraping and Text Mining Analysis

Introduction

3

1. Wiki – Billboard year-end 100:

https://en.wikipedia.org/wiki/Billboard_Year-End

2. Years - 1959-2018

3. Number of songs - 60x100 = 6000

➢Goal:

To study top 100 songs on billboard year-end charts from year 1959 to 2018

➢Dataset:

➢Methodology:

1. Extract data from various websites

2. Choose relevant variables, such as artist nationality, lyrics, genre, etc.

3. Perform Data Cleaning, Analysis and Text Mining

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Page 4: Lyrics Web Scraping and Text Mining Analysis

4

➢Part I: Rank, Song, Artist - obtained from Wikipedia

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

➢Part IV: Lyrics - obtained from the Genius database

➢Part II: Nationality - obtained from Wikipedia

➢Part III: Genres - obtained from DBpedia resources

Web Scraping - 4 main components

Page 5: Lyrics Web Scraping and Text Mining Analysis

Web Scraping

5

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Data

Cleaning

Needed!

Page 6: Lyrics Web Scraping and Text Mining Analysis

6

➢Nationality: Total 128 different nationalities listed by wiki -

categorized into 37 nationalities

Data Cleaning

➢Lyrics: Removal of periods, punctuations, incomplete words

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

➢Genre: Total 489 genres listed by DBpedia - categorized into 17

main genre classes

Page 7: Lyrics Web Scraping and Text Mining Analysis

7

Number of songs

➢By Country:

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

US tops the chart!

Page 8: Lyrics Web Scraping and Text Mining Analysis

8

Average length of lyrics

➢By Year:

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Increasing trend!

Page 9: Lyrics Web Scraping and Text Mining Analysis

9

➢By Genre:

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Caribbean music are

longest!

Average length of lyrics

Page 10: Lyrics Web Scraping and Text Mining Analysis

10

Text Mining

➢Part I: N-grams -- Most frequent set of words that occur next to each other

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Love is the way forward!

Unigram Bigram Trigram

love love love love love love

Page 11: Lyrics Web Scraping and Text Mining Analysis

11

Text Mining

➢Part II: Sentiment Analysis - Sentiment Intensity Analyzer library

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Negative sentiments

creeping in!

Page 12: Lyrics Web Scraping and Text Mining Analysis

12

Text Mining

➢Part II: Sentiment Analysis

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Page 13: Lyrics Web Scraping and Text Mining Analysis

13

Text Mining

➢Part III: TF-IDF - Top words encountered for top-3 genres

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

Hip-hop and Pop have more colloquial

word usage!

Page 14: Lyrics Web Scraping and Text Mining Analysis

Conclusion

14

Introduction Web Scraping Data Cleaning Data Visualization Text Mining Conclusion

➢Data gathered about Top 100 Billboard songs from 1959-2018

➢Data Cleaning for Lyrics, Nationality, Genre of song

➢Text Mining - N-gram, Sentiment Analysis, TF-IDF

➢More Text Mining - Word Cloud, Parts of Speech Analysis

Page 15: Lyrics Web Scraping and Text Mining Analysis

THANK YOU FOR LISTENING!

Any questions?

15