Kanye West is one of the most influential entertainers of the past decade. His creative exploits in music production have overflowed into many facets of pop culture - including fashion, social media influence, and publicity entertainment. Kanye's attention-grabbing endeavors coalesce into an exceptional body of work, regardless of whether you choose to view his persona in a positive or negative light. Before self-proclaimed deity comparisons fueled by the visibility of celebrity life, “Yeezus” was a pink-poloed, backpack-strapped producer creating musical platforms for rappers to communicate via. So… since ’Ye himself picked up the mic, what has he said? What type of insight can we garner by analyzing the sentiment and context of his lyrics?

NOTE: I have added this dataset to Kaggle and I have included a notebook in R to help anyone get started with text analysis!

Discography Analysis

One of the first apparent things is that “god” is the 10th most frequently used unique word across all of Kanye’s albums. It is easy to conclude this is due to enamor with himself, but consider that the context of his lyrics has changed throughout the years. For example, The College Dropout references “god” in a a much different way than later albums such as Yeezus. On "I Am a God” (Yeezus) Kayne attempts to rectify some humility following self-comparison to The Lord: “I am a god / Even though I am a man of God / My whole life in the hand of God”. The size of the each text fragment is scaled to the frequency of use.

Lets take a look further into the emotions behind Mr. West’s lyrics. This is possible by analyzing the positive or negative emotional association of a word with the frequency of use. “Shit” and “love” are the top 2 most frequently used words associated with an emotional sentiment. If that doesn’t highlight the dichotomous nature of Kanye’s personality, I don’t know what does. Here are the top 10 most frequently used positive and negative words in Kanye’s discography:

Sentiment Exploration

Are there any interesting positive/negative emotional fluctuations across the chronological development of indiviual albums? Most likely due to the nature of explicit content associated with most rap and hip-hop lyrics, all albums have a net negative sentiment (sentiment nornmalized to the number of songs in each album; chronologically ordered).

albums negative positive sentiment album length normalized sentiment
College Dropout 321 195 -126 21 -6.000000
Late Registration 278 134 -144 21 -6.857143
Graduation 261 146 -115 16 -7.187500
808s & Heartbreak 135 97 -38 12 -3.166667
My Beautiful Dark Twisted Fantasy 380 159 -221 13 -17.000000
Watch The Throne 356 153 -203 16 -12.687500
Cruel Summer 268 137 -131 12 -10.916667
Yeezus 184 64 -120 10 -12.000000
The Life Of Pablo 307 144 -163 20 -8.150000

It is worthwhile to take a closer look at some of the net sentiment ranking of songs across specific albums. The modern internet age of musical artistry has produced a plethora of singles which often lack the narrative qualities associated with the depth of a full-length album. Kanye has his fair share of single releases but they are usually tied to a storyline within an album. Take a look at the sentiment fluctuations within his most recent album, The Life of Pablo.

Contextual Analysis

The setiment anlysis is intersting, however it is also important to consider the context of the lyrics. In the previous analysis words are dissected on an individual basis to inform summarized trends of general sentiment. A more informative approach is to explore the context of words based on word-word associations. For example, if Kanye sings (autotunes?): “Baby don’t worry about it”, the context is truly positive due to the double-negative sentiment of the ranked words. To look further into how this affects the analysis, check out the relationship between some common negative words and what they are followed by within the lyric.

These “inverse bigrams”" can be thought of as sentimentally ranked in the incorrect orientation. Further investigation reveals that the net sentiment of these incorrectly ranked word-relationships is -66. “No – no” is probably not contextually positive as it is most likely a repeated pattern. Some of these bigrams are also not contextually positive terms due to the bilateral use of swear words. Removing these classifiers reduces the net contribution of inverse bigrams to only -5. We can assume these misclassifications occur at a similar rate within both positive and negative designations.

Word Network Visualization

Finally, lets look at a network of bigrams to see how words are related to one another, and the frequency of related use. The directionality of each arrow indicates lyric word order, and the color denisty of each arrow indicates frequency of lyrical use.

This is just a snapshot of a few of the most interesting nodes within the network. A majority of nodes are uninteresting because many have word relations based on use in individual songs, and show up within the network because of repeated use (think “30 hours” or “Kanye’s new workout plan”; chorus repetition is not counted). The most interesting grouping to consider is the node which displays the word associations between “black”, “real”, “music”, “nigga”, which begs consideration of associated themes within Kanye’s music.

Many thanks to Julia Silge’s and David Robinson’s Tidy Text Mining with R for a great introduction to text data analysis.