We can learn a lot about people from snooping around their social media accounts. A quick way to do social network analysis on Twitter is by using R. R is an open source language environment for statistical computing.
First you will need a Twitter Account. You can create one by visiting www.twitter.com. Then you will have to create a Twitter Application. In order to create a Twitter App follow the directions from thiswebsite: http://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-easy-steps/
Next for the hard part, installing twitter in R. Lets just say that this can get a little bit frustrating.
The R packages needed for this tutorial are devtools, OAuth, and twitteR. The devtools package makes developing packages in R easier by providing functions that simplify many common tasks. The reason why we are downloading this package is because it allows us to download R packages from github accounts, in this case the twitteR package which has functions for network analysis of twitter accounts. The OAuth package allows users to authenticate themselves in order to connect with the Twitter interface.
If you are having trouble downloading the devtools you should check out this youtube video: http://youtu.be/enPPMHr5SrM. Even though it is narrated in Chinese it is easy to follow.
library(devtools) install_github("twitteR", username="geoffjentry") #install twitteR package install.packages("ROAuth") #install the ROAuth package library(twitteR) #load the library library("ROAuth") #load the library reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize"
NOTE: Make sure the URL has https and the http. If you do not use https then the authentication process of twitteR will not work.
consumerKey <- "XXXXXXXXXXXXXXXXXXXXXX" consumerSecret <- "XXXXXXXXXXXXXXXXXXXXXXXXX" access_token <- "XXXXXXXXXXXXXXXXXXXXXXXXXXX" access_secret <- "XXXXXXXXXXXXXXXXXXXXXXXXXXX" #necessary step for Windows download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem") #to get your consumerKey and consumerSecret see the twitteR documentation for instructions cred <- OAuthFactory$new(consumerKey,consumerSecret,reqURL,accessURL,authURL) #necessary step for Windows cred$handshake(cainfo="cacert.pem") #save for later use for Windows save(cred, file="twitter_authentication.Rdata") #once saved, next time all you have to do is this #load("twitter_authentication.Rdata") #setup_twitter_oauth(consumerKey,consumerSecret,access_token,access_secret)
Hooray we are done connecting R to twitter. Now we are going to look at the hashtag #MakeAMovieSmarter. Its from one of my favorite shows @Midnight with Chris Hardwick.
Its a game show with celebrity comedians and the compete with each other with made up categories. The #MakeAMoiveSmarter hashtag is taking an existing movie title and changing that title into something smart.
install.packages("tm", dependencies=TRUE) library(twitteR) library(tm) library(wordcloud) midnight <- searchTwitter("#MakeAMovieSmarter", n = 1000) midnight_text = sapply(midnight, function(x) x$getText()) midnight_corpus = Corpus(VectorSource(midnight_text)) midnight_corpus <- tm_map(midnight_corpus, content_transformer(tolower)) midnight_corpus <- tm_map(midnight_corpus, removePunctuation) midnight_corpus <- tm_map(midnight_corpus, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(midnight_corpus) m <- as.matrix(tdm) word_freqs = sort(rowSums(m), decreasing = TRUE) # create a data frame with words and their frequencies dm = data.frame(word = names(word_freqs), freq = word_freqs) wordcloud(dm$word, dm$freq, random.order = FALSE, colors = brewer.pal(8, "Dark2"))
The result is a wordcloud that visualizes the most commonly cited words in the tweets.