Scrapping tweets from Twitter using R

We can learn a lot about people from snooping around their social media accounts. A quick way to do social network analysis on Twitter is by using R. R is an open source language environment  for statistical computing.

First you will need a Twitter Account. You can create one by visiting Then you will have to create a Twitter Application. In order to create a Twitter App follow the directions from thiswebsite:

Next for the hard part, installing twitter in R. Lets just say that this can get a little bit frustrating.

The R packages needed for this tutorial are devtools, OAuth, and twitteR. The devtools package makes developing packages in R easier by providing functions that simplify many common tasks. The reason why we are downloading this package is because it allows us to download R packages from github accounts, in this case the twitteR package which has functions for network analysis of twitter accounts. The OAuth package allows users to authenticate themselves in order to connect with the Twitter interface.


If you are having trouble downloading the devtools you should check out this youtube video: Even though it is narrated in Chinese it is easy to follow.

 install_github("twitteR", username="geoffjentry")  #install twitteR package
 install.packages("ROAuth") #install the ROAuth package
 library(twitteR)  #load the library
 library("ROAuth") #load  the library

reqURL <- ""

accessURL <- ""

authURL <- ""

NOTE: Make sure the URL has https and the http. If you do not use https then the authentication process of twitteR will not work. 





#necessary step for Windows
 download.file(url="", destfile="cacert.pem")

#to get your consumerKey and consumerSecret see the twitteR documentation for instructions
 cred <- OAuthFactory$new(consumerKey,consumerSecret,reqURL,accessURL,authURL)

#necessary step for Windows
 #save for later use for Windows
 save(cred, file="twitter_authentication.Rdata")

#once saved, next time all you have to do is this

Hooray we are done connecting R to twitter. Now we are going to look at the hashtag #MakeAMovieSmarter. Its from one of my favorite shows @Midnight with Chris Hardwick.

Its a game show with celebrity comedians and the compete with each other with made up categories.  The #MakeAMoiveSmarter hashtag is taking an existing movie title and changing that title into something smart.

install.packages("tm", dependencies=TRUE)


midnight <- searchTwitter("#MakeAMovieSmarter", n = 1000)
midnight_text = sapply(midnight, function(x) x$getText())
midnight_corpus = Corpus(VectorSource(midnight_text))
midnight_corpus <- tm_map(midnight_corpus, content_transformer(tolower))
midnight_corpus <- tm_map(midnight_corpus, removePunctuation)
midnight_corpus <- tm_map(midnight_corpus, function(x)removeWords(x,stopwords()))

tdm <- TermDocumentMatrix(midnight_corpus)
m <- as.matrix(tdm)
word_freqs = sort(rowSums(m), decreasing = TRUE) 

# create a data frame with words and their frequencies
dm = data.frame(word = names(word_freqs), freq = word_freqs)
wordcloud(dm$word, dm$freq, random.order = FALSE, colors = brewer.pal(8, "Dark2"))

The result is a wordcloud that visualizes the most commonly cited words in the tweets.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s