Subsetting Cookbook in R

So, I’ve just finished up the R Programming course that is apart of Coursera’s John Hopkins Data Science specialization. And I must say that it did some judo mortal kombat moves on my mind. This course is not beginner friend but I’ve learned a lot and I think it’s safe to say that I am becoming a master at subsetting and filtering data in R. In retrospect if you are planning to take this specialization you should do the Getting and Cleaning Data Course before you start the R Programming course.

A collection of notes on how to select different rows and columns within R.

R has four main data structures to store and manipulate data which are vectors, matrices, data frames, and list. So far in my on again off again relationship with R I mostly worked with data frames. I will be using the airquality dataset that comes preinstalled in R.

Selecting

data("airquality")

names("airquality")
names(airquality)
 [1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
head(airquality)
 Ozone Solar.R Wind Temp Month Day
 1 41 190 7.4 67 5 1
 2 36 118 8.0 72 5 2
 3 12 149 12.6 74 5 3
 4 18 313 11.5 62 5 4
 5 NA NA 14.3 56 5 5
 6 28 NA 14.9 66 5 6

 Subsetting data by index.

While subsetting the placement of the comma is important.

Extracting a specific observation(row) make sure that you include a comma to the right of the object you are extracting from. DONT FORGET THE COMMA

 airquality[1,]

 Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1

# First two rows and all columns

airquality[1:2,]
 Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2

 

Extracting a specific variable(column)make sure that you include a comma to the left of the object you are extracting from.

# First column and all rows

airquality[,1]
 [1] 41 36 12 18 NA 28 23 19 8 NA 7 16 11 14 18 14 34 6 30 11 1 11 4 32
 [25] NA NA NA 23 45 115 37 NA NA NA NA NA NA 29 NA 71 39 NA NA 23 NA NA 21 37
 [49] 20 12 13 NA NA NA NA NA NA NA NA NA NA 135 49 32 NA 64 40 77 97 97 85 NA
 [73] 10 27 NA 7 48 35 61 79 63 16 NA NA 80 108 20 52 82 50 64 59 39 9 16 78
 [97] 35 66 122 89 110 NA NA 44 28 65 NA 22 59 23 31 44 21 9 NA 45 168 73 NA 76
[121] 118 84 85 96 78 73 91 47 32 20 23 21 24 44 21 28 9 13 46 18 13 24 16 13
[145] 23 36 7 14 30 NA 14 18 20

#First two columns and all rows

airquailty[,1:2]
 Ozone Solar.R
1 41 190
2 36 118
3 12 149
4 18 313
5 NA NA
6 28 NA
7 23 299

You can also select a column from data frame using the variable name with a dollar sign

 airquality$Ozone

 [1] 41 36 12 18 NA 28 23 19 8 NA 7 16 11 14 18 14 34 6 30 11 1 11 4 32
 [25] NA NA NA 23 45 115 37 NA NA NA NA NA NA 29 NA 71 39 NA NA 23 NA NA 21 37
 [49] 20 12 13 NA NA NA NA NA NA NA NA NA NA 135 49 32 NA 64 40 77 97 97 85 NA
 [73] 10 27 NA 7 48 35 61 79 63 16 NA NA 80 108 20 52 82 50 64 59 39 9 16 78
 [97] 35 66 122 89 110 NA NA 44 28 65 NA 22 59 23 31 44 21 9 NA 45 168 73 NA 76
[121] 118 84 85 96 78 73 91 47 32 20 23 21 24 44 21 28 9 13 46 18 13 24 16 13
[145] 23 36 7 14 30 NA 14 18 20

These are all the observations from the Ozone column

Extracting multiple columns from a data frame

df[,c("A","B","E")] source Stack Overflow

head(airquality[,c("Ozone","Temp")])
 Ozone Temp
1 41 67
2 36 72
3 12 74
4 18 62
5 NA 56
6 28 66

You can also filter data when you are making a selection by using basic logic statements

Basic Logic statements

 Operator Description
== equal
!= Not equal
> greater than
< less than
< less than
<= less than or equal
> greater than or equal
! NOT
& And
| Or
%in% match returns a vector of the positions of (first) matches of its first argument in its second.
Logical Function Description
which.min Index of the minimum value
which.max index of the maximum value
Extract the subset of rows of the data frame where Ozone values are above 31 
and Temp values are above 90.

myset<-data[data$Ozone>31 & data$Temp>90,]

Extract the subset of rows of the data frame where Month values equal 5,7,8
airquality[airquality$Month %in% c(5,7,8),]
What is the mean of "Temp" when "Month" is equal to 6?
june <-airquality[airquality$Month==6,]
mean(june$Temp, na.rm=TRUE)

What was the maximum ozone value in the month of May (i.e. Month is equal to 5)?
may<-airquality[airquality$Month==5,]
may[which.max(may$Ozone),]


More Resources

Practice with subsetting with R

Subsetting by string

Was Library School Worth It?

So I’ve been out of school for about 9 months now and it kinda of sucks. Just a little bit. Truth be told I have a love/hate relationship with school in general. I miss the safety that you get with consistency but it can only do so much to prepare you for the world. I remember the day I told my parents I was going to graduate school for Library and Information Science. And it went a little like this….

5khmpdwkhzxic

 

When I go to job interviews or example to someone I want to be in the field of data analytics and Library and Information Science is a good fit. It goes like this…

l0nwqgfpv1ecke3xa
So I am going to tell you why going to library school nurtured  data science skills. Three reasons why

  1. Become a better researcher
  2. Public Service Skills
  3. Communication Skills

Become a better researcher

Day One of library school you learn that Google ain’t all that.Google is a powerful search engine tool but if you do not define you search into a narrow question then your search results come flat. When I was working at the university library I had to help a few undergrads with their research by using what we call a reference interview. Helping students ask the right questions. I tend to use it on myself a lot these days to help define a project to use my data science portfolio.

Public Service Skills

As a librarian you are a public servant. You serve the people. Which includes everyone no matter what creed,race, or gender the person happens to be. I took an Intro to Web Development class during my graduate career. It was completely different from the web development courses from Code Academy, Team Treehouse and FreeCodeCamp in that it was about accessibility. Never once when I self taught myself web development have I ever thought about how a blind person or deaf person uses the web.

Library school just made me more aware of what’s going on in the world in general. Most of the people who attend GSLIS come from history and social science. It got me out of my comfort zone. Most of the time I would look at tech blogs like CSS trick, stackOverflow, A-List Apart and HackerNews.My friends influenced me to listen to NPR , CNN, The Read and just think about the world around me. How can I use technology just make a positive impact on society?

Communication Skills

In undergrad it was all about homework…

invvfuomod31k

 

In library school it was mostly about final projects instead of bombarding you with homework assignments. The only person policing your education is you.Every class involved presentation. To be able to not only convey your ideas to an entire room of people but to keep them interested as well is an art. I also had to write argumentative essays. For example, in one of my classes I had to develop a strategic plan to present to stakeholders for building an inclusive digital community. Or I had to write documentation on technologies for example how do you go about doing diagnostics on Chromebooks that cannot connect via the wifi network.

I know that my Altar Mata is going forth to focus more on hard science of Information Science. But the soft skills that I learned from my library courses have given an edge as well.

 

Plotting Graphs with ggvis

Grammar of Graphics

In linguistics, grammar is the set of structural rules governing the composition of clauses, phrases, and words in any given natural language. (https://en.wikipedia.org/wiki/Grammar)

The Grammar of graphics  is a tool that basically use  the same concept but instead of build sentences that are the foundation of paragraphs which lead on to works of literature we are building graphs.

One grammar graphic tool is ggvis; a data visualization package for R.

The grammar for ggvis is

graph =  data + coordinate system + properties + mark

[pre]

<data>  %>% 
  ggvis(~<x property>,~<y property>, 
        fill = ~<fill property>, size=~<size property>) %>% 
  layer_<marks>()

[/pre]

3 common charts are going to be shown in this tutorial

  • Bar Charts
  • Line Charts
  • Scatter Charts

Bar Charts

The bar chart is used when comparing the mean or percentages of 8 or more different groups.

[pre]

mtcars%>% ggvis(~ wt, ~mpg) %>% layer_bars()

[/pre]

mtcars_bar

Line Charts

Line charts are used to illustrate trends over time.

[pre]

mtcars%>% ggvis(~ wt, ~mpg) %>% layer_lines()


[/pre]

mtcars_lines.png

Scatter Plots

Scatter plots are used to depict how different objects settle around a mean based on 2 to 3 different dimensions. This allows for quick and easy comparisons between competing variables. Scatter plots show how much one variable is affected by another.

[pre]

mtcars%>% ggvis(~ wt, ~mpg) %>% layer_points()


[/pre]
mpg_points.png

First I exported data from the Basketball-Reference site. For this example I am going to use
Jimmy Butler's statistics from 2015-2015. I am just going to plot Butler's game score for each game.
This statistic was invented by John Hollinger to provide a rough measure of a player's 
performance in a given game.  The scale upon which the player's game score is based is 
the same as points scored.  If a player has a game score of 40, that is amazing, 
while a game score of 10 is average.(http://www.sportingcharts.com/dictionary/nba/game-score-statistic.aspx)

Install the ggvis and call the library in order to use the package
[pre]
install.packages("ggvis")
library(ggvis)
[/pre]
Import the data using the read.csv function. Make sure you specify stringsAsFactors optional parameter as false.

[pre]
butler<- read.csv("jimmy_butler.csv", stringsAsFactors=FALSE)
[/pre]

Explore the data. In this instance I am looking at the column that selects Jimmy Butler's game score.

[pre]
butler$GmSc 
[/pre]

Subset the observations
[pre]
butler2<- butler[1:65,]
[/pre]

Attach the search path to the environment.
The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes.
I noticed when I was coercing the Game Score data to become numeric I got an error invalid subscript type integer error in r using ggvis.
After search through the pages of stack overflow. I've learned that The dplyr package doesn't like the usage of '$'. Try instead using '[', e.g.:
[pre]
attach(butler2)
butler2 %>% ggvis(~butler2$G,~as.numeric(GmSc)) %>% layer_points()
butler_points
butler2 %>% ggvis(~G,~as.numeric(GmSc)) %>% layer_bars()
butler_bar
butler2 %>% ggvis(~G,~as.numeric(GmSc)) %>% layer_lines()
butler_line
[/pre]

Data Science Resources

I  wanted to create a  quick blog post which will be a redux of a blog post I did two years ago. This is for people who are like me, people who want to practice their data science skills but are too broke to shell out $16,000 bones for a data science bootcamp. Luckily some of these bootcamps post all of their resources on github.

The Data Science Summer School(D3) is  a bootcamp hosted by Microsoft and is geared towards undergraduates in the New York area. Microsoft also has a dataset resources to practice your machine learning algorithms.

CS 109- Harvard Intro to Data Science even though this is a bootcamp this class has the most comprehensive materials on this list. It has lecture notes, videos , and assignments. Harvard education without the price or the accreditation.

Kevin Markham founder of dataschool.io teaches a General Assembly Data Science BootCamp and each session is posted on his github page.

Kevin Markham also has his own youtube channel where he teaches scikit learn.

Other resources that are not in bootcamp/ structured class format

The most important resource in this list is Hardley Wickham’s tidy data tutorial . As a statistics major in undergrad I was blessed/cursed with never having to deal with messy data until I went to grad school. Its something that we all need to learn.

A very brief tutorial of graphing plots with R  ggplot 2 library or python with the seaborn library

Finally to keep up to date with data science news check out this curated list of data science blogs

WordPress RoadMap for Noobs

Doing WordPress modifications can be a complete pain.  This blog post is a quick roadmap guide to lay a foundation into WordPress Development. For those of you who don’t know WordPress, it’s a content management system, similar to Drupal or Joomal.

First thing, learn HTML and CSS. If you are looking to do any kind of web development you are going to have to start with HTML/CSS. A good place to start is Free Code Camp  , w3schools  , and  Khan Academy .

Second, learn  the basics of PHP. I’m not saying you need to be an expert but you need to know enough to build a simple website. Trust me looking at wordpress code will be a lot easier if you have a solid foundation of the basics.

http://www.w3schools.com/php/

http://adambrown.info/b/widgets/easy-php-tutorial-for-wordpress-users/

Next thing you should know about is the WordPress Hierarchy. Why is it important to learn WordPress Hierarchy? The WordPress Hierarchy is a diagram that shows the order in which individual pages are rendered in WordPress.If you want to customize an existing WordPress theme it will help you decide which template file needs to be edited.Take a day or two to marinate on it  looks complicated  at first but things will make sense.

http://wphierarchy.com/

https://developer.wordpress.org/themes/basics/template-hierarchy/

You should also download a plugin called “Show Current Template” . I shows you the name of the php file the theme is using for that page.

All roads leads to the Codex. The Codex is your best friend. The WordPress Codex, the online manual for WordPress. So if you want to write a function,plugin,page template you can find all the answers seek in the WordPress Codex.

https://codex.wordpress.org/

If you need any additional help just ask someone.

http://stackoverflow.com/

http://wordpress.stackexchange.com/

https://wordpress.org/support/

 

 

 

Job hunting thoughts and rants

I recently got hired to do a contract job for the upcoming year. Since the new year is coming it’s time for reflection. The number one thing I can say about job hunting  is that it sucks. There are plenty of BuzzFeed articles about it.

buzzfeed

Hopefully this blog post can give hope to people that are in this situation. Here are some things that I learned along the way while search for the elusive full-time job.

1. Netflix is your best friend!

It takes about 6 months to a year to find a job. Meaning a lot of free time to catch up on your shows.

2.Prepare, prepare, prepare.

There are plenty of resources out there. The behavior questions that you will receive are going to be practically the same.

https://www.themuse.com/

http://www.monster.com/

http://www.job-hunt.org/

3. You are also interviewing them as well

Don’t be intimidated by an interview. You are also judging them as well. If it’s not a good fit then it’s not a good fit. I’ve had interviews where my potential coworker told me that he did not really like his job. I’ve had an interviewer discriminate against me because I live on in the south side of Chicago. You don’t want to work for people like that.  The worst thing about not having a job is having work related stress. You can learn more about a company’s culture at  https://www.glassdoor.com/ Number 3 is for all the people of color out there. You guys will probably already know this but I just have to say it.

4. We do not live in a post racial society.

I am sure all of you know about the story of Jose/ Joe. If not check out the video below. Despite this you just have to keep going and make sure that you have a good portfolio and references.

http://www.buzzfeed.com/adriancarrasquillo/meet-jose-zamora-the-guy-who-changed-his-name-to-joe-to-get#.gpnD1nzm5Z

 

5. Network

Everyone that I know that has a job got their job because they knew someone. If you don’t have a good network go on  http://www.meetup.com/ . You can go to tech meetups or even non-tech meetups like single and early 20s.  Just get yourself out there and practice your elevator pitch.

6. Show off your work

Show it off anywhere. Even if you think that your work isn’t the best having something is better then  having nothing at all. You can volunteer or do some online classes on Coursera /edX . You can post your work on github,youtube, blog, soundcloud, anywhere.  There are so many free resources out there.

7. Talk to someone that is actually in that field.

I have  never been on a tech interview in my life until this summer. It is completely different from doing a general mock interviews at your school’s career center. If you are a first gen student like me who is going into a complete different field from the rest of her family. Then I suggest you look at tip number 4 . Or you can go to a forum called The Workplace , it’s like StackOverflow but for work situations.

8. Tech recruiters don’t know anything.

That’s all I have to say about the subject. There is nothing wrong with being open to a conversation and you might make a friend along the way. But a lot of them just busy trying to get commission and most recruiters  are not tech orientated people.

9. Don’t let reject get you down.

Sometimes it was meant to be child. You have to remind yourself that you are a smart person.  I suggest looking at  this Ted talk from Sean Stephenson .  Rejection is a learn opportunity. During my time job hunting I’ve learned that working for a tech firm might not be the place for me. I want to get more into civic hacking and that I should look into government jobs.

10. Finding a job sometimes just boils down to luck

 

 

 

.

Getting started with Markdown

What is it? And who is it for?

Markdown is a markup language created by John Gruber. It was designed to be easy to create readable scripts that can be converted to HTML. The only thing you need is a simple text editor such as notepad for Windows or TextEdit for Mac. Text files are easy to use editors because they are simple and to the point thus eliminating most, if not all, distractions making the user a force for productivity. Plus, all text files can be read using any computer system without a glitch. Markdown is also perfect for writing a blog post without the hassle of learning HTML. People have also used Markdown for organizing their notes, creating to-do lists, creating presentations, and much more!

Here are a few basic rules in order to get started:

Headings

# This is a First-level heading

## This is a Second-level heading

### This is a Third- level heading

#### This is a Fourth-level heading

##### This is a Fifth-level heading

###### This is a Sixth- level heading

In HTML:

<h1>This is a First-level heading</h1>
<h2>This is a Second-level heading</h2>
<h3>This is a Third-level heading </h3>
<h4>This is a Fourth-level heading</h4>
<h5> This is a Fifth-level heading</h5>
<h6>This is a Sixth-level heading</h6>

Paragraphs

A paragraph is one or more consecutive lines of the text separated by one or more blank lines. Normal paragraphs should not be indented with spaces or tabs.

O serpent heart, hid with a flowering face!

Did ever dragon keep so fair a cave?

Beautiful tyrant! fiend angelical!

In HTML:

<p> O serpent heart, hid with a flowering face!
    Did ever dragon keep so fair a cave?
    Beautiful tyrant! fiend angelical! </p>

Unordered List

Bullet List/unordered list can be created using *asterisk, +plus signs or -minus signs interchangeable as well.

(Note: Place three spaces from the *, + or -)

Ex:

*   Goat

+   Milk

–  Banana

*   Eggs

In HTML:

<ul> 
     <li> Goat   </li>
     <li> Milk   </li>
     <li> Banana </li>
     <li> Eggs   </li>

</ul>

Ordered list

(Note: Similar to the unordered list, items need to be at least three spaces from their respective numbers in order to be able to format correctly.)

1.   Eggs

2.   Ham

3.   Milk

In HTML:

<ol>
      <li> Eggs </li>
      <li> Ham  </li>
      <li> Milk </li>
</ol>

 Bold and Italics

*hello, world* italicized text

<i>hello, world</i>

**hello, world** boldface

 <b>hello, world</b>

Markdown Resources:

Here is an online converter and more syntax rules from the creator of Markdown

The Official website for Markdown: http://daringfireball.net/projects/markdown/dingus

More on the basics of Markdown:

http://lifehacker.com/5943320/what-is-markdown-and-why-is-it-better-for-my-to+do-lists-and-notes

http://en.wikipedia.org/wiki/Markdown

https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

https://help.github.com/articles/github-flavored-markdown

http://net.tutsplus.com/tutorials/tools-and-tips/markdown-the-ins-and-outs/

Using Markdown with WordPress:

http://www.youtube.com/watch?v=7aEYoP5-duY

http://designshack.net/articles/html/mastering-markdown-30-resources-apps-and-tutorials-to-get-you-started/

Markdown Editors:

Here is a list of markdown converters. There are 75 of them listed here.

http://mashable.com/2013/06/24/markdown-tools/