Long Data Vs. Wide Data

So, lately I have had my hands on some raw unclean data for an assignment for school. Originally I thought that messy data was about cleaning up blank values, formatting text, numbers, and strings in the right form, etc. But as I proceed to analyze my data in R I found out that it could not be handled. There was a key concept that I was missing when it comes to setting up data the right way: Wide and Long Data

What is Wide Data?

In the wide data (also known as unstacked) is when each variable attribute for a subject is in a separate column.

Person Age Weight
Buttercup 24 110
Bubbles 24 105
Blossom 24 107

What is Long Data?

Narrow (stacked) data is presented with one column containing all the values and another column listing the context of the value

Person Variable Value
Buttercup Age 24
Buttercup Weight 110
Bubbles Age 24
Bubbles Weight 105
Blossom Age 24
Blossom Weight 107

It is easier for r to do analysis in the Long data form. This concept might seem weird at first. We are use to seeing and analyzing data in Wide data form but with practice it gets easier over time. R has an awesome package called reshape2 to convert your data from wide to long.

First install the r package and load the library.


install.packages("reshape2")
library(reshape2)

Using the wide table above we will split our variables into two groups identifiers and measured variables.

Identifier variable:Person
Measured variable: Age, weight

In order to transform this wide data into long data we will have to use the melt method. You “melt” data so that each row is a unique id-variable combination.

df
 Person Age Weight
1 Buttercup 24 110
2 Bubbles 24 105
3 Blossom 24 107

ppg <-melt(df,id=c("Person"),measured=c("Age","Weight"))
 ppg
 Person variable value
1 Buttercup Age 24
2 Bubbles Age 24
3 Blossom Age 24
4 Buttercup Weight 110
5 Bubbles Weight 105
6 Blossom Weight 107

For official documentation about the reshape library from its creator Hadley Wickham: http://had.co.nz/reshape/introduction.pdf

More about Wide vs. Long data check out :http://www.theanalysisfactor.com/wide-and-long-data/

More information about cleaning and shaping data from messy data to tidy data check out Hadley Wickham’s paper Tidy Data: http://vita.had.co.nz/papers/tidy-data.pdf

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s