Pandas Basics

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Read Table Data Into Python

Read in the data.

In [2]:
data = pd.read_csv("people-example.csv")

Display the data from the csv file.

In [3]:
data
Out[3]:
First Name Last Name Country age
0 Bob Smith United States 24
1 Alice Williams Canada 23
2 Malcolm Jone England 22
3 Felix Brown USA 23
4 Alex Cooper Poland 23
5 Tod Campbell United States 22
6 Derek Ward Switzerland 25

Print out the first three rows.

In [4]:
#print the first three rows
data.head(3)
Out[4]:
First Name Last Name Country age
0 Bob Smith United States 24
1 Alice Williams Canada 23
2 Malcolm Jone England 22
In [5]:
#print the last three rows
data.tail(3)
Out[5]:
First Name Last Name Country age
4 Alex Cooper Poland 23
5 Tod Campbell United States 22
6 Derek Ward Switzerland 25

Exploring the data

Display basic exploratory statistics of the numeric variables in the dataframe. In this case age

In [6]:
data.describe()
Out[6]:
age
count 7.000000
mean 23.142857
std 1.069045
min 22.000000
25% 22.500000
50% 23.000000
75% 23.500000
max 25.000000

Displays the first three observations of the column first name

In [9]:
data['First Name'].head(3)
Out[9]:
0        Bob
1      Alice
2    Malcolm
Name: First Name, dtype: object

Displays the last three observations of the column Country

In [10]:
data['Country'].tail(3)
Out[10]:
4           Poland
5    United States
6      Switzerland
Name: Country, dtype: object

Selecting

Select one column.

In [12]:
data['age']
Out[12]:
0    24
1    23
2    22
3    23
4    23
5    22
6    25
Name: age, dtype: int64

Select the data frame of the column use double brackets.

In [14]:
data[['age']]
Out[14]:
age
0 24
1 23
2 22
3 23
4 23
5 22
6 25

Select multiple columns

In [15]:
columns_i_want =['First Name','age']
data[columns_i_want] #put them in a data frame
Out[15]:
First Name age
0 Bob 24
1 Alice 23
2 Malcolm 22
3 Felix 23
4 Alex 23
5 Tod 22
6 Derek 25

Or you can do this.

In [16]:
data[['First Name','Last Name','age']]
Out[16]:
First Name Last Name age
0 Bob Smith 24
1 Alice Williams 23
2 Malcolm Jone 22
3 Felix Brown 23
4 Alex Cooper 23
5 Tod Campbell 22
6 Derek Ward 25

Graphing

In [17]:
plt.hist(data['age'])
plt.title('Age Histogram')
plt.xlabel('Ages')
plt.ylabel('Frequency')
plt.show()
 figure_1

Manipulating

In [30]:
data['age'].mean()
Out[30]:
23.142857142857142
In [31]:
data['age'].max()
Out[31]:
25
In [34]:
data['Full Name']= data['First Name'] + ' ' +data['Last Name']
In [35]:
data
Out[35]:
First Name Last Name Country age Full Name
0 Bob Smith United States 24 Bob Smith
1 Alice Williams Canada 23 Alice Williams
2 Malcolm Jone England 22 Malcolm Jone
3 Felix Brown USA 23 Felix Brown
4 Alex Cooper Poland 23 Alex Cooper
5 Tod Campbell United States 22 Tod Campbell
6 Derek Ward Switzerland 25 Derek Ward
In [36]:
data['age'] +data['age']
Out[36]:
0    48
1    46
2    44
3    46
4    46
5    44
6    50
Name: age, dtype: int64

Using the apply function to do advance transformation of our data

The pandas package takes a nod from R. The apply function can apply a function to the observations of the dataframe. Meaning that you don’t have to loop through the dataframe in order to invoke a function throughout the dataframe.

In [18]:
data['Country']
Out[18]:
0    United States
1           Canada
2          England
3              USA
4           Poland
5    United States
6      Switzerland
Name: Country, dtype: object
In [19]:
def transform_country(country):
    if country == 'USA':
        return 'United States'
    else:
        return country
In [20]:
transform_country('USA')
Out[20]:
'United States'
In [21]:
data['Country'].apply(transform_country)
Out[21]:
0    United States
1           Canada
2          England
3    United States
4           Poland
5    United States
6      Switzerland
Name: Country, dtype: object
In [22]:
data['Country'] = data['Country'].apply(transform_country)
In [48]:
data
Out[48]:
First Name Last Name Country age Full Name
0 Bob Smith United States 24 Bob Smith
1 Alice Williams Canada 23 Alice Williams
2 Malcolm Jone England 22 Malcolm Jone
3 Felix Brown United States 23 Felix Brown
4 Alex Cooper Poland 23 Alex Cooper
5 Tod Campbell United States 22 Tod Campbell
6 Derek Ward Switzerland 25 Derek Ward
In [ ]:
For more resources about Pandas:
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s