Graphing Networks for beginners with Python

I’ve tried this before a year ago and my understand of networks wasn’t so good. So I decided to try again. Today I will be making a basic network graph of the Marvel Universe. I originally tried to graph this network using D3.js but, the data set is so massive it was to much for the DOM to handle. Instead I used a python library called Networkx and it graphed the data without any trouble. So,the take home message from this  introduction is that python is awesome.

python

Photo from: https://xkcd.com/353/

These are the topics that are going to be discussed in each post.

  1. Downloading the Data
  2. Processing Data in Python
  3. Graphing Data in Python

First I download the data from this website: http://exposedata.com/marvel/ . Click on the Hero Social Network Data (CSV) link in order to do this.

When you open up the file you will see two columns Source (Person) and Target (The person this person interact with). Instead of using names id numbers for each super hero in order for python to make a network graph.

hero_network_1

 

Next I pre-process it with python.

Things that will be useful to know List,List comprehension, dictionaries and the enumerate function.


'''
Creates a simple graph of the Marvel Universe
Hero CSV file can be found here: http://exposedata.com/marvel/
Doucmentation for Networkx can be found here: http://networkx.github.io/documentation/latest/tutorial/tutorial.html


'''
import csv
import networkx as nx
import matplotlib.pyplot as plt

with open('hero-network.csv','rt') as heroIn: #reads in the hero-network file
 heroIn = csv.reader(heroIn)
 headers = next(heroIn)
 heroes = [row for row in heroIn]

uniqueHeroes = list(set([row[0] for row in heroes])) #takes the reduantices of the superhero networks

id=list(enumerate(uniqueHeroes))# creates a list of tuples with unique ids and their names for each superhero in the network


keys = {name: i for i, name in enumerate(uniqueHeroes)} #creates a dictionary(hash map) that maps each id to the superhero names


links = [] #creates a blank list


for row in heroes: #maps all of the names in the csv file to their id number
 try:
 links.append({keys[row[0]]: keys[row[1]]})
 except: 
 links.append({row[0]: row[1]})

G = nx.Graph() #creates a graph
heroNodeId=[] #takes source and target edges
for row in id:
 heroNodeId.append(row[0])


G.add_nodes_from(heroNodeId)#creates nodes for the graph.

for node in links:#loops through each link and changes each dictionary to a tuple so networkx can read in the information
 edges = node.items()
 G.add_edge(*edges[0])#takes the tuple from the list and unpacks the tuples

nx.draw(G)
plt.show(G)


First import the libraries that you need.

import csv
import networkx as nx
import matplotlib.pyplot as plt

 

Next we read in the csv file by opening the file. Skip the headers of the csv file and read in the data row by row.


with open('hero-network.csv','rt') as heroIn: #reads in the hero-network file
    heroIn = csv.reader(heroIn)
    headers = next(heroIn)
    heroes = [row for row in heroIn]

Next we have to create ids for the superheroes. We have to create a list of superheroes without multiple occurrences. Then we create a dictionary for each superhero with their name as the key and their number being ther value.


uniqueHeroes = list(set([row[0] for row in heroes])) #takes the redundancy of the superhero networks

id=list(enumerate(uniqueHeroes))# creates a list of tuples with unique ids and their names for each superhero in the network

keys = {name: i for i, name in enumerate(uniqueHeroes)} #creates a dictionary(hash map) that maps each id to the superhero names

Next we create a list of links for each person(source) and the person that they interacted with(target)


links = [] #creates a blank list

for row in heroes: #maps all of the names in the csv file to their id number
     try:
        links.append({keys[row[0]]: keys[row[1]]})
     except: 
       links.append({row[0]: row[1]})

Next we can create a graph using Networkx. I take all of the superhero id numbers and create nodes out of them.


G = nx.Graph() #creates a graph
heroNodeId=[] 
for row in id:
 heroNodeId.append(row[0])
G.add_nodes_from(heroNodeId)#creates nodes for the graph.

Next it is the links. Since NetworkX needs links to be in tuples I loop through the dictionary and use the items() method which creates a tuple which returns a copy of the dictionary’s list of (key, value) pairs in tuples.


for node in links:#loops through each link and changes each dictionary to a tuple so networkx can read in the information
 edges = node.items()
 G.add_edge(*edges[0])#takes the tuple from the list and unpacks the tuples

nx.draw(G)
plt.show(G)


And here you go a simple network graph. It is not pretty but it is a start.

figure_1

 

I also have code to put the nodes and the links into to a python format. Just click here to my github page.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s