Talk Pytolab: Twitter statistics on the 2012 French presidential election

Presented by Laurent Luce in Scientific track 2012 on 2012/08/26 from 12:00 to 12:30

Tweets are a great source to extract statistics on society's reaction to an event. Pytolab was created to provide Twitter statistics on the 2012 French presidential election. Over 6 millions tweets related to the candidates and other politicians were processed by PytoLab during the campaign.

Pytolab is an open-source web service completely written in Python allowing:

  • Visualization on the number of tweets written in French referring to the candidates per hour.
  • Graphical representation of the candidates that are talked about in the same tweet.
  • Extraction of the most frequent words referring to each candidate.

This project is trying to answer the following questions:

  • Do patterns emerge when multiple candidates are referred to in the same post?
  • Is there a correlation between the number of tweets and the polls?
  • Is there a link between the most used words in the tweets related to a candidate and the popularity of that candidate?
  • How are the tweets distributed per author? Is it the same for each candidate?
  • What is the percentage of automatically generated versus manually entered tweets?
  • What types of events and authors generate the most tweets?
  • Is there a correlation between the tweet's location and its content?

The architecture of Pytolab processes the tweets in real-time. It uses the Python Twitter streaming API helper library Tweepy and the messaging system RabbitMQ to receive tweets. Two databases are used to store recent and older data. The key/value in-memory store Redis with the Python helper library redis-py is utilized for the recent data, and MySQL with the Python helper library MySQLdb is used for older data. The computation on the Twitter data is done in Python with the help of the following libraries: networkx and pygraphviz for the graphs and NLTK for text processing. The Python web framework Django is used to process the users requests.

In this talk, I intend to present the results on the questions defined above. I will also give details on the challenges encountered and on the architecture of Pytolab.

Author: Laurent Luce

tagged by
no related entity