I received some good feedback on additional things people thought might be interesting to look investigate regarding next week’s election. So today I’ve done two things: 1) Taken a brief look at media outlet perception of candidates based on published articles and 2) Re-examined the Irish twitter stream with a new sentiment engine to see how the individual candidates stack up against each other.
NEWS SOURCE PERCEPTIONS
Once again using ScraperWiki, I picked two different news sites to scrape for their election coverage. RTÉ and The Irish Times both make it a bit awkward to find all of their Áras election coverage in one place. On RTÉ, the best source was http://www.rte.ie/news/presidential-special-reports.html, but I’m not positive it’s comprehensive as it was a link I stumbled on to while digging around their site. For The Irish Times, I used a search function to pull up 100 articles containing the word “Aras.”
The best site, which I ran out of time today to include but will add it later, is unsurprisingly TheJournal.ie as they have a nice tagging system. You can simply visit http://www.thejournal.ie/topic/race-for-the-aras/ for all of their great election coverage.
Back to The Irish Times and RTÉ: using scrapers to comb through their HTML I pulled out article titles and descriptions to get a brief understanding of what tone comes through and who is talked about most. With more time, one could easily walk through all of the articles and grab and parse that text as well, but this is a basic exercise. Another tough thing about RTÉ’s coverage and what may limit me digging deeper there is that they have so much video coverage and I couldn’t seem to find any transcripts of video reports. Parsing audio into text from video reports is a whole other project!
The Irish Times:
The data sets and visualizations are all linked on my Many Eyes page here in case you’d like to do your own visualization of the data.
IMPROVED TWITTER SENTIMENT
In my last related post, I used twitrratr to do a very simple analysis of how people in Ireland were feeling about the election. However it is a very simple application and I wanted to expand on it, using a better sentiment algorithm.
R is a statistical computing and graphics generation language and tool. R allows very interesting and complex analysis of language and data. I used two R tools to help source and evaluate the Tweets. First I used Jeff Gentry’s twitteR package which has some very easy methods for searching twitter timelines. A search for tweets related to David Norris, for example, might look something like this:
norris.tweets = searchTwitter(‘aras11 AND norris OR david OR SenDavidNorris’, n=1500)
where the words in quotes are my search terms and the n=1500 refers to how many tweets it should return. So I built queries like these to search for tweets related to the individual candidates.
The next tool I used was an “opinion lexicon” by Hu & Liu. If you’re not familiar with processing language, the easiest way to explain this is it’s a big dictionary with almost 7,000 words which are categorized as positive or negative. Words like “love” or “amazing” would be categorized as positive, and words like “hate” or “sucks” would be considered negative. Of course this doesn’t allow for sarcasm, so we have to assume that most people mean what they say. In the future maybe we’ll have to also search for a “sarcasm” hash tag and then reverse the word values!
With the opinion lexicon, we can go through all of the tweets and score them depending on whether the words in the tweet are more positive or negative.
Finally, we can plot the answers on a histogram as shown below. The diagram is a bar chart showing for each candidate, how many tweets were considered positive versus negative. We can see that Dana Scallon has relatively more negative tweets than the others, and that Michael Higgins has relatively more positive tweets than the others. Higgins also seems to have the widest variety, with tweets going up to a score of six and down to a negative five.
I’d like to go deeper into the actual published articles, which will take a bit more time but could provide some interesting results. I would also love to look at additional sources such as TheJournal.ie and The Irish Independent. As I have with the previous charts, I’ll continue to update these daily until the election and see if Twitter is able to make a good prediction about the final result.
After the election I’ll also do a blog post on how to create your own data visualizations from public sources with easy tools that you don’t have to be a programmer to use.