“The Making Of” That Infographic

This is not an op-ed, blog post, or editorial. I warn you now, this will be neither interesting nor an ‘easy read’. It is purely to explain how I got the information for the infographic I recently shared.

@elevatorgate Storify Stats Infographic

The infographic. Click to see full size.

I’ll try to make this as painless as possible, but no promises. There have already been claims that the data is skewed, so, hopefully this will at least clear up any confusion.

Getting the Data

Storify, like just about every web service since Facebook and Twitter became huge, has an API (Application Programming Interface), which allows third parties to write applications that access their data. Now, Storify require some information (as do most web services) in order to grant you an API key. I wasn’t creating an app, so I didn’t have the information they needed (app name, app URL, etc). Fortunately, there is another way.

As my purposes only required the ability to read the information, I could make use of the fact that Storify allows the downloading of .JSON files. The .JSON file basically contains all the information you’re looking at when you open a Storify URL, but in an easier to decipher format for extracting.

Anybody can make use of this feature. Simply go to Storify page (either a user or a story page) and add “.json” to the end of it. For example, http://storify.com/elevatorgate.json. The .json of a user page contains stories by that user. A .json of a story page contains the elements (tweets, flickr images, etc) of that story.

Getting the Stories

The first step was to get a list of all the stories. Storify limit the amount of stories a .json will include (min=1, max=50). Obviously, @elevatorgate has Storified considerably more than fifty stories. This can be dealt with by adding “?page=2”, “?page=3”—and so on—to the end of the URL. So, I wrote a script that would load each page in turn, extract all the story URLs, append “.json” to the end, and save them into a file for later use. Should you wish, you can view that file here.

For those who may wish to scrutinise the actual code, here it is. For the rest, you’ll probably want to skip the code.

The resultant file included every story from the date I ran the script back to as far as Storify would let me go, which included 7,103 stories.

Getting the Elements

Obviously, the issue surrounding @elevatorgate’s Storify usage was not how many stories he created, but who he Storified, and how many times they were Storified. To that end, I created a script that cycled through the list-of-stories file created above, opening each story and grabbing the pertinent data from each element within the story.

This took three hours to run!

It is around the extracting of these tweets that I fear the biggest point of contention will come. The .json files for each story only provide the first twenty elements, and I couldn’t find a way to get the rest. Having looked through a number of @elevatorgate’s stories, I decided not to waste time trying to solve this issue, as his stories seemed to be fewer than twenty elements more often than not. And, in any case, given the amount of stories and tweets sampled, the missing tweets would all have to be from the same person in order to have a significant impact on the data.

As the complaints about @elevatorgate primarily revolve around Twitter, I chose to focus on tweets only. Here is the code.

This code creates a file that contains all the tweets (well, the first twenty) from each of the stories in the first file, a total of 38,517 tweets. I purposely did not add a facility to detect duplicate tweets, seeing as the whole point of contention is the amount of times a person is Storified, I figured every instance should be counted, rather than every unique tweet.

There would have been nothing stopping me from scraping every piece of information Storify provided, but, as I only needed the tweet content and who tweeted it, that (and the link to the original tweet) was all I took. Again, if you wish to view the file containing all the scraped information for the 38,517 tweets that were used for the infographic, you can see it here. At first glance, the file will appear to be an unholy jumble of information, but, should you wish to break it out, the information is delimited using “~”, and is in order of Username, Tweet URL, Tweet Content.

Using the Data

Finally, with all that data, I had to do something with it. Most of it is pretty straight forward (x amount of tweets by one person is what % of total, and so on), but I will share one bit of code that was invaluable. It’s not mine, though.

By putting all the usernames into one string and running it through this code, I am given a complete list of each individual who has been Storified (5,544), along with how many times they were Storified. By putting all the tweets into a string, I was able to use the same script to get the most used words in all the tweets that were Storified.

Contentious Issues

Like I said, if the tweets that weren’t counted were all about one person, maybe there’d be an issue. As it stands, I think the sample is perfectly representative of @elevatorgate’s Storify usage. The same applies to anyone who may argue that @elevatorgate has Storified quite a few people since I ran the script; every story would have to be entirely of one person to seriously affect the data.

The replacing of “~” with “-” in the tweet content is the only changes that were made to the data. I hope we can all agree that those changes don’t actually change anything of importance.

 All Done

So, there you have it. That is how I got my data. I realise that @elevatorgate has more than one Storify account, but none of them have anywhere near the amount of Stories that the ElevatorGATE one does.

The infographic stands alone, free of my personal opinion. Make of it what you will.

4 thoughts on ““The Making Of” That Infographic

  1. This is also possible if they offer SEO as part of their service.
    One should keep in mind that communication lines for
    live support are kept open for paid accounts, who are given top priority.

    Japanese Gardens – Although the gardens are technically in Fort
    Worth, Texas, it is only a short drive.

  2. I loved as much as you will receive carried out right
    here. The sketch is attractive, your authored subject matter stylish.

    nonetheless, you command get got an shakiness over
    that you wish be delivering the following. unwell unquestionably
    come more formerly again since exactly the same nearly a lot often inside case you shield this hike.

Leave a Reply

Your email address will not be published. Required fields are marked *