Tuesday, June 25, 2013

Trawling Social Media

Lately, I've been digging into using the APIs of various social media platforms as tools to help explore the spread of information, sentiment, and so forth, specifically focusing on Twitter and Instagram. It's been interesting and sometimes challenging, as I'm going over old tricks (PHP) and learning new ones (authorization and so forth). There have been a number of resources I've found helpful, and I thought I'd post something about my experiences here as a guide to others who are also just getting started with using social media APIs for research.


In order to access the APIs, it's necessary to authenticate with the API itself. Both Twitter and Instagram use the OAuth protocol to provide users with access to their data. This involves having an account, registering an application, and generating and providing access codes in the appropriate places. I found the following to be very helpful in understanding and interacting with OAuth:

  • 140 Dev Twitter OAuth Programming - a tutorial on using OAuth in the context of Twitter applications. You have to sign up as a member to get access to the text, but I highly recommend it.
  • tmhOAuth - An OAuth library used in the 140 Dev tutorial, which with minimal modification can be used to access Instagram data as well (specifically, by changing 'api.twitter.com' on line 40 to 'api.instagram.com').
The App Itself

I've been using PHP to do my data extraction, but other sites discuss using javascript, etc, to do something similar. If you're comfortable with the command line, using PHP is incredibly simple and I highly recommend it. I've included a simple script which, if you edit it to include your target username, will pull the last 100 tweets from a user. To use it, you can drop the tmhOAuth.php and cacert.pem files into a directory with a copy of your application tokens, put a simple PHP script in that same directory, and type

commandline> php myScript.php > outputfile.txt

and voila! You're done. Until you run into 429 Errors, aka rate limiting.

Rate Limiting

Rate limiting will probably make you want to tear out your hair. Twitter and Instagram have different limits on how many times you can query the API within however much time. Twitter even breaks rate limiting down by the kind of query you send - for example, under the current REST 1.1 API policies, a user can submit only 15 "GET lists" queries relative to 180 "GET statuses/user_timeline" queries within a 15 minute period. So be aware of this, and factor it into your applications.

Anyway, hope this is helpful to someone!

Sample PHP files

No comments:

Post a Comment