Idea for Implementation – TiPSi

This is my proposal for Hack the Government taking place in Berlin April 17 & 18 2010 after re:publica conference. Implement a site which extracts from twitter trends and subsequent search using the twitter API links, longify those links and prese…

Advertisements

Tipsi-s

This is my proposal for Hack the Government taking place in Berlin April 17 & 18 2010 after re:publica conference.

Implement a site which extracts from twitter trends and subsequent search using the twitter API links, longify those links and present those links to the user.

Rationale: Twitter offers very liberal access to the information daily exchanged and stored on twitters data centers. Based on the powerful search API, accessible as

http://search.twitter.com/
  and in conjunction with the trend API
http://search.twitter.com/trends.format

a huge amount of meta information can be extracted from public tweeds of roughly 25 Million registered users (End 2009, ref. Wikipedia http://en.wikipedia.org/wiki/Twitter) and counting. By using the trend tags as search criteria and an additional filter which returns only web sites, a comprehensive list of hot and interesting sites can easily be created. Before these sites are of actually useful, they have to be de-shortened as the vast majority of links within tweets are shortlinks. Some longurl-services provide such a service for free as

http://www.longurlplease.com/
  or
http://longurl.org/

Request for implementation – TiPSi (Tipping sites)

Initial release: Implement a site which extracts from twitter trends and subsequent search using the twitter API links, longify those links and present those links to the user.

Additional charactersitics

Near after version 0.1

  • Rank the top 10? pages according to heuristics, like time density of tweets returned by tweet search, time in which the associated twitter tag or hashtag remains in trend, …
  • extract geoinformation from users pointing to a site and present those geo-id on a zoomable heat-map (see http://code.google.com/p/gheat/)

Later, for version 0.2

  • store tweeds assigend to aggregated links. This is valuable data which might be further processed using NLP algorithms (http://www.nltk.org/) like n-gram analysis, stemming, ontology-mapping à la OpenCyc (http://opencyc.org/)
  • display trends on websites over time – time machine

Technology

Neutral to Technology, but a fair insight into JSON, AJAX/Comet for increased usability and XML is required. Django (http://www.djangoproject.com/) or Turbogears (http://turbogears.org/) from the Python-camp come to mind as they offer a fair level of abstraction yet do not narrow down creativity. Google App Engine (http://code.google.com/appengine/) happily hosts such a framework!
A database backend is required for more advanced features as time-frame. As I currently ennvision no deep nested relations a NoSQL featured backend as MongoDB (http://www.mongodb.org) or Tokyo Tyrant (http://1978th.net/tokyotyrant/) would probably fit best.

Expected effort

Two seasoned programmers can implement the basic framwork, ie. retrieving trends, resubmitting searches to get links, longify those links and present them in a pleasant manner within a hackday. A hackday has more to offer than eight hours;)

Attribution

Creative Commons License
This work by TiPSi is licensed under a Creative Commons Attribution 3.0 Austria License.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s