This is my proposal for Hack the Government taking place in Berlin April 17 & 18 2010 after re:publica conference.
Implement a site which extracts from twitter trends and subsequent search using the twitter API links, longify those links and present those links to the user.
Rationale: Twitter offers very liberal access to the information daily exchanged and stored on twitters data centers. Based on the powerful search API, accessible as
http://search.twitter.com/and in conjunction with the trend API
http://search.twitter.com/trends.format a huge amount of meta information can be extracted from public tweeds of roughly 25 Million registered users (End 2009, ref. Wikipedia http://en.wikipedia.org/wiki/Twitter) and counting. By using the trend tags as search criteria and an additional filter which returns only web sites, a comprehensive list of hot and interesting sites can easily be created. Before these sites are of actually useful, they have to be de-shortened as the vast majority of links within tweets are shortlinks. Some longurl-services provide such a service for free as http://www.longurlplease.com/
or
http://longurl.org/
Request for implementation – TiPSi (Tipping sites)
Initial release: Implement a site which extracts from twitter trends and subsequent search using the twitter API links, longify those links and present those links to the user.
Additional charactersitics
Near after version 0.1
- Rank the top 10? pages according to heuristics, like time density of tweets returned by tweet search, time in which the associated twitter tag or hashtag remains in trend, …
- extract geoinformation from users pointing to a site and present those geo-id on a zoomable heat-map (see http://code.google.com/p/gheat/)
Later, for version 0.2
- store tweeds assigend to aggregated links. This is valuable data which might be further processed using NLP algorithms (http://www.nltk.org/) like n-gram analysis, stemming, ontology-mapping à la OpenCyc (http://opencyc.org/)
- display trends on websites over time – time machine
Technology
Neutral to Technology, but a fair insight into JSON, AJAX/Comet for increased usability and XML is required. Django (http://www.djangoproject.com/) or Turbogears (http://turbogears.org/) from the Python-camp come to mind as they offer a fair level of abstraction yet do not narrow down creativity. Google App Engine (http://code.google.com/appengine/) happily hosts such a framework!
A database backend is required for more advanced features as time-frame. As I currently ennvision no deep nested relations a NoSQL featured backend as MongoDB (http://www.mongodb.org) or Tokyo Tyrant (http://1978th.net/tokyotyrant/) would probably fit best.
Expected effort
Two seasoned programmers can implement the basic framwork, ie. retrieving trends, resubmitting searches to get links, longify those links and present them in a pleasant manner within a hackday. A hackday has more to offer than eight hours;)
Attribution
This work by TiPSi is licensed under a Creative Commons Attribution 3.0 Austria License.