In these nights I’m playing with Cascalog. for running (map reduce) jobs over Hadoop. A nice news for the future is that “Cascading 3.0 will initially ship with support for: local in-memory, Apache MapReduce (support for both Hadoop 1 and 2 are provided), and Apache Tez. Soon thereafter, with community support, Apache Spark™, Apache Storm and others will be supported through its new pluggable and customizable planner[...]“. Read the full InfoQ article


Posted in Me.

Starting collecting statistics about top (suggested) twitter users at site

Posted in Me.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The 2 main design themes for Tez are:

  • Empowering end users by:
  • Expressive dataflow definition APIs
  • Flexible Input-Processor-Output runtime model
  • Data type agnostic
  • Simplifying deployment
  • Execution Performance
    • Performance gains over Map Reduce
    • Optimal resource management
    • Plan reconfiguration at runtime
    • Dynamic physical data flow decisions

Old way was:



Posted in Me.


dump_tweets.R is a tool for  searching tweets and (recursively) crawl users from twitter:

Data are then saved to a MySQL database and can finally be exported to .RData files

dump_tweets.R is sponsorized by Associazione Rospo





Rscript search.R -q “#opensource”

2014-01-20 20:53:43 INFO::Connecting to TWITTER…
2014-01-20 20:53:43 INFO::Connecting to database=twitter, host=localhost with user=root
2014-01-20 20:53:43 INFO::using UTF8 code
2014-01-20 20:53:43 INFO::Searching for q=#opensource, sinceID=0
2014-01-20 20:53:57 INFO::Found 191 tweets
2014-01-20 20:53:57 INFO::maxID=425355265857187841
2014-01-20 20:53:57 INFO::Saving data to tweet table…
2014-01-20 20:53:58 INFO::saving data to search_results table…

Posted in Me.

network_plotAs suggested by Julianhi’s Blog post, I installed the CRAN Rfacebook package for the R statistical environment and I created a sample igraph with 20 of my “facebook friends”

## See examples for fbOAuth to know how token was created.

## Getting my network of friends


mat <- getNetwork(token=fb_oauth, format=”adj.matrix”)


network <- graph.adjacency(mat, mode=”undirected”)



Posted in Me.