In these nights I’m playing with Cascalog. for running (map reduce) jobs over Hadoop. A nice news for the future is that “Cascading 3.0 will initially ship with support for: local in-memory, Apache MapReduce (support for both Hadoop 1 and 2 are provided), and Apache Tez. Soon thereafter, with community support, Apache Spark™, Apache Storm and others will be supported through its new pluggable and customizable planner[...]“. Read the full InfoQ article

 

Posted in Me.

Starting collecting statistics about top (suggested) twitter users at site http://top-twitter-users.blogspot.it/

Posted in Me.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The 2 main design themes for Tez are:

  • Empowering end users by:
  • Expressive dataflow definition APIs
  • Flexible Input-Processor-Output runtime model
  • Data type agnostic
  • Simplifying deployment
  • Execution Performance
    • Performance gains over Map Reduce
    • Optimal resource management
    • Plan reconfiguration at runtime
    • Dynamic physical data flow decisions

Old way was:

References:

 

Posted in Me.

 

dump_tweets.R is a tool for  searching tweets and (recursively) crawl users from twitter:

Data are then saved to a MySQL database and can finally be exported to .RData files

dump_tweets.R is sponsorized by Associazione Rospo

 

 

 

Example:

Rscript search.R -q “#opensource”

2014-01-20 20:53:43 INFO::Connecting to TWITTER…
2014-01-20 20:53:43 INFO::Connecting to database=twitter, host=localhost with user=root
2014-01-20 20:53:43 INFO::using UTF8 code
2014-01-20 20:53:43 INFO::Searching for q=#opensource, sinceID=0
2014-01-20 20:53:57 INFO::Found 191 tweets
2014-01-20 20:53:57 INFO::maxID=425355265857187841
2014-01-20 20:53:57 INFO::Saving data to tweet table…
2014-01-20 20:53:58 INFO::saving data to search_results table…

Posted in Me.

network_plotAs suggested by Julianhi’s Blog post, I installed the CRAN Rfacebook package for the R statistical environment and I created a sample igraph with 20 of my “facebook friends”

## See examples for fbOAuth to know how token was created.

## Getting my network of friends

load(“fb_oauth”)

mat <- getNetwork(token=fb_oauth, format=”adj.matrix”)

library(igraph)

network <- graph.adjacency(mat, mode=”undirected”)

pdf(“network_plot.pdf”)

plot(network)

dev.off()

Posted in Me.