In these days I’m playing with Apache Pig for running data analysis over Apache Hadoop. Below a sample wordcloud generated from the top word count of nouns of the Italian translation of the Bible


Wordcount (pig script):

a = load '/user/matteo/book.txt';
b = foreach a {
        line = LOWER(REPLACE((chararray)$0, '[!?\\.»«:;,\']', ' '));
    generate flatten(TOKENIZE(line)) as word;
c = group b by word;
d = foreach c generate group, COUNT(b) as cnt;
d_ordered = ORDER d BY cnt DESC;
store d_ordered into '/user/matteo/book-wordcount';

Wordcloud (R script)

p = read.table(file="book-wordcount-nouns.txt")
png("/home/matteo/la-sacra-bibbia-frequenza-parole.png", width=900, height=900)
wordcloud(p$V1, p$V2, scale=c(8,.3),min.freq=2,max.words=200, random.order=T, rot.per=.15)

Posted in Me.

“The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.” Read the full article


This is my first sample microservices written in clojure


Posted in Me.

In these nights I’m playing with Cascalog. for running (map reduce) jobs over Hadoop. A nice news for the future is that “Cascading 3.0 will initially ship with support for: local in-memory, Apache MapReduce (support for both Hadoop 1 and 2 are provided), and Apache Tez. Soon thereafter, with community support, Apache Spark™, Apache Storm and others will be supported through its new pluggable and customizable planner[...]“. Read the full InfoQ article


Posted in Me.

Starting collecting statistics about top (suggested) twitter users at site

Posted in Me.