In these days I’m playing with Apache Pig for running data analysis over Apache Hadoop. Below a sample wordcloud generated from the top word count of nouns of the Italian translation of the Bible

la-sacra-bibbia-frequenza-parole

Wordcount (pig script):

a = load '/user/matteo/book.txt';
b = foreach a {
        line = LOWER(REPLACE((chararray)$0, '[!?\\.»«:;,\']', ' '));
    generate flatten(TOKENIZE(line)) as word;
}
c = group b by word;
d = foreach c generate group, COUNT(b) as cnt;
d_ordered = ORDER d BY cnt DESC;
store d_ordered into '/user/matteo/book-wordcount';

Wordcloud (R script)

library(wordcloud)
p = read.table(file="book-wordcount-nouns.txt")
png("/home/matteo/la-sacra-bibbia-frequenza-parole.png", width=900, height=900)
wordcloud(p$V1, p$V2, scale=c(8,.3),min.freq=2,max.words=200, random.order=T, rot.per=.15)
dev.off()

Posted in Me.

“The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.” Read the full article

 

This is my first sample microservices written in clojure

 

Posted in Me.

In these nights I’m playing with Cascalog. for running (map reduce) jobs over Hadoop. A nice news for the future is that “Cascading 3.0 will initially ship with support for: local in-memory, Apache MapReduce (support for both Hadoop 1 and 2 are provided), and Apache Tez. Soon thereafter, with community support, Apache Spark™, Apache Storm and others will be supported through its new pluggable and customizable planner[...]“. Read the full InfoQ article

 

Posted in Me.

Starting collecting statistics about top (suggested) twitter users at site http://top-twitter-users.blogspot.it/

Posted in Me.