The two top Hadoop distributions (Cloudera and Hortonworks but remember that Hadoop is a Free Software and many companies do not pay anything for using it!) include Apache Solr as Hadoop search tool

See apache-solr-hadoop-search article and the following two presentations from the two vendors

 

 

See also the Natural Language Processing and Sentiment Analysis for Retailers using HDP and ITC Infotech Radar article

 

 

 

 

 

Posted in Me.

I opened a service request to Oracle and they did not provide me an official way to add the Google Analytics javascript code to Oracle OBIEE (release 11.1.1.7): I wanted to add it in only one place and see it in all pages of Oracle Obiee.

The solution I found and tested is to add the javascript code (without <scripts> and </scripts>) in the file

bi_server1/tmp/_WL_user/analytics_11.1.1/7dezjl/war/res/b_mozilla/common.js

Pay attention that the file could be overwritten after any software upgrades

Posted in Me.

In this blog post Google confirms its adoption of the opensource statistical environment R (see my R introduction) releasing a new R package..

“How can we measure the number of additional clicks or sales that an AdWords campaign generated? How can we estimate the impact of a new feature on app downloads? How do we compare the effectiveness of publicity across countries? In principle, all of these questions can be answered through causal inference […]

How the package works
The CausalImpact R package implements a Bayesian approach to estimating the causal effect of a designed intervention on a time series. Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets, clicks on other sites, or Google Trends data), the package constructs a Bayesian structural time-series model with a built-in spike-and-slab prior for automatic variable selection. This model is then used to predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had not occurred.” Read the full Google blog post

Posted in Me.

Sometimes ago I read that some components of IBM Watson were implemented in prolog . So I decided to look at it again after many years… I like Prolog, I studied prolog at Computer Science University of Milan and for my thesis I wrote code in Prolog (and Lisp).

proloGraph is a simple example of howto exposing a prolog graph database to other applications,  building a REST web service. I used swi-prolog and its http library

Install the prolog language (I used the fantastic Linux Debian distribution) with

apt-get install swi-prolog

clone my git repository

git clone https://github.com/matteoredaelli/proloGraph
cd proloGraph

Run it with

swipl -s webserver.pl -g 'server(8765).'

Open the following url with your browser

http://localhost:8765/vertex?name=user(matteo)

and you will get:

{
  "prev": [ {"from":"user(gabriele)", "to":"user(matteo)", "rel":"follow"} ],
  "next": [
    {"from":"user(matteo)", "to":"user(ele)", "rel":"follow"},
    {"from":"user(matteo)", "to":"user(gabriele)", "rel":"follow"},
    {"from":"user(matteo)", "to":"user(4)", "rel":"follow"},
    {"from":"user(matteo)", "to":"country(italy)", "rel":"lives"},
    {"from":"user(matteo)", "to":"hobby(running)", "rel":"likes"}
  ]
}
Posted in Me.

In these days I’m playing with Apache Pig for running data analysis over Apache Hadoop. Below a sample wordcloud generated from the top word count of nouns of the Italian translation of the Bible

la-sacra-bibbia-frequenza-paroleCopy the file book.txt to hadoop distribuited file system (HDFS) with

hadoop-2.4.0/bin/hdfs dfs -copyFromLocal -f book.txt

Test the pig job locally with

pig-0.13.0/bin/pig -x local wordcount.pig

Run the pig job in hadoop with

pig-0.13.0/bin/pig -x mapreduce wordcount.pig

Look at results with

hadoop-2.4.0/bin/hdfs dfs -cat book-wordcount/part*|more

Copy the results to a local file with

hadoop-2.4.0/bin/hdfs dfs -cat book-wordcount/part* > frequenza-parole-bibbia.txt

 

Below the two scripts I used for this short tutorial:

Wordcount (pig script):

a = load '/user/matteo/book.txt';
b = foreach a {
        line = LOWER(REPLACE((chararray)$0, '[!?\\.»«:;,\']', ' '));
    generate flatten(TOKENIZE(line)) as word;
}
c = group b by word;
d = foreach c generate group, COUNT(b) as cnt;
d_ordered = ORDER d BY cnt DESC;
store d_ordered into '/user/matteo/book-wordcount';

 

Wordcloud (R script)

library(wordcloud)
p = read.table(file="frequenza-parole-bibbia.txt")
png("/home/matteo/la-sacra-bibbia-frequenza-parole.png", width=900, height=900)
wordcloud(p$V1, p$V2, scale=c(8,.3),min.freq=2,max.words=200, random.order=T, rot.per=.15)
dev.off()

Posted in Me.

“The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.” Read the full article

 

This is my first sample microservices written in clojure

 

Posted in Me.