Author Archives: matteo

Prolog for theorem proving, expert systems, type inference systems, and automated planning…

In the name of the father of Prolog (Alain_Colmerauer who died few days ago), I’ll show how to use Prolog for solving a common business problem: finding the paths in a graph between two nodes..

“Prolog is declarative programming language: the program logic is expressed in terms of relations, represented as facts and rules. A computation is initiated by running a query over these relations”. [Wikipiedia].

Prolog is a general-purpose logic programming language associated with artificial intelligence and computational linguistics [..] The language has been used for theorem provingexpert systemstype inference systems, and automated planning, as well as its original intended field of use, natural language processing.[Wikipiedia].

You tell Prolog the facts and rules of your game and it will find the solution 😉

In this tutorial my graph is the network of underground/train stations of Milan.

The facts are like

station('Affori centro', m3).
station('Affori FN', m3).
station('Affori', s2).
station('Affori', s4).
station('Airuno', s8).
station('Albairate - Vermezzo', s9).
station('Albate Camerlata', s11).
station('Albizzate', s5).
station('Amendola Fiera', m1).
station('Arcore', s8).
station('Assago Milanofiori Forum', m2).
station('Assago Milanofiori Nord', m2).


edge('Villapizzone', 'Lancetti', s5).
edge('Villapizzone', 'Lancetti', s6).
edge('Villa Pompea', 'Gorgonzola', m2).
edge('Villa Raverio', 'Carate-Calò', s7).
edge('Villasanta', 'Monza Sobborghi', s7).
edge('Villa S. Giovanni', 'Precotto', m1).
edge('Vimodrone', 'Cascina Burrona', m2).
edge('Vittuone', 'Pregnana Milanese', s6).
edge('Wagner', 'De Angeli', m1).
edge('Zara', 'Isola', m5).
edge('Zara', 'Sondrio', m3).

The rules are like

adiacent([X,L1], [Y,L1]) :- edge(X,Y, L1) ; edge(Y, X, L1).

change(L1,L2, X) :-
 station(X,L1),
 station(X,L2),
 not(L1 == L2).
 
same_line_path(Node, Node, _, [Node]). % rule 1
same_line_path(Start, Finish, Visited, [Start | Path]) :- % rule 2
 adiacent(Start, X),
 not(member(X, Visited)),
 same_line_path(X, Finish, [X | Visited], Path).

one_change_line_path([Start,L1], [End,L2], Visited, Path):-
 station(Start,L1),
 station(End,L2),
 change(L1,L2, X), 
 same_line_path([Start,L1], [X,L1], [[Start,L1]|Visited], Path1), 
 same_line_path([X,L2], [End,L2], [[X,L2]|Visited], Path2),
 append(Path1, Path2, Path).

You can find a sample test page at https://paroleonline.it/metropolitana-milano/ and the source code of the Prolog webservice at https://github.com/matteoredaelli/metropolitana-milano

Deploy tomcat applications in Docker containers

Deploying you applications in containers, you are sure that they are easily portable and scalable…

Here a sample of deploying a .war application using a Docker container

Create a Dockerfile like

FROM tomcat:8-jre8

MAINTAINER "Matteo <matteo.redaelli@gmail.com>"

ADD server.xml /usr/local/tomcat/conf/
ADD tomcat-users.xml /usr/local/tomcat/conf/
ADD ojdbc6.jar /usr/local/tomcat/lib/
ADD bips.war /usr/local/tomcat/webapps/

Build a docker image

docker build . -t myapp

Run one or more docker images of your appplication with

docker run --restart=unless-stopped --name myapp1 -p 8080:8080 -d myapp
docker run --restart=unless-stopped --name myapp2 -p 8081:8080 -d myapp

It is better to redirect tomcat logs to stdout: in this way you can see them with

docker logs myapp

Docker containers can be managed among several servers using tools like Kubernetes (an open-source system for automating deployment, scaling, and management of containerized applications), but it should be an other post 😉

Continuous integration and continuous delivery with Jenkins


In this post I’ll show how to use the opensource tool #Jenkins, “the leading #opensource automation server, Jenkins provides hundreds of plugins to support building, deploying and automating any project”. I’ll create a simple pipeline that executes remote tasks via ssh. It could be used for continuous integration and continuous delivery for Oracle OBIEE Systems

Install (in a docker container)

docker run -p 8080:8080 -p 50000:50000 -v /home/oracle/docker_shares/jenkins:/var/jenkins_home -d jenkins

Configure credentials

Login to Jenkins  (http://jenkins.redaelli.org:8080)

Jenkins -> Manage Jenkins -> Credential -> System -> Add credential

Configure remote nodes

Jenkins -> Manage Jenkins -> Manage nodes ->  Add node

Configure Pipeline

Jenkins -> New Item -> Pipeline

See https://gist.github.com/matteoredaelli/8d306d79e547f3fdfd5d1c467373f8e0

Log analysis with ELK for Business Intelligence systems

In this post I’ll show howto collect logs from several applications (Oracle OBIEE, Oracle Essbase, QlikView, Apache logs, Linux system logs) with the ELK (Elasticsearch, Logstash and Kibana) stack. ELK is a powerful opensource alternative for Splunk. It can easily manage multiline logs.

Installing the ELK stack in docker containers is really fast, easy and flexible..

Continue reading

Building a Chat Bot…

[last update: June, 5 2017]

My next project will be a Chat Bot..

Starting points are:

There are also some useful softwares as a service like

And  alternatives:

Useful ML libraries:

Managing Spark dataframes in Python

Below a quick sample of using Apache Spark (2.0) dataframes for manipulating data. Sample data is a file of jsonlines like

{"description": "255/40 ZR17 94W", "ean": "EAN: 4981910401193", "season": "tires_season summer", "price": "203,98", "model": "Michelin Pilot Sport PS2 255/40 R17", "id": "MPN: 2351610"}
{"description": "225/55 R17 101V XL", "ean": "EAN: 5452000438744", "season": "tires_season summer", "price": "120,98", "model": "Pirelli P Zero 205/45 R17", "id": "MPN: 530155"}
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.functions import col
from pyspark.sql.functions import lit
from pyspark.sql.functions import *
import re, sys


# warehouse_location points to the default location for managed databases and tables
warehouse_location = 'spark-warehouse'

spark = SparkSession \
    .builder \
    .appName("Python Spark  ") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .enableHiveSupport() \
    .getOrCreate()

records_orig = spark.read.json("scraped_tyres_data.json")

## removing bad records 
records = records_orig \
  .filter(records.id != '') \
  .filter(regexp_extract('description', '(rinnovati)', 1) == '')

## saving bad records  
records_orig.subtract(records).coalesce(1).write.csv("bad-records.csv", sep=";")

# extract new features
regexp_size = "(\d+)/(\d+) R(\d+) (\d+)(\w+)\s*"

records = records \
  .withColumn("width",       regexp_extract("description", regexp_size, 1)) \
  .withColumn("ratio",       regexp_extract("description", regexp_size, 2)) \
  .withColumn("diameter",    regexp_extract("description", regexp_size, 3)) \
  .withColumn("load_index",  regexp_extract("description", regexp_size, 4)) \
  .withColumn("speed_index", regexp_extract("description", regexp_size, 5)) \
  .withColumn("brand",       regexp_extract("model", "^(\w+) ", 1)) \
  .withColumn("season",      trim(regexp_replace("season", "tires_season",""))) \
  .withColumn("id",          trim(regexp_replace("id", "MPN: ",""))) \
  .withColumn("ean",         trim(regexp_replace("ean", "EAN: ",""))) \
  .withColumn("runflat",     regexp_extract("description", "(runflat)", 1)) \
  .withColumn("mfs",         regexp_extract("description", "(MFS|FSL|bordo di protezione|bordino di protezione)", 1)) \
  .withColumn("xl",          regexp_extract("description", " (XL|RF)\s*", 1)) \
  .withColumn("chiodabile",  regexp_extract("description", "(chiodabile)\s*", 1))

## extracting and saving all season values
records.select("season").distinct().coalesce(1).write.csv("season_values", sep=";")

# misc
# records.columns   # show columns
# records.groupBy("brand").count().show()
# records.groupBy("brand").count().filter("count > 100").show(20,False)
#
# renaming all columns before joining dataframes with same column names
# records_renamed = records.select(*(col(x).alias(x + '_renamed') for x in records.columns))
# join two dataframe
# records.join(record_renamed, records.ean == records_renamed.ean_renamed)
#
#
# saving data to several formats
records.coalesce(1).write.csv("result.csv", sep=";")
records.write.json("result.json")
records.write.parquet("result.parquet")
records.write.format("com.databricks.spark.avro").save("result.avro")

 

Howto batch Install OBIEE 12c (silent mode)

If you wanted to install / deploy automatically obiee systems in a datacenter/cloud you could simply run few simple commands like:

export TEMP=/home/oracle/tmp
export TEMPDIR=/home/oracle/tmp
export JAVA_HOME=/home/oracle/apps/jdk1.8.0

java -Djava.io.tmpdir=/home/oracle/tmp -jar fmw_12.2.1.2.0_infrastructure.jar \
     -silent -responseFile /home/oracle/KIT/response_01_fmw_infrastructure.rsp \
     -invPtrLoc /home/oracle/oraInst.loc

./bi_platform-12.2.1.2.0_linux64.bin -silent \
      -responseFile /home/oracle/KIT/response_02_bi_platform.rsp \
      -invPtrLoc /home/oracle/oraInst.loc \
      -ignoreSysPrereqs

export ORACLE_HOME=/home/oracle/Oracle/Middleware/Oracle_Home
export BI_PRODUCT_HOME=$ORACLE_HOME/bi
$BI_PRODUCT_HOME/bin/config.sh -silent \
    -responseFile /home/oracle/KIT/response_03_bi_platform_config.rsp \
    -invPtrLoc /home/oracle/oraInst.loc \
    -ignoreSysPrereqs

Any faster alternative to #Hadoop HDFS?

I’d like to have an alternative to Hadoop HDFS, a faster and not java filesystem:

Which is better? Any suggestions?

References:

  • [1] https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems