Author Archives: matteo

Apache Spark howto import data from a jdbc database using python

Using Apache spark 2.0 and python I’ll show how to import a table from a relational database (using its jdbc driver) into a python dataframe and save it in a parquet file. In this demo the database is an oracle 12.x


from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \

df ="jdbc").options(url="jdbc:oracle:thin:ro/", 


And the run it with

spark-2.0.1-bin-hadoop2.7/bin/spark-submit –jars instantclient_12_1/ojdbc7.jar

Building websites for high traffic with REST APIs, AngularJs and Jekyll

If you have few hw resources and/or you expect high traffic on your website, here are some quick suggestions (also taken from the article Meet the Obama campaign’s $250 million fundraising platform):

  • Expose your business logic with REST services
  • Use a javascript framework like angularJS for calling your rest APIs and building a dynamic site
  • Build a (not so) static website using Jekyll or similars and put your static files on S3 (if you are using Amazon AWS)
  • use a CDN

A sample website (not hosted at AWS but home based using a poor raspberryPI2) is

Adding an application (angularjs+rest api) inside a WordPress site

If you need to integrate an application written with AngularJS and Rest API services in your wordpress website, just create an empy page and edit it in “text” mode with something like

<!-- the following two lines can be put in the header template --> 
<script src=""></script>
<script src=""></script>

<div ng-app="myApp" ng-controller="planetController">
       <div >
           <input ng-model="query" placeholder="inserisci una parola" type="text">
            <p><button ng-click="searchV(query)" >Dividi in sillabe</button></p>

A running example is (now, but in the near future I’ll switch to a generated static web site) at

Installing Nodejs oracledb module on Suse SLES 11

For a quick tutorial about installing Oracle module for Nodejs (oracledb) on Suse SLES, follow the info at

Node-OracleDB Installation

but remember to use the gcc compiler release 5.0

export ORACLE_HOME=/home/oracle/instantclient_12_1
export OCI_INC_DIR=$ORACLE_HOME/sdk/include
CC=gcc-5 CXX=g++-5 npm install oracledb

ldiff2sql: How to import ldap data to a database

Export your data

ldapsearch -o ldif-wrap=no -E pr=1000/noprompt -x -h -D "CN=admin,OU=users,DC=redaelli,DC=org" -w mypwd -s sub -b "DC=redaelli,DC=org" "(objectclass=computer)" dNSHostName description operatingSystem operatingSystemVersion  -LLL > ad-computer-sa.ldiff
rm hosts.csv

Converto to sql

Deploying microservices in a Docker container

I already spoke about docker containers (moving datacenters apps from virtual machines to containers)

This is a quick tutorial (my github sample code) about a new way of deploying (micro) services and applications, ie using Docker containers: a sample python webservice and an simple web (html + angularJS code) page

Creating docker containers means defining a file Dockerfile like

FROM python:3.5
#FROM python:3-onbuild

ENV DEBIAN_FRONTEND noninteractive

ENV http_proxy=""
ENV https_proxy=""

COPY requirements.txt /usr/src/app/
COPY /usr/src/app/

WORKDIR /usr/src/app
RUN apt-get update && apt-get install -y nmap
RUN pip install --proxy $HTTP_PROXY --no-cache-dir -r requirements.txt

VOLUME ["/usr/src/app"]

ENTRYPOINT ["python"]
CMD ["./"]

Put the additional python packages you need in a file requirements.txt


And create your application in the file

In this way we are going to create a docker container with python3 and some additional python packages with the command

docker build -t python-infra-ws .

Finally we’ll start the container with the command

docker run -d -t --name python-infra-ws -p 5000:5000 python-infra-ws

Some other useful commands are:

docker stop python-infra-ws
docker start python-infra-ws
docker ps python-infra-ws
docker rm python-infra-ws
docker rmi python-infra-ws

Analyzing huge sensor data in near realtime with Apache Spark Streaming

For this demo I downloaded and installed Apache Spark 1.5.1

Suppose you have a stream of data from several (industrial) machines like

1,2015-01-01 11:00:01,1.0,1.1,1.2,1.3,..
2,2015-01-01 11:00:01,2.2,2.1,2.6,2.8,.
3,2015-01-01 11:00:01,1.1,1.2,1.3,1.3,.
1,2015-01-01 11:00:02,1.0,1.1,1.2,1.4,.
1,2015-01-01 11:00:02,1.3,1.2,3.2,3.3,..

Below a system, written in Python, that reads data from a stream (use the command “nc -lk 9999” to send data to the stream) and every 10 seconds collects alerts from signals: at least 4 suspicious values of a specific signal of the same machine

from pyspark import SparkContext
from pyspark.streaming import StreamingContext

min_occurs = 4

def signals_from_1_row_to_many(row):
  "output is (machine, date, signal_number, signal_value)"
  result = []
  for f in range(2,21):
    result = result + [(row[0], row[1], f-1, row[f])]
  return result

def isAlert(signal, value):
  defaults = [83.0, 57.0, 37.0, 57.0, 45.0, 19.0, -223.0, 20.50, 20.42, 20.48, 20.24, 20.22, 20.43, 20, 20.44, 20.39, 20.36, 20.25, 1675.0]
  soglia = 0.95
  if value == '':
     return True
  value = float(value)
  ref = defaults[signal -1]
  if value < ref - soglia*ref or value > ref + soglia*ref:
    return True
    return False
def isException(machine, signal):
  # sample data. the sensor 19 of machine 11 is broken
  exceptions = [(11,19)]
  return (int(machine), signal) in exceptions 

# Create a local StreamingContext with two working thread and batch interval of 10 second
sc = SparkContext("local[2]", "SignalsAlerts")
ssc = StreamingContext(sc, 10)

# Create a DStream that will connect to hostname:port, like localhost:9999
lines = ssc.socketTextStream("localhost", 9999)

all_alerts = l: l.split(",")) \
                 .flatMap(signals_from_1_row_to_many) \
                 .filter(lambda s: isAlert(s[2], s[3])) \
                 .filter(lambda s: not isException(s[0], s[2])) \
                 .map(lambda s: (s[0]+'-'+str(s[2]), [(s[1], s[3])])) \
                 .reduceByKey(lambda x, y: x + y) 

alerts = all_alerts.filter(lambda s: len(s[1]) > min_occurs)


ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate