Installing Nodejs oracledb module on Suse SLES 11

Please follow the info at

https://github.com/oracle/node-oracledb/blob/master/INSTALL.md but

but remember to use gcc compiler release 5.0

export ORACLE_HOME=/home/oracle/instantclient_12_1
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME
export TNS_ADMIN=$ORACLE_HOME
export OCI_LIBRARY_PATH=$ORACLE_HOME
export OCI_LIB_DIR=$ORACLE_HOME
export OCI_INC_DIR=$ORACLE_HOME/sdk/include
CC=gcc-5 CXX=g++-5 npm install oracledb

ldiff2sql: How to import ldap data to a database

Export your data

ldapsearch -o ldif-wrap=no -E pr=1000/noprompt -x -h ldapserver.redaelli.org -D "CN=admin,OU=users,DC=redaelli,DC=org" -w mypwd -s sub -b "DC=redaelli,DC=org" "(objectclass=computer)" dNSHostName description operatingSystem operatingSystemVersion  -LLL > ad-computer-sa.ldiff
rm hosts.csv

Converto to sql

Deploying microservices in a Docker container

I already spoke about docker containers (moving datacenters apps from virtual machines to containers)

This is a quick tutorial (my github sample code) about a new way of deploying (micro) services and applications, ie using Docker containers: a sample python webservice and an simple web (html + angularJS code) page

Creating docker containers means defining a file Dockerfile like

FROM python:3.5
#FROM python:3-onbuild

ENV DEBIAN_FRONTEND noninteractive

ENV HTTP_PROXY="http://myproxy.redaelli.org:80"
ENV HTTPS_PROXY="http://myproxy.redaelli.org:80"
ENV http_proxy="http://myproxy.redaelli.org:80"
ENV https_proxy="http://myproxy.redaelli.org:80"
ENV PIP_OPTIONS="--proxy $HTTP_PROXY"

COPY requirements.txt /usr/src/app/
COPY app.py /usr/src/app/

WORKDIR /usr/src/app
RUN apt-get update && apt-get install -y nmap
RUN pip install --proxy $HTTP_PROXY --no-cache-dir -r requirements.txt

VOLUME ["/usr/src/app"]
EXPOSE 5000

ENTRYPOINT ["python"]
CMD ["./app.py"]

Put the additional python packages you need in a file requirements.txt

Flask
python-nmap
dnspython3

And create your application in the file app.py

In this way we are going to create a docker container with python3 and some additional python packages with the command

docker build -t python-infra-ws .

Finally we’ll start the container with the command

docker run -d -t --name python-infra-ws -p 5000:5000 python-infra-ws

Some other useful commands are:

docker stop python-infra-ws
docker start python-infra-ws
docker ps python-infra-ws
docker rm python-infra-ws
docker rmi python-infra-ws

Analyzing huge sensor data in near realtime with Apache Spark Streaming

For this demo I downloaded and installed Apache Spark 1.5.1

Suppose you have a stream of data from several (industrial) machines like

MACHINE,TIMESTAMP,SIGNAL1,SIGNAL2,SIGNAL3,...
1,2015-01-01 11:00:01,1.0,1.1,1.2,1.3,..
2,2015-01-01 11:00:01,2.2,2.1,2.6,2.8,.
3,2015-01-01 11:00:01,1.1,1.2,1.3,1.3,.
1,2015-01-01 11:00:02,1.0,1.1,1.2,1.4,.
1,2015-01-01 11:00:02,1.3,1.2,3.2,3.3,..
...

Below a system, written in Python, that reads data from a stream (use the command “nc -lk 9999” to send data to the stream) and every 10 seconds collects alerts from signals: at least 4 suspicious values of a specific signal of the same machine

from pyspark import SparkContext
from pyspark.streaming import StreamingContext

min_occurs = 4

def signals_from_1_row_to_many(row):
  "output is (machine, date, signal_number, signal_value)"
  result = []
  for f in range(2,21):
    result = result + [(row[0], row[1], f-1, row[f])]
  return result

def isAlert(signal, value):
  defaults = [83.0, 57.0, 37.0, 57.0, 45.0, 19.0, -223.0, 20.50, 20.42, 20.48, 20.24, 20.22, 20.43, 20, 20.44, 20.39, 20.36, 20.25, 1675.0]
  soglia = 0.95
  if value == '':
     return True
  value = float(value)
  ref = defaults[signal -1]
  if value < ref - soglia*ref or value > ref + soglia*ref:
    return True
  else:
    return False
  
def isException(machine, signal):
  # sample data. the sensor 19 of machine 11 is broken
  exceptions = [(11,19)]
  return (int(machine), signal) in exceptions 

# Create a local StreamingContext with two working thread and batch interval of 10 second
sc = SparkContext("local[2]", "SignalsAlerts")
ssc = StreamingContext(sc, 10)

# Create a DStream that will connect to hostname:port, like localhost:9999
lines = ssc.socketTextStream("localhost", 9999)

all_alerts = lines.map(lambda l: l.split(",")) \
                 .flatMap(signals_from_1_row_to_many) \
                 .filter(lambda s: isAlert(s[2], s[3])) \
                 .filter(lambda s: not isException(s[0], s[2])) \
                 .map(lambda s: (s[0]+'-'+str(s[2]), [(s[1], s[3])])) \
                 .reduceByKey(lambda x, y: x + y) 

alerts = all_alerts.filter(lambda s: len(s[1]) > min_occurs)

alerts.pprint()

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate

TwitterPopularTags.scala example of Apache Spark Streaming in a standalone project

This is an easy tutorial of using Apache Spark Streaming with Scala language using the official  TwitterPopularTags.scala example and putting it in a standalone sbt project.

 

In few minutes you will be able to receive streams of tweets and manipulating then in realtime with  Apache Spark Streaming

  • Install Apache Spark (I used 1.5.1)
  • Install sbt
  • git clone https://github.com/matteoredaelli/TwitterPopularTags
  • cd TwitterPopularTags
  • cp twitter4j.properties.sample twitter4j.properties
  • edit twitter4j.properties
  • sbt package
  • spark-submit –master local –packages “org.apache.spark:spark-streaming-twitter_2.10:1.5.1” ./target/scala-2.10/twitterpopulartags_2.10-1.0.jar italy

Howto collecting twitter data in 15 minutes

For this tutorial I assume you are using a  Debian/Ubuntu Linux system but it could be easily adapted for other Openrating Systems

Install the software

apt-get install openjdk-7-jdk  
wget http://apache.panu.it/karaf/4.0.2/apache-karaf-4.0.2.tar.gz
tar xvfz apache-karaf-4.0.2.tar.gz

Start the server

cd apache-karaf-4.0.2/
./bin/start

Install additional connectors

ssh -p 8101 karaf@localhost
feature:repo-add camel 2.16.0
feature:install camel camel-blueprint camel-twitter camel-jackson camel-dropbox
exit

Configure our routes

Create two new files:

twitter-to-file.xml

<?xml version="1.0" encoding="UTF-8"?>
<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:camel="http://camel.apache.org/schema/blueprint"
       xsi:schemaLocation="
       http://www.osgi.org/xmlns/blueprint/v1.0.0 http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd
       http://camel.apache.org/schema/blueprint http://camel.apache.org/schema/blueprint/camel-blueprint.xsd">

  <camelContext id="twitter-to-file" streamCache="true" xmlns="http://camel.apache.org/schema/blueprint">

    <dataFormats>
      <json id="jack" library="Jackson" />
      <jaxb id="myJaxb" prettyPrint="true" contextPath="org.apache.camel.example"/>
    </dataFormats>

    <route id="twitter-tweets-to-file">
      <from uri="vm:twitter-tweets-to-file" />
      <setHeader headerName="CamelFileName">
         <simple>${in.header.twitter-id}</simple>
      </setHeader>
      <split>
        <simple>${body}</simple>
        <to uri="vm:twitter-tweet-to-file" />
      </split>
    </route>

    <route id="twitter-tweet-to-file">
      <from uri="vm:twitter-tweet-to-file" />
      <log message="Saving tweet id= ${body.id}" />
      <!-- transforming the body (a single tweet) to a json doc -->
      <marshal ref="jack" />
      <convertBodyTo type="java.lang.String" charset="UTF8" />
      <transform>
        <simple>${body}\n</simple>
      </transform>
      <setHeader headerName="CamelFileName">
        <simple>${in.header.CamelFileName}/${date:now:yyyy}/${date:now:MM}/${date:now:dd}</simple>
      </setHeader>
      <to uri="file:twitter-data?autoCreate=true&amp;fileExist=Append" />
    </route>
  </camelContext>
</blueprint>

twitter-streaming-sample.xml

<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0">
  <camelContext id="twitter-search-sample" xmlns="http://camel.apache.org/schema/blueprint">
    <route id="twitter-search-sample">
      <from uri="twitter://streaming/sample?count=100&amp;type=polling&amp;consumerKey=XXX&amp;consumerSecret=XXX&amp;accessToken=XXX&amp;accessTokenSecret=XXX" />
      <setHeader headerName="twitter-id">
        <simple>sample</simple>
      </setHeader>
      <to uri="vm:twitter-tweets-to-file" />
    </route>

  </camelContext>
</blueprint>

and copy then in the “deploy” directory. Check logs in data/log/karaf.log and see results in the folder twitter-data/sample/yyyy/mm/dd

 

Good lucks

Matteo

About Cayley a scalable graph database

 

This is fast tutorial of using the Caylay graph database (with MongoDB as backend): Cayley is “not a Google project, but created and maintained by a Googler, with permission from and assignment to Google, under the Apache License, version 2.0”

{
"database": "mongo",
"db_path": "cayley.redaelli.org:27017",
"read_only": false,
"host": "0.0.0.0"
}
  • ./cayley init -config=cayley.cfg
  • ./cayley http -config=cayley.cfg -host=”0.0.0.0″ &
  • create a file demo.n3
"/user/matteo" "is_manager_of" "/user/ele" .
"/user/matteo" "has" "/workstation/wk0002" .
"/user/matteo" "lives_in" "/country/italy" .
  • upload data with: curl http://cayley.redaelli.org:64210/api/v1/write/file/nquad -F NQuadFile=@demo.n3
  • or: ./cayley load –config=cayley.cfg  -quads=demo.n3
  • query data with: curl –data ‘g.V(“/user/matteo”).Out(null,”predicate”).All()’ http://cayley.redaelli.org:64210/api/v1/query/gremlin
{
 "result": [
  {
   "id": "/workstation/wk0002",
   "predicate": "has"
  },
  {
   "id": "/country/italy",
   "predicate": "lives_in"
  },
  {
   "id": "/user/ele",
   "predicate": "is_manager_of"
  }
 ]