Category Archives: Me

Deploy tomcat applications in Docker containers

Deploying you applications in containers, you are sure that they are easily portable and scalable…

Here a sample of deploying a .war application using a Docker container

Create a Dockerfile like

FROM tomcat:8-jre8

MAINTAINER "Matteo <matteo.redaelli@gmail.com>"

ADD server.xml /usr/local/tomcat/conf/
ADD tomcat-users.xml /usr/local/tomcat/conf/
ADD ojdbc6.jar /usr/local/tomcat/lib/
ADD bips.war /usr/local/tomcat/webapps/

Build a docker image

docker build . -t myapp

Run one or more docker images of your appplication with

docker run --restart=unless-stopped --name myapp1 -p 8080:8080 -d myapp
docker run --restart=unless-stopped --name myapp2 -p 8081:8080 -d myapp

It is better to redirect tomcat logs to stdout: in this way you can see them with

docker logs myapp

Docker containers can be managed among several servers using tools like Kubernetes (an open-source system for automating deployment, scaling, and management of containerized applications), but it should be an other post ūüėČ

Continuous integration and continuous delivery with Jenkins


In this post I’ll show how to use the opensource tool #Jenkins,¬†“the leading #opensource automation server, Jenkins provides hundreds of plugins to support building, deploying and automating any project”. I’ll create a simple pipeline that executes remote tasks via ssh. It could be used for¬†continuous integration and continuous delivery for Oracle OBIEE Systems

Install (in a docker container)

docker run -p 8080:8080 -p 50000:50000 -v /home/oracle/docker_shares/jenkins:/var/jenkins_home -d jenkins

Configure credentials

Login to Jenkins  (http://jenkins.redaelli.org:8080)

Jenkins -> Manage Jenkins -> Credential -> System -> Add credential

Configure remote nodes

Jenkins -> Manage Jenkins -> Manage nodes ->  Add node

Configure Pipeline

Jenkins -> New Item -> Pipeline

See https://gist.github.com/matteoredaelli/8d306d79e547f3fdfd5d1c467373f8e0

Log analysis with ELK for Business Intelligence systems

In this post I’ll show howto collect logs from several applications (Oracle OBIEE, Oracle Essbase, QlikView, Apache logs, Linux system logs) with the ELK (Elasticsearch, Logstash and Kibana) stack. ELK is a powerful opensource alternative for Splunk. It can easily manage multiline logs.

Installing the ELK stack in docker containers is really fast, easy and flexible..

Continue reading

Managing Spark dataframes in Python

Below a quick sample of using Apache Spark (2.0) dataframes for manipulating data. Sample data is a file of jsonlines like

{"description": "255/40 ZR17 94W", "ean": "EAN: 4981910401193", "season": "tires_season summer", "price": "203,98", "model": "Michelin Pilot Sport PS2 255/40 R17", "id": "MPN: 2351610"}
{"description": "225/55 R17 101V XL", "ean": "EAN: 5452000438744", "season": "tires_season summer", "price": "120,98", "model": "Pirelli P Zero 205/45 R17", "id": "MPN: 530155"}
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.functions import col
from pyspark.sql.functions import lit
from pyspark.sql.functions import *
import re, sys


# warehouse_location points to the default location for managed databases and tables
warehouse_location = 'spark-warehouse'

spark = SparkSession \
    .builder \
    .appName("Python Spark  ") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .enableHiveSupport() \
    .getOrCreate()

records_orig = spark.read.json("scraped_tyres_data.json")

## removing bad records 
records = records_orig \
  .filter(records.id != '') \
  .filter(regexp_extract('description', '(rinnovati)', 1) == '')

## saving bad records  
records_orig.subtract(records).coalesce(1).write.csv("bad-records.csv", sep=";")

# extract new features
regexp_size = "(\d+)/(\d+) R(\d+) (\d+)(\w+)\s*"

records = records \
  .withColumn("width",       regexp_extract("description", regexp_size, 1)) \
  .withColumn("ratio",       regexp_extract("description", regexp_size, 2)) \
  .withColumn("diameter",    regexp_extract("description", regexp_size, 3)) \
  .withColumn("load_index",  regexp_extract("description", regexp_size, 4)) \
  .withColumn("speed_index", regexp_extract("description", regexp_size, 5)) \
  .withColumn("brand",       regexp_extract("model", "^(\w+) ", 1)) \
  .withColumn("season",      trim(regexp_replace("season", "tires_season",""))) \
  .withColumn("id",          trim(regexp_replace("id", "MPN: ",""))) \
  .withColumn("ean",         trim(regexp_replace("ean", "EAN: ",""))) \
  .withColumn("runflat",     regexp_extract("description", "(runflat)", 1)) \
  .withColumn("mfs",         regexp_extract("description", "(MFS|FSL|bordo di protezione|bordino di protezione)", 1)) \
  .withColumn("xl",          regexp_extract("description", " (XL|RF)\s*", 1)) \
  .withColumn("chiodabile",  regexp_extract("description", "(chiodabile)\s*", 1))

## extracting and saving all season values
records.select("season").distinct().coalesce(1).write.csv("season_values", sep=";")

# misc
# records.columns   # show columns
# records.groupBy("brand").count().show()
# records.groupBy("brand").count().filter("count > 100").show(20,False)
#
# renaming all columns before joining dataframes with same column names
# records_renamed = records.select(*(col(x).alias(x + '_renamed') for x in records.columns))
# join two dataframe
# records.join(record_renamed, records.ean == records_renamed.ean_renamed)
#
#
# saving data to several formats
records.coalesce(1).write.csv("result.csv", sep=";")
records.write.json("result.json")
records.write.parquet("result.parquet")
records.write.format("com.databricks.spark.avro").save("result.avro")

 

Howto batch Install OBIEE 12c (silent mode)

If you wanted to install / deploy automatically obiee systems in a datacenter/cloud you could simply run few simple commands like:

export TEMP=/home/oracle/tmp
export TEMPDIR=/home/oracle/tmp
export JAVA_HOME=/home/oracle/apps/jdk1.8.0

java -Djava.io.tmpdir=/home/oracle/tmp -jar fmw_12.2.1.2.0_infrastructure.jar \
     -silent -responseFile /home/oracle/KIT/response_01_fmw_infrastructure.rsp \
     -invPtrLoc /home/oracle/oraInst.loc

./bi_platform-12.2.1.2.0_linux64.bin -silent \
      -responseFile /home/oracle/KIT/response_02_bi_platform.rsp \
      -invPtrLoc /home/oracle/oraInst.loc \
      -ignoreSysPrereqs

export ORACLE_HOME=/home/oracle/Oracle/Middleware/Oracle_Home
export BI_PRODUCT_HOME=$ORACLE_HOME/bi
$BI_PRODUCT_HOME/bin/config.sh -silent \
    -responseFile /home/oracle/KIT/response_03_bi_platform_config.rsp \
    -invPtrLoc /home/oracle/oraInst.loc \
    -ignoreSysPrereqs

Any faster alternative to #Hadoop HDFS?

I’d like to have an alternative to Hadoop HDFS, a faster and not java filesystem:

Which is better? Any suggestions?

References:

  • [1] https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems

Apache Spark howto import data from a jdbc database using python

Using Apache spark 2.0 and python I’ll show how to import a table from a relational database (using its jdbc driver) into a python dataframe and save it in a parquet file. In this demo the database is an oracle 12.x

file jdbc-to-parquet.py:

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .getOrCreate()


df = spark.read.format("jdbc").options(url="jdbc:oracle:thin:ro/ro@mydboracle.redaelli.org:1521:MYSID", 
      dbtable="myuser.dim_country", 
      driver="oracle.jdbc.OracleDriver").load()

df.write.parquet("country.parquet")

And the run it with

spark-2.0.1-bin-hadoop2.7/bin/spark-submit –jars instantclient_12_1/ojdbc7.jar jdbc-to-parquet.py