Author Archives: matteo

N queens in Prolog

:- use_module(library(clpfd)).

n_queens(N, Queens) :-
        length(Queens, N),
        Queens ins 1..N,
	all_different(Queens), %% the queens must be in different columns
        different_diagonals(Queens).

different_diagonals([]).
different_diagonals([Q|Queens]) :- different_diagonals(Queens, Q, 1), different_diagonals(Queens).

different_diagonals([], _, _).
different_diagonals([Q|Queens], Q0, Distance) :-
        abs(Q0 - Q) #\= Distance,
        NewDistance #= Distance + 1,
        different_diagonals(Queens, Q0, NewDistance).

/* 
   Queens is the list of columns of the queens: the corresponding rows are the position of the elements/columns in the list Queens

   Examples:

   ?- n_queens(8, Queens), labeling([ff], Queens).
   %@ Queens = [1, 5, 8, 6, 3, 7, 2, 4] ;
   %@ Queens = [1, 6, 8, 3, 7, 4, 2, 5] .

   The result chess cells are, for instance, [1,1], [2,5], [3,8], [4,6], [5,3], [6,7], [7,2], [8,4]

   Suggestion from https://www.swi-prolog.org/pldoc/man?section=clpfd-n-queens
 */

Prolog and Constraint Logic Programming over Finite Domains

I like Prolog and in these days I have studied the library CLP(FD).

For instance it is easy to write a simple code for solving “How many men and horses have 8 heads and 20 feet?”. You write the rules and contraints and Prolog will find the solution for you

men_and_horses(Men, Horses):-
    Men in 0..10,
    Horses in 0..10,
    Men + Horses #= 8, %% heads must be 8
    Men * 2 + Horses * 4 #= 20. %% feet must be 20

?- men_and_horses(Men, Horses).
 Men = 6,
 Horses = 2.

“clp(fd) is useful for solving a wide variety of find values for these variables problems. Here are some broad categories of problems that clp(fd) can address:

  • Scheduling problems, like, when should we do what work in this factory to make these products?
  • Optimization problems, like which mix of products should we make in this factory to maximize profits?
  • Satisficing problems, like finding an arrangement of rooms in a hospital that meets criteria like having the operating theater near the recovery rooms, or finding a set of vacation plans the whole family can agree on.
  • Sequence problems, like finding a travel itinerary that gets us to our destination.
  • Labeling problems, like Sudoku or Cryptarithm puzzles
  • …. ” [See what_is_clp_fd_good_for]

Running Talend Remote Engine in a docker container

I was not able to find a Dockerfile for running Talend Remote Engine in a container. So I tried to build a new one. It is a working in progress: do you have any suggestions?

TODO / Next steps:

  • registering the engine using Talend API
  • running the engine with a unix user “talend”
FROM centos:7
# centos is the recommended linux distribution
MAINTAINER  Matteo Redaelli <matteo.redaelli@gmail.com>

# Build and run this image with:
# - docker build -t  talend/remote_engine:2.7.0
# - docker run -d --name talend_remote_engine talend/remote_engine:2.7.0 run

# Set environment variables.
ENV TALEND_HOME /opt/talend
ENV TALEND_ENGINE_HOME $TALEND_HOME/remote_engine
ENV HOME $TALEND_HOME
ENV JAVA_VERSION=java-1.8.0-openjdk
ENV JAVA_HOME /usr/lib/jvm/$JAVA_VERSION

# Installing java
RUN yum update -y && \
   yum install -y $JAVA_VERSION ${JAVA_VERSION}-devel && \ 
   rm -rf /var/cache/yum

# Define working directory.
WORKDIR $TALEND_HOME

## remember to update config files before creating the image
## - Talend-RemoteEngine-*/etc/preauthorized.key.cfg with the engine key, name and description
## - Talend-RemoteEngine-*/etc/system.properties with proxy settings (if needed)
COPY Talend-RemoteEngine-V2.7.0 $TALEND_ENGINE_HOME
 
#RUN mkdir $HOME/.m2
#COPY settings.xml $HOME/.m2/settings.xml

# Define default command.
# See trun source for options: you shuld use "run"
ENTRYPOINT ["/opt/talend/remote_engine/bin/trun"]

Using Apache Camel from Groovy

Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

Apache Groovy is a Java-syntax-compatible object-orientedprogramming language for the Java platform. It is both a static and dynamic language with features similar to those of PythonRuby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java’s. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy’s power lies in its AST transformations, triggered through annotations. [Wikipedia]

Create a file camel-test.groovy like the following

 @Grab('org.apache.camel:camel-core:2.21.5')
 @Grab('javax.xml.bind:jaxb-api:2.3.0')
 @Grab('org.slf4j:slf4j-simple:1.7.21')
 @Grab('javax.activation:activation:1.1.1')

 import org.apache.camel.*
 import org.apache.camel.impl.*
 import org.apache.camel.builder.*
 def camelContext = new DefaultCamelContext()
 camelContext.addRoutes(new RouteBuilder() {
     def void configure() {
         from("timer://jdkTimer?period=3000")
             .to("log://camelLogger?level=INFO")
     }
 })
 camelContext.start()
 addShutdownHook{ camelContext.stop() }
 synchronized(this){ this.wait() }

Test it with

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64 groovy camel-test.groovy

Using Terraform for managining Amazon Web Service infrastructure

In the last days I tested Terraform (Use Infrastructure as Code to provision and manage any cloud, infrastructure, or service) for managing some resources in a AWS cloud ebvironemnt.

In this sample I’ll create and schedule a lambda function

Create a file "variables.tf" with the content:

variable "aws_region" {default = "eu-west-1"}
variable "aws_profile" {default = ""}
variable "project" {default = "my_project"}

variable "vpc" {default= "XXXXX"}
variable "subnets" {default= "XXXX"}
variable "aws_account" {default= "XXX"}
variable "security_groups" {default= "XXXX"}
#
variable "db_redshift_host" {default= ""}
variable "db_redshift_port" {default= ""}
variable "db_redshift_name" {default= ""}
variable "db_redshift_username" {default= ""}
variable "db_redshift_password" {default= ""}
Create a file lambda.tf as follow:

provider "aws" {
  region  = "${var.aws_region}"
  profile = "${var.aws_profile}"
}
# ############################################################################
# CLOUDWATCH
# ############################################################################
resource "aws_cloudwatch_log_group" "log_group" {
  name              = "/aws/lambda/${var.project}"
  retention_in_days = 14
}

# ############################################################################
# CLOUDWATCH rules
# ############################################################################
resource "aws_cloudwatch_event_rule" "rule" {
  name        = "${var.project}-rule"
  description = "scheduler for ${var.project}"
  schedule_expression = "cron(0 10 * * ? *)"
}
resource "aws_cloudwatch_event_target" "trigger_lambda" {
  rule  = "${aws_cloudwatch_event_rule.rule.name}"
  arn   = "${aws_lambda_function.lambda.arn}"
}

# ############################################################################
# iam
# ############################################################################
resource "aws_iam_role" "role" {
  name = "${var.project}_role"
  #assume_role_policy = "${file("assumerolepolicy.json")}"
  assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
  {
    "Action": "sts:AssumeRole",
    "Principal": {
      "Service": "lambda.amazonaws.com"
    },
    "Effect": "Allow",
    "Sid": ""
  }
]
}
EOF
}

resource "aws_iam_policy" "logging" {
  name = "${var.project}_logging"
  path = "/"
  description = "${var.project} IAM policy for logging"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*",
      "Effect": "Allow"
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "logging" {
  role = "${aws_iam_role.role.name}"
  policy_arn = "${aws_iam_policy.logging.arn}"
}

resource "aws_iam_role_policy_attachment" "policy_attachment_vpc" {
  #name       = "${var.project}_attachment_vpc"
  role       = "${aws_iam_role.role.name}"
  policy_arn = "arn:aws:iam::aws:policy/AmazonVPCFullAccess"
}

resource "aws_iam_role_policy_attachment" "policy_attachment_rds" {
  role       = "${aws_iam_role.role.name}"
  policy_arn = "arn:aws:iam::aws:policy/AmazonRDSReadOnlyAccess"
}

resource "aws_iam_role_policy_attachment" "policy_attachment_redshift" {
  role       = "${aws_iam_role.role.name}"
  policy_arn = "arn:aws:iam::aws:policy/AmazonRedshiftReadOnlyAccess"
}

# ###############################################
# lambda_action
# ###############################################

resource "aws_lambda_function" "lambda" {
  function_name = "${var.project}_lambda"
  depends_on    = ["aws_iam_role_policy_attachment.logging", "aws_cloudwatch_log_group.log_group"]
  filename      = "lambda.zip"
  role          = "${aws_iam_role.role.arn}"
  handler       = "lambda_function.lambda_handler"
  source_code_hash = "${filebase64sha256("lambda.zip")}"
  runtime = "python3.7"
  timeout          = "30"
  memory_size      = 256
  publish          = true
  vpc_config {
    subnet_ids = "${var.subnets}"
    security_group_ids = "${var.security_groups}"
  }

  environment {
    variables = {
      db_redshift_host= "${var.db_redshift_host}"
      db_redshift_port= var.db_redshift_port
      db_redshift_name= "${var.db_redshift_name}"
      db_redshift_username= "${var.db_redshift_username}"
      db_redshift_password= "${var.db_redshift_password}"
    }
  }
}

Now you can run

terraform init 
terraform  plan -var aws_profile=myprofile
terraform  apply -var aws_profile=myprofile 
terraform  destroy -var aws_profile=myprofile 

Scheduling start/stop of Amazon RDS instances using CDK libraries

Instead of creating the necessary aws resources using the Aws Console, I wanted to use the new AWS CDK libraries: in this way the aws resources can be created and deleted using Python.

“The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to model and provision your cloud application resources using familiar programming languages.

Provisioning cloud applications can be a challenging process that requires you to perform manual actions, write custom scripts, maintain templates, or learn domain-specific languages. AWS CDK uses the familiarity and expressive power of programming languages for modeling your applications. ” [source aws-cdk]

As suggested by https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html I installed the required software and then I ran

mkdir rds-start-stop-cdk
cd rds-start-stop-cdk
cdk init --language python
python3 -m venv .env
source .env/bin/activate
# now I added the code you can see at
# https://gitlab.com/matteo.redaelli/rds-start-stop-cdk
pip install -r requirements.txt
cdk ls

You can see my sample code at https://gitlab.com/matteo.redaelli/rds-start-stop-cdk

Scheduling AWS EMR clusters resize

Below a sample of howto schedule an Amzon Elastic MapReduce (EMR) cluster resize. It is useful if you have a cluster that is less used during the nights or in the weekends

I used a lambda function triggered by a Cloudwatch rule. Here is my python lambda function

import boto3, json

MIN=1
MAX=10

def lambda_handler(event, context):
    region = event["region"]
    ClusterId = event["ClusterId"]
    InstanceGroupId = event["InstanceGroupId"]
    InstanceCount = int(event['InstanceCount'])
    
    if InstanceCount >= MIN and InstanceCount <= MAX:
        client = boto3.client('emr', region_name=region)
        response = client.modify_instance_groups(
            ClusterId=ClusterId,
            InstanceGroups= [{
                "InstanceGroupId": InstanceGroupId,
                "InstanceCount": InstanceCount
            }])
        return response
    else:
        msg = "EMR cluster id %s (%s): InstanceCount=%d is NOT allowed [%d,%d]" % (ClusterId, region, InstanceGroupId, InstanceCount, MIN,MAX)
        return {"response": "ko", "message": msg}

Below the CloudWatch rule where the input event is a constant json object like 

{"region": "eu-west-1","ClusterId": "j-dsds","InstanceGroupId": "ig-sdsd","InstanceCount": 8}



Exporting database tables to csv files with Apache Camel

Below the interested part of code using spring xml

     <bean id="ds-patriot-dw_ro" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
         <property name="driverClassName" value="oracle.jdbc.OracleDriver" />
          <property name="url" value="jdbc:oracle:thin:@//patriot.redaelli.org:1111/RED"/>
          <property name="Username" value="user"/>
          <property name="Password" value="pwd"/>
  </bean>


<camelContext id="MyCamel" streamCache="true" xmlns="http://camel.apache.org/schema/spring">

    <route id="scheduler">
      <from uri="timer:hello?repeatCount=1"/>
      <setHeader headerName="ndays">
        <constant>0</constant>
      </setHeader>
      <to uri="direct:start"/>
    </route>

    <route>
      <from uri="direct:start"/>
      <setBody>
        <constant>table1,table2,table3</constant>
      </setBody>
      <split streaming="true">
        <tokenize token="," />
        <setHeader headerName="tablename">
          <simple>${body}</simple>
        </setHeader>
        <to uri="direct:jdbc2csv"/>
      </split>
    </route>

    <route>
      <from uri="direct:jdbc2csv"/>
        <to uri="direct:get-jdbc-data" pattern="InOut" />
        <to uri="direct:export-csv" />
    </route>

    <route>
      <from uri="direct:get-jdbc-data"/>
      <log message="quering table ${headers.tablename}..."/>

      <setBody>
        <groovy><![CDATA[
          "SELECT * from " + request.headers.get('tablename')
    ]]>
        </groovy>
      </setBody>
      <log message="quering statement: ${body}..."/>
      <to uri="jdbc:ds-patriot-dw_ro?useHeadersAsParameters=true&outputType=StreamList"/>
    </route>

    <route>
      <from uri="direct:export-csv"/>
            <log message="saving table ${headers.tablename} to ${headers.CamelFileName}..."/>
      <setHeader headerName="CamelFileName">
        <groovy>
          request.headers.get('tablename').replace(".", "_") + "/" + request.headers.get('tablename') + ".csv"
        </groovy>
      </setHeader>
      
      <!-- <marshal><csv></marshal> does not include header. I have to export it manualy.. -->
      
      <multicast stopOnException="true">
        <pipeline>
          <log message="saving table ${headers.tablename} header to ${headers.CamelFileName}..."/>
          <setBody>     
    <groovy>request.headers.get('CamelJdbcColumnNames').join(";") + "\n"</groovy>
          </setBody>
          <to uri="file:output"/>
        </pipeline>

        <pipeline>
          <log message="saving table ${headers.tablename} rows to ${headers.CamelFileName}..."/>
          <marshal>
            <csv delimiter=";" headerDisabled="false" useMaps="true"/>
          </marshal>
          <to uri="file:output?fileExist=Append"/>
        </pipeline>
      </multicast>

      <log message="saved table ${headers.tablename} to ${headers.CamelFileName}..."/>
  </route>

  </camelContext>

Add AD users from csv to group using powershell 

$GroupName = "Qliksense_SI_Techedge"
$Users =  "e:\scripts\users.csv"

Import-module ActiveDirectory 

$dc = Get-ADDomainController -DomainName mydomain.redaelli.org -Discover -NextClosestSite
$server = $dc.HostName[0]

get-content $Users | ForEach-Object {
  Get-ADUser -Server $server -LDAPFilter "(mail=$_)" } |
  Select-Object -ExpandProperty sAMAccountName |  ForEach-Object { Add-ADGroupMember -Server $server -Identity $GroupName -Member $_ }