Monthly Archives: October 2015

TwitterPopularTags.scala example of Apache Spark Streaming in a standalone project

This is an easy tutorial of using Apache Spark Streaming with Scala language using the official  TwitterPopularTags.scala example and putting it in a standalone sbt project.


In few minutes you will be able to receive streams of tweets and manipulating then in realtime with  Apache Spark Streaming

  • Install Apache Spark (I used 1.5.1)
  • Install sbt
  • git clone
  • cd TwitterPopularTags
  • cp
  • edit
  • sbt package
  • spark-submit –master local –packages “org.apache.spark:spark-streaming-twitter_2.10:1.5.1” ./target/scala-2.10/twitterpopulartags_2.10-1.0.jar italy

Howto collecting twitter data in 15 minutes

For this tutorial I assume you are using a  Debian/Ubuntu Linux system but it could be easily adapted for other Openrating Systems

Install the software

apt-get install openjdk-7-jdk  
tar xvfz apache-karaf-4.0.2.tar.gz

Start the server

cd apache-karaf-4.0.2/

Install additional connectors

ssh -p 8101 karaf@localhost
feature:repo-add camel 2.16.0
feature:install camel camel-blueprint camel-twitter camel-jackson camel-dropbox

Configure our routes

Create two new files:


<?xml version="1.0" encoding="UTF-8"?>
<blueprint xmlns=""

  <camelContext id="twitter-to-file" streamCache="true" xmlns="">

      <json id="jack" library="Jackson" />
      <jaxb id="myJaxb" prettyPrint="true" contextPath="org.apache.camel.example"/>

    <route id="twitter-tweets-to-file">
      <from uri="vm:twitter-tweets-to-file" />
      <setHeader headerName="CamelFileName">
        <to uri="vm:twitter-tweet-to-file" />

    <route id="twitter-tweet-to-file">
      <from uri="vm:twitter-tweet-to-file" />
      <log message="Saving tweet id= ${}" />
      <!-- transforming the body (a single tweet) to a json doc -->
      <marshal ref="jack" />
      <convertBodyTo type="java.lang.String" charset="UTF8" />
      <setHeader headerName="CamelFileName">
      <to uri="file:twitter-data?autoCreate=true&amp;fileExist=Append" />


<blueprint xmlns="">
  <camelContext id="twitter-search-sample" xmlns="">
    <route id="twitter-search-sample">
      <from uri="twitter://streaming/sample?count=100&amp;type=polling&amp;consumerKey=XXX&amp;consumerSecret=XXX&amp;accessToken=XXX&amp;accessTokenSecret=XXX" />
      <setHeader headerName="twitter-id">
      <to uri="vm:twitter-tweets-to-file" />


and copy then in the “deploy” directory. Check logs in data/log/karaf.log and see results in the folder twitter-data/sample/yyyy/mm/dd


Good lucks