Bigdata projects can be very expensive and can easily fail: I suggest to start with a small, useful but not critical project. Better if it is about unstructured data collection and batch processing. In this case you have time to get practise with the new technologies and the Apache Hadoop system can have not critical downtimes.
At home I have the following system running on a small Raspberry PI: for sure it is not fast 😉
At work I introduced Hadoop just few months ago for collecting web data and generating daily reports.
In the article Apache Camel: how to collect data from twitter,
I’ll show howto save all tweets about Pirelli to files in .json format ..
In the article building system integrations with Apache Camel I’ll show how to create in 10 minutes an integration between two databases (without writing any lines of java or c# code):
- looking for uses in the database MOODLE (mysql) with missing attributes
- for each of that users retreiving the missing attributes from the database UPMS (m$ sql server) and then
- adding the missing attributes to the database MOODLE
Any suggestions and comments are welcome!