Bigdata projects can be very expensive and can easily fail: I suggest to start with a small, useful but not critical project. Better if it is about unstructured data collection and batch processing. In this case you have time to get practise with the new technologies and the Apache Hadoop system can have not critical downtimes.
At home I have the following system running on a small Raspberry PI: for sure it is not fast
At work I introduced Hadoop just few months ago for collecting web data and generating daily reports.
I’ve been playing with Erlang and ChicagoBoss and Amazon Web Services: see my easy example of generation of anagrams at rapid.tips website!
I have updated my R presentation with the latest news (about Oracle, SAP, Tibco, teradata, IBM, ..) … Look at the presentation An opensource environment and language for statistics
Strategico 3.0 has been released!
Strategico is an opensource tool for making forecasts and Long Term Predictions over a (huge) set of time series
Strategico is written with R, the most famous and used Statistical programming language
In the article building system integrations with Apache Camel I’ll show how to create in 10 minutes an integration between two databases (without writing any lines of java or c# code):
- looking for uses in the database MOODLE (mysql) with missing attributes
- for each of that users retreiving the missing attributes from the database UPMS (m$ sql server) and then
- adding the missing attributes to the database MOODLE
Any suggestions and comments are welcome!
MongoDB Selected as the Core Content Management Component of SAP’s Platform-as-a-Service (PaaS) Offering
“MongoDB’s Flexibility and Scalability Will Enable SAP to Scale Its Content Management Service on Its PaaS to Meet Customer Demand While Managing Data From Different Applications” Read the full marketWatch article.
Oracle’s comprehensive big data strategy includes NoSQL, Hadoop, and R analytics
“Oracle’s planned distribution of the open-source R statistical environment will be adapted for use on large-scale data within the Oracle database, rather than on desktops and laptops where analysts typically use the software. Oracle R Enterprise will run existing R applications and it will use the R client directly against data stored in Oracle Database 11g. This will vastly increase scalability, performance, and security, according to Oracle, along with the promise of software support. Oracle will ship the open-source distribution along with Linux. Separate R packages with database-specific extensions for Oracle 11g will be bundled with that database”. Taken from an Informationweek article.