Erlang Bot (Ebot) is an opensource web crawler written on top of Erlang, a NOSQL database (Apache CouchDB or Riak), RabbitMQ, Webmachine (Mochiweb), RRDTOOL, .. Using a NOSQL instead of a Relational Database, Ebot can grow easily and cheaply… Ebot is a solid and highly scalable, distribuited and customizable web crawler.

The Ebot crawler project is hosted at http://github.com/matteoredaelli/ebot
Thanks to Ebot crawler I’ve been improving my knowledge about Erlang, the AMQP protocol (RabbitMQ) and NOSQL databases (Apache CouchDB and Riak) with the distribuited map/reduce queries
Below there is an example of a url document generated by the ebot crawler (with apache couchdb backend)

Below you find a sample image of Statistics generated by ebot web crawler using RRDTOOL

EBot Users
you are welcome
LICENSE
GNU GENERAL PUBLIC LICENSE v3





ebot need some special library or dependency?
know any site that uses ebot online now?
benchmark made with ebot?
Bye, and thank you.
Greetings from Chile.
For dependencies, read the wiki page at http://wiki.github.com/matteoredaelli/ebot/installation
no benchmarks: because of its (distribuited and scalable) design, ebot could be slower then other competitors when running on a single machine. But it is scalable and reliable
Hi, mmmm… some example using ebot?
Greeting!
I just want to start develop a crawler using erlang and I find your work,
and your peoject seems like almost finished…
I will try to dive into your code and you need any volunteers ^_^ ?
Hello, you are welcome! Let me know if this project can fit you and if you want contribute to it, I’ll give you write permissions to the github repository…
Regards
Matteo
Very interesting, checking it out now. I see on the database page of your wiki that you mention CouchDB is not distributed and can only be on one file system, have you looked at BigCouch? Seems like a good fit for these, even running multiple BigCouch instances on the same box so each can use a different filesystem.
If you don’t mind me asking, what are you currently using this for and what plans do you have for this?
Pingback: NoSQL Daily – Mon Sep 20 › PHP App Engine
Hi Matteo,
We are heavily using home grown java based crawler. It is always my dream to using Erlang based crawler. I hope you already know the nutch crawler from apache . But as you know java needs more coding and needs more care .. So can we have similar features in erlang based crawler (like limits per domain limits per job limited time schedule using outside anlyzer(lucene) etc. We need to spare time for erlang. I would like to know your personal opinion about the erlang and crawling. We have great tools on java for indexing and crawling. Additionally think about the millions of url and checking their status whether will be fetch or not. currently we are doing this on map-reduce cluster. But when we needs scalibity, reliability and clean code I think erlang could be the unique choice. If you have time to discuss please contact me from my email.
Is this web site/blog dead ? No body replied my post
((
No! I st you an email…
Hey, really glad to be here!!
I just wanted to learn Erlang by some practice and thus found here via google. I see pictures which describe the main process. I just wonder if u can show us some more details. And I may keep up with u.
Thanks you very much!!!
Hi Matteo – thanks for sharing this! Would you be able to email over some details for getting started running this on a Mac? (with Riak)
Thank you very much!
Been playing with it for a bit. nice. Got a question: is there any dependency on erl version? upgraded to R14 recently, keep getting test:oss(). -> ** exception exit: {noproc,{gen_server,call,[ebot_crawler,{start_workers}]}}… slightly lost
figured it out. if one of the sub procs didn’t start correctly you end up having the main process terminated (propogated failure?). Adding some verbosity.
If you want to contribute and improve ebot, you are welcome.. I could add you as commiter at ebot github repositoty…
Pingback: Erlang: Links, News and Resources (1) « Angel “Java” Lopez on Blog
I truly enjoyed your blog post really helpful as well as intriguing details as well as figures you’ve got mentioned in your blog site perhaps the surveys are extremely fruitful and also attractive improving the knowledge with regards to the matter. I seriously liked your website extremely informative and exciting details and numbers you could have.