When is MongoDB the Right Tool for the Job? 08 April 2014
Recently, I’ve been mentoring a few full-stack developers through a process that bears some similarity to the Meerkat Method. Since I also intend to publish a series of articles about the whole journey, I have to be careful to keep things factual and honest.
This puts me in a quandary, because my recent stint on the job market has shown that just about everybody is using MongoDB, and I’ve just never been in any situation that I have needed to use it.
I also can’t foresee any situation where there is a solid technical reason for choosing MongoDB over it’s competitors either, and the last thing I want to do is lead people astray or foist my preconceptions onto them.
Why MongoDB?
I’m looking for the cases where MongoDB was the right technical decision to make. I’m finding it really difficult to cut through the FUD to find any really impartial information about it.
This is something I have been asking other developers privately, and in some cases such as this AskHN and again a few days later, publicly. I am not trolling here, I genuinely want to know.
As some of my students are hoping to use these skills I share with them on the job market, I am either going to have to buckle down and learn it, or have a concrete reason not to.
I want to know what your data looks like. What are the data access patterns, how much of it is there, how much trouble did you have building it, scaling it, but especially maintaining it?
Show me your algorithm and I will remain puzzled,
but show me your data structure and I will be enlightened.
I want to know the reasons you had for choosing Mongo. What databases were you using originally, what other databases did you evaluate, and why did you decide against them?
Would you use MongoDB again? If you had to do it all over again, would you make the same choice, or would you use one of the other options such as PostgreSQL, Redis, CouchDB, ElasticSearch, or any of the dozens of competitors.
I’m not looking for more MongoDB horror stories. There are plenty of those out there already. All that noise makes it really difficult to evaluate MongoDB on a technical level.
My current perception of MongoDB.
I’m not saying that I am free of biases myself. Everybody has them, and I’m no different. I just don’t feel comfortable making any kind of blanket statement about a technology unless I’ve done my due diligence. These are the things about it that I hope to prove or disprove.
The only benefit I can see in MongoDB is that it’s popular. The only reason I can see for it being popular, is that everybody uses it, and so on.
I worry about vendor lock-in. I spent years working with MySQL when I thought PostgreSQL was superior. The difference is that they were so closely related that the majority of database code I wrote was incredibly portable between them.
I have deep reservations about any database that is as prone to catastrophic failure. I’m not really sure that the possibility to configure it correctly in any way makes up for the fact that it just doesn’t come with sane defaults.
A lot of people are saying that MongoDB is very easy, if you are only familiar with relational databases. I’m not convinced that not being relational is a good enough reason to choose it if you are really going for something familiar.
It really just seems too complex to me. MongoDB just tries to do so much, that it doesn’t surprise me that there are all of these nightmares with it. Redis I get, CouchDB I get, ElasticSearch I get. MongoDB? Not so much.
Why is one of the major pluses that it has a really popular ORM for it? I don’t like ORM’s at the best of times, but using one for a non-relational database just feels really strange to me. It’s NoSQL so the first thing you do is declaratively define your schema?
I don’t mind being proven wrong
I actually kind of enjoy it, because I find that I can learn so much from those experiences. I will update this post, or even write a new one if I manage to learn anything from this process.
For instance, the one use case I have found so far that actually makes a lot of sense.
Geolocation, specifically GeoJSON … I’m not doing trivial ‘find my 3 places near [y,x]’ queries, but am traversing a pseudo-network of routes to calculate directions. Neo4j also wouldn’t have worked in my case.
I believe it because PostGIS/spatialite were so incredibly frustrating to move GeoJSON around in, and GeoCouch wasn’t up to scratch either. I don’t know if it would still be the right tool now, but I am willing to admit it was the right decision back then.
How to reach me
I don’t have comments on this blog, but I will read and respond to anything posted on:
- The HackerNews discussion.
- The /r/programming thread.
- You could even leave an issue on GitHub for me.
Unless otherwise convinced
I will probably teach them the tools that I know the best. Those would be CouchDB as a data store, ElasticSearch as a search index and sometimes Redis to store sessions and other ephemeral data.
Using CouchDB as a separate “source-of-truth” and doing queries and reads from ElasticSearch makes a lot of sense to me because those things often scale differently. It is also in-line with some of the data storage lessons that LinkedIn have shared recently.
Teaching CouchDB+ElasticSearch also means I can focus more on the principles of REST, and teach the concepts of node streams through the use of proxying connections around.
This in turn would give me the opportunity to teach them about realtime systems such as Websockets or WebRTC instead of mucking around in MongoDB specifics.
That’s probably the best course of action anyway, but I want to be able to take it with a clear conscience.