You are on page 1of 12

Search...

Pssst Server Density v2 is coming soon!

MongoDB vs Cassandra
W r i t t e n b y D a v i d

Over the 2 years weve been using MongoDB in production with our server monitoring tool, Server Density, weve built up signif icant experience and knowledge about how it works. Back in 2009 when I was looking at a replacement f or MySQL I looked at Cassandra but dismissed it because MongoDB had several advantages, and Cassandra was still extremely early stage (even more so than MongoDB at the time). Having been invited to give a comparison at the Cassandra London Meetup, I thought Id revisit it to see how it compares today. Disclaimer : Its important to note that much of what I know about MongoDB has been learnt through using it in production. We dont use Cassandra so any comparisons are going to be f airly superf icial but they will still be relevant because thats the stage most
PDFmyURL.com

people will be in when they are considering which database to pick. As a result of this I will try to avoid making technical comparisons about specif ic f eatures because this will be biased towards my extensive understanding on MongoDB vs a limited understanding of Cassandra. As such, this comparison is split into 2 types of dif f erence usage and operations. Usage: T he actual usage as a developer implementing the application with the database. Operations: Points which are not directly about the core database but its suitability f or production and management on an operational level. T hat said, I will start with several technical comparisons because these are important to understand. Usage Structure MongoDB acts much like a relational database. Its data model consists of a database at the top level, then collections which are like tables in MySQL (f or example) and then documents which are contained within the collection, like rows in MySQL. Each document has a f ield and a value where this is similar to columns and values in MySQL. Fields can be simple key / value e.g. { 'name': 'David Mytton' } but they can also contain other documents e.g. { 'name': { 'first' : David, 'last' : 'Mytton' } }.

In Cassandra documents are known as columns which are really just a single key and value. e.g. { 'key': 'name', 'value':
PDFmyURL.com

'David Mytton' }. T heres also a timestamp f ield which is f or internal replication and consistency. T he value can be a single value but can also contain another column. T hese columns then exist within column f amilies which order data based on a specif ic value in the columns, ref erenced by a key. At the top level there is a keyspace, which is similar to the MongoDB database. A good set of data model diagrams f or Cassandra can be f ound here. Usage Indexes MongoDB indexes work very similar to relational databases. You create single or compound indexes on the collection level and every document inserted into that collection has those f ields indexed. Querying by index is extremely f ast so long as you have all your indexes in memory. Prior to Cassandra 0.7 it was essentially a key/value store so if you want to query by the contents of a key (i.e the value) then you need to create a separate column which ref erences the other columns i.e. you create your own indexes. T his changed in Cassandra 0.7 which allowed secondary indexes on column values, but only through the column f amilies mechanism. Cassandra requires a lot more meta data f or indexes and requires secondary indexes if you want to do range queries. E.g. if we def ine a new column f amily with 1 index:
1 2 3 4 5 6 7 8 9 $ bin/cassandra-cli --host localhost Connected to: "Test Cluster" on localhost/9160 Welcome to cassandra CLI. Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace demo; [default@unknown] use demo; [default@demo] create column family users with comparator=UTF8Type ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, ... {column_name: birth_date, validation_class: LongType, index_type: KEYS}];

then we cannot do range queries:


1 2 [default@demo] get users where state = 'UT' and birth_date > 1970; No indexed columns present in index clause with operator EQ

We must create a secondary index:


1 update column family users with comparator=UTF8Type
PDFmyURL.com

2 3 4

... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, ... {column_name: birth_date, validation_class: LongType, index_type: KEYS}, ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}];

T hen Cassandra can use the state as the primary and f ilter based on the birth_date:
1 get users where state = 'UT' and birth_date > 1970;

(Code samples taken f rom this blog post ). Usage Deployment MongoDB is written in C++ and provided in binary f orm f or Linux, OS X, Windows and several other platf orms. Its extremely easy to install download, extract and run mongod. Cassandra is written in Java and has the overhead that brings, but also the easy ability to integrate into existing Java projects. It takes a little longer to get started but there is a demonstration of setting up a 4 node cluster in less than 2 minutes , which youd struggle to beat with MongoDB. I know plenty of people running MongoDB on Windows but would be interested to hear if thats the same with Cassandra (I suspect its more Linux). Operations/Usage Consistency/Replication

PDFmyURL.com

In MongoDB replication is achieved through replica sets . T his is an enhanced master/slave model where you have a set of nodes where one is the master. Data is replicated to all nodes so that if the master f ails, another member will take over. T here are conf iguration options to determine which nodes have priority and you can set options like sync delay to have nodes lag behind (f or disaster recovery, f or example). Writes in MongoDB are unsaf e by def ault; data isnt written right away by def ault so its possible that a write operation could return success but be lost if the server f ails bef ore the data is f lushed to disk. T his is how Mongo attains high perf ormance. If you need increased durability then you can specif y a saf e write which will guarantee the data is written to disk bef ore returning. Further, you can require that the data also be successf ully written to n replication slaves. MongoDB drivers also support the ability to read f rom slaves. T his can be done on a connection, database, collection or even query level and the drivers handle sending the right queries to the right slaves, but there is no guarantee of consistency (unless you are using the option to write to all slaves bef ore returning). In contrast Cassandra queries go to every node and the most up to date column is returned (based on the timestamp value).

Cassandra has much more advanced support f or replication by being aware of the network topology. T he server can be set to use a specif ic consistency level to ensure that queries are replicated locally, or to remote data centres . T his means you can let Cassandra handle redundancy across nodes where it is aware of which rack and data centre those nodes are on. Cassandra can also monitor nodes and route queries away f rom slow responding nodes.
PDFmyURL.com

T he only disadvantage with Cassandra is that these settings are done on a node level with conf iguration f iles whereas MongoDB allows very granular ad-hoc control down the query level through driver options which can be called in code at run time. Operations Whos behind it?

Both Cassandra (Apache 2.0 license) and MongoDB (AGPL) are open source. You can f reely download the code, write patches and submit them upstream. However, Cassandra is purely an open source project whereas MongoDB is owned by a commercial company, 10gen. T he original authors of MongoDB are core contributors to the code and work f or 10gen (indeed, 10gen was f ounded specif ically to support MongoDB and the CEO and CT O are the original creators).

In contrast, Cassandra was created by 2 engineers f rom Facebook and is incubated by the Apache Foundation. T his is not a disadvantage (indeed, the Apache Web server used by the majority of websites has similar roots and is part of the Apache Foundation) but is important to understand when it comes to support, ongoing development and the community (below). Operations Support Although there are independent consultants f or MongoDB, the best place to get support is f rom 10gen themselves because they wrote the database so they know it best. T heyre able to provide support contracts with phone and e-mail SLAs. In contrast, Cassandra has several companies of f ering commercial support and whilst they do have committers to the core Cassandra code, Id argue its not the same as having access to the entire engineering team and original authors f rom a single contact point, as is the case with MongoDB.
PDFmyURL.com

Operations Ongoing development

Interacting directly with the company that controls the main project, especially f or support purposes, means you can have bug f ixes and changes implemented to the code base. Weve had numerous f ixes committed as a result of problems discovered in our production usage of MongoDB. We pay 10gen f or support now but even bef ore we did they were very responsive to bugs. We also get votes f or f eatures and improvements. In theory this is the same in Cassandra youd want bugs to be f ixed and f eatures implemented but that doesnt have to happen because of the nature of open source projects run by volunteers (becomes more complex when companies are paying developers to work on the project e.g. Eric Evans f rom Rackspace working on Cassandra f ull time ). Of course there is a risk that the company behind the project disappears and all the engineers move on somewhere else but the project is still open source and this is the same with any piece of sof tware you might use. You could also argue there is more direction and f ocus f rom a commercial company working solely on the product (and more engineers dedicated to it) but I dont want to go any f urther with this point as this post isnt about open source vs commercial. T his is just one point to be aware of .
PDFmyURL.com

Operations Documentation

T he of f icial Cassandra documentation is poor. Researching f or this I had to visit several websites and watch videos even to get explanations f or key concepts like indexes. T here is better documentation f rom Datastax but that is still lacking in explaining concepts in any depth. T he MongoDB documentation was good when I f irst looked at it but is even better nowadays. Its actually kept up to date and covers all the f eatures, with examples. Nobody likes writing documentation and it shows with many open source projects; another advantage of having a company behind the project, f orcing developers to write the docs! Incidentally, one of the biggest advantages of the PHP language is the extensive documentation, examples and user submitted notes. When youre using a completely new data store then documentation is important, and is one of the reasons why I chose MongoDB back in 2009. Operations Community

PDFmyURL.com

MongoDB has to be a case study in how to build a community around a product. T here have been almost 40 MongoDB conf erences in the last year, a very active mailing list , and user groups around the world. You know youre well known when a phrase like web scale is associated with your product (as a parody). Again, this is because there is a company behind the product actively promoting it and encouraging and managing these events. Cassandra has had 1 conf erence in that time, and whilst there are user groups (I presented this talk at the London one) its certainly not on the same scale as MongoDB. Does that matter? None of that existed when we chose MongoDB so we learnt everything ourselves. But f or new users today, theres a huge f orum of people who are using MongoDB and are sharing their knowledge f reely and easily accessible. Operations/Usage Drivers

PDFmyURL.com

T he other main reason I chose MongoDB was the driver support. All the key drivers f or MongoDB were available and most importantly, maintained by 10gen themselves. MongoDB has of f icial drivers f or C, C#, C++, Erlang, Javascript, Java, Perl, PHP, Python, Ruby and Scala. All f ully supported. T he Python and PHP drivers were most important to us but we also use the C# driver in our Windows monitoring agent and to have these well maintained just like the core server makes a massive dif f erence. Cassandra only has of f icial Java and Python drivers with a f ew others written by 3rd parties . Ive f ound that Python is usually well catered f or when it comes to libraries that work well. PHP is another story and weve had issues with RabbitMQ and Z eroMQ in the past (specif ically not working well under heavy load; they all work f ine f or playing around). Good PHP libraries are hard to come by. Conclusion T here is no conclusion. T his post isnt about which is best, its about comparing the two. Both have advantages and disadvantages and to truly compare you need to run them both in production under signif icant load f or a long period of time. MongoDB has worked well f or us and has proven itself at scale and to have f lexibility to do things like building a queueing system as well as be the main data store f or our server monitoring service. For me, the operational considerations play a major part in making a decision because these types of databases are so new. I would suspect theyre also important to companies looking to adopt this technology. We dont need a support contract f or Apache, f or example, because its so well proven. Our support contract with 10gen has been well worth the money! Other references Mongodb vs. Cassandra on Stackoverf low Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison

PDFmyURL.com

cassandra MongoDB nosql Enjoy this post? You may also like MongoDB Benchmarks
Share

Tweet

A little more about David Mytton David Mytton is the f ounder of Server Density. He has been programming in PHP and Python f or over 10 years, regularly speaks about MongoDB (including running the London MongoDB User Group), co-f ounded the Open Rights Group and can of ten be f ound cycling in London or drinking tea in Japan. Follow him on Twitter and Google+ .

Top

Favourit e s

Curre nt

Mo ngo DB schema design pitfalls Using Celery fo r queuing requests Ho w we handle o n call schedules Do es everyo ne hate Mo ngo DB?

Subscribe by email

Blog content delivered straight to your inbox.

PDFmyURL.com

Email Address

Subs c ribe

Ab o ut Us | Se rve r and We b sit e Mo nit o ring

PDFmyURL.com

You might also like