Fork me on GitHub

basho

Riak Compared to MongoDB

This is intended to be a brief, objective and technical comparison of Riak and MongoDB. (This comparison is current as of the 1.6 release of MongoDB.)

High Level Differences

Riak and MongoDB satisfy some of the same use-cases :

  • Semi-structured data modeled as “documents”
  • Storage of non-document data in the database
  • High write-availability

However, they approach solving these problems quite differently. Fundamentally, Riak is a distributed system while MongoDB is a single-system database (with support for replication and sharding). MongoDB specifies the internal format of documents while Riak is content-agnostic. MongoDB uses an ancillary specification called “GridFS” for storing non-document data, while Riak stores non-document or binary data in the same manner as structured data. MongoDB achieves high write-availability through performant in-place writes and “upserts”, while Riak achieves availability through quorum writes, tolerance of node failure and hinted handoff.

http://www.mongodb.org/display/DOCS/Home
http://blog.mongodb.org/post/248614779/fast-updates-with-mongodb-update-in-place
http://www.mongodb.org/display/DOCS/Updating#Updating-Update

Replication and Scaling Out

Riak uses “consistent hashing” to replicate data. This functionality is deeply and tightly integrated with the Riak core, and enables Riak to automatically replicate and rebalance data according to cluster size. Riak has no privileged nodes (no concept of master), making the system resilient to failure. Thus, when you need to grow your cluster for greater throughput or fault-tolerance, you simply add nodes.

As of the 1.6 release, MongoDB offers several options for replication:

1. Master/Slave

http://www.mongodb.org/display/DOCS/Master+Slave

2. Replica sets

From the Mongo Docs: “Replica Sets are MongoDB’s new method for replication. They are an elaboration on the existing master/slave replication, adding automatic failover and automatic recovery of member nodes.”

http://www.mongodb.org/display/DOCS/Replica+Sets

While Master/Slave replication is still supported, Replica sets adds auto failover so it’s expected that most users will migrate to this configuration. However, in certain use cases traditional M/S is more appropriate and will still be supported.

To enable horizontal scaling, Mongo uses a process known as “sharding,” which involves designating certain server to hold certain chunks of the data as the data set grows.

New in the 1.6 release is “auto-sharding.”

http://www.mongodb.org/display/DOCS/Sharding
http://www.mongodb.org/display/DOCS/Sharding+Introduction
http://en.wikipedia.org/wiki/Sharding

Scaling Back In

Riak allows you to remove a node (or nodes) from the cluster when needed. When you do this, RIak will hand the partitions that the removed node was managing and hand them back to the machines still in the cluster, giving each remaining node a few more partitions to worry about. In other words, the load remains evenly distributed.

MongoDB has support for removing shards from your database.

http://www.mongodb.org/display/DOCS/Configuring+Sharding#ConfiguringSharding-Removingashard

Performance

Riak has pluggable storage engines, with the recommended being Bitcask so you can tune levels of performance and durability based on your needs. (One thing that is said around the office: “Eventual consistency is no excuse for losing data.”) Durability and performance can also be tuned at the request level by specifying the number of nodes that need to agree on reads and writes.

Mongo is more performant because it uses memory-mapped files. The tradeoff is that it fsyncs (flushes in-memory data to disk) only every 60 seconds by default, so you run the risk of losing data if your MongoDB server goes down. The solution for increasing durability in MongoDB is to replicate.

http://www.mongodb.org/display/DOCS/Durability+and+Repair
http://blog.mongodb.org/post/381927266/what-about-durability

Data Model

Riak is content-type agnostic, and allows you to store semi-structured documents or objects of varying sizes. Riak lets you specify relationships between objects via links.

Data Storage in Riak

Mongo’s data format is BSON (binary equivalent to JSON). Mongo stores this data as documents (self-contained records with no intrinsic relationships). Documents in MongoDB may store any of the defined BSON types, including raw binary data (upon which the GridFS feature is built).

http://www.mongodb.org/display/DOCS/BSON
_http://www.mongodb.org/display/DOCS/Data+Types+and+Conventions__http://www.mongodb.org/display/DOCS/GridFS_

Queries and Distributed Operations

Riak’s query interface is entirely key-value, link-walking, or MapReduce. Riak has no concept of secondary indexes because it does not know the internal structure of the stored data.

http://wiki.basho.com/MapReduce.html

MongoDB has a query interface that has some similarities to relational databases, including secondary indexes that can be derived from the stored documents. MongoDB also has a facility for performing MapReduce queries.

http://www.mongodb.org/display/DOCS/Indexes
http://www.mongodb.org/display/DOCS/Querying
http://www.mongodb.org/display/DOCS/MapReduce

Deflecting Conflicting Writes

Riak uses vector-clocks to track the ancestry of each object so your application can decide which representation to pick when two people have updated an object at the same time.

Vector Clocks

Mongo uses a “last one wins” technique for conflict resolution.

http://www.mongodb.org/display/DOCS/Atomic+Operations

API

Riak offers two primary interfaces to non-Erlang clients:

1. HTTP
2. Protocol Buffers

MongoDB uses a custom protocol with BSON as the interchange format, and 10gen supports clients in the most popular programming languages.

http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol