Apache Cassandra is a distributed storage system for managing structured/unstructured data while providing reliability at a massive scale. Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Cassandra's Column Family data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching.
Cassandra is designed to scale to a very large size across many commodity servers, with no single point of failure. The philosophy behind the design of the storage portion of Cassandra is that it be able to satisfy the requirements of applications that demand storage of large amounts of structured data. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the context of these failures enables the reliability and scalability of the software systems relying on this service.
HBase vs Cassandra:
- HBase is based on BigTable (Google)
- Cassandra is based on DynamoDB (Amazon). Initially developed at Facebook by former Amazon engineers. This is one reason why Cassandra supports multi data center. Rackspace is a big contributor to Cassandra due to multi data center support.
- Cisco's WebEx uses Cassandra to store user feed and activity in near real time.
- Facebook used Cassandra to power Inbox Search, with over 200 nodes deployed. This was abandoned in late 2010 when they built Facebook Messaging platform on HBase.
- IBM has done research in building a scalable email system based on Cassandra
- Netflix uses Cassandra as their back-end database for their streaming services
- Formspring uses Cassandra to count responses, as well as store Social Graph data
- Twitter announced it is planning to use Cassandra because it can be run on large server clusters and is capable of taking in very large amounts of data at a time.Twitter continues to use it but not for Tweets themselves.
- WalmartLabs (previously Kosmix) uses Cassandra with SSD
Cassandra NoSQL Database: Getting Started - http://msdn.microsoft.com/en-us/magazine/jj553519.aspx
Cassandra vs HBase - http://ria101.wordpress.com/
