Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.
Apache Cassandra is a distributed storage system for managing structured/unstructured data while providing reliability at a massive scale. Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Cassandra's Column Family data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching.
Cassandra is designed to scale to a very large size across many commodity servers, with no single point of failure. The philosophy behind the design of the storage portion of Cassandra is that it be able to satisfy the requirements of applications that demand storage of large amounts of structured data. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the context of these failures enables the reliability and scalability of the software systems relying on this service.
HBase vs Cassandra:
Use Cassandra to Run Hadoop MapReduce - http://architects.dzone.com/ articles/use-cassandra-run- hadoop
Cassandra vs HBase - http://ria101.wordpress.com/ 2010/02/24/hbase-vs-cassandra- why-we-moved/
Apache Cassandra is a distributed storage system for managing structured/unstructured data while providing reliability at a massive scale. Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Cassandra's Column Family data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching.
Cassandra is designed to scale to a very large size across many commodity servers, with no single point of failure. The philosophy behind the design of the storage portion of Cassandra is that it be able to satisfy the requirements of applications that demand storage of large amounts of structured data. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the context of these failures enables the reliability and scalability of the software systems relying on this service.
HBase vs Cassandra:
- HBase is based on BigTable (Google)
- Cassandra is based on DynamoDB (Amazon). Initially developed at Facebook by former Amazon engineers. This is one reason why Cassandra supports multi data center. Rackspace is a big contributor to Cassandra due to multi data center support.
- Cisco's WebEx uses Cassandra to store user feed and activity in near real time.
- Facebook used Cassandra to power Inbox Search, with over 200 nodes deployed. This was abandoned in late 2010 when they built Facebook Messaging platform on HBase.
- IBM has done research in building a scalable email system based on Cassandra
- Netflix uses Cassandra as their back-end database for their streaming services
- Formspring uses Cassandra to count responses, as well as store Social Graph data
- Twitter announced it is planning to use Cassandra because it can be run on large server clusters and is capable of taking in very large amounts of data at a time.Twitter continues to use it but not for Tweets themselves.
- WalmartLabs (previously Kosmix) uses Cassandra with SSD
ref:
Apache Cassandra - http://cassandra.apache.org/ , http://en.wikipedia.org/ wiki/Apache_Cassandra
Cassandra - https://wiki.intuit.com/ display/ARCH/Cassandra
Cassandra NoSQL Database: Getting Started - http://msdn.microsoft.com/en-us/magazine/jj553519.aspx
Cassandra NoSQL Database: Getting Started - http://msdn.microsoft.com/en-us/magazine/jj553519.aspx
HBase vs Cassandra - http://bigdatanoob.blogspot. com/2012/11/hbase-vs- cassandra.html
Cassandra vs HBase - http://ria101.wordpress.com/
Running Hadoop MapReduce With Cassandra NoSQL - http://allthingshadoop.com/ 2010/04/24/running-hadoop- mapreduce-with-cassandra- nosql/