Tech Kaizen: 1/1/13

Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.

Apache Cassandra is a distributed storage system for managing structured/unstructured data while providing reliability at a massive scale. Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

Cassandra's Column Family data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching.

Cassandra is designed to scale to a very large size across many commodity servers, with no single point of failure. The philosophy behind the design of the storage portion of Cassandra is that it be able to satisfy the requirements of applications that demand storage of large amounts of structured data. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the context of these failures enables the reliability and scalability of the software systems relying on this service.

HBase vs Cassandra:

HBase is based on BigTable (Google)
Cassandra is based on DynamoDB (Amazon). Initially developed at Facebook by former Amazon engineers. This is one reason why Cassandra supports multi data center. Rackspace is a big contributor to Cassandra due to multi data center support.

Prominent users:

Cisco's WebEx uses Cassandra to store user feed and activity in near real time.
Facebook used Cassandra to power Inbox Search, with over 200 nodes deployed. This was abandoned in late 2010 when they built Facebook Messaging platform on HBase.
IBM has done research in building a scalable email system based on Cassandra
Netflix uses Cassandra as their back-end database for their streaming services
Formspring uses Cassandra to count responses, as well as store Social Graph data
Twitter announced it is planning to use Cassandra because it can be run on large server clusters and is capable of taking in very large amounts of data at a time.Twitter continues to use it but not for Tweets themselves.
WalmartLabs (previously Kosmix) uses Cassandra with SSD

ref:

Apache Cassandra - http://cassandra.apache.org/, http://en.wikipedia.org/wiki/Apache_Cassandra

Cassandra - https://wiki.intuit.com/display/ARCH/Cassandra

Cassandra NoSQL Database: Getting Started - http://msdn.microsoft.com/en-us/magazine/jj553519.aspx

HBase vs Cassandra - http://bigdatanoob.blogspot.com/2012/11/hbase-vs-cassandra.html

Use Cassandra to Run Hadoop MapReduce - http://architects.dzone.com/articles/use-cassandra-run-hadoop

Cassandra vs HBase - http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/

Running Hadoop MapReduce With Cassandra NoSQL - http://allthingshadoop.com/2010/04/24/running-hadoop-mapreduce-with-cassandra-nosql/

Tech Kaizen

Search this Blog:

Apache Cassandra: An open source distributed database management system

The Verge - YOUTUBE

Google - YOUTUBE

Microsoft - YOUTUBE

MIT OpenCourseWare - YOUTUBE

FREE CODE CAMP - YOUTUBE

NEET CODE - YOUTUBE

GAURAV SEN INTERVIEWS - YOUTUBE

Y Combinator Discussions

SUCCESS IN TECH INTERVIEWS - YOUTUBE

IGotAnOffer: Engineering YOUTUBE

Tanay Pratap YOUTUBE

Ashish Pratap Singh YOUTUBE

Questpond YOUTUBE

Kantan Coding YOUTUBE

CYBER SECURITY - YOUTUBE

CYBER SECURITY FUNDAMENTALS PROF MESSER - YOUTUBE

DEEPLEARNING AI - YOUTUBE

STANFORD UNIVERSITY - YOUTUBE

NPTEL IISC BANGALORE - YOUTUBE

NPTEL IIT MADRAS - YOUTUBE

NPTEL HYDERABAD - YOUTUBE

MIT News

MIT News - Artificial intelligence

The Berkeley Artificial Intelligence Research Blog

Microsoft Research

MachineLearningMastery.com

Harward Business Review(HBR)

Wharton Magazine

Monthly Blog Archives

Blog Archives Categories

Popular Posts

My Other Blogs

Total Pageviews

who am i

Google Developers Blog

Blogs@Google

Berklee Blogs » Technology

Martin Fowler's Bliki

TED Blog

TEDTalks (video)

Psychology Today Blogs

Aryaka Insights

The Pragmatic Engineer

Stanford Online

MIT Corporate Relations

AI at Wharton

OpenAI

AI Workshop

Hugging Face - Blog

BYTE BYTE GO - YOUTBUE

Google Cloud Tech

3Blue1Brown

Bloomberg Originals

Dwarkesh Patel Youtube Channel

Reid Hoffman

Aswath Damodaran