- Cassandra is a wide-column store database.
- It stores data in rows and columns, where each row represents a record, and columns can vary from row to row.
- Cassandra's masterless architecture and low latency will withstand an entire data center outage with no data loss.
- You can choose between synchronous or asynchronous replication for each update.
- Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair.
- The audit logging feature for operators tracks the DML, DDL, and DCL activity with minimal impact on typical workload performance.
- The fqltool allows the capture and replay of production workloads for analysis.
- Cassandra uses CQL (Cassandra Query Language), which is similar to SQL but tailored for Cassandra's column-family data model.
- CQL is designed to be more familiar to those coming from the relational database world.
- Cassandra uses a partition key to distribute data across nodes. Each partition key determines how data is distributed and replicated within the cluster.
- MongoDB's flexibility and rich query language suit applications with dynamic data needs well.
- Cassandra's distributed architecture and focus on scalability make it a good fit for applications with demanding availability and scalability requirements.
Data Definition | Apache Cassandra Documentation
Client drivers | Apache Cassandra Documentation
History:
- Cassandra was developed at Facebook for inbox search.
- It was open-sourced by Facebook in July 2008.
- Cassandra was accepted into Apache Incubator in March 2009.
- It was made an Apache top-level project since February 2010.
Cassandra makes the following guarantees:
High Scalability
- Cassandra is a highly scalable storage system in which nodes may be added/removed as needed.
- Using a gossip-based protocol, a unified and consistent membership list is kept at each node.
High Availability
- Cassandra guarantees high availability of data by implementing a fault-tolerant storage system.
- Failure of a node is detected using a gossip-based protocol.
Durability
- Cassandra guarantees data durability by using replicas.
- Replicas are multiple copies of data stored on different nodes in a cluster.
- In a multi-datacenter environment, the replicas may be stored in different datacenters.
- If one replica is lost due to unrecoverable node/datacenter failure, the data is not completely lost, as replicas are still available.
Eventual Consistency