Follow my blog with Bloglovin

Tuesday, January 7, 2014

Cassandra Notes : CQL3 Development With Datastax Java Driver [Session #2]

This is my take away from Datastax Cassandra Virtual Training course.This course is divided into seven sessions, which constitutes to 135 modules.

Session #2:
DevCenter is a visual query tool to interact with Cassandra. This allows to connect to Cassandra and explore schema and execute query, see results. Cqlsh commands like DESCRIBE or COPY are not allowed. 

Schema Design in Cassandra:
Keyspace is top most entity in Cassandra, analogous to database in RDBMS. It group of tables and controls replication strategy (weather to replicate within one data center or across multiple data center, and how many copies) and other configuration like compression, compaction, partitioner etc. common for all tables.
Tables (column families) are columns to store data and data is organized in row (store data for columns as table name, row key, time stamp). Rows are replicated across multiple nodes but one row must fit in one node (won't fit if there are billions of columns).

CQL Reference:
CREATE KEYSPACE keyspace1 WITH REPLICATION = {
           'class':'SimpleStrategy',
           'replication_factor':3
};
Keyspace name is converted to lower case, to keep case sensitive enclose in quotes. SimpleStategy replication replicate in local data center, for multiple data center replication use NetworkTopologyStrategy will replicate across multiple data center. replication_factor is for how many replicas will be created. In multiple data center replication total number of replicas will be replication_factor X number of data centers.
Murmur3 is the default partitioner which keep data evenly distributed, other one preserves order.

USE keyspace1;
This command is used to set default keyspace, other wise tables can be prefixed with keyspace name and '.'. Prefixed keyspace will take precedence.

To create table:
CREATE TABLE table1 {
   id int PRIMARY KEY,
  name text

Insert data:
INSERT INTO table1('id', 'name') VALUES (1,'Lalit Jha');
Enclose text data type in quotes.

For bulk loading from CSV file:
COPY table1 ('id', 'name') FROM 'filename.csv' WITH DELIMITER '|' AND HEADER=true;
Replace FROM with TO to export data to csv file.

Cassandra allows atomic operation using BATCH blocks but isolation is not supported, so partial modifications will be visible to queries, which may be rolled back. It can contain only write operation (insert/update/delete). It costs around 30% performance. Syntax:
BIGIN BATCH;
//statements
APPLY BATCH;

Datastax Java Driver allows interaction with Cassandra programatically using JDBC like API.


Most important topic is Cassandra is Data Modeling includes primary key, clustering key, secondary indexes, partitioning. Separate post will cover them.

Popular Posts