Pages

Wednesday, September 26, 2012

Cassandra and Cassandra Data Model Step by Step Guides


This article contains Quick and Easy tour of Cassandra database system and how to install Cassandra on your local windows system. 

What is Apache Cassandra?

Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web.




It doesn’t use SQL and optimized to high scale size of data & transaction handling. Even though Cassandra is implemented with Java language, other language can use the Cassandra as a client. (It supports Ruby,Perl,Python,Scala,PHP etc).

 It doesn’t support complex relationship like Foreign Key. It just provides Key & Value relationship like Java Hashmap. It is very easy to install and use like other Database systems.

Understanding the Cassandra Data Model

The Cassandra data model is a schema-optional, Key space, column-oriented data model. unlike a relational database, you do not need to model all of the columns required by your application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by your application as they are needed without incurring downtime to your application.

Although it is natural to want to compare the Cassandra data model to a relational database, they are really quite different. In a relational database, data is stored in tables and the tables comprising an application are typically related to each other. Data is usually normalized to reduce redundant entries, and tables are joined on common keys to satisfy a given query.

Who Uses Cassandra:

To get a fair idea of the power of Cassandra I think I must share with you some of the Users of Cassandra. Here is the list

Facebook
Digg
Twitter
IBM
Redit
App Scale

Let’s look at Cassandra data model

Data Model

KeySpace

Cassandra is based on google big table data model. It is called “Column DB”. It is totally different from traditional RDBMS. 

In Cassandra, the keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application.


Cassandra Data Model


Columns

The column similar to a table row in the relational database management system. it contains KEY, which acts as a Unique data for the each row of the Column. also contains name and value pair on each row.

Column Family

When comparing Cassandra to a relational database, the column family is similar to a table in that it is a container for columns and rows.

In a relational database, you define tables, which have defined columns. The table defines the column names and their data types, and the client application then supplies rows conforming to that schema: each row contains the same fixed set of columns.

In Cassandra, you define column families. Column families can (and should) define metadata about the columns, but the actual columns that make up a row are determined by the client application. Each row can have a different set of columns.Although column families are very flexible, not entirely schema-less. 

ex: phpcms = { article:”Cassandra” , chapter:”understanding about cassandra”}

phpcms is key for the row, and the row has two columns. Keys of the columns are “emailAddress” and “age”. Each column value is ”Cassandra” and "understanding about cassandra".

Let’s look at Column Family which has a number of different rows.

UserProfile={
phpcms = { article:”Cassandra” , chapter:”understanding about cassandra”}
TerryCho= { emailAddress:”terry.cho@apache.org” , gender:”male”}
Cath= { emailAddress:”cath@apache.org” , age:”20”,gender:”female”,address:”Seoul”}
}

There are two typical column family design patterns in Cassandra; the static and dynamic column families.

Super Column & Super Column Family

Here column value can have a column itself. (Similar to Java Hashtable can have ValueObject class as a ‘Object’ type)

{name:”username” 
value: firstname{name:”firstname”,value=”PHP”}
value: lastname{name:”lastname”,value=”CMS”}
}

As a same way column family also can have column family like this

UserList={ 
KRISH:{
username:{firstname:”Jhom”,lastname:”Yoon”}
address:{city:”Seoul”,postcode:”1234”}
}
JAMES:{
username:{firstname:”Terry”,lastname:”Cho”}
account:{bank:”hana”,accounted:”1234”}
}
}

UserList column family has two rows with key “KRISH” and “JAMES”. Each of the “KRISH” and “JAMES” row  has two column families – “KRISH” row has “username” and “address’ column family, “JAMES” row has “username” and “account” column family.


You have learned basic things about Cassandra and Cassandra Data Model, now lets begin with Installation and Play with keyspaces, Columnfamilies, Columns.

Related Posts
Setting JAVA_HOME Environment Variable
Cassandra Installation and Configuration on windows


No comments:

Post a Comment