SQL developers learning Cassandra will find the concept of primary keys very familiar. Primary keys allow the database to quickly return a single row by it’s key, or a collection of rows by a key range. Most relational databases also support creating additional (non-clustered) indexes to cover arbitrary columns.
CREATE INDEX ix_user_familyname ON user (familyname); CREATE INDEX ix_user_fullname ON user (firstname, familyname);
Updates to defined indexes happen transparently in the background whenever a new row is added. Performance becomes worse with a growing number of indexes, but the developer doesn’t have to worry about when and how the update actually happens. Using indexes in your query is easy, too. Most of the time the query optimizer can figure out which index to use to speed up your query most efficiently. So far for relational databases. Let’s see what Cassandra has in store for us.
Starting with Cassandra 0.7 we get the option to define custom indexes complementary to the primary key, called secondary indexes. Although very attractive and flexible at first glance, secondary indexes should be used with care, as they come with a limitations that potentially harm performance badly if used in the wrong ways. Most of all, secondary indexes are local to each node. There’s no way for Cassandra to tell which node contains the queried value, just based on a defined secondary index. Without providing an additional primary key, Cassandra would have to query each node one by one, which is certainly not what we want when creating large scale applications. As hitting the sweat spot for secondary indexes is hard, they should be used with care.
Using global index tables
As we know, doing lookups based on a primary keys is working great in Cassandra. So let’s just accept and embrace the idea of denormalizing data over multiple tables and build our own index tables this way. We simply have to come up with all the variants of our table with the corresponding primary keys for the queries we plan to use.
CREATE TABLE user_by_userid .. PRIMARY KEY(userid) CREATE TABLE user_by_username .. PRIMARY KEY(username) CREATE TABLE user_by_fullname .. PRIMARY KEY((firstname, familyname)) CREATE TABLE user_by_email .. PRIMARY KEY(email)
People not familiar with Cassandra can be reluctant to denormalize the same data across multiple tables. And they are probably right, as it’s quite annoying and error prone to create and maintain all those tables. Even more important, how do we keep all those tables updated and in a consistent state?
Manipulating index tables using batches
Adding data to a table in a relational database can become very slow in case many indexes have to be maintained. Insert statements only successfully complete after all indexes have been updated, which prevents inconsistencies from stale reads. Again, there’s no need for the developer to explicitly take care of updating indexes in a relational database, as this happens automatically in the background.
Although such strict consistency guarantees are not available in Cassandra, we can still make use of “atomic batches”. Using batches, all update statements are combined into a single batch statement handled by Cassandra. You’ll be able to put all updates to the individual index tables into a single batch without having to worry about writing your own retry logic and having your data end up in an inconsistent state in case the client crashes in between.
Ok, now that we’ve figured out to stick with index tables and batches, wouldn’t it be great if we’d not have to repeat ourselves by copying table definitions and doing batch updates for all the index tables for in project? How about having Cassandra to deal with this, so we can focus on other problems. Well, looks Cassandra 3.0 has something waiting for us!
Materialized Views in Cassandra 3.0
Not surprisingly “Materialized Views” is SQL jargon and should be well known to users of major proprietary databases or PostgreSQL. Although the Cassandra implementation details are much different, it shares the same idea of having multiple pre-computed views based on a single table. In case data in the base table gets updated, materialized views are transparently updated as well. In contrast to simple SQL views, materialized views are persistent and don’t have to be re-computed for each query.
So far for the theory. Let’s try them in practice.
Let’s create a rankings table that will store the result of athletes during a championship.
CREATE TABLE ranking ( year INT, place INT, athlete TEXT, PRIMARY KEY(year, place) );
INSERT INTO ranking (year, place, athlete) VALUES (1986, 1, 'Frank'); INSERT INTO ranking (year, place, athlete) VALUES (1986, 2, 'Tom'); INSERT INTO ranking (year, place, athlete) VALUES (1986, 3, 'Mike'); INSERT INTO ranking (year, place, athlete) VALUES (1987, 1, 'John'); INSERT INTO ranking (year, place, athlete) VALUES (1987, 2, 'Frank'); INSERT INTO ranking (year, place, athlete) VALUES (1987, 3, 'Tom'); INSERT INTO ranking (year, place, athlete) VALUES (1988, 1, 'Martin'); INSERT INTO ranking (year, place, athlete) VALUES (1988, 2, 'Steve'); INSERT INTO ranking (year, place, athlete) VALUES (1988, 3, 'Mike'); INSERT INTO ranking (year, place, athlete) VALUES (1989, 1, 'John'); INSERT INTO ranking (year, place, athlete) VALUES (1989, 2, 'Tom'); INSERT INTO ranking (year, place, athlete) VALUES (1989, 3, 'Mike');
Now we can start querying all results for a particular year.
SELECT * FROM ranking WHERE year = 1987; year | place | athlete ------+-------+--------- 1987 | 1 | John 1987 | 2 | Frank 1987 | 3 | Tom
Let’s assume we need to retrieve all results for a certain athlete. In this case we would have to use
athlete as the partitioning key instead of year.
CREATE MATERIALIZED VIEW ranking_by_athlete AS SELECT year, place FROM ranking WHERE athlete IS NOT NULL AND year IS NOT NULL AND place IS NOT NULL PRIMARY KEY(athlete, year, place) WITH CLUSTERING ORDER BY (athlete ASC, year DESC);
SELECT * FROM ranking_by_athlete WHERE athlete = 'Frank'; athlete | year | place ---------+------+------- Frank | 1987 | 2 Frank | 1986 | 1
Pretty cool, uh?
Now with the superpowers of MVs, we can basically re-arrange tables in any way we like, right? Not so fast.
CREATE MATERIALIZED VIEW champion AS SELECT athlete FROM ranking WHERE place = 1 AND athlete IS NOT NULL PRIMARY KEY(athlete);
We like to return all champions from all years in this view. The idea is to filter out all first ranking athletes and use their name as PK so we get a unique result.
Unfortunately this time Cassandra has to object:
SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 3:14 mismatched input '=' expecting K_IS (...athlete FROM ranking WHERE place [=] 1...)">
The issue here is that filtering is not allowed for the base query. This is something that’s on the list of future improvements but isn’t supported yet. Looks like CASSANDRA-9664 will bring filtering support for columns in the base table PK, but is still work in progress.
For now, let’s try using
place as clustering key and filter ‘
where place = 1’ while querying the MV.
CREATE MATERIALIZED VIEW champion AS SELECT athlete, place FROM ranking WHERE place IS NOT NULL AND athlete IS NOT NULL PRIMARY KEY(athlete, place);
Again, this is not going to work.
InvalidRequest: code=2200 [Invalid query] message= "Cannot create Materialized View champion without primary key columns from base ranking (year)"
Cassandra tells us that it won’t create the MV without
year as part of the primary key. That brings us to the next restriction for MV, which is the fact that you can’t collapse rows by changing the PK.
In our example, transforming the PK of the base table (
PRIMARY KEY(year, place)) into the PK of the MV (
PRIMARY KEY(athlete, place)) will not work, as there could be multiple rows for a given place and athlete in the base table, which would have to be merged into a single row in the MV. Cassandra won’t be able to collapse multiple rows into a single row for MVs and therefore will abort with an error message.
A primary key for a materialized view must be at least as restrictive as the corresponding PK of the base table.
One possible solution for our example would be to use
place as partition key and filter criteria:
CREATE MATERIALIZED VIEW champion AS SELECT * FROM ranking WHERE place IS NOT NULL AND athlete IS NOT NULL AND year IS NOT NULL PRIMARY KEY(place, year, athlete);
SELECT athlete FROM champion WHERE place = 1; athlete --------- Frank John Martin John
Unfortunately the result isn’t exactly what we wanted and we’d have to filter all duplicate names in our application code. In this case adding the names of all winners to a separate table would be a cleaner solution.
Creating materialized views for existing tables
So far we’ve seen how to create materialized views and how data inserted into our base table has been reflected there. However, you don’t always have to start with an empty table. Creating MVs on top of existing tables is possible, too. This is done by building the MV in the background until it’s finished, so the view won’t be available immediately. Meanwhile you’re able to get the current status from the system keyspace through
SELECT * FROM system.materialized_views_builds_in_progress
and a list of all finished tables by executing
SELECT * FROM system.built_materialized_views
Remember that Cassandra 3.0 hasn’t seen a RC yet, so certain aspects may still change. But so far I think this is a feature many developers have been waiting for.
- Removes the burden from the developers to maintain separate index tables
- Robust propagation of updates using batches
- Views can be queried just as any regular table
- Need to include all primary key columns of the base table
- The MVs primary key must contain exactly one key column that is not part of the base table’s PK (CASSANDRA-9928)
- All primary key columns need to be filtered against
- Filtering against other values in WHERE clause not possible (CASSANDRA-9664)
- The must be at least on column defined for the MV that is not part of the PK
- MV cannot be created on top of counter tables, other MVs or tables in other keyspaces
- Tables with static columns are not supported
gcGraceSecondsvalue of base table must be at least 3 hours to allow propagation across all MVs