At one point after you’ve gained a recognized level of expertise on a certain topic, you probably are going to be asked to conduct some kind of workshop and share what you know with your fellow peers. Don’t miss this chance to step up from being a competent developer to become a force multiplier! Set a date, book a room, invite everyone in your company and then … spend the next couple of months to create training material and content for your workshop!
If you go to the Apache Cassandra homepage, you’ll find a big button “contribute” in the menu on top. This will take you to a tutorial on the wiki on how to get started contributing to Cassandra. However, there are some topics that have been left uncovered and I like to share some personal experiences with, so I decided to create this little walk-through for starting Cassandra developers.
What are prepared statements? Using a text based query language such as CQL makes it convenient to interact with our Cassandra cluster, as statements will work across tools and APIs. Learning CQL will allow you to use the cqlsh, the driver of your favourite programming language, troubleshoot queries in log files, understand table definitions, and in general should be easy to read and write for humans beings. However, CQL needs to be parsed each time before it can be executed, which takes a significant amount of CPU resources.
Why even monitor Cassandra? When I started developing software, there wasn’t a lot of hype about monitoring. We used to setup some kind of process monitoring that would raise alerts in case something crashed and we also had tools collecting basic system metrics such as CPU and disk usage. But it never occurred to me to gather performance data on MySQL or Tomcat internals. Monitoring disk usage and memory consumption for those processes was enough in most of the cases.
SQL developers learning Cassandra will find the concept of primary keys very familiar. Primary keys allow the database to quickly return a single row by it’s key, or a collection of rows by a key range. Most relational databases also support creating additional (non-clustered) indexes to cover arbitrary columns. CREATE INDEX ix_user_familyname ON user (familyname); CREATE INDEX ix_user_fullname ON user (firstname, familyname); Updates to defined indexes happen transparently in the background whenever a new row is added.
What does “repair” mean in context with Cassandra? Repairing data can mean different things things, so lets first clarify what “repair” means in context of Cassandra. Usually you will do repairs to fix corrupted data, e.g. in your file system or block level device. There are many strategies to make this possible, such as write-ahead logs or checksums. In some cases corrupted data can be corrected automatically for you. Parity bit checks in raid controllers will allow fixing corrupted blocks transparently.
In order to understand how batches work under the hood, its helpful to look at the individual stages of the batch execution. The client Batches are supported using CQL3 or modern Cassandra client APIs. In each case you’ll be able to specify a list of statements you want to execute as part of the batch, a consistency level to be used for all statements and an optional timestamp. You’ll be able to batch execute INSERT, DELETE and UPDATE statements.