Airports at Scale
Assessing NoSQL Databases
In a previous blog post, I explained the assessment we were using at Lunatech, detailing a solution using a common tech stack, Java with Spring Boot and PostgreSQL. An idea mentioned was that it would’ve been possible to also develop a solution using a NoSQL database, and since we did receive such solutions, this got me thinking: could every NoSQL category easily solve each requirement? What would be the advantages of using them over SQL? More importantly, do they fit in the problem and domain?
In this series of posts I want to explore these questions, where each post focuses on a specific member of each NoSQL category.
You can take a look at the assessment in the previous post, but as a quick refresher, we had to meet four requirements:
Query Option will ask the user for the country name or code and print the airports & runways at each airport. Bonus point, allow fuzzy search.
Report: 10 countries with highest number of airports (with count) and countries with lowest number of airports.
Report: Type of runways (as indicated in "surface" column) per country
Bonus Report: Print the top 10 most common runway identifications (indicated in "le_ident" column)
NoSQL came out as an effort to solve difficulties imposed by traditional relational databases, namely the need to scale to a magnitude not traditionally achieved. Unlike RDMS, NoSQL is actually an umbrella term that groups different implementations such as: document, key-value, column families, graph and so on. Where each offers its own set of features and solve a specific set of problems, all with their own limitations.
As developers, coming in contact with each category creates the question: How do we model our data? (ie. How can I store and read data back?). This usually poses a problem, since we are so used to our normal forms, which don’t work on these databases, thus we need to learn new tricks to answer that question. Borrowing from NoSQL Distilled, we no longer deal with tables, rows and relationships, but with aggregates. An aggregate is a unit of data we desire to manipulate, while maintaining atomicity and consistency within the limits of that aggregate (this will become evident when we start developing this series). To be more concrete, an example of aggregate would be a single document in a database such as MongoDB. NoSQL databases put additional emphasis on how the data will be queried, whereas with normal forms we just try to make the data as DRY as possible.
The most common recommendation is to not treat each NoSQL implementation as a direct replacement for RDMS, but to first understand what problem that database is trying to solve, and then see if that solution fits your problem (and not the other way around). For this series I would like to disregard that recommendation in favor of feeling out when a NoSQL instance is not helping us achieve our goal. Only by trying to use a tool in a way it was never intended to be used can we understand where we shouldn’t use it. I’ll try to keep the use of every instance as pure to its original model as possible—over the years these databases have added features such as full text search, geospatial search and so on.
This series will mainly explore how to adapt and implement ideas that were originally planned to be done with SQL, with NoSQL technologies. Focusing on modelling data, and leaving scaling for last. The plan is to take gradual steps. You can check the progress by following the updates on this GitHub repository. We’ll take a look at each major NoSQL category, picking the following implementations:
Wide Column: Cassandra
See you in the next blog post where we’ll see how to use Redis!