Data development is no small part of your typical cloud application, and the choice of which database type and platform to use is something you usually need to decide early on. Let’s explore the database options available on Amazon Web Services (AWS).
There are two approaches to databases on AWS: Using managed database services, or hosting your own database platform via your own AWS EC2 instances. (If you’re interviewing for an AWS-related job, there’s a great chance that your interviewer will ask questions about this; EC2 plays a huge part in most companies’ AWS-based infrastructure.)
In the world of relational databases (such as MySQL and Oracle), AWS offers managed services that handle the updates of database engines and backups. Really, all you do is fill out a wizard to create your databases, and AWS handles the rest. Managed versions of non-relational databases are available under an umbrella service called RDS. RDS includes multiple options for database engines, specifically MySQL, MariaDB, PostgreSQL, Oracle, SQL Server, or AWS’s own database engine called Aurora.
Aurora is an engine that provides compatibility with either MySQL or PostgreSQL; you choose which compatibility you want when you create the database. Oracle and SQL Server, as you can probably guess, come with an extra licensing fee. AWS RDS includes several database engine options:
For non-relational databases, AWS offers a few different options, including:
- DocumentDB: This is AWS’s answer to a managed form of MongoDB. It’s proprietary code, but compatible with MongoDB clients.
- DynamoDB: This is AWS’s long-running (since 2012) fully-hosted NoSQL engine.
- Neptune: This is AWS’s proprietary graph database. It’s compatible with various open source graph database systems such as Apache Gremlin. Graph databases are useful in social network applications.
- QLDB: This stands for Quantum Ledger Database; it’s a cryptographic ledger.
In every case mentioned above except DynamoDB and QLDB, when you allocate a database, you have to specify a database instance type. That brings me to an important point regarding the cost of managed hosting versus cost of hosting it yourself. These are servers that are allocated similarly to the way an EC2 server is allocated, and they start with images supplied by AWS. That means that, although technically “managed,” there’s still a server running under your own account that you’re charged for by the hour.
If you already have a few EC2 servers running, you may well be able to just install your database engine of choice on a couple different servers, saving money in the process. However, the tradeoff is that you have to manage the databases yourself; you need to know how to manage replication and sharding, along with the usual things like security.
In the cases of the managed services where you choose an database instance type, you’ll want to also carefully compare the prices of the database instances versus the regular EC2 instances.
The database server prices tend to be a bit higher than a comparable EC2 instance, and even higher yet for Oracle and SQL Server. So even if you don’t have extra EC2 space, you still might consider allocating your own EC2 servers and hosting the database engines yourself, skipping the AWS management, in order to save a few bucks. In most of the managed database options, you have to specify a database instance type, which adds to the cost:
Tip: RDS does offer a free tier, which you can access if you’re still running within the AWS free tiers, but the server isn’t very big.
Now a quick note on DynamoDB. AWS creates a tool called DyanamoDB Local that lets you run a local version of DynamoDB. However, this is only for testing! Do not use it for production. As such, it’s not possible to host your own DynamoDB; you only get a hosted option.
To Host or Not to Host?
If I were writing this article a year ago, I would have said that I always have EC2 servers running, and I can always find space on those to install database software myself (usually MongoDB or MySQL, in my case). And since both are free and open source, I don’t really incur any extra charges if my instances are running anyway… so for MySQL and MongoDB, I didn’t need to use DBaaS.
But times change! As with many other organizations, mine is moving towards “serverless.” We’re moving our code away from EC2 instances and into AWS Lambda. There will soon come a time when we might not have an extra EC2 instance running that we can drop an open-source database package on. At the end of the day, there’s still the question: Isn’t it cheaper to allocate non-DB instances, and install the software yourself? Probably.
Conclusion: AWS Offers Options... and Complications
Ultimately, a simple cost analysis is probably in order. If you have extra EC2 space, you’ll likely find it cheaper to just install the database engines yourself. And even if you don’t, you’re still going to want to calculate the different prices. Then factor in your own talent and whether you have the ability and bandwidth to manage multiple instances of MySQL, PostgreSQL, MongoDB, and so on.
For me personally, I’ll probably still stick to self-hosting for the time being. A year from now, maybe not. For any technologist, flexibility is key.