Stupid Things - Vol 1

25th May 2020

Stupid Things Azure CosmosDb Application Insights Cost Saving

As engineers, we make a lot of mistakes. Some people feel ashamed or embarrassed of their mistakes, when it comes to building things, I like to wear them like a badge of honor. Mistakes are human; they humble us and if we are wise enough, we can learn from them. This is why I'm proud of my mistakes and the bad code I've written. When I realise I have made a mistake, I know I have learned something. An old colleague once told me that if you don't look back on old code and think; "what the hell was I thinking??", it might be a bad sign that you haven't learned anything since you wrote it!

In this series I'm going to cover some of the stupid mistakes I've made in the past while building things, what I learned and how I fixed the issues. In this first installment, I'm going to talk about a decision I made over two years ago that I only recently fixed. All told, it ended up costing me personally about €1000 over the last two years.


The TL;DR


I was using Azure Cosmos Db to store short term analytics data for a website with no TTL on the data. The website is one I run for a family member. I absorb all of the cost into my monthly costs of a few different things I manage. The costs of running Cosmos Db has ended up being a fairly large chunk of my monthly costs and has cost me about €1000 over the past two years. I introduced Azure Application Insights to solve the problem.


The Problem


The first iteration of the website in question was originally built on a Windows VM running inside Azure. This was a standard Windows Server 2016 instance onto which I had installed IIS (as the sites web server), MSSQL Server and MongoDb. I had manual deployment scripts which would build and deploy new versions of the website. I would also manually maintain and backup the databases and install security patches. The overhead of these tasks was starting to annoy me. In my constant quest for learning new things, I wanted to try out some stuff.

The old website ran on .Net 4.6. At this time I was starting to get interested in .Net Core and thought this might be a fun learning experience to port the website. At the same time as swapping over to run on .Net Core, I also decided that I would port everything to run on managed Azure services. I decided to run the website as an Azure App Service, the file storage in Azure Blog Storage and the SQL Db as an Azure SQL instance. Now we come to the main topic of this article.

You may have noticed earlier I said the old Windows VM ran an instance of MongoDb. When originally designing this website, there was a requirement to add a dashboard which would render some custom business metrics. At this point in my career I hadn't really been exposed to any systems for gathering and querying business metrics. I didn't want to add metrics to the SQL database and I wanted to use something that could store arbitrary blobs of data. I was also interested in non relational databases. With this set of requirements and some ideas, I decided to implement MongoDb as my data store for storing arbitrary analytics metrics.

Let's pause for a minute and look at some poor decisions already made. Ultimately much of the data I ended up storing for the analytics dashboard were things like page counts. There are good strategies for storing this data in a SQL database. I didn't really need much else in the end, so my idea of being able to store arbitrary blobs of data, wasn't really something I needed to do (but I wanted to anyway ¯\_(ツ)_/¯). You can call curiosity and wanting to build something new and cool - not really practical. A little knowledge is also a dangerous thing, I knew that I could use Mongo to store blobs of data, but I didn't know the first thing about implementing this well. I also didn't know how to do efficient aggregations in Mongo (is this an oxymoron? I don't know; but I don't use Mongo anymore and don't really intend to in the future, so maybe I'll never know).

Back to the point at hand, I had replacements for almost all of the dependencies I previously had running on the old VM. However, I needed an Azure service to replace MongoDb - step in Azure Cosmos Db. Cosmos is an Azure service which provides you with a non relational data store. Wouldn't you know it, it has a Mongo Db compatible API surface! Just my luck, all of my dependencies are now covered, happy days. Armed with my plans to port the website, I started working away on rebuilding things, spun up all of my new Azure services and cut over to my new world. Job done, everything worked like a treat... for the first few months... 😬

I was suspicious after a while that my Azure bill kept increasing month on month. The sites traffic was fairly steady, resource consumption was also static, but my bill was growing. The line item causing this was for Cosmos. Proving again, that a little knowledge is a dangerous thing, I was storing heaps of analytics data in Cosmos and hadn't set a TTL for that data. The analytics data never needed to be more than about a month old, so I had loads of stale old data sitting there driving up the cost of Cosmos. Cosmos is billed by throughput and total storage size. At this point, I also didn't even know that you could set a TTL for data in Cosmos, so I occasionally would manually clear down the Db with some scripts I had written (shout out to my good friend Eoin for pointing me to Cosmos TTL and allowing me to save a little bit of money before implementing my fix).

The nasties continue...My naive implementation of the dashboard and design of my schema, made the dashboard painfully slow to load. This isn't something I am proud of, but no one complained about how slow it was and it worked, so I left it as it was. So now we have two issues, it's dog slow and it cost a lot to run.


The Fix


Recently, I was moving some Azure resources around when I was reminded of my billing issues in Cosmos and the sad state of affairs with the performance of the dashboard. I cursed myself for literally paying for my mistakes for so long and decided to fix the problem.

In the two years since building out the original system, I have learned a lot. The systems I build now have become a lot better than they used to be. I have spoken about it before, but one of my most loved tools these days is Azure Application Insights. As well as monitoring and logging, App Insights allows you to log custom metrics from within your app. I realized that all of the old metrics for the dashboard could be logged using App Insights. The retention policy for App Insights would easily allow me to have access to a months worth of data. The App Insights API would also allow me to query the data much faster than the old aggregations from Cosmos. Finally, the cherry on top, the pricing for App Insights at the data volumes that would be running through it, are almost negligible!!

In what ended up being a pretty quick process, I added App Insights to the website and implemented the custom metrics. I removed the old logging of the metrics in Cosmos and rewrote the dashboard to be powered directly from the App Insights API. I have now turned off Cosmos Db and it no longer haunts my billing statements. Everyone loves a quick and easy solution to a problem!

I am proud to have made this mistake and to have fixed it. I am proud of how much I have learned over the last two years and how this enabled me to fix the problem. I don't plan to ever stop learning, I don't think anyone ever should! The money this mistake cost me is spent and gone, there is no point regretting it now. The experience I gained from it is priceless.


Keep on making mistakes!
- Ian