Stupid Things - Vol 2
28th Oct 2020Stupid Things Security Bots reCaptcha
In this second installment of stupid things, I'm going to talk about how a silly oversight made years ago lead to Wodly's sign up flow being hijacked by a bot, effectively turning the system into a spam engine!
I had a sign up page that didn't have any bot protection. This wasn't generally a problem until someone started spamming the page creating tonnes of new fake users. The big issues was that I couldn't know if the bot was using valid emails for actual people. The system sends an account confirmation email so potentially my system was being used to spam real people for a website they never signed up for. I added reCAPTCHA to solve the problem.
While running Wodly over the past two years, I have noticed lots of users signing up but never really "activating" on the platform. The most basic activation metrics tracked in Wodly is if the user has confirmed their email address or not. The flow for this is pretty simple; the user fills out the sign up form, they get a message to say they need to check their emails for a confirmation email, they click the link in the email and they are in! Simple as it is, I did always notice lots of people sign up and never confirm their email.
I never put much through into this, I figured that maybe their email provider blocked the email and sent it to spam. Or maybe they saw a confirmation email and just weren't bothered to finish. Or maybe they were bots and they weren't able to click a link. In any case, there was usually only a small trickle of these that built up over time. It never was enough of an issue for me to bother setting up any bot protection in the sign up flow.
This was until a bot found it's merry way into the sign up flow for Wodly. I first noticed an unusual bump in traffic originating from France (maybe a French teenager bored during lockdown, maybe a server running in France with the bot on it, maybe a fake IP address ¯\_(ツ)_/¯). Then over the next few days, the activation metrics sky rocketed in terms of unconfirmed emails.
Based on the previous trend in activation metrics I figured something was strange here. So I decided to randomly sample new sign ups from the past few days. Very quickly I noticed some odd things in the new user data. People with both first and last name being set up with just the letter "a", nonsense emails etc. It was fairly easy to see these were being generated by a program that was simply meeting the minimum standards for the form validations in the UI. This wasn't much of a problem, without the fake users confirming their email addresses, they couldn't log in and do any spamming on the platform itself. However, in my analysis I noticed that some of the email addresses appeared to by valid and very possibly belonged to real people.
Now we definitely have a problem. Aside from my activation metrics not meaning much, this bot had effectively turned Wodly into a spam engine. Now, there wasn't any threat of the bot being able to spam users with anything other than the Wodly confirm email address message. However, if enough real people marked this as spam, that could mean a bad time for the platform in the future. A blacklisted email domain will be blocked by almost all email providers. This means people would not be able to sign up for Wodly as they would never receive the confirm email address message.
I knew the bot had been actively creating users on the system for around a day or so. I also knew that I would need to find a solid block of a few hours to implement, test and release some means of preventing this attack. With the time available to me, I decided that first step was going to need to be mitigation. I needed to stop the attack in it's tracks for a block of time to allow me to get the fix out. With sign up volume on Wodly generally relatively low, I decided to take the drastic but ultimately right action to disable the sign up flow. I don't know if this impacted anyone trying to sign up at the time aside from the bot (sorry if it did 😬), but it had to be done to stop the spam until I got the fix out.
With the mitigation in place, I could breathe for a second and now turn my attention to the best way to prevent this issue in the future. My first thought was a simple CAPTCHA (which apparently stands for "Completely Automated Public Turing test to tell Computers and Humans Apart" - TIL). This would hopefully block bots from being able to repeatedly spam the sign up page. Each attempt by a bot would fail the validation and the action would result in a NOOP in the system.
I figured I could bash this out in a few hours. This is a pretty standard and well understood mechanism for protecting a website, but I ran into a few bits of trouble. Firstly, lets talk about a silly assumption I made around what exactly I was going to implement. I thought I could find a simple and basic CAPTCHA that would work out of the box in dotnet core. I was definitely a little over optimistic that I could just install a package and it would all just work. I found some packages that appeared to do what I wanted but when I started to implement them I found issues. They wouldn't work with my version of dotnet core or wouldn't easily work with the architecture of the app... great. In desperation, I turned to some hand rolled versions that I found online and tried to re-implement them in dotnet core. That way lay madness, so, I stopped.
I was starting to tie myself up in knots trying to do this on the cheap and trying to find a bare minimum "that will do the job solution". Ultimately, I realised, I was trying to do the wrong thing. Why was I trying to re-implement something in a basic way? There is an industry standard for this, why am I not just using that and doing it properly? After all, I actually need this thing to be good and stop future bot attacks. With my sanity now restored, I went and looked at reCAPTCHA. This is a great free tool offered by Google to allow you to implement a simple and highly effective CAPTCHA.
To use reCAPTCHA you simply have to sign up with the domain you wish to protect and Google give you a set of API keys to access their service. The implementation is easy enough. On your site you need to run a JS snippet provided by Google on the page you wish to secure. Any POST requests from this page to your backend will now contain a token generated by the JS API (some config required). In your backend, you now simply pull the token passed from the front end and validate it against the reCAPTCHA API.
Once I actually dug into the set up steps provided by Google this was really quick and easy to get working. Had I just looked at this immediately, it would have vastly reduced the headaches I had trying to get packages to work or hack my way around this issue.
After testing and releasing the fix, I turned the sign up flow back on and the spam bot seems to be gone for good. Interestingly, I have also noticed a dramatic improvement in my activation metrics. The slow building trickle I mentioned at the start of this post of unconfirmed emails has more or less dropped to 0. I am now only seeing successful sign ups and email confirmations at more or less the same rate as previous with the non activated users simply not being an issues anymore. I guess they were all bots in the end!
I made some good learnings from this. However, the funny thing is, I already knew these things. I knew about reCAPTCHA from the start. I know about how to find the best tool for the job and how not to waste time looking for the wrong thing. In my professional life, I wouldn't make these mistakes.
It's funny how working on your own can give you tunnel vision. No one was paying me to fix this issue and I just managed to forget how to find real solutions to problems for a minute. This experience was a good lesson for me on working like a professional on a side project. The other lesson was, DIRTFT (Do It Right The First Time). I should have set this up when I first built the platform, I didn't and paid the price. Also the stress, lets not forget that! No one wants their side project taking years off their life from stress.
With all that said, we are human, we make mistakes and as long as we learn from them, mistakes are good!
Keep on making mistakes!