Internal Developer Platforms (IDPs) are Anti-Devops

(Restored: original date: September 12, 2021)

Note: The clickbait title should make it obvious but this article is very much an opinion piece. Comments are enabled via fediverse (eg. mastodon) federation – respond to the post at @martyn@musings.martyn.berlin and tag in @martyn@toot.martyn.berlin.

Context: I am currently leading the SRE department of a scale-up phase company. 20 years ago I was a Multimedia Developer who built a server so we could share files and the ISDN line – I've been doing this a LONG time. I keep getting asked if we should use Spotify's Backstage IDP or Humanitec or similar. Here's a long-form version of why I say “no”.

Final Note: “Azure DevOps” is an evil name, polluting the already confusing nature of DevOps with their marketing to try and maintain relevance.

Point 1 – DevOps isn't just about automating

A common misunderstanding on “what is devops?” is “well it's just automating stuff”. Sorry to burst that bubble but Sysadmins were writing perl many many years ago and saying no to devs. That did not make them DevOps, nor does automating your CI to release to the VM you point-and-clicked to create.

Point 2 – DevOps is not just tooling

Again, M$ can sod off, but using Puppet, Chef, Ansible, even Terraform and Kubernetes (my top two goto tools) does not make what you're doing DevOps. If your dev team asks your SRE team to create a queue and your SRE team creates the queue using terraform, guess what, your SRE team is a SysAdmin (Ops) team.

A lot of people are forgetting why we as an industry made DevOps one thing – to collaborate. And let's follow with more whys (3 whys? 5 whys?) – Why do we want to collaborate? To make things better, more stable (ops) and faster (dev). To reduce friction. For devs to understand how our application is running in production so they can debug it.

Point 3 – Developers who understand how their application runs in production are better developers

This is surprisingly not really written down much on the internet but is a core part of the DevOps philosophy that is being eroded by IDPs. A cookie-cutter approach to development and deployment is the first part. I'm all for standardisation and speed (remember that from the point about tooling), but not at the expense of understanding. If you're deploying in Kubernetes for example, and a new developer can write a new service* and get production traffic to it without ever knowing what a Kubernetes deployment or service is, you're doing it wrong! How can that developer ever support their application in production? Sure, they might have DataDog and alerts set up for them, but what happens when an alert goes off? They need their SRE to come and help them, possibly at 3am. So why have devs on call, why not just have a sysadmin team again?

Point 4 – you might already have an IDP!

Have your SRE teams made a lot of nice CI for terraforming resources (queues, databases, etc.) and do you already have a CI for “infrastructure components” inside Kubernetes (an ingress controller, cert-manager, monitoring stack etc.)? Maybe you already have helm charts for your “micro”services and they have a good CI with testing environments and automated testing before promotion to production? Maybe even a nice rollback mechanism? Well done, that sounds a lot like an IDP!

Does that mean you should go further and make the developers not even have to see the helm charts? Or replace that entire system with an off-the-shelf IDP? IMO: no. That is how you make your developers NOT understand how their application runs in production. Code monkey like Fritos?**

Point 5 – but Martyn, the IDP provides a nice centralised place for self-documenting APIs

Sure, that’s something that is nice. It’s also perfectly possible to use OpenAPI (formally Swagger) and have that kind of documentation built and published in a central place without ripping out your entire infrastructure to do so! Add tags to your monitoring and you could even hotlink from a logline to your docs! Magic IDP from Wizards inc. isn’t going to replace ALL your documentation anyway, so you’re going to have documentation outside said IDP.

Point 6 – an IDP reduces developer onboarding, they can start coding straight away!

See my point 4 about running in production, but I’d actually refute this anyway. A new developer can either learn an abstraction layer that is industry standard (Kubernetes is this, don’t fight me) or an abstraction layer that is specific to their company. What are the chances that any new developer knows the IDP that your company picks vs them knowing a bit about the industry standard system?

I will concede that moving a developer from one team to another has less team-specific onboarding if they’re using an IDP because teams don’t organise themselves the same way unless you force them to. Perhaps a company-wide team documentation structure can help here, without replacing your entire infrastructure?

Point 7 – we're scaling, maybe we should have an IDP?

There's a big decision to be made here – do you want developers who understand how their application runs in production? “You build it, you run it” is a mantra for a good reason. If the environment where you want to scale your company is so strapped for good developers (that actually care how their application runs and performs in production) and you just want “any developers, please”, perhaps an off-the-shelf IDP is the way to go for your company. Of course, probably you want to hire a sysadmin team too because those developers cannot own their application in production if you want any kind of uptime guarantee.

Hopefully you can see, “this doesn't scale, we need a real IDP”, only works if you scale to have the knowledge and responsibility in a different place, and if you do that, beware, here be dragons. You might find the people who really believe in the devops ethos that you have been claiming is pervasive in the organisation, decide that they are no longer needed and go work somewhere where it is treasured. Coming soon

If you enjoyed this rant, look out for my next one : “Workflow engines promote bad practice”

*“Service” is a hugely overloaded term, here I'm talking an application that services users

**Quote from a Jonathon Coulton song called Code Monkey – I am not intending to deprecate people who code.