Guardrails are meant to protect us from tripping over. The same can be said about engineering guardrails.

Like most engineering decisions, adding guardrails is a trade-off. There are multiple levels of adding guardrails and one has to decide which ones and how many should be added.

  1. Source code
  2. Production deployments
  3. Data
  4. Information Security

Source code

There are multiple levels of guardrails one can add at the level of source code. The most basic being CI. A decade ago, setting up Jenkins used to be an effort. These days all version control systems like GitHub and GitLab come with built-in CI, utilize that. At the very least, build the code on every relevant pull request. This will ensure that the decision to merge a breaking change cannot be an accidental one.

Other such guardrails can be about adding tests, linter, and even code format checkers (eslint/gofmt/black, etc.) to the CI. These guardrails are especially useful for a new engineering onboarding, as s/he can be confident that his/her changes have a low likelihood of breaking anything.

Add a high-quality .gitignore file. This prevents accidental commits like .env or node_modules to the version control. gibo is a really useful tool for generating such files.

Pick a build system and stick to it. My favorite is Makefile. Commands like make build, make lint, make clean would map the correct Go/Android/Python/Typescript/Rust code and I don’t have to memorize the esoteric syntax of each of those separately.

Production Deployment

Guardrails around production deployment should always start from external checks. Tools like UptimeRobot and HyperPing can regularly ping your external endpoints and verify that they are accessible and responding correctly. Start with a small list but over time, every single endpoint that you have published publicly (e.g. in your mobile app) should be added to the list.

Crash monitoring tools like Crashlytics and Sentry monitor application crashes. Ideally, add them first to the backend and then to the frontends as well to get full crash coverage.

Another useful guardrail is to see how your application is performing. E.g. track the number of incoming requests, the time taken to process those requests, the number of malformed requests, etc. Application Performance Monitoring (APM) systems like New Relic and DataDog are useful for tracking patterns and spotting any anomalies over time.

Always use rollouts, so that, a bad build would degrade but never take down the service.

Ensure that production configs are recorded declaratively e.g. config.yaml for Kubernetes. This ensures that all changes are tracked via version control.

Monitor user sessions and see how they are coming. Tools like Hotjar for the web and Embrace for mobile are good for recording and visualizing the user’s overall interaction with the product. This guardrail is more useful for a Product team rolling out a feature.

Data

Declaratively defining data configs are hard. At the very least, if possible, avoid the NoSQL hype and use a relational database like MySQL or Postgres. Further, define the schemas via ORM.

Any files that are being stored into buckets (e.g. Amazon S3) should be versioned. E.g. if you are storing user attachments, consider calling it attachmentV1 from the beginning, so that, if you switch to compressed/encrypted attachments, the codebase can seamlessly switch to a bucket named attachmentV2.

Further, ensure that the backups are enabled either implicitly or explicitly. This prevents data loss in case of accidental deletes.

Information Security

  1. If possible, always use SSO.
  2. Make two-factor mandatory for all accesses.
  3. Require signed commits
  4. Keep the frontend (including mobile) and backend codebases separate, so that, frontend engineers (especially offshore contractors) won’t need access to the backend codebase.
  5. Use the secrets manager provided by your cloud platform. Overtime, first, migrate all credentials to the secrets manager. Then refresh all those credentials. Most team members should be able to deploy applications without having access to the secrets directly.
  6. Add remote wipe ability on employees’ devices.