Since joining in early 2019, I've been building out Castor's ability to run services internally, forming the platform team that we have today. During this time I've switched back and forth between the management and IC tracks depending on the needs of the team and the organization.
Created in-house database tooling for a cloud migration of 45.000 on-prem MySQL databases ("schemas") into new MySQL instances on Azure with under 1 minute of unavailability per tenant (P99).
Brought into production a new Kubernetes-based deployment model to replace our Virtual Machine platform, providing greater ease-of-use for users at a lower cost of infrastructure spend.
Implemented Earthly to set up automated, reproducible builds of in-house Ubuntu software packages.
Fostered a team culture with a high degree of Psychological safety, placing a strong emphasis on transparency and quality documentation, a reduction of Toil through automation, and continuous learning through practices such as pre-mortems, After Action Reviews and Post-incident reviews.
Reduced MySQL crash recovery time from over 10 hours to under 10 minutes by working with Microsoft to identify pathological code paths in Azure MySQL Flexible Server.