Senior DevOps Engineer (Team Lead)
70% Hands-On | Cloud & On-Prem | AI Platform
Why Join Us?
Lead and grow our DevOps function in a hands-on leadership role combining deep technical ownership (70% hands-on) with people management. Manage a team of 4 DevOps engineers while remaining directly involved in architecture, infrastructure, automation, and customer-facing deployments. Play a critical role in designing and operating highly reliable, scalable, and secure infrastructure supporting our AI platform across AWS, Azure, GCP, and air-gapped on-prem environments.
Key Responsibilities
-
Design, develop, and maintain cloud infrastructure across AWS, Azure, and GCP
-
Own platform delivery and deployment in customer environments (public cloud and air-gapped on-prem using OpenShift and Docker)
-
Lead and manage a team of 4 DevOps engineers, including technical guidance, prioritization, and mentoring
-
Manage Infrastructure as Code using Terraform
-
Build, scale, and operate Kubernetes clusters and containerized applications
-
Implement automation for deployment, monitoring, alerting, and incident response
-
Develop and maintain Python and Bash scripts for automation, integrations, and internal tooling
-
Troubleshoot complex networking, connectivity, and security issues (TCP/IP, UDP, VPNs, DNS, routing, firewalls)
-
Collaborate closely with engineering and product teams to optimize deployment strategies and system reliability
-
Support customer onboarding, including technical setup, deployment sessions, and architecture discussions
-
Continuously improve CI/CD pipelines, operational processes, and platform reliability
Requirements
-
5+ years of experience in DevOps/Infrastructure roles in startup or technology-driven environment, OR leadership experience in DevOps roles during military service plus 2+ years of industry experience
-
Proven hands-on experience with at least two major cloud providers (AWS, Azure, GCP); deep experience with Azure highly preferred
-
Strong experience with on-prem environments; OpenShift experience is a significant plus
-
Solid programming skills in Python and Bash for automation, tooling, and scripting
-
Strong background in Linux systems administration
-
Extensive experience with Kubernetes, Docker, and container orchestration
-
Strong understanding of networking fundamentals: TCP/IP, UDP, DNS, VPNs, routing, firewalls
-
Experience with Terraform or other Infrastructure as Code tools
-
Excellent problem-solving and troubleshooting skills
-
Strong communication skills with ability to work directly with customers, engineers, and stakeholders
Preferred Qualifications
-
Experience with all three major cloud providers (AWS, Azure, GCP)
-
Familiarity with CI/CD tools (GitHub Actions)
-
Experience deploying AI/ML platforms or similar complex systems
Tech Stack: AWS, Azure, GCP, OpenShift, Kubernetes, Docker, Terraform, GitHub, GitHub Actions, Helm, Python, Bash, Linux, IaC