We believe that a great product can be made even greater by an excellent and caring team. If that's you, we want to hear from you
Principal DevOps Engineer
Location: Remote / San Francisco,
Type: Full-Time
This role is listed as remote, but we have a strong preference for candidates based in San Francisco or those open to relocating.
About the Role:
We are seeking a highly experienced Principal DevOps Engineer to join our growing engineering team. As a key leader, you will play a pivotal role in designing, building, and maintaining scalable, reliable infrastructure that supports the deployment and operation of our cutting-edge AI-driven applications. You will work closely with software engineers, machine learning engineers, and product teams to ensure our systems and services are robust, secure, and performant.
This is an opportunity to shape the DevOps practices and infrastructure at an early-stage startup, laying the groundwork for scalable growth and innovation.
Key Responsibilities:
Infrastructure Design & Management
- Architect, implement, and manage cloud-based infrastructure on AWS, GCP, or Azure.
- Design and maintain scalable, highly available systems using modern infrastructure-as-code (IaC) tools such as Chef, Terraform or Pulumi.
CI/CD Pipelines
- Develop and optimize CI/CD pipelines to automate build, test, and deployment workflows for both application and machine learning model releases.
- Implement robust rollback mechanisms to ensure system reliability.
Monitoring & Incident Response
- Create and maintain monitoring, logging, and alerting systems using tools like Prometheus, Grafana, or Datadog.
- Establish and manage incident response protocols, ensuring uptime and rapid resolution of critical issues.
Containerization & Orchestration
- Build and manage containerized applications using Docker and Kubernetes, enabling scalable and efficient deployments.
- Optimize Kubernetes clusters for performance and cost efficiency.
Collaboration & Leadership
- Partner with engineering teams to embed DevOps best practices across the organization.
- Mentor and guide junior DevOps engineers, fostering a culture of ownership, automation, and continuous improvement.
Security & Compliance
- Implement and enforce security best practices, including secrets management, network policies, and compliance standards.
- Conduct regular infrastructure audits and vulnerability assessments.
Performance Tuning & Optimization
- Analyze system performance metrics to identify bottlenecks and optimize resource utilization.
- Scale infrastructure to handle increasing load, ensuring reliability and performance.
Qualifications:
Education & Experience:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 10+ years of experience in DevOps, Site Reliability Engineering (SRE), or a similar role, with experience in early-stage startups preferred.
Technical Skills:
- Cloud Platforms: Deep expertise in AWS, GCP, or Azure, including core services like EC2, S3, RDS, and IAM.
- IaC Tools: Proficiency in Terraform, Pulumi, or similar tools.
- CI/CD: Strong experience with tools such as Jenkins, GitHub Actions, CircleCI, or GitLab CI/CD.
- Containerization: Advanced knowledge of Docker and Kubernetes (deployment, scaling, and monitoring).
- Monitoring Tools: Hands-on experience with monitoring and logging systems (e.g., Prometheus, Grafana, Datadog, ELK stack).
- Programming: Proficiency in scripting and automation using languages like Python, Bash, or Go.
- Networking: Strong understanding of networking principles, including DNS, load balancing, and VPC configurations.
- Version Control: Expertise with Git and GitOps workflows.
Preferred Skills:
- Experience implementing security standards such as SOC 2 or ISO 27001.
- Familiarity with MLOps practices and tools for deploying machine learning models.
- Knowledge of big data processing tools such as Apache Kafka or Spark.
- Exposure to serverless technologies like AWS Lambda or GCP Cloud Functions.
Soft Skills:
- Strong problem-solving and analytical abilities.
- Proven leadership skills with the ability to mentor junior team members and lead by example.
- Excellent communication skills, with the ability to collaborate effectively with cross-functional teams.
Why Join Us?
- Be a key player in shaping the infrastructure of a rapidly growing startup.
- Work on exciting AI-driven projects with real-world impact.
- Join a team of innovative and passionate engineers, product thinkers, and entrepreneurs.
- Enjoy competitive salary, equity options, and benefits.
- Thrive in a flexible work environment that values ownership, creativity, and continuous learning.
Benefits include:
- Unlimited PTO
- Competitive Salary
- Equity Options
- Daily food stipend
Are you looking for more jobs nearby? Find your favorite jobs now by visiting our online jobs page.