Software Engineer (CloudOps Engineer)
Posted 2025-08-15
Remote, USA
Full Time
Immediate Start
About the Role:
We’re looking for a CloudOps Engineer to join our fast-growing CloudOps team focused on Developer Experience, SRE, and FinOps. In this role, you’ll be responsible for the reliability, performance, and observability of CloudZero’s infrastructure — empowering engineering teams to ship features that help customers understand and optimize their cloud spend.
CloudZero processes billions of events daily across AWS, Azure, and GCP. Our customers rely on real-time, accurate cost data to make business-critical decisions — and any instability in our system impacts their planning. Built entirely on a unique serverless architecture (no EC2s or containers), our platform demands infrastructure that scales gracefully, fails predictably, and recovers automatically.
The problems are interesting: handling massive data volumes efficiently, ensuring sub-second query performance across terabytes of data, and scaling systems to support customers spending millions monthly — all in a modern, event-driven environment.
You Will:
• Infrastructure as Code everything. Design and maintain Pulumi modules that provision reliable, cost-efficient cloud resources. No clicking through consoles.
• Build observability into everything. Instrument systems so that failures surface quickly and debugging happens with data, not guesswork. You'll know about problems before customers do.
• Automate the boring stuff. Deployments, scaling, backups, and changing limits; if humans are doing it repeatedly, you'll build systems to automate it instead.
• Partner with product engineering. Help teams design resilient services, review architectures for operational complexity, and build deployment pipelines that enable safe and fast shipping.
• Optimize for cost and performance. CloudZero's business is helping others optimize cloud costs. We should be exemplars of efficient cloud usage ourselves.
Requirements:
• 3–5+ years of experience building and operating distributed systems in AWS
• Strong skills in Python, Infrastructure as Code (e.g., Pulumi or Terraform), and Kubernetes
• Hands-on experience with monitoring tools such as Prometheus or DataDog
• Proven ability to debug production issues under pressure
• Values thoughtful, reliable system design over reactive “hero” efforts
• Balances automation intelligently — builds solutions to real problems, not automation for its own sake
• Able to clearly explain complex technical issues to non-technical stakeholders
• Strong documentation habits to support long-term team clarity and system stability
• Excited to take ownership of infrastructure and solve operational challenges at scale
Please note: CloudZero is unable to sponsor employment visas or provide immigration-related support now or in the future. All candidates must have current, unrestricted authorization to work in the United States permanently.
About CloudZero:
Cloud cost management is one of the biggest challenges organizations face today. As cloud adoption continues to accelerate, so do the complexities and costs associated with it — and macroeconomic conditions only increase pressure to prove cloud efficiency. That’s why we built CloudZero: a SaaS platform at the intersection of next-generation cloud cost management and FinOps. CloudZero ingests billing and usage data from all cloud, SaaS, and PaaS providers, organizes it in real time according to our customers’ business structures, lets customers view it at any level of time or resource granularity, and ultimately empowers them to make more informed business decisions.
Since our founding in 2016, our mission has been to make efficient innovation a reality for every cloud-driven organization. At CloudZero, we believe every engineering decision is a buying decision, yet the cost conversation often bypasses the engineers who drive those determinations. To solve this, we’ve built a dynamic, single-page application that answers the complex, data-heavy questions every cloud-based organization needs to ask if they want to grow their company profitably.
To date, we've raised over $119 million, including a $56 million Series C round backed by leading venture capital firms from across the country. We're tackling challenges of massive scale, strategic business importance, and technical complexity in a space that needs innovation more than ever. We're growing rapidly—and we'd love for you to join us. Apply tot his job
We’re looking for a CloudOps Engineer to join our fast-growing CloudOps team focused on Developer Experience, SRE, and FinOps. In this role, you’ll be responsible for the reliability, performance, and observability of CloudZero’s infrastructure — empowering engineering teams to ship features that help customers understand and optimize their cloud spend.
CloudZero processes billions of events daily across AWS, Azure, and GCP. Our customers rely on real-time, accurate cost data to make business-critical decisions — and any instability in our system impacts their planning. Built entirely on a unique serverless architecture (no EC2s or containers), our platform demands infrastructure that scales gracefully, fails predictably, and recovers automatically.
The problems are interesting: handling massive data volumes efficiently, ensuring sub-second query performance across terabytes of data, and scaling systems to support customers spending millions monthly — all in a modern, event-driven environment.
You Will:
• Infrastructure as Code everything. Design and maintain Pulumi modules that provision reliable, cost-efficient cloud resources. No clicking through consoles.
• Build observability into everything. Instrument systems so that failures surface quickly and debugging happens with data, not guesswork. You'll know about problems before customers do.
• Automate the boring stuff. Deployments, scaling, backups, and changing limits; if humans are doing it repeatedly, you'll build systems to automate it instead.
• Partner with product engineering. Help teams design resilient services, review architectures for operational complexity, and build deployment pipelines that enable safe and fast shipping.
• Optimize for cost and performance. CloudZero's business is helping others optimize cloud costs. We should be exemplars of efficient cloud usage ourselves.
Requirements:
• 3–5+ years of experience building and operating distributed systems in AWS
• Strong skills in Python, Infrastructure as Code (e.g., Pulumi or Terraform), and Kubernetes
• Hands-on experience with monitoring tools such as Prometheus or DataDog
• Proven ability to debug production issues under pressure
• Values thoughtful, reliable system design over reactive “hero” efforts
• Balances automation intelligently — builds solutions to real problems, not automation for its own sake
• Able to clearly explain complex technical issues to non-technical stakeholders
• Strong documentation habits to support long-term team clarity and system stability
• Excited to take ownership of infrastructure and solve operational challenges at scale
Please note: CloudZero is unable to sponsor employment visas or provide immigration-related support now or in the future. All candidates must have current, unrestricted authorization to work in the United States permanently.
About CloudZero:
Cloud cost management is one of the biggest challenges organizations face today. As cloud adoption continues to accelerate, so do the complexities and costs associated with it — and macroeconomic conditions only increase pressure to prove cloud efficiency. That’s why we built CloudZero: a SaaS platform at the intersection of next-generation cloud cost management and FinOps. CloudZero ingests billing and usage data from all cloud, SaaS, and PaaS providers, organizes it in real time according to our customers’ business structures, lets customers view it at any level of time or resource granularity, and ultimately empowers them to make more informed business decisions.
Since our founding in 2016, our mission has been to make efficient innovation a reality for every cloud-driven organization. At CloudZero, we believe every engineering decision is a buying decision, yet the cost conversation often bypasses the engineers who drive those determinations. To solve this, we’ve built a dynamic, single-page application that answers the complex, data-heavy questions every cloud-based organization needs to ask if they want to grow their company profitably.
To date, we've raised over $119 million, including a $56 million Series C round backed by leading venture capital firms from across the country. We're tackling challenges of massive scale, strategic business importance, and technical complexity in a space that needs innovation more than ever. We're growing rapidly—and we'd love for you to join us. Apply tot his job