Principle Site Reliability Engineer
Contract To Hire
"Principle Site Reliability Engineer"
with one of our direct clients in
for initial contract duration of
No third party candidates considered for this position. US citizens and all those authorized to work in the US are encouraged to apply.
Our Client is currently looking for a Principal Software Engineer to join our team.
About the Team
We are establishing a Site Reliability Engineering (SRE) practice at Client and building SRE teams. This is a great opportunity to get in on the ground floor of a core initiative within our Client Center of Excellence. The goal of Site Reliability Engineering at Client is to create practices that ensure the reliability, availability and performance of our products. Our SRE teams will create a culture of operational excellence through the reduction of toil, shared ownership with product development teams, and repeatable patterns for the software development lifecycle. More specifically we will be growing our Site Reliability Engineering focus to work alongside engineering organizations to help deploy, manage, observe, troubleshoot, and enhance our complex cloud-based and on-prem services.
About the Position
You are a Systems Thinking, SRE Engineer who wants to help teams scale through production insights, operational automation, developer guidance, real-time metrics, automation, automation, automation. Ambitious, independent, self-starter that finds joy in iteratively evolving software reliability implementations and practices with software engineering teams and leaders to meet customer needs. You are a quick study and problem solving engineer that enjoys tackling challenging business problems with software and/or hardware.
As a Principal Site Reliability Engineer at Client you will:
- Assist in establishing SRE practices for our organization including creating and maintaining new Service Level Objectives, establishing requirements for and performing Production Readiness Reviews
- Engage with engineering teams and leaders to mature our site reliability practices
- Work with our Center of Excellence teams to build out a toolkit of solutions for our application teams to integrate into their processes.
- Evolve problem statements into actionable items that enable engineers to deliver business value as quickly as possible, in a safe and repeatable way.
- Lead projects/technical initiatives for multiple organizations
- Demonstrate ownership of initiatives and drive them through to completion.
- Define and codify organizational standards relative to resiliency, configuration, and scalability.
Top 5 Must Haves:
- SRE Experience
- Software Engineering experience
- AWS or Cloud Experience
- Leadership experience
- Automation experience
- Bachelor’s degree in Computer Science or related field and 10+ years experience OR MS w/ 8+ years experience OR 14 years experience of equivalent combination of industry related professional experience and education
- Ability to learn new software, method and practices and bringing them to our developers
- Commitment to Infrastructure as Code
- Git or general version control experience
- Ability to work with engineering teams and leadership to engineer requirements
- Working in a continuous integration, testing, and delivery SDLC employing automation
- Eager to dig into problems and bring proposed solutions to group discussion
- Open to feedback and able to creatively adapt multiple ideas into a solution
- 6-8 years experience with DevOps engineering or SRE
- Experience with containers and serverless and how to deploy and operate both
- 6-8 years experience with monitoring and observability
- 6-8 years experience with configuration management.