Chan Zuckerberg Initiative - Palo Alto, CA
We believe we can help build a future for everyone.
- We aim to be daring, but humble: We look for bold ideas — regardless of structure and stage — and help them scale by pairing engineers with subject matter experts to build tools that accelerate the pace of social progress.
- We want to learn fast, but build for the long-term: We want to iterate fast and help bring new solutions to the table, but we also realize that important breakthroughs often take decades, or even centuries.
- Stay close to the real problems: We engage directly in the communities we serve because no one understands our society’s challenges like those who live them every day.
Our success is dependent on building teams that include people from different backgrounds and experiences who can challenge each other's assumptions with fresh perspectives. To that end, we look for a diverse pool of applicants including those from historically marginalized groups — women, people with disabilities, people of color, formerly incarcerated people, people who are lesbian, gay, bisexual, transgender, and/or gender nonconforming, first and second generation immigrants, veterans, and people from different socioeconomic backgrounds.
By pairing engineers with leaders in our education, science, and justice and opportunity teams, we can bring technology to the table in new ways to help drive solutions. We are uniquely positioned to design, build, and scale software systems to help educators, scientists, and policy experts better address the myriad challenges they face. Our technology team is already helping schools bring personalized learning tools to teachers and schools across the country and supporting scientists around the world as they develop a comprehensive reference atlas of all cells in the human body.Site Reliability Engineers work alongside Software Engineers to ensure that we're designing and operating our systems with scalability and reliability in mind. Members of the Infrastructure organization have an impact on all of our initiatives by tackling technology problems which impact multiple engineering teams.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews to share best practices across CZI
- Write automation code for provisioning and operating infrastructure in reliable and repeatable fashions
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health across multiple cloud environments
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity
- Roll up your sleeves to troubleshoot incidents, formulate theories, test your hypothesis, and narrow down possibilities to find the root cause
- Together with your engineering team, you will participate in a 24x7 on-call rotation and be an escalation contact for service incidents
- 3+ years relevant Site Reliability Engineering or Production Engineering experience
- Experience with Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure
- Experience with configuration management and orchestration tools such as Ansible, Chef, Docker or Terraform
- Experience with a scripting language such as Python, PHP, Ruby, or Perl
- Experience with a systems language such as C, C++, C#, Go, Java, or Scala is preferred
Want to discover the best jobs and companies?
Welcome to the next step in your career
Maia is a daily email with jobs and career advice.
Discover jobs that are a fit for you, with Maia’s smart job matching.