Site Reliability Engineer | Louisville, KY | Papa John's

Site Reliability Engineer

THIS IS A CORPORATE POSITION

Responsibilities

As a Site Reliability engineer II, you'll be responsible for leading a team of engineers to ensure that Papa John's is available, resilient and fast for our customers. In addition to pairing with other engineers and architects, SREs field questions from other product teams and encourage cross-team collaboration. You will also have an active role engaging with third party partners and the open-source software community. We work closely with the core technology services teams who manage system configuration, basic network services, container orchestration, metrics collection, load balancing, performance testing, chaos testing, monitoring and alerting. We're enablers working and building tools that are used by all of engineering to manage production, with the goal of providing a consistent and stable ecosystem. You'll be instrumental in making Papa John's the best pizza delivery experience for customers who love pizza!

As a Site Reliability Engineer II, you will be an extremely knowledgeable engineer on the SRE team and are expected to build and grow the skillsets of the other junior engineers that you'll work with. You are action oriented, taking on new opportunities and tough challenges with a sense of urgency, high energy and enthusiasm. You consistently achieve results, even under tough circumstances. There is also an expectation that the Site Reliability Engineer II will demonstrate expertise around resiliency patterns, architectures and reducing operations toil.

Papa John's Offers

70% - Delivery and Execution:
  • Technical lead for the SREs and related resiliency engineering projects
  • Good working knowledge in SRE concepts like Availability, Observability/Monitoring, Scalability, SLA, SLO, SLA, MTTR, MTTF..etc
  • Improve reliability, resiliency, performance and time-to-market for services, systems and products
  • Troubleshoot/Triage production outages, incidents and issues relating to resilience, performance, availability and scalability issues
  • Completing Root Cause Analysis (RCA) investigations and leading the postmortem discussions
  • Extend technical support for on-call roles for eCommerce systems (omnichannel - web, mobile, tablet, aggregators...etc)
  • Construct monitoring and alerting strategy for production environments and diagnose performance/availability/scalability issues with an emphasis on Root Cause Analysis & Resolution
  • Experience in Java performance, Linux system monitoring, Database SQL tuning and basic networking services
  • Strong knowledge in resiliency frameworks and patterns for applications (Hystrix, resilience4j, circuit breaker, bulkhead), as well as infrastructure
  • Identifies Single point of failures/failure modes and design strategy for addressing it
  • Hands-on experience with tools like AppDynamics, Splunk (or Kibana/ELK), Solarwinds and Cloud monitoring tools (Stackdriver, Google monitoring, Cloudwatch)
  • Good programming knowledge (Java/J2EE, Python) beyond simple scripts
  • Cloud environments (GCP, AWS, Azure) and Automation experience
  • Good understanding and experience of CI/CD via Jenkins
  • Knowledge around containerization (docker) and container orchestration (Kubernetes, GKE, docker swarm)
  • Collaborates with Enterprise Architects and with Chief Architecture Owners and creates meaningful architecture diagrams and other documentation needed for security reviews or other interested parties


20% - Support and Enablement:
  • Field questions from other product teams or support teams
  • Monitors tools and participates in conversations to encourage collaboration across product teams
  • Provides application support for software running in production
  • Proactively monitors production and lower life cycle environments service level objectives for products
  • Works with partners and open-source community to help identify and ensure reliability and resiliency in software products
  • Proactively reviews the performance and capacity of all aspects of production: code, infrastructure, data, and message processing
  • Triages high priority issues and outages as they arise
  • Conducts technical interviews of job applicants and contractors to evaluate their skills


10% - Learning:
  • Participates in and leads learning activities around resiliency patterns, architectures and reducing operations toil
  • Learns through reading, tutorials, videos, collaboration, new technologies and best practices being used within other technology organizations
  • Attends conferences and shares learnings with others upon return


Qualifications

  • 6 to 8 years of relevant work experience
  • Proficient in a cloud computing platform and the associated automation patterns they provide (preferably Google Cloud and Kubernetes)
  • Proficient in defensive coding practices and patterns for high availability
  • Proficient in modern microservice-based architectures and methodologies
  • Proficient in production systems design including High Availability, Disaster Recovery, Performance, Efficiency, and Security
  • Demonstrates team leader abilities with a proven record of successful delivery of products
  • Linux and shell/bash scripting
  • Java/J2EE performance optimization
  • Performance Engineering/Testing knowledge (NFR, Troubleshooting, Tuning, Capacity Planning...etc)
  • Public Cloud - GCP / AWS / Azure
  • APM tools like AppDynamics, Dynatrace
  • Infrastructure/network monitoring tools like Solarwinds, Thousand Eyes, Wireshark, nmon
  • Logging Analytics tools like Splunk, Kibana/ELK
  • Automation via bash, Python, groovy, Java, per
  • Good understanding of chaos engineering
  • Performance testing tools like Jmeter, LoadUI/ReadyAPI, Performance Center...etc
  • CI/CD using Jenkins
  • Docker, Kubernetes, GKE
  • Configuration Management & Infrastructure as a code using Ansible / Puppet
  • Oracle DB , MySQL / PostGreSQL (SQL tuning would be good)

Education, Experience & Certifications
  • Bachelors' Degree - Computer Science & Engineering
  • Masters' Degree - Computer Science & Engineering
  • Google Cloud Certifications
  • Amazon Web Services Certifications
  • Microsoft Cloud Certifications
  • Java Programming Language Certifications


Overview

Exciting things are happening at Papa John's corporate restaurants. Work where the best ingredient is YOU!

Great things are happening at Papa John's! If you are looking for a fulfilling career with an international company, flavored with challenging work, mixed with professional development opportunities, a competitive salary and a collaborative team environment, then look no further! Papa John's seeks people who share our philosophy for success, are looking for quality business practices and meaningful work. All these combine to produce not only the best pizza, but also the best team members!

Papa John's has over 5,000 locations in 44 countries and territories around the world. We offer a competitive benefits and compensation package. Driven to be the best. Better Ingredients. Better People. ®

It is the policy of Papa John’s to provide equal employment opportunities for all applicants and team members without regard to race, color, religion, sex, age, marital status or civil partnership, national or ethnic origin, pregnancy or maternity, veteran status, uniformed service (as defined by 10 U.S.C. §101 (a)(5)), protected disability status, genetic information, sexual orientation, gender identity, gender reassignment, or gender expression, or any other characteristic protected by statute or law.