Job Description
Summary
Description
Understanding of core SRE concepts - Monitoring, Alerting, Incident management
Performance engineering (design concepts, profile-guided optimization)
Prepare alert handling procedures, run-books, and collaborate with other SRE team members. Service management across bare metal, and virtualized (EC2) platforms
Excellent communication and a high degree of customer focus when engaging with internal platform customers
Ability to work optimally with colleagues based in other locations is also essential; experience in this area is a plus
Prior experience with development or maintenance of distributed databases, and operating systems systems is recommended
Come join us at Apple Services Engineering and help us deliver services and applications that are fluid and responsive. You will collaborate with engineers from across Apple to define the metrics, set targets, uncover optimization opportunities, and ship a service that will delight our customers. This role is for engineers who enjoy deep technical engineering that spans large cross-organizational projects. Your openness to learning and implementing new technologies will contribute to the continuous evolution of our organization. Good ideas are valued and rewarded.
Minimum Qualifications
- At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure- focused role, with preference for distributed database management.
- Linux expertise
- Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
- Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
- Hands on scripting with Python and shell
- Designing, building and maintaining infrastructure with a cloud provider such as AWS.
- Automation advocate - prior history of removing operational toil via software.
- Both a strong sense of ownership as well as team camaraderie with clear and transparent communication abilities.
- Self motivated, inquisitive and always looking to learn more.
Preferred Qualifications
- Demonstrated expertise developing distributed systems, storage engines, distributed systems, or performance engineering.
- Experience developing critical internet services and/or platform infrastructure.
- Proficient in one or more of the following programming languages: Java, Go (golang) or Python
- Experience managing services on Kubernetes * Experience with Terraform
- JVM tuning