Lead Site Reliability Engineer
Company: Ford Motor Company
Location: Dearborn
Posted on: March 18, 2023
|
|
Job Description:
The specific responsibilities of an SRE managing a large,
distributed application built on microservices, spring boot, and
Google Cloud may include:
Strong background in software development and systems
administration, as well as excellent problem-solving and
communication skills.
Run the production environment by monitoring availability and
taking a holistic view of system health.
Developing, improving, and operating the deployment and
orchestration of a complex distributed system
Improve reliability, quality, and time-to-market of our suite of
software solutions
Measure and optimize system performance, with an eye toward pushing
our capabilities forward, getting ahead of customer needs, and
innovating to continually improve
Provide primary operational and engineering Support for multiple
large, distributed software applications
Identify and reduce or eliminate toil via automation to maximize
the time spent on engineering and innovation
Collaborating with development teams to design, build, and operate
scalable and resilient software systems
Automating deployment, monitoring, and incident response
processes
Performing root cause analysis of production incidents and
implementing preventive measures
Conducting performance analysis and optimization of the system
Ensuring compliance with security and regulatory standards
Implementing and maintaining disaster recovery processes
Providing technical guidance and mentorship to other team
members
Participating in an on-call rotation for incident response and
support.
At Ford Motor Company, we believe freedom of movement drives human
progress. With our incredible plans for the future of mobility, we
have a wide variety of opportunities for you to accelerate your
career and help us define tomorrow's transportation.
Qualifications
Four-year college degree in Computer Science or Equivalent.
7 - 9 years' experience with JAVA, J2EE, NoSQL/SQL Datastore,
Spring Boot, GCP/AWS/Azure & Docker/K8 in developing multi-tier
applications.
Programming skills (Perl, Python, Ruby, Java/Scala or C).
Experience with RESTful APIs and microservices platform is a
must
Working knowledge of the TCP/IP stack, internet routing and load
balancing
4 - 5 Years of experience with any of APM and other moniotoring
tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu,
Nagios, Kafka, DataDog, PagerDuty.
Experience with product & development teams to establish error
budgets by identifying the right SLOs (Service level objective),
SLIs (Service level indicators), KPIs (Key performance indicators)
and effectively drive the use of the budget to ensure maximum
domain availability/uptime.
Regularly review key site technical metrics such as transactions
errors, logging, response times, caching strategies,
conversion/bounce rates, capacity & resource utilization.
Debug production issues across services and levels of the
stack.
Proactively identify stability risks & work with engineering
leadership to establish appropriate mitigation plans.
Recognize, validate & evangelize emerging technologies &
architectures that align with business objectives
Solve complex architecture/design & business problems, work to
simplify, optimize, remove bottlenecks, etc.
Collaborate closely with architects & other cross functional teams
to create secure, reliable, and scalable software solutions.
Thorough understanding of software development cycle and agile
programming environment.
Architect, design & develop automation to reduce toil, improve
recoverability, availability, latency & scalability of supported
applications.
Triage, analyze and provide solution to critical & high priority
technical issues occurring in the ecosystem, optimize incident
management processes.
Respond, react & communicate as per the ITSM incident management
process. This process involves detection of the incident, timely
communication to leadership during the life of the incident,
service restoration, followed by root cause analysis to prevent the
incident from occurring in the future.
Drive blameless postmortem culture.
Practice destructive testing for discovering vulnerabilities in
environments powered by Distributed software systems.
Implement effective observability strategy, to improve MTTD (Mean
Time to Detection) & MTTR (Mean Time to Resolution).
Maintain knowledge repository that includes Standard operating
procedure, Release checklists, Runbooks for incident recovery
What you'll receive in return:
As part of the Ford family, you'll enjoy excellent compensation and
a comprehensive benefits package that includes generous PTO,
retirement, savings and stock investment plans, incentive
compensation and much more. You'll also experience exciting
opportunities for professional and personal growth and
recognition.
Candidates for positions with Ford Motor Company must be legally
authorized to work in the United States permanently. Verification
of employment eligibility will be required at the time of hire.
Visa sponsorship is available for this position.
We are an Equal Opportunity Employer committed to a culturally
diverse workforce. All qualified applicants will receive
consideration for employment without regard to race, religion,
color, age, sex, national origin, sexual orientation, gender
identity, disability status or protected veteran status.
For information on Ford's salary and benefits, please visit:
https://corporate.ford.com/content/dam/corporate/us/en-us/documents/careers/2022-benefits-and-comp-GSR-sal-plan-2.pdf
At Ford, the health and safety of our employees is our top
priority. Vaccination has been proven to play a critical role in
combating COVID-19. As a result, Ford has made the decision to
require U.S. salaried employees to be fully vaccinated against
COVID-19, unless employees require an accommodation for religious
or medical reasons. Being fully vaccinated means that an individual
is at least two weeks past their final dose of an authorized
COVID-19 vaccine regimen. As a condition of employment, newly hired
employees will be required to provide proof of their COVID-19
vaccination or an approved medical or religious exemption.
We are an Equal Opportunity Employer committed to a culturally
diverse workforce. All qualified applicants will receive
consideration for employment without regard to race, religion,
color, age, sex, national origin, sexual orientation, gender
identity, disability status or protected veteran status.
Requisition ID : 7993
Keywords: Ford Motor Company, Dearborn , Lead Site Reliability Engineer, Professions , Dearborn, Michigan
Click
here to apply!
|