Site Reliability Engineer- Senior Level
Location: San Antonio
Posted on: February 15, 2019
Purpose of JobWe are currently seeking a talented Site Reliability Engineer - Senior Level for our San Antonio, TX, Plano, TX or Phoenix, AZ facility.
The primary purpose of site reliability engineering at USAA is to improve and sustain the reliability of USAAs most critical IT systems. The role is essential in helping establish and measure service level objectives for critical systems. In addition, SREs will continuously identify engineering and automation opportunities to effectively manage production systems at scale. An SRE will model a blameless culture through effective post mortems and a focus on minimizing impact felt from outages.
Site reliability engineers at USAA will have the job title of a Software Developers and Integrators (SDIs) who are also engaged in all phases of the software development lifecycle which include; gathering and analyzing user/business system requirements, responding to outages and creating application system models. SDIs primary functions are to design, develop, document, test and debug new and existing software systems and/or applications for internal use, perform defect corrections (analysis, design, code). In addition, SDIs participate in design meetings and consult with business clients to refine, test, and debug programs to meet business needs, and interact and sometimes direct third party partners in the achievement of business and technology initiatives. This role is a solid, career-level role where functional and technical proficiency has been obtained, and incumbents display a depth of technical understanding within their respective areas of specialization allowing them to operate independently. Incumbents also display a proficiency that allows them to begin to mentor others (third party and internal resources) on procedural mattersJob Requirements
- Work with application and system SMEs to design highly scalable and resilient distributed systems
- Create Service Level Objectives to measure and manage core infrastructure and critical services
- Analyze, troubleshoot and fix core infrastructure or critical systems when they fail or degrade
- Write custom code or scripts to automate repetitive or manual system support tasks
- Lead technical post mortems to identify lessons learned and implement improvements
- Partner with technical teams and product owners to ensure resiliency work is developer-ready
- Design and execute failure injection tests to verify adequate system capacity and resiliency
- Champion Site Reliability Engineering practices across IT organization
- Independently installs, customizes and integrates commercial software packages.
- Facilitates root cause analysis of system issues.
- Works with experienced team members to conduct root cause analysis of issues, review new and existing code and/or perform unit testing.
- Learns to create system documentation/play books and attends requirements, design and code reviews.
- Receives work packages from manager and/or delegates.
- Identifies ideas to improve system performance and impact availability.
- Resolves complex technical design issues.
- Creates system documentation/play book(s) and participates as a reviewer and contributor in requirements, design and code reviews.
- May serve as the subject matter expert on development techniques.
- Partners with experienced team members to develop accurate work estimates on work packages.
May serve as a mentor on procedural matters to less experienced internal and third party team members.
- May assist experienced team members with the delegation of work packages.
- Bachelors degree or 4 additional years of related experience beyond the minimum required may be substituted in lieu of a degree.
- 6 or more years of software development experience demonstrating depth of technical understanding within a specific I/T discipline(s)/technology(s) to include relevant business support and/or general information technology support experience
- 6+ years of experience managing large scale production environments (1000+ servers) and experience with production support of applications in large scale environments
- Working knowledge of systems administration and/or systems programming skills
- Strong interest in monitoring, optimizing, scaling and troubleshooting large distributed systems
*Qualifications may warrant placement in a different job level*
When you apply for this position, you will be required to answer some initial questions. This will take approximately 5 minutes. Once you begin the questions you will not be able to finish them at a later time and you will not able to change your responses.
- Demonstrated experience influencing and selling new ideas to peers, leadership, and senior management
- Experience in one or more of the following: C, C++, Java or Python
- Strong experience or working knowledge of end-to-end IT systems (compute, storage, network, security, application runtime, relational databases, REST services, asynchronous messaging, etc.)
- Strong troubleshooting skills and experience developing enterprise applications
- Demonstrated experience building SLO-based monitoring solutions
- Strong teaming and collaboration skills
The above description reflects the details considered necessary to describe the principal functions of the job and should not be construed as a detailed description of all the work requirements that may be performed in the job.
At USAA our employees enjoy one of the best benefits package in the business, including a flexible business casual or casual dress environment, comprehensive medical, dental and vision plans, along with wellness and wealth building programs. Additionally, our career path planning and continuing education will assist you with your professional goals.
Relocation assistance is not available for this position.
For Internal Candidates:
Must complete 12 months in current position (from date of hire or date of placement), or must have managers approval prior to posting.
Last day for internal candidates to apply to the opening is 2/05/19 by 11:59 pm CST time.
Keywords: USAA, San Antonio , Site Reliability Engineer- Senior Level, Engineering , San Antonio, Texas
Didn't find what you're looking for? Search again!