Location: San Antonio
Posted on: November 22, 2021
UST Global is looking for Site Reliability Engineering Architect
to join our team. SRE architect will play the mission-critical role
of ensuring that critical systems are healthy, monitored,
automated, and designed to scale. This role requires a thoughtful
problem solver with excellent organizational skills. The Site
Reliability engineering team is responsible for availability,
latency, performance, efficiency, change management, monitoring,
emergency response, and capacity planning. This role will be
responsible for responding to production problems, investigating
their causes, and engineering and advising on permanent solutions.
- Engage in and improve the whole life-cycle of services---from
inception and design, through deployment, operation and
- Design, develop, ship, and motivate the creation of software
and systems to increase product reliability and organizational
- Guide reliability practices through the entire software
development life-cycle through activities like architecture
reviews, code reviews, creating platforms and frameworks, capacity
- Work with senior engineering and testing team members to build
tools and testing strategies for problem prevention, detection, and
- Design and create centralized logging and monitoring
- Design and create robust logging, monitoring, and alerting
- Troubleshoot production incidents in real time.
- Lead root cause investigations. --- Improve service reliability
through blameless post-incident reviews and using code to prevent
or respond to problem recurrence.
- Proactively identify system anomalies.
- Recommend and execute testing strategies. --- Recognize
- Participate in on-call rotation and be able to work on weekend
during on call schedule.
- Code level debugging on issues escalated to the team.
- Develop tools to automate routine jobs through knowledge
learned on the job.
- Plug into software release cycle. Work closely with developers
to ensure software releases are well designed, planned,
implemented, released, and monitored. ---
- Automate time-consuming and manual processes.
- Assess current SRE solution and define the SRE approach for
- Work with applications development teams on designing,
implementing, and improving SRE practices
- Conduct SRE training sessions.
- Design and execute Scalability strategies that ensure the
scalability and the elasticity of the infrastructure. Must have
- Experience in defining the SRE Roadmap for organizations
- Experience with Cloud technologies and Solution. (GCP
- Experience with IAC tools (Terraform, CloudFormation)
- Experience with configuration management tools like
- Experience with container technology and orchestration
- Proficiency with tools like Git, Bitbucket
- Linux operating system, testing tools and database management
- Experience in one or more of the following: Java, JS, Duck
creek, Python, Micro-services
- Experience with Monitoring tools like App Dynamics.
- Experience with Log management and ELK Stack. (Elastic Search,
- Experience with APICA, Zebra tester for synthetic
- Experience with Pager Duty for Alerting.
- Understanding of the Application servers, Network and
- Excellent understanding of Scalability processes and
- Understanding of Jenkins or other build tools.
- Hands on experience in administering high availability and
high-performance environments, as well as managing large-scale
deployments of traffic-heavy applications.
- Someone who can handle multiple complex systems and not shy
away from the challenge of improving them.
- The willingness to try new technologies and make them harmonize
with existing systems to achieve better operations overall. Added
- Experience of working in large financial services or retail
- Excellent communication and organizational skills
- Thriving as a member of a team excelling under pressure
- The ability to think fast; A natural problem-solver?
- Bachelor---s degree or equivalent in Computer Science,
Engineering or a related field, or additional comparable
- Proven experience in IT, application development or DevOps,
including excellent knowledge of networking, computing and
- Background in Software Development, Software Validation, or
- Industry certification in cloud services / solutions
Keywords: UST, San Antonio , SRE Architect, Professions , San Antonio, Texas
Didn't find what you're looking for? Search again!