SanAntonioTXRecruiter Since 2001
the smart solution for San Antonio jobs

Product Reliability Engineer (Remote)

Company: Medecision
Location: San Antonio
Posted on: June 25, 2022

Job Description:


  • Linux
  • Python
  • BASH
  • APIs
  • GCP
    • Product Observability: PREs embed with development teams to build and refine: metrics, logging, monitoring, and alerting for our products to enable proactive (and increasingly automated) issue identification, prevention, and remediation. This serves to improve the performance and uptime of our products, as well as provide detailed telemetry that aids in debugging complex issues
    • Product Reliability Systems and Process: PREs are the amongst the most seasoned users of our deployment and stability infrastructure, which means they often identify opportunities for additional functionality that would improve operational efficiency. An important part of the PRE role is partnering with other infrastructure and operational teams across Medecision to provide this input and, where appropriate, directly deliver features that will benefit product operations. (e.g., developing a system that automatically de-duplicates product alerts and enables teams to prioritize and document critical information).
    • Responsible for deploying, automating, maintaining, troubleshooting and improving the systems that keep the backend infrastructure running smoothly.
    • Responsible for the application maintenance, availability, and performance.
    • Dive deep to resolve problems at their root and troubleshoot services related to our platform
    • Develop automation tools for managing Aerial on-prem and cloud infrastructure.
    • Improve engineering standards, tooling, and processes
    • Develop a deep understanding of Aerial products and processes.
    • Collaborate with customer-facing, product, and infrastructure teams on the development and deployment of scalable, reliable software for our customers.
    • Diagnose, resolve, and prevent issues encountered in the field
    • Reduce the operational overhead of Medecision's products and leverage data to understand the largest sources of reliability risk.
    • Deliver end-to-end improvements to stability by proactively preventing issues via telemetry and automation and directly reducing the need for reactive support.
    • Make data-driven decisions about investments in stability and reliability.
    • Take part in a 24/7 on-call rotation responsible for coordinating Medecision's response to mission-critical incidents, ensuring efficient resolution with minimal customer impact.
    • Automating infrastructure and services in Aerial's Cloud
    • Working with client support to solve technical issues, perform upgrades, and migrate to better/faster/stronger gear
    • Supporting legacy and new architecture components to maximize uptime, availability, security
    • Working support tickets as part of a high output team in a fast-paced environment
    • Managing and optimize build pipelines and CI/CD tooling to streamline new product launches/features
    • Ensuring continuous delivery of technical and support services with end users and monitoring of systems and software performance
    • Providing technical input and leadership to formulate long-term objectives and standards of performance for staff, technical releases, production support and projectsWHAT YOU'LL BRING:
      • The role requires you to have hands-on technical experience and a can-do approach towards software and environment automation/management and continuous improvement.
      • You should have 4-6 years of experience with a start-up mentality in managing & troubleshooting large-scale distributed systems.
      • Excellent Linux and troubleshooting skills
      • You have a passion for solving problems using open source software.
      • You are an expert in Python/Bash and you are proficient in Linux.
      • Strong experience working in AWS/GCP environment and other server virtualization technologies
      • Experience working with monitoring stack like Splunk
      • Bachelor's degree in computer science
      • Familiarity with Infrastructure provisioning tools like Kubernetes, Terraform
      • Familiarity with Release Orchestration tools such as Octopus Deploy
      • Knowledge of CI/CD tools such as Gitlab
      • Comfortable building/proving new services for improvement/replacement of existing tech
      • Professional technical certifications from leading industry vendors such as AWS, GCP
      • Expertise in Python, SQL
      • Experience with distributed systems and technologies
      • Proven ability to deliver production software, including the use of Gitlab, Jenkins and/or other CI/CD tools
      • Technical expertise in industry leading products such as AWS, GCP, Terraform, Splunk, Gitlab
      • Expertise in at least one programming language used to building/launch cloud-based infrastructure
      • Technical knowledge including cloud services virtualization, containerization, postgres/mysql, and security
      • Comfortable building/proving new services for improvement/replacement of existing tech
      • Knowledge of RESTful APIs.
      • Remove ambiguity in understanding things by documenting things and hence making the teams more efficient and effective
      • Convert tacit knowledge to implicit knowledge
      • Experience in US Healthcare

Keywords: Medecision, San Antonio , Product Reliability Engineer (Remote), Engineering , San Antonio, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

San Antonio RSS job feeds