CarrolltonRecruiter Since 2001
the smart solution for Carrollton jobs

SRE Lead - Observability

Company: Citi
Location: Carrollton
Posted on: January 19, 2023

Job Description:

Summary : This role will drive the development of SRE function for Observability platform, while ensuring the alignment to Citi's Inventory, Developer Enablement and Automation Service (IDEAS) a core area within Citi Technology Services (CTI). They will work with Architects and Engineers to ensure full stack Observability Platform is built with strong operational stability and performance standards. They will take a fresh perspective on security and control requirements to ensure we have codified controls and real-time insights driven by industry standards. Job Description Inventory, Developer Enablement and Automation Service is a core area within Citi Technology Services (CTI) that focuses on technology reference data, automation frameworks, developer enablement and o bservability globally across Citi. This is a relatively new organization that will service as the framework that rest of Citi technology can adopt . We are looking to transform to effective use of Agile development practices, and a client first focus. This role is a Site Reliability Engineer directly reporting to Observability Head to ensure all the pillars across Observability (logs, traces, metrics, and event mgt.) are meeting the performance, reliability, and risk requirements. Site Reliability Engineers create a bridge between the application development and production support operations by applying a software engineering mindset to deploy, manage and maintain the system, services, and risk to deliver the business outcomes within service contract requirements. While production operations are focused on immediate tactical recovery of service, the SRE teams are looking to ensure the issues do not reoccur by solving systemic infrastructure or application architecture gaps or issues and implement automation and self-healing to deliver predictability. One of the key areas of focus for this role is to transform technology risk from a non-technical reporting role to an active role automating risk out of the platforms across Observability. This role will have a dotted line to the Site Reliability Engineering Head of IDEAS to participate in forming the SRE role across Citi. Responsibilities:

  • Serve as a technology subject matter expert for internal and external stakeholders and provide direction for all firm mandated controls and compliance initiatives, all projects within the group and in creating a technology domain roadmap
  • Design, create and manage the performance, availability and recovery requirements and standards across Observability Platform
  • Contribution in designing and create the SRE guidelines for Observability Platform and oversee and help to develop their SRE functions
  • Design and create the balance between cost, availability and risk and be able to articulate this to our customers, security, risk and control partners
  • Advise or mentor junior team members
  • Impact the engineering function by influencing decisions through advice, counsel or facilitating services
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency Essential Skills
    • Consistently demonstrates clear and concise written and verbal communication
    • 5+ years scripting in Python/Shell/Bash/Ksh/JavaScript
    • 5+ years of software development experience (Java on Linux)
    • Extensive experience working with Ansible and Ansible Towers
    • Prior experiences with DevOps CI/CD tools like Git and Jenkins
    • Competent with API, web services and microservices development
    • Hands on knowledge and understanding of monitoring and best practices using Grafana, Prometheus, Splunk, Elasticsearch, Spark
    • Measure and optimize system performance setting proper metrics and SLAs/SLOs
    • Hands of knowledge in configuration and tune observability platforms we use to streamline alerting and proactive issue identification.
    • Automate manual activities as new features are added to the platform
    • Mange risk and control activities across the platform and team
    • Strong knowledge of data pipelines and streaming data
    • Good understanding of Policy as Code with OPA (experience in Sentinel, will be a plus)
    • Have a deep knowledge of software development best practices and SDLC
    • Ability to work in a matrix environment and partner with virtual teams
    • Proactive attitude with an ability to work independently, multi-task, and take ownership of various parts of a project or initiative
    • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirementsDesired Skills
      • Experience in analyzing data to drive decisions
      • Ensure the platform functional and integration test suite remains complete and runs quickly.
      • Expand performance and load testing capability to better simulate real production use for our platform and with customers.
      • Perform root cause analysis owning actionable follow-ups.
      • 5+ SRE and operational role in a large enterprise platform
      • 5+ years working with Linux
      • Architecture and design
      • Strong analytical, algorithmic, and problem-solving skills
      • Experience with Docker, Kubernetes, Openshift
      • Experience with Sql/NoSql databases like Oracle, MongoDB
      • Java Spring Framework development experience
      • Experience writing automation tests
      • Ability to quickly learn new concepts and software Education:
        • Bachelor's degree/University degree or equivalent experience
        • Master's degree preferred - Job Family Group: Technology - Job Family:Systems & Engineering Time Type: Full time Primary Location: Irving Texas United States Primary Location Salary Range: $116,880.00 - $175,320.00 Citi is an equal opportunity and affirmative action employer.Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.Citigroup Inc. and its subsidiaries ("Citi") invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi .View the " EEO is the Law " poster. View the EEO is the Law Supplement .View the EEO Policy Statement .View the Pay Transparency Posting - Effective November 1, 2021, Citi requires that all successful applicants for positions located in the United States or Puerto Rico be fully vaccinated against COVID-19 as a condition of employment and provide proof of such vaccination prior to commencement of employment.

Keywords: Citi, Carrollton , SRE Lead - Observability, Other , Carrollton, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Carrollton RSS job feeds