SRE Lead - Observability
Company: Citi
Location: Carrollton
Posted on: January 19, 2023
Job Description:
Summary : This role will drive the development of SRE function
for Observability platform, while ensuring the alignment to Citi's
Inventory, Developer Enablement and Automation Service (IDEAS) a
core area within Citi Technology Services (CTI). They will work
with Architects and Engineers to ensure full stack Observability
Platform is built with strong operational stability and performance
standards. They will take a fresh perspective on security and
control requirements to ensure we have codified controls and
real-time insights driven by industry standards. Job Description
Inventory, Developer Enablement and Automation Service is a core
area within Citi Technology Services (CTI) that focuses on
technology reference data, automation frameworks, developer
enablement and o bservability globally across Citi. This is a
relatively new organization that will service as the framework that
rest of Citi technology can adopt . We are looking to transform to
effective use of Agile development practices, and a client first
focus. This role is a Site Reliability Engineer directly reporting
to Observability Head to ensure all the pillars across
Observability (logs, traces, metrics, and event mgt.) are meeting
the performance, reliability, and risk requirements. Site
Reliability Engineers create a bridge between the application
development and production support operations by applying a
software engineering mindset to deploy, manage and maintain the
system, services, and risk to deliver the business outcomes within
service contract requirements. While production operations are
focused on immediate tactical recovery of service, the SRE teams
are looking to ensure the issues do not reoccur by solving systemic
infrastructure or application architecture gaps or issues and
implement automation and self-healing to deliver predictability.
One of the key areas of focus for this role is to transform
technology risk from a non-technical reporting role to an active
role automating risk out of the platforms across Observability.
This role will have a dotted line to the Site Reliability
Engineering Head of IDEAS to participate in forming the SRE role
across Citi. Responsibilities:
- Serve as a technology subject matter expert for internal and
external stakeholders and provide direction for all firm mandated
controls and compliance initiatives, all projects within the group
and in creating a technology domain roadmap
- Design, create and manage the performance, availability and
recovery requirements and standards across Observability
Platform
- Contribution in designing and create the SRE guidelines for
Observability Platform and oversee and help to develop their SRE
functions
- Design and create the balance between cost, availability and
risk and be able to articulate this to our customers, security,
risk and control partners
- Advise or mentor junior team members
- Impact the engineering function by influencing decisions
through advice, counsel or facilitating services
- Appropriately assess risk when business decisions are made,
demonstrating particular consideration for the firm's reputation
and safeguarding Citigroup, its clients and assets, by driving
compliance with applicable laws, rules and regulations, adhering to
Policy, applying sound ethical judgment regarding personal
behavior, conduct and business practices, and escalating, managing
and reporting control issues with transparency Essential Skills
- Consistently demonstrates clear and concise written and verbal
communication
- 5+ years scripting in Python/Shell/Bash/Ksh/JavaScript
- 5+ years of software development experience (Java on
Linux)
- Extensive experience working with Ansible and Ansible
Towers
- Prior experiences with DevOps CI/CD tools like Git and
Jenkins
- Competent with API, web services and microservices
development
- Hands on knowledge and understanding of monitoring and best
practices using Grafana, Prometheus, Splunk, Elasticsearch,
Spark
- Measure and optimize system performance setting proper metrics
and SLAs/SLOs
- Hands of knowledge in configuration and tune observability
platforms we use to streamline alerting and proactive issue
identification.
- Automate manual activities as new features are added to the
platform
- Mange risk and control activities across the platform and
team
- Strong knowledge of data pipelines and streaming data
- Good understanding of Policy as Code with OPA (experience in
Sentinel, will be a plus)
- Have a deep knowledge of software development best practices
and SDLC
- Ability to work in a matrix environment and partner with
virtual teams
- Proactive attitude with an ability to work independently,
multi-task, and take ownership of various parts of a project or
initiative
- Ability to work under pressure and manage to tight deadlines or
unexpected changes in expectations or requirementsDesired Skills
- Experience in analyzing data to drive decisions
- Ensure the platform functional and integration test suite
remains complete and runs quickly.
- Expand performance and load testing capability to better
simulate real production use for our platform and with
customers.
- Perform root cause analysis owning actionable follow-ups.
- 5+ SRE and operational role in a large enterprise platform
- 5+ years working with Linux
- Architecture and design
- Strong analytical, algorithmic, and problem-solving skills
- Experience with Docker, Kubernetes, Openshift
- Experience with Sql/NoSql databases like Oracle, MongoDB
- Java Spring Framework development experience
- Experience writing automation tests
- Ability to quickly learn new concepts and software Education:
- Bachelor's degree/University degree or equivalent
experience
- Master's degree preferred - Job Family Group: Technology - Job
Family:Systems & Engineering Time Type: Full time Primary Location:
Irving Texas United States Primary Location Salary Range:
$116,880.00 - $175,320.00 Citi is an equal opportunity and
affirmative action employer.Qualified applicants will receive
consideration without regard to their race, color, religion, sex,
sexual orientation, gender identity, national origin, disability,
or status as a protected veteran.Citigroup Inc. and its
subsidiaries ("Citi") invite all qualified interested applicants to
apply for career opportunities. If you are a person with a
disability and need a reasonable accommodation to use our search
tools and/or apply for a career opportunity review Accessibility at
Citi .View the " EEO is the Law " poster. View the EEO is the Law
Supplement .View the EEO Policy Statement .View the Pay
Transparency Posting - Effective November 1, 2021, Citi requires
that all successful applicants for positions located in the United
States or Puerto Rico be fully vaccinated against COVID-19 as a
condition of employment and provide proof of such vaccination prior
to commencement of employment.
Keywords: Citi, Carrollton , SRE Lead - Observability, Other , Carrollton, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...