Lead Site Reliability Engineer
Company: Glean
Location: Palo Alto
Posted on: April 1, 2026
|
|
|
Job Description:
About Glean: Glean is the Work AI platform that helps everyone
work smarter with AI. What began as the industry’s most advanced
enterprise search has evolved into a full-scale Work AI ecosystem,
powering intelligent Search, an AI Assistant, and scalable AI
agents on one secure, open platform. With over 100 enterprise SaaS
connectors, flexible LLM choice, and robust APIs, Glean gives
organizations the infrastructure to govern, scale, and customize AI
across their entire business - without vendor lock-in or costly
implementation cycles. At its core, Glean is redefining how
enterprises find, use, and act on knowledge. Its Enterprise Graph
and Personal Knowledge Graph map the relationships between people,
content, and activity, delivering deeply personalized,
context-aware responses for every employee. This foundation powers
Glean’s agentic capabilities - AI agents that automate real work
across teams by accessing the industry’s broadest range of data:
enterprise and world, structured and unstructured, historical and
real-time. The result: measurable business impact through faster
onboarding, hours of productivity gained each week, and smarter,
safer decisions at every level. Recognized by Fast Company as one
of the World’s Most Innovative Companies (Top 10, 2025), by CNBC’s
Disruptor 50, Bloomberg’s AI Startups to Watch (2026), Forbes AI
50, and Gartner’s Tech Innovators in Agentic AI, Glean continues to
accelerate its global impact. With customers across 50 industries
and 1,000 employees in more than 25 countries, we’re helping the
world’s largest organizations make every employee AI-fluent, and
turning the superintelligent enterprise from concept into reality.
If you’re excited to shape how the world works, you’ll help build
systems used daily across Microsoft Teams, Zoom, ServiceNow,
Zendesk, GitHub, and many more - deeply embedded where people get
things done. You’ll ship agentic capabilities on an open,
extensible stack, with the craft and care required for enterprise
trust, as we bring Work AI to every employee, in every company.
About the Role: Glean is seeking a Site Reliability Engineering
Lead to foster a culture of engineering excellence, drive technical
strategy, and develop a high-performing, collaborative team. Your
role is pivotal in ensuring our services meet stringent Service
Level Objectives (SLOs) and in building resilient, automated
production environments in the cloud. You'll lead a team and be
responsible for products globally, providing technical leadership
to key projects and empowering your team to do the same. Much of
our software development focuses on building infrastructure to
scale our operations in a hybrid cloud environment and eliminating
work through automation. On the SRE team, you’ll have the
opportunity to manage the complex challenges of scale and fast
growth which are unique to Glean, while using your expertise in
coding, algorithms, problem-solving, and SRE practices. We keep
Glean applications up and running, ensuring our customers have the
best and most reliable experience possible. You are: Technical
Leadership and Mentorship : Play a key role in driving technical
excellence and fostering a culture of reliability across
engineering teams. You will lead by example, setting best practices
for incident management, performance optimization, and automation.
Influence best practices, drive cross-team collaborations, and
contribute to the execution of key objectives in alignment with
engineering leadership and cross-functional partners. Establish
strong technical credibility, shaping architectural decisions and
ensuring the delivery of high-quality, reliable systems. Ensure
High Availability: Implement and maintain resilient cloud
architectures, monitor system performance, and proactively identify
and resolve potential bottlenecks or points of failure. Incident
Management: Participate in primary oncall rotation; cultivate
technical curiosity and growth mindset, and a blameless postmortem
culture within the team. Continuously optimize the on-call process
for sustainability and efficiency. Automation and Tooling: Develop
and maintain automation scripts, tools, and processes to streamline
system deployment, monitoring, and management tasks. Your
contributions will be vital in efficiently scaling cloud
operations. Performance Optimization: Optimize cloud infrastructure
and applications for performance, scalability, and
cost-effectiveness. Security and Compliance: Collaborate with
security engineers to implement best practices and ensure
compliance with security standards and policies. Monitoring and
Alerting: Design and configure advanced monitoring systems to gain
insights into system behavior, set up alerts, and respond
proactively to potential issues. Create and maintain comprehensive
dashboards and playbooks for production on-call. Software
Development Consultation: Engage actively in the entire software
development lifecycle. Participate in system design reviews and
provide valuable SRE insights during launch reviews, influencing
and enhancing system architecture. About you: Bachelor’s degree in
Computer Science, a related field, or equivalent practical
experience. 8 years of experience in a senior-level role within
Site Reliability Engineering or similar role, particularly in
managing cloud-based services and infrastructure. 5 years of
experience with software development in one or more programming
languages. 3 years of experience managing people or teams, leading
projects, and designing, analyzing, and troubleshooting distributed
systems running in Cloud. Strong knowledge of cloud platforms such
as Google Cloud Platform, AWS, or Azure. Practical experience with
containerization technologies, including Docker and Kubernetes.
Familiarity with infrastructure as code tools like Terraform is
essential. Solid understanding of networking, security principles,
and best SRE and security practices. Proficiency in using
monitoring and alerting tools to detect and respond to potential
issues effectively Location: This role is hybrid (4 days a week in
one of our Palo Alto Office) Compensation & Benefits: The standard
base salary range for this position is $200,000 - $260,000
annually. Compensation offered will be determined by factors such
as location, level, job-related knowledge, skills, and experience.
Certain roles may be eligible for variable compensation, equity,
and benefits. We offer a comprehensive benefits package including
competitive compensation, Medical, Vision, and Dental coverage,
generous time-off policy, and the opportunity to contribute to your
401k plan to support your long-term goals. When you join, you'll
receive a home office improvement stipend, as well as an annual
education and wellness stipends to support your growth and
wellbeing. We foster a vibrant company culture through regular
events, and provide healthy lunches daily to keep you fueled and
focused. We are a diverse bunch of people and we want to continue
to attract and retain a diverse range of people into our
organization. We're committed to an inclusive and diverse company.
We do not discriminate based on gender, ethnicity, sexual
orientation, religion, civil or family status, age, disability, or
race. LI-HYBRID AI-First Mindset at Glean: At Glean, AI fluency is
core to how we work and we're committed to ensuring every new hire
feels confident integrating AI into their everyday work. As part of
the interview process, you'll complete a brief AI-focused exercise
or discussion so we can understand how you think about, design, and
use AI to drive impact in your role. Feel free to reference any
tools, platforms, or workflows you use today — prior Glean
experience isn't required.
Keywords: Glean, Mountain View , Lead Site Reliability Engineer, IT / Software / Systems , Palo Alto, California