We are seeking an experienced HPC Systems Administrator to join our Houston Systems team. You will support a hybrid High-Performance Computing (HPC) environment that spans both on-premise and cloud platforms (GCP, Azure). Your focus will be on managing large-scale Linux systems, storage, containerized environments, and supporting internal production and development teams across Geosolutions, DMS, and GTC.
Support and maintain a hybrid HPC infrastructure (thousands of compute nodes, high-speed storage systems, robotic tape libraries).
Install, configure, and manage Linux-based OS (RHEL, CentOS, Rocky Linux).
Manage infrastructure using IBM xCAT, Ansible, and Terraform.
Perform hardware setup and diagnostics for servers, GPU nodes, SSDs, and tape systems.
Maintain and troubleshoot storage systems (HPE ClusterStor, NetApp, Isilon, Pure).
Configure networks involving Ethernet, InfiniBand, and SAN technologies.
Script using Bash, C Shell, Perl, Python, Ruby for automation and monitoring.
Administer PostgreSQL databases and integrate with HPC workflows.
Provision and manage cloud infrastructure on GCP and Azure.
Manage backup/recovery tools (IBM Spectrum, Dell Networker).
Apply Linux security best practices and manage endpoint protection tools.
Debug system-level issues and contribute to HPC performance optimization.
Participate in an on-call rotation and support global datacenter operations.
Minimum 5 years in large-scale HPC environments.
Deep Linux administration experience (preferably RHEL-based).
Hands-on with xCAT, containerization (Docker/Singularity), and cloud (GCP, Azure).
Strong scripting skills in Bash, Python, Perl, etc.
Working knowledge of data center network architecture.
Experience with configuration management (Ansible, Terraform).
Strong communication and documentation skills.
Willingness to work flexible hours including weekend outages and on-call shifts.
Experience with GPU workloads and tuning.
Familiarity with MRTG, InTouch support systems, or similar monitoring tools.
Background in Geoscience applications or scientific computing environments.
Self-driven and proactive.
Strong collaboration skills in team-based environments.
Comfortable communicating with cross-functional stakeholders and vendors.
Passion for continuous learning and mentoring others.
#LI-Hybrid #LI-DNI