Linux Admin
The Judge Group
Irving, TX, USA
Job Description: L3 Linux Administrator
Location: Irving, TX (Onsite 3 Days/Week)
Employment: Fulltime
Experience: 8–12 Years
Overview
We are seeking a highly skilled L3 Linux Administrator to support and enhance large-scale enterprise Linux environments. This role requires deep systems expertise, strong experience with Veritas Clustering (VCS), SAN/NAS storage, and hands-on collaboration with data center teams. The ideal candidate is a proactive problem-solver who can independently drive incident resolution, ensure platform stability, and improve BAU operations through automation and best practices.
Key Responsibilities
Linux Administration (L3)
- Administer and troubleshoot RHEL, Oracle Linux, CentOS, and SUSE in production environments.
- Diagnose complex OS issues: kernel panics, boot/GRUB failures, filesystem corruption, resource contention, SELinux/AppArmor denials.
- Perform OS patching and upgrades at scale; manage package repositories and kernel updates with rollback strategies.
- Implement and audit security hardening (firewalld/iptables, CIS benchmarks, PAM, SSH, sudo, auditd).
- Manage system services (systemd), cron/timers, user/group administration, sudoers, and system-wide configurations.
Veritas Cluster Server (VCS / InfoScale)
- Install, configure, and administer VCS for HA/DR across multi-node clusters.
- Build and maintain service groups, resource dependencies, LLT/GAB, I/O fencing, and quorum mechanisms.
- Integrate VxVM/VxFS for disk groups, volumes, filesystems, and application failover.
- Conduct DR drills, cluster failover tests, and analyze cluster-related incidents.
Storage Administration: SAN & NAS
- Collaborate with storage teams for LUN provisioning, zoning, and masking; validate multipathing (DM Multipath / PowerPath).
- Build and maintain filesystems (ext4/xfs/VxFS), fstab configurations, and autofs.
- Administer NFS/CIFS/SMB mounts, permissions, quotas, and troubleshoot locking issues.
- Diagnose latency, path failures, and I/O bottlenecks using OS-level and HBA/array telemetry.
Data Center & Hardware Coordination
- Work with data center teams for racking/stacking, cabling, console access, and onsite hardware triage.
- Troubleshoot hardware faults: CPU, memory, NIC/HBA, disks/RAID/SSD, backplane, PSU, fans.
- Manage OEM support tickets (Dell/HP/IBM/Cisco), RMAs, and post-replacement validation.
BAU Operations & Incident Management
- Act as L3 escalation for P1/P2 incidents; lead technical recovery and bridge calls.
- Perform comprehensive log analysis: journald, syslog, dmesg, audit logs, application logs.
- Develop and maintain SOPs/runbooks and knowledge base articles; drive RCA and corrective actions.
- Participate in on-call rotation and scheduled maintenance windows (change management, CAB, MOPs).
Networking (Host-Level)
- Troubleshoot TCP/IP, routing, bonding/teaming, MTU issues, host firewalls, DNS/DHCP, NTP/Chrony.
- Coordinate with network teams on L2/L3 connectivity, load balancers, and firewall rule updates.
Required Skills & Experience
- 8–12+ years of enterprise Linux administration with proven L3-level ownership.
- Strong hands-on expertise with Veritas Cluster Server (VCS), VxVM, VxFS, and HA/DR architectures.
- Solid SAN/NAS experience: LUNs, zoning, multipathing, NFS/SMB.
- Advanced troubleshooting skills in OS, storage, performance, and clustering.
- Scripting skills in Bash (Python preferred) and familiarity with Ansible.
- Knowledge of VMware/KVM and basic cloud concepts (AWS/Azure).
- Strong documentation habits for SOPs, MOPs, RCAs, and ITIL-aligned processes.
- Ability to work independently, lead incidents, and collaborate with cross-functional teams including DC, network, and storage units.