MLOps Platform Engineer (SageMaker)
Software Engineering
Plano, TX, USA
Job Title: MLOps Platform Engineer (SageMaker)
Location: Plano, TX
Contract
Must Haves:
- 10-15 years of software engineering experience focused on cloud infrastructure or ML platform operations.
- 5+ years hands-on with AWS, including deep expertise in Amazon SageMaker (Studio Classic Studio, Pipelines, Model Registry, Endpoints, Feature Store)
- 3+ years building and operating production MLOps pipelines — training, versioning, deployment, monitoring, rollback
- Experience with SageMaker Unified Studio or Studio Classic — domain/project setup, blueprints, multi-tenant configuration
- MLflow or equivalent experiment tracking
- SageMaker Pipelines or similar workflow orchestration (Airflow, Step Functions)
- Unified Studio is preferred to have but Classic is must have.
What you’ll be doing
- Set up SageMaker Unified Studio platform — domain configuration, project provisioning, persona-based roles, and multi-environment (Dev, Prod-UAT, Prod) promotion workflows
- Build MLOps pipelines using SageMaker Pipelines — data extraction from Snowflake, preprocessing, training, evaluation, and model registration
- Manage SageMaker Model Registry — cross-account model promotion, versioning, immutability, and lineage tracking
- Configure MLflow experiment tracking — auto-logging of parameters, metrics, and artifacts
- Set up identity and access management — Okta SSO, SailPoint entitlements, persona-based execution roles, service roles for pipelines
- Build model serving — real-time SageMaker endpoints and batch prediction workflows
- Set up model monitoring — data drift, model drift, performance degradation detection
- Configure data catalog — searchable datasets, access-level visibility, access-request workflows, lineage
- Own platform operations — observability (CloudWatch, Datadog), logging, custom images, instance availability
Requirements:
Qualifications/ What you bring (Must Haves) – Highlight Top 3-5 skills
- 10-15 years of software engineering experience focused on cloud infrastructure or ML platform operations
- 5+ years hands-on with AWS, including deep expertise in Amazon SageMaker (Studio, Pipelines, Model Registry, Endpoints, Feature Store)
- 3+ years building and operating production MLOps pipelines — training, versioning, deployment, monitoring, rollback
- Experience with SageMaker Unified Studio or Studio Classic — domain/project setup, blueprints, multi-tenant configuration
- Infrastructure-as-Code with Terraform, CDK, or CloudFormation
- IAM design for ML platforms — execution roles, service roles, cross-account access, Lake Formation, SSO/SAML
- MLflow or equivalent experiment tracking
- SageMaker Pipelines or similar workflow orchestration (Airflow, Step Functions)
- Model serving — real-time endpoints, batch transform, auto-scaling, endpoint monitoring
- Snowflake as a data source for ML pipelines
- Kubernetes (EKS) and container orchestration
- Networking and security — VPC, security groups, private endpoints, cross-account connectivity
Added bonus if you have (Preferred):
- SageMaker Unified Studio domain provisioning, custom blueprints, project standardization
- SageMaker Feature Store for online/offline feature management
- SageMaker Model Monitor — data quality checks, bias detection, drift detection
- AWS Machine Learning Specialty certification