Applied Methods
~The MetaEngineeringInfrastructure & Platform Engineer

Infrastructure & Platform Engineer

Engineers in this role architect and operate the systems that power AI research and product development at scale. They design distributed infrastructure for training, serving, and orchestrating AI workloads across GPU clusters, build internal platforms that accelerate developer velocity, and optimize the critical path from code to production. This role bridges deep systems engineering expertise—in areas like Kubernetes, build systems, data pipelines, and performance tuning—with the unique demands of AI workloads, combining hands-on infrastructure work with close collaboration with researchers and product teams to eliminate bottlenecks that slow down innovation.

$ titles --canonical
Senior Software Engineer, InfrastructureSoftware Engineer, PlatformSoftware Engineer, AI Platform
Open Jobs531
Companies Hiring81
$02

Skills

What companies are looking for in this role.

$ skills --core

Designing and deploying cloud-based machine learning training and inference clusters at scale

95%

Designing and operating Kubernetes clusters, including schedulers, control planes, and custom controllers for specialized workloads

92%

Implementing Infrastructure as Code for reproducible resource provisioning and configuration management

90%

Building and maintaining CI/CD pipelines for machine learning workflows and distributed systems

88%

Optimizing system performance including GPU utilization, latency, and throughput at scale

88%

Diagnosing and resolving distributed systems issues including performance bottlenecks and hardware failures

87%

Managing and optimizing network-based distributed file systems and blob storage solutions for machine learning workloads

85%

Designing and building tools for monitoring, observability, and operational visibility across infrastructure

82%

Provisioning bare metal servers and managing hardware lifecycle across data centers and edge environments

82%

Developing custom autoscaling solutions for machine learning and compute-intensive workloads

80%

Implementing security best practices across infrastructure stacks without impeding research velocity

75%
$ skills --emerging

Building abstractions and developer-friendly tools that accelerate research iteration and reduce infrastructure friction

78%

Architecting multi-region and multi-cloud infrastructure for distributed training and inference

70%

Designing systems for measuring and evaluating large-scale machine learning workloads to determine production readiness

68%

Integrating artificial intelligence capabilities into developer workflows and productivity tools

62%
$ skills --soft

Collaborating with research and product teams to translate workload requirements into infrastructure solutions

85%

Owning technical strategy, roadmaps, and long-term architectural decisions for infrastructure systems

82%

Taking ownership of production systems and participating in incident diagnosis and resolution

80%

Communicating complex technical concepts across teams with different expertise and priorities

78%

Mentoring engineers and establishing best practices for building and operating large-scale systems

72%
$03

Technology

The tools and technologies that define this role.

$ tech --language
Pythonhigh
Gomoderate
$ tech --framework
CUDAmoderate
$ tech --platform
Kubernetesvery high
Linuxvery high
AWShigh
Slurmhigh
GCPmoderate
NVIDIAmoderate
Azurelow
GitLablow
$ tech --tool
Dockerhigh
BMCmoderate
etcdmoderate
Gitmoderate
Grafanamoderate
Helmmoderate
IPMImoderate
Prometheusmoderate
S3moderate
Terraformmoderate
Ansiblelow
Datadoglow
Jenkinslow
MAASlow
$ tech --concept
Distributed systemsvery high
Infrastructure as Codevery high
Cloud-nativehigh
Data centerhigh
High-performance computinghigh
Networkinghigh
Schedulerhigh
Storage systemshigh
API servermoderate
Edge computingmoderate
Load balancingmoderate
Multi-cloudmoderate
PXEmoderate
Service discoverymoderate
$04

Open Jobs

531 open Infrastructure & Platform Engineer jobs across 81 companies.

Crusoe21h
Senior Backend Tooling Software Engineer
San Francisco, CA - US·Engineering
Astronomer1d
Software Engineer - Data Plane Management
New York City·Engineering
Black Forest Labs1d
Member of Technical Staff - ML Infrastructure Engineer
Freiburg (Germany), San Francisco (USA)·Engineering
Anthropic2d
Staff Software Engineer, Kubernetes Platform
San Francisco, CA | New York City, NY | Seattle, WA·Engineering
Anthropic3d
Staff+ Software Engineer, Developer Productivity
San Francisco, CA | New York City, NY | Seattle, WA·Engineering
Crusoe3d
Staff Infrastructure Engineer
San Francisco, CA - US·Engineering
Graphcore6d
Senior Systems Engineer – Performance & Reliability
Bristol, UK·Engineering
OpenAI6d
Performance & Systems Engineer, Codex
San Francisco·Engineering
Harvey6d
Staff Software Engineer, Developer Experience (DevEx)
San Francisco·Engineering
Baseten6d
OS / K8s Systems Engineer
San Francisco·Engineering
MongoDB6d
Senior Software Engineer, Atlas Enablement
United States·Engineering
Graphcore6d
Observability, Infrastructure Engineer
Gdańsk, Pomeranian Voivodeship, Poland·Engineering
World Labs6d
Research Platform Engineer
San Francisco·Engineering
Graphcore1w
Observability, Telemetry Engineer
Gdańsk, Pomeranian Voivodeship, Poland·Engineering
Graphcore1w
Senior Cloud Network Engineer
Bristol, UK·Engineering
Graphcore1w
Senior Cloud Engineer (K8S)
Bristol, UK·Engineering
Palantir1w
Edge Infrastructure Engineer
Warsaw, Poland·Engineering
Palantir1w
Edge Infrastructure Engineer
Paris, France·Engineering
Fireworks AI1w
Member of Technical Staff, Cloud Infrastructure
New York, NY; San Mateo, CA·Engineering
Databricks1w
Staff Software Engineer - AI Research Infrastructure
New York City, New York; San Francisco, California·Engineering