About the role
You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.
Key responsibilities
- Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
- Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
- Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
- Profile and tune low level CPU and memory performance
Requirements
- 5+ years experience building distributed compute and orchestration platforms in Python or Rust
- Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
- Deep understanding of computational complexity and memory allocation
- Track record of designing systems that scale under real production load
- Experience building and using observability to drive performance and reliability decisions
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
- Experience with AI/ML inference or training infrastructure
- Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
- Background in building multi-tenant compute platforms
- Understanding of networking fundamentals and performance characteristics
- Familiarity with GPU workload characteristics and scheduling constraints
Location
-
Turkey
What we offer at fal
- Interesting and challenging work
- A lot of learning and growth opportunities
- Regular team events and offsites
Find similar jobs
Explore opportunities with similar job descriptions at other companies.
$ similar5 results
10h
Black Forest Labs
Member of Technical Staff - ML Infrastructure Engineer
Freiburg (Germany), San Francisco (USA)
Black Forest Labs10h
Member of Technical Staff - ML Infrastructure Engineer
Freiburg (Germany), San Francisco (USA)
15h
Anthropic
Staff Software Engineer, Kubernetes Platform
San Francisco, CA | New York City, NY | Seattle, WA
Anthropic15h
Staff Software Engineer, Kubernetes Platform
San Francisco, CA | New York City, NY | Seattle, WA
1d
Anthropic
Staff+ Software Engineer, Developer Productivity
San Francisco, CA | New York City, NY | Seattle, WA
Anthropic1d
Staff+ Software Engineer, Developer Productivity
San Francisco, CA | New York City, NY | Seattle, WA
2d
Crusoe
Staff Infrastructure Engineer
San Francisco, CA - US
Crusoe2d
Staff Infrastructure Engineer
San Francisco, CA - US
4d
Graphcore
Senior Systems Engineer – Performance & Reliability
Bristol, UK
Graphcore4d
Senior Systems Engineer – Performance & Reliability
Bristol, UK