Job Description
CoreWeave is a cloud platform that empowers AI innovation. Founded in 2017, it provides infrastructure, tools, and expertise to improve performance.
What You’ll Do
- Develop, optimize, and maintain network observability platforms. Use Python and Golang to create collectors, exporters, and dashboards that provide deep visibility into network health and performance.
- Collaborate with Network Engineering and Platform teams to ingest and unify logs, metrics, and events from various platforms (Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux, etc.) into a single observability pipeline.
- Design and implement scalable telemetry solutions using protocols like gNMI, SNMP, and streaming analytics. Ensure advanced alerting and anomaly detection with Prometheus, Grafana, Alertmanager.
- Work closely with network developers, site reliability engineers, and security teams to integrate observability solutions across the broader infrastructu...
Ready to Apply?
Take the next step in your AI career. Submit your application to CoreWeave today.
Submit Application