Lead Machine Learning Operations Engineer

45937

New York, NY, US, 10036

Technology
New York
Full-Time
Fully Remote

#WeAreParamount on a mission to unleash the power of content… you in?
We’ve got the brands, we’ve got the stars, we’ve got the power to achieve our mission to entertain the planet – now all we’re missing is… YOU! Becoming a part of Paramount means joining a team of passionate people who not only recognize the power of content but also enjoy a touch of fun and uniqueness. Together, we co-create moments that matter – both for our audiences and our employees – and aim to leave a positive mark on culture.

Lead Machine Learning Operations Engineer
Personalization & Recommendation Systems


Overview
We’re hiring a Lead Machine Learning Operations Engineer to own the operational excellence, observability, reliability, and governance layer around our personalization and recommendation ML systems. 

Our recommendation models retrain and deploy frequently. You will define how we detect model behavior changes, diagnose issues quickly, and prevent bad deployments from reaching customers.


This is a lead-level IC role: you’ll set technical direction and drive adoption across ML Engineering, DevOps, Platform Engineering, Data Engineering, and Product.


Sitting within ML Platform and Infrastructure, you’ll partner closely with ML engineers who own model development. You’re not expected to build infrastructure from scratch, but you’ll define what good looks like, evaluate tooling, and own the day-to-day operational layer.


What You’ll Do

Own ML production reliability strategy

  • Define and lead the operational strategy for production ML systems, including monitoring, traceability, deployment safety, incident response, and post-deployment validation.

  • Set the standards ML teams use to assess model health, performance, and trustworthiness in production.

Own model traceability and governance

  • Ensure every production model has clear lineage (data, features, code, artifacts, validation, deployment history) and drive adoption of model registry and metadata tooling across ML teams.

Build end-to-end ML observability

  • Design and implement monitoring across the full ML signal path: data arrival, feature freshness, distribution stability, candidate generation, ranking behavior, model metrics, serving latency, and SLA performance.

Define production health metrics

  • Partner with ML, data, product, and business stakeholders to define post-deployment metrics covering model quality, system reliability, business guardrails, and degradation indicators.

Detect drift and degradation proactively

  • Detect data drift, feature drift, model behavior changes, and silent failures before they impact customers via thresholding, alerting, anomaly detection, and release-over-release monitoring.

Lead diagnostic tooling and root-cause analysis

  • Build dashboards, logs, and diagnostic workflows that progress quickly from “recommendations look off” to root cause, with context captured across candidates, features, scores, ranking decisions, and downstream outcomes.

Own ML deployment safety

  • Define and operate automated gates that prevent bad models or bad data from being promoted to production. Partner with MLEs to establish validation checks, rollback criteria, canary strategies, shadow testing, and release health reviews.

Lead ML incident response

  • Own incident response practices for ML systems, including rollback playbooks, hotfix strategies, severity definitions, tradeoff frameworks, communications, and post-mortems. Drive closure of systemic gaps after incidents rather than only resolving the immediate issue.

Partner across ML Platform, Data, and ML

  • Partner with DevOps/Platform on infrastructure and observability needs; with Data Engineering on data quality, drift, and freshness; and with ML Engineering to embed operational requirements into development and deployment workflows.

Set standards and mentor others

  • Act as the technical lead for ML operations: establish reusable patterns, playbooks, and standards, and mentor engineers on reliability, observability, and operational rigor.


Basic Qualifications

  • 5+ years of experience in machine learning engineering, ML platform, applied ML, MLOps, data platform, reliability engineering, or a related technical role.

  • Demonstrated experience operating production ML systems, including monitoring, deployment, incident response, model validation, data quality, or reliability ownership.

  • Experience leading technical initiatives across multiple engineering teams, especially where success required influencing architecture, tooling, standards, or adoption.

  • Hands-on experience with model registries, feature stores, ML metadata systems, production monitoring, model deployment pipelines, or ML observability platforms.

  • Solid knowledge of end-to-end ML systems, including training data, features, model artifacts, offline validation, online serving, post-deployment metrics, and business outcome measurement.

  • Ability to reason about ML operational failure modes: stale features, distribution shift, training-serving skew, delayed labels, and offline-online metric gaps.

  • Solid SQL skills and comfort investigating data quality, feature distributions, model outputs, pipeline behavior, and production anomalies.

  • Track record of cross-functional collaboration with Platform, Data, and ML Engineering to deliver production-grade operational capabilities.

  • Solid written and verbal communication skills, including the ability to explain ML system health, risks, incidents, and tradeoffs to both technical and non-technical stakeholders.


Additional Qualifications

  • Experience operating recommendation, personalization, ranking, search, ads, content discovery, or marketplace ML systems at scale.

  • Experience with real-time or near-real-time model serving systems.

  • Experience with feature stores, model registries, metadata stores, experiment tracking, data quality tools, lineage systems, or observability platforms.

  • Experience designing automated validation gates, canary deployments, rollback strategies, shadow deployments, or progressive delivery workflows for ML systems.

  • Experience with A/B testing, experiment guardrails, counterfactual evaluation, or offline-to-online metric alignment.

  • Experience with cloud-native production environments, distributed data pipelines, orchestration frameworks, or streaming systems.

  • Experience leading incident reviews, post-mortems, reliability programs, or operational excellence initiatives.

  • Experience defining standards, playbooks, or governance frameworks for production ML systems.


What Success Looks Like

  • Within the first 90 days, you will have assessed the current production ML operational landscape, identified the highest-risk gaps, and established a prioritized roadmap for observability, traceability, deployment safety, and incident response.

  • Within six months, our production recommendation systems will have clearer model lineage, stronger monitoring coverage, better diagnostic workflows, and more reliable deployment gates.

  • Within one year, ML operations will be a repeatable, trusted function: teams will know what is running, why it was promoted, how it is performing, when something is wrong, and how to recover quickly.

 

Leveling Signal
The right candidate defines strategy, leads cross-functional execution, and sets engineering standards while staying hands-on enough to investigate production issues directly.

 

Paramount Streaming, a division within Paramount Global, is the home to the company's direct-to-consumer services spanning free and paid in the form of Pluto TV and Paramount+. Pluto TV is the global leader in free ad-supported TV, delivering more than 1,400 global channels and an extensive library of streaming content, including live and original channels. Paramount+, digital subscription video-on-demand and live streaming service, combines live sports, breaking news, and A Mountain of Entertainment™. Paramount+ features an expansive library of original series, hit shows and popular movies across every genre from world-renowned brands and production studios, including SHOWTIME®.

 

 

ADDITIONAL INFORMATION

 

Hiring Salary Range: $157,000.00 - 235,000.00. 

 

The hiring salary range for this position applies to New York, California, Colorado, Washington state, and most other geographies. Starting pay for the successful applicant depends on a variety of job-related factors, including but not limited to geographic location, market demands, experience, training, and education.  The benefits available for this position include medical, dental, vision, 401(k) plan, life insurance coverage, disability benefits, tuition assistance program and PTO or, if applicable,  as otherwise dictated by the appropriate Collective Bargaining Agreement. This position is bonus eligible.  

 

What We Offer:
  • Attractive compensation and comprehensive benefits packages. Check out our full list of benefits here: https://www.paramount.com/careers/benefits
  • Generous paid time off.
  • An exciting and fulfilling opportunity to be part of one of Paramount’s most dynamic teams.
  • Opportunities for both on-site and virtual engagement events.
  • Unique opportunities to make meaningful connections and build a vibrant community, both inside and outside the workplace.
  • Explore life at Paramount: https://www.paramount.com/careers/life-at-paramount

 

Paramount is an equal opportunity employer (EOE) including disability/vet.

 

At Paramount, the spirit of inclusion feeds into everything that we do, on-screen and off. From the programming and movies we create to employee benefits/programs and social impact outreach initiatives, we believe that opportunity, access, resources and rewards should be available to and for the benefit of all. Paramount is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ethnicity, ancestry, religion, creed, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, gender expression, and Veteran status.

 

If you are a qualified individual with a disability or a disabled veteran, you may request a reasonable accommodation if you are unable or limited in your ability to use or access https://www.paramount.com/careers as a result of your disability. You can request reasonable accommodations by calling 212.846.5500 or by sending an email to paramountaccommodations@paramount.com. Only messages left for this purpose will be returned.

 


Nearest Major Market: Manhattan
Nearest Secondary Market: New York City