Evaluation Scenario Writer - Paris

Réservé aux membres inscrits Paris, France

il y a 4 jours

Default job background

Job summary

Please submit your CV in English and indicate your level of English proficiency.


Mindrift connects specialists with project-based AI opportunities for leading tech companies,
focused on testing, evaluating and improving AI systems.
This opportunity involves creating structured test cases that simulate complex human workflows,
defining gold-standard behavior and scoring logic to evaluate agent actions,
analyzing agent logs failure modes decision paths working with code repositories
and test frameworks to validate scenarios iterating on prompts instructions
and test cases to improve clarity difficulty ensuring that scenarios are production-ready easy-to-run reusable.
  • Create structured test cases.

Lorem ipsum dolor sit amet
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.

Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.

Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Obtenez un accès complet

Accédez à tous les postes de haut niveau et obtenez le travail de vos rêves.



Emplois similaires

  • Réservé aux membres inscrits Paris À temps partiel

    This opportunity allows you to create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You'll work on designing realistic and structured evaluation scenarios for LLM-based agents. · Create structured test cases tha ...

  • Réservé aux membres inscrits Paris

    This opportunity involves creating structured test cases for AI systems, · defining gold-standard behavior, · analyzing agent logs, · and working with code repositories.Paid contributions up to $50/hour*, fixed project rate or individual rates depending on project needs, ...

  • Réservé aux membres inscrits Paris

    We are looking for an Evaluation Scenario Writer to create structured test cases and define gold-standard behavior for AI systems. · 3+ of software development experience with strong Python focus · Experience with Git and code repositories · Comfortable with structured formats ...

  • Réservé aux membres inscrits Paris À temps partiel

    Mindrift connects specialists with project-based AI opportunities for leading tech companies. · Create structured test cases that simulate complex human workflows · Define gold-standard behavior and scoring logic to evaluate agent actions · ...

  • Réservé aux membres inscrits Paris

    We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. · At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. · Create structured test c ...

  • Réservé aux membres inscrits Paris

    We're on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures policy logic and agent evaluation frameworks. · Reviewing evaluation tasks and scenarios for logic completeness and realism · Identifying inconsistenci ...

  • Réservé aux membres inscrits Paris

    We are looking for experts to develop MCP-compatible evaluation servers and internal tools for running and evaluating agent behavior.Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. · ...

  • Réservé aux membres inscrits Paris ()

    We are looking for a Senior Product Manager to be responsible for designing & building the product, driving its adoption and its success. · ...

  • Réservé aux membres inscrits Paris

    Mirakl has launched a unique innovative solution on the market, · which aims to bring together all the players in the marketplace ecosystem: · Mirakl Connect.As part of Mirakl Connect, · we want to simplify the lives of · the Sellers in their daily activities, · by offering them ...

  • Réservé aux membres inscrits Courbevoie, Île-de-

    The Maintenance Workstream Leader defines strategy and translates it into execution and delivers all tender-related outputs on time while ensuring cross-workstream alignment. · * Minimum of 10 years' proven experience in rolling stock and/or infrastructure maintenance management, ...

  • Réservé aux membres inscrits Courbevoie

    The Senior Bid Workstream Leader defines strategy and translates it into execution and delivers all tender-related outputs on time while ensuring cross-workstream alignment. The role combines strategic design operational coordination data-driven decision-making. · Set the Mainten ...

  • Stage Economic Advisory

    il y a 1 semaine

    Réservé aux membres inscrits Courbevoie

    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet nulla auctor, vestibulum magna sed, convallis ex. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. · Analyse économique des différents secteurs énergétiques. · Développement ...

  • Réservé aux membres inscrits Fresnes

    Serma Safety & Security est l'entité spécialisée dans la cyber sécurité du groupe SERMA. Nous recherchons un(e) auditeur / pentesteur spécialisé(e) dans les tests d'intrusion d'applications web et mobile.Réaliser des tests d'intrusion ciblés sur applications web (front/back), API ...

  • Réservé aux membres inscrits Fresnes

    Serma Safety & Security busca un(a) auditeur/pentesteur para realizar tests d'intrusion en aplicaciones web y mobile. La misión es identificar, explotar y documentar vulnerabilidades en aplicaciones, redes y postes de trabajo. · Réaliser des tests d'intrusion ciblés sur applicati ...

  • Réservé aux membres inscrits Fresnes

    We are looking for a Product Manager to be responsible for designing & building the product, driving its adoption and its success. This job is based in France, · Simplify the lives of Sellers in their daily activities · ...