Magi AI Content Quality Evaluator – En-US (Freelance, Temporary)
Posted 2025-08-15
Remote, USA
Full Time
Immediate Start
Job Description:
We are hiring 20 Freelance Evaluators to support a high-priority Magi AI Evaluation Pilot Project for Uber’s AI team. This role involves reviewing and rating AI-generated responses for quality, clarity, accuracy, and factual correctness across varying levels of complexity.
All candidates must meet Expertise Level 3 (L3) requirements—the highest evaluator standard—to qualify for this project. Your evaluations will directly shape how AI systems learn, improve, and interact with users.
What is Expertise Level 3 (L3)?
This role requires Expertise Level 3, meaning the evaluator must have:
Advanced analytical and research skills
The ability to handle highly complex content, such as technical or domain-specific materials (e.g., science, legal, medical, or data-heavy text)
Experience evaluating multi-modal data (charts, PDFs, screenshots, etc.)
Capability to conduct line-by-line fact-checking and justify responses with critical reasoning
Comfort working with structured rubrics and independently verifying factual accuracy
Compensation:
Pay Rate: $25 – $31 per hour (USD)
The assessment is paid if the evaluator passes and completes at least 4 hours of work on the project
Key Responsibilities:
Evaluate AI-generated responses across varying task types, using structured guidelines
Identify tone, style, factual, and product-specific issues in output
Perform detailed comparisons, fact-checks, and accuracy reviews
Submit ratings using tools such as dropdowns, screencasts, and feedback forms
Meet tight deadlines (all assigned work must be completed within 4 business days)
Task Complexity & Time Commitment:
Tasks will range from simple to complex, but all evaluators must qualify at the L3 expertise level
Average Handling Time: 75–145 minutes per task
Minimum expectation: 3+ hours per task cycle, with option to take on more work
Work is asynchronous, though task assignment may align with IST time zone
Requirements:
Professional and expert-level English (US) speaker (can reside in or outside of the U.S.)
5+ years of experience in linguistics, research, writing, content evaluation, or technical review
Master’s degree or PhD strongly preferred
High attention to detail, accuracy, and critical thinking
Secure internet connection and workspace
Must complete and pass a qualifying assessment at the Expertise Level 3 standard
Assessment Details:
Approx. 30 minutes to complete
Paid if passed and evaluator completes at least 4 hours of project work
Project Details:
Project Name: Magi AI Evaluation (Uber Special Project)
Work Type: Freelance, Temporary
Start Date: Within 48 hours of onboarding completion (target: April 28, 2025)
Schedule: Flexible, asynchronous
Location: Global (must be a native En-US speaker)
Duration: Initial pilot cycle, with possible future work based on performance and need
Job Types: Part-time, Temporary
Pay: $25.00 - $31.00 per hour Apply tot his job
We are hiring 20 Freelance Evaluators to support a high-priority Magi AI Evaluation Pilot Project for Uber’s AI team. This role involves reviewing and rating AI-generated responses for quality, clarity, accuracy, and factual correctness across varying levels of complexity.
All candidates must meet Expertise Level 3 (L3) requirements—the highest evaluator standard—to qualify for this project. Your evaluations will directly shape how AI systems learn, improve, and interact with users.
What is Expertise Level 3 (L3)?
This role requires Expertise Level 3, meaning the evaluator must have:
Advanced analytical and research skills
The ability to handle highly complex content, such as technical or domain-specific materials (e.g., science, legal, medical, or data-heavy text)
Experience evaluating multi-modal data (charts, PDFs, screenshots, etc.)
Capability to conduct line-by-line fact-checking and justify responses with critical reasoning
Comfort working with structured rubrics and independently verifying factual accuracy
Compensation:
Pay Rate: $25 – $31 per hour (USD)
The assessment is paid if the evaluator passes and completes at least 4 hours of work on the project
Key Responsibilities:
Evaluate AI-generated responses across varying task types, using structured guidelines
Identify tone, style, factual, and product-specific issues in output
Perform detailed comparisons, fact-checks, and accuracy reviews
Submit ratings using tools such as dropdowns, screencasts, and feedback forms
Meet tight deadlines (all assigned work must be completed within 4 business days)
Task Complexity & Time Commitment:
Tasks will range from simple to complex, but all evaluators must qualify at the L3 expertise level
Average Handling Time: 75–145 minutes per task
Minimum expectation: 3+ hours per task cycle, with option to take on more work
Work is asynchronous, though task assignment may align with IST time zone
Requirements:
Professional and expert-level English (US) speaker (can reside in or outside of the U.S.)
5+ years of experience in linguistics, research, writing, content evaluation, or technical review
Master’s degree or PhD strongly preferred
High attention to detail, accuracy, and critical thinking
Secure internet connection and workspace
Must complete and pass a qualifying assessment at the Expertise Level 3 standard
Assessment Details:
Approx. 30 minutes to complete
Paid if passed and evaluator completes at least 4 hours of project work
Project Details:
Project Name: Magi AI Evaluation (Uber Special Project)
Work Type: Freelance, Temporary
Start Date: Within 48 hours of onboarding completion (target: April 28, 2025)
Schedule: Flexible, asynchronous
Location: Global (must be a native En-US speaker)
Duration: Initial pilot cycle, with possible future work based on performance and need
Job Types: Part-time, Temporary
Pay: $25.00 - $31.00 per hour Apply tot his job