Research Scientist - Model Evaluation Job at Lumicity, Santa Rosa, CA

dE91M3ZjNWxza3BUMG52d2d6U2xDdVR2NVE9PQ==
  • Lumicity
  • Santa Rosa, CA

Job Description

AI Benchmarking & Evaluation Engineer

Join a team at the forefront of AI model evaluation, setting the standard for how large language models are tested and validated. In this role, you'll assess the latest AI models, design new benchmarks, and develop advanced evaluation methodologies. You'll work closely with engineers, AI researchers, and enterprise clients to ensure cutting-edge AI systems meet the highest standards. This role is a bridge between research and practical implementation and will suit someone who enjoys taking academic papers and creating working models.

Key Responsibilities:

  • Analyze and benchmark newly released AI models (DeepSeek, Gemini, etc.)
  • Develop and implement novel evaluation frameworks
  • Build datasets, manage labeling processes, and publish findings
  • Enhance automated evaluation techniques for AI-generated content
  • Collaborate with top AI labs and enterprise partners to refine best practices

Who You Are:

  • MSc or PhD from leading Computer Science or Machine Learning school
  • At least 3 years of experience in applied AI, with a focus on benchmarking or model evaluation
  • Strong background in designing evaluation methodologies
  • Passion for advancing AI assessment standards
  • Solid Python, PyTorch/TensorFlow and Django

Make a real impact in AI research and development—apply today!

Job Tags

Similar Jobs

Moore

Mechanical Maintenance (Printing) Job at Moore

Moore is a data-driven constituent experience management (CXM) company achieving accelerated growth for clients through integrated supporter experiences across all platforms, channels and devices. We are an innovation-led company that is the largest data, media, and marketing...

Elevait Solutions

Image Processing Engineer Job at Elevait Solutions

 ...Job Title: Image Processing Engineer/ Measurement Engineer I Location: Painted Post, NY - open for hybrid Duration: 12+ Months with possible extensions Top Required Skills: Programming in C# and Python Image processing algorithm development (OpenCV/Halcon... 

Sundance Construction Company

Traveling Construction Superintendent - Commercial Construction Job at Sundance Construction Company

 ...Construction Superintendent Houston, TX (traveling) Company Overview Sundance Construction Company, established in 1982, is a family-owned, generational full-service general contracting and construction management firm serving Texas and surrounding states. We specialize... 

Excelsia Injury Care

Licensed Clinical Social Worker Job at Excelsia Injury Care

 ...providers are leaders in personal injury and workers compensation care, with a proven track...  ...those injured in motor vehicle or work-related accidents. We take an interdisciplinary...  ...citizens, we integrate environmental, social, and governance (ESG) considerations into... 

AP Rochester

Registered Nurse Job at AP Rochester

 ...AP Professionals is in search of a compassionate and dedicated Registered Nurse (RN) for a non-hospital setting in Rochester. The RN will provide high-quality nursing care to pediatric patients by assessing, planning, implementing, and evaluating patient needs. This role...