Lavita AI’s Medical Evaluation Sphere is the go-to platform for real-time evaluation of foundation models on any medical or clinical task, across any modality, at scale. We’re building a global network of medical professionals alongside a growing community of users to create the most trusted and comprehensive ecosystem for medical AI evaluation.

Lavita AI’s Medical Evaluation Sphere

How It Works

Lavita AI’s Medical Evaluation Sphere

On the Medical Evaluation Sphere, users can chat with two foundation models and compare their performances. Everything starts with asking a question. Users can engage in either a single-turn or multi-turn conversation, then compare the responses from both models and vote on one of the following:

  1. Model A is better

  2. Model B is better

  3. It’s a tie (both models performed well)

  4. Neither (neither model provided a good response)

Comparing model outputs and voting for the preferred response

There are two modes of conversation on the Evaluation Sphere. By default, the models remain anonymous to ensure an objective comparison. However, users can choose to uncheck the “Anonymous Battle” option and manually select which models to compare—this is called a non-anonymous battle. In anonymous battle mode, after submitting a vote, the model names will be revealed.

When reporting results on our leaderboard, we only consider votes from anonymous battles. Additionally, while users can ask any type of question, we filter out votes on non-medical or non-clinical conversations when aggregating results. Therefore, we encourage users to focus on medical questions, as votes on non-medical topics will not be counted.

Revealing model names after submitting the vote