Method

insert the datasets we are using

Setting Things Up

Step 1

Creating a Clean Workspace:
We made a virtual environment (venv), which is like a separate room for our project. It keeps everything organized and avoids messing with other parts of the computer.

Installing Tools:
We added the tools we needed, like lm-eval (to evaluate models), openai (to connect to AI), and python-dotenv (to manage secret keys).

Setting Things Up

Organizing the Project

Step 2

  • Building the Foundation:
    We arranged everything neatly in folders:
    • A script (evaluate_islamic_model.py) to run our tests.
    • A special task file (islamic_knowledge_task.py) to define how we test the AI.
    • A dataset (islamic_knowledge.jsonl) with our questions and answers.
Organizing the Project

Creating the Testing Framework

Step 3

  • Designing the Test:
    We built a task that:
  • Reads our list of questions and correct answers.
  • Formats them into multiple-choice questions.
  • Scores how well the AI answers.
  • Integrating It All:
    We plugged this task into our testing system so it works seamlessly.
  •  
Creating the Testing Framework

Testing the Models

Step 4

  • Connecting to AI Models:
    We set up keys to talk to different AI models, like GPT-4 and Gemini.
  • Running the Questions:
    Each AI was asked the same set of questions and their answers were collected.
  • Helping the AI Learn:
    We gave each model a few example questions to show how it should respond.
Testing the Models

Collecting the Results

Step 5

  • Scoring the AI:
    We calculated how many questions each model got right (Accuracy).
  • Finding the Best:
    We noted which models did well (like Gemini 1.5 Pro with 96%) and which struggled (like Gemini 2.0 Beta with 28%).
Collecting the Results

Creating the Leaderboard

Step 6

  • Ranking the Models:
    We organized the results into a leaderboard to show how the models compare.
  • Sharing It:
    We put the leaderboard online using Hugging Face Spaces so others can see it too.
Creating the Leaderboard
Scroll to Top