Methodology – halalai.org

Setting Things Up

Step 1

Creating a Clean Workspace:
We made a virtual environment (venv), which is like a separate room for our project. It keeps everything organized and avoids messing with other parts of the computer.

Installing Tools:
We added the tools we needed, like lm-eval (to evaluate models), openai (to connect to AI), and python-dotenv (to manage secret keys).

Setting Things Up

Organizing the Project

Step 2

Building the Foundation:
We arranged everything neatly in folders:
- A script (evaluate_islamic_model.py) to run our tests.
- A special task file (islamic_knowledge_task.py) to define how we test the AI.
- A dataset (islamic_knowledge.jsonl) with our questions and answers.

Organizing the Project

Creating the Testing Framework

Step 3

Designing the Test:
We built a task that:
Reads our list of questions and correct answers.
Formats them into multiple-choice questions.
Scores how well the AI answers.
Integrating It All:
We plugged this task into our testing system so it works seamlessly.

Creating the Testing Framework

Testing the Models

Step 4

Connecting to AI Models:
We set up keys to talk to different AI models, like GPT-4 and Gemini.
Running the Questions:
Each AI was asked the same set of questions and their answers were collected.
Helping the AI Learn:
We gave each model a few example questions to show how it should respond.

Testing the Models

Collecting the Results

Step 5

Scoring the AI:
We calculated how many questions each model got right (Accuracy).
Finding the Best:
We noted which models did well (like Gemini 1.5 Pro with 96%) and which struggled (like Gemini 2.0 Beta with 28%).

Collecting the Results

Creating the Leaderboard

Step 6

Ranking the Models:
We organized the results into a leaderboard to show how the models compare.
Sharing It:
We put the leaderboard online using Hugging Face Spaces so others can see it too.

Creating the Leaderboard