Evaluation

Evaluation using the Python Library

The following guide demonstrates how to use the Python Library to do evaluations and send them to the database using the Backend API. This example uses the Deepeval (opens in a new tab) Framework to demonstrate that GuardOps is not just an isolated solution in itself but rather built as a single point of contact that consolidates the LLM lifecycle in one application.

Get config data

Go to your projects tab in the frontend and click on the 🗲 symbol to open an overlay with the necessary IDs.

Get config Data

Prepare the config

from guardOps.config.config import Config
 
#Set up the parameters in the config with the values from step 1
Config.set_user_id("<your_user_id>")
Config.set_project_id("<your_project_id>")

Use Deepeval as usual

GuardOps has a wrapper for Deepeval's evaluate function. This means that until you want to use the evaluate function, you can use deepeval as you would use it without GuardOps.

from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval.models import GPTModel
 
# Set up deepeval as usual (example from docs used here)
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7,model=GPTModel(model="gpt-4-turbo-preview",base_url="https://lm3.hs-ansbach.de/worker2/v1/",api_key="test"))
test_case = LLMTestCase(
    input="What if these shoes don't fit?",
    # Replace this with the actual output from your LLM application
    actual_output="We offer a 30-day full refund at no extra costs.",
    retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)
test_case2 = LLMTestCase(
    input="What if these shoes don't fit?",
    # Replace this with the actual output from your LLM application
    actual_output="We offer a 30-day full refund at no extra costs.",
    retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)

Initiate the GuardOps Wrapper Framework

Now instead of using deepeval's evaluate function. We want to use the Deepeval Wrapper that GuardOps provides. To do so, we need to import the Wrapper Framework and create a new instance of it. Refer to the Docs for all available Frameworks.

from guardOps.evaluation.DeepevalFramework import DeepevalFramework
 
#Use the wrapper for coai
eval_framework = DeepevalFramework()

Perform Deepeval Evaluation within the Framework

Every Framework that guardOps provides has an evaluate function with varying parameters. This function always returns an instance of evaluation that can then be exported to the backend. Below you can see, that we provide the required parameters for the GuardOps function and all the usual parameters that you would need when using the deepeval evaluate function. These deepeval parameters simply get passed through using the **kwargs of the DeepevalFramework.evaluate() function.

from guardOps.evaluation.DeepevalFramework import DeepevalFramework
 
#Use the wrapper for coai - eval_name is required by guardOps and the rest are the deepeval parameters that get passed through
evaluation = eval_framework.evaluate(eval_name="Deepeval",test_cases=[test_case,test_case2] , metrics= [answer_relevancy_metric])

Export the data using the GuardOps Exporter

GuardOps provides an exporter that handles all the Backend API interaction. In short - adding evaluations using exporter.add_eval(eval=evaluation) stores the evaluation object in the exporter. Only explictly calling exporter.export() will send either all data or the provided evaluation as an optional param to the backend.

from exporter.EvalExporter import EvalExporter
 
exporter = EvalExporter(title="Deepeval Wrapper")
exporter.add_eval(eval = evaluation)
exporter.export()

Full Code

The complete code should look something like this.

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval.models import GPTModel
from evaluation.DeepevalFramework import DeepevalFramework
from config.config import Config
 
#Set up the parameters in the config
Config.set_user_id("<your_user_id>")
Config.set_project_id("<your_project_id>")
 
 
# Set up deepeval as usual (example from docs used here)
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7,model=GPTModel(model="gpt-4-turbo-preview",base_url="https://lm3.hs-ansbach.de/worker2/v1/",api_key="test"))
test_case = LLMTestCase(
    input="What if these shoes don't fit?",
    # Replace this with the actual output from your LLM application
    actual_output="We offer a 30-day full refund at no extra costs.",
    retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)
test_case2 = LLMTestCase(
    input="What if these shoes don't fit?",
    # Replace this with the actual output from your LLM application
    actual_output="We offer a 30-day full refund at no extra costs.",
    retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)
 
#Use the wrapper for coai
 
eval_framework = DeepevalFramework()
evaluation = eval_framework.evaluate(eval_name="Deepeval",test_cases=[test_case,test_case2] , metrics= [answer_relevancy_metric])
 
# Export it 
from exporter.EvalExporter import EvalExporter
exporter = EvalExporter(title="Deepeval Wrapper")
exporter.add_eval(eval = evaluation)
exporter.export()

Evaluation using Flows

Tracing Usage examples