mirror of
https://github.com/EvolutionAPI/adk-python.git
synced 2025-12-23 21:57:44 -06:00
Copybara import of the project:
-- 16994cb2d5d646341f5285ca71d72697d81d18fe by Nilanjan De <nilanjan.de@gmail.com>: chore: fix typos COPYBARA_INTEGRATE_REVIEW=https://github.com/google/adk-python/pull/272 from n1lanjan:fix-typos a1ab655b08ec08c5dd2da71aab9a2386e3610e84 PiperOrigin-RevId: 749690489
This commit is contained in:
committed by
Copybara-Service
parent
23f0383284
commit
1664b45562
@@ -55,7 +55,7 @@ def load_json(file_path: str) -> Union[Dict, List]:
|
||||
|
||||
|
||||
class AgentEvaluator:
|
||||
"""An evaluator for Agents, mainly intented for helping with test cases."""
|
||||
"""An evaluator for Agents, mainly intended for helping with test cases."""
|
||||
|
||||
@staticmethod
|
||||
def find_config_for_test_file(test_file: str):
|
||||
@@ -91,7 +91,7 @@ class AgentEvaluator:
|
||||
look for 'root_agent' in the loaded module.
|
||||
eval_dataset: The eval data set. This can be either a string representing
|
||||
full path to the file containing eval dataset, or a directory that is
|
||||
recusively explored for all files that have a `.test.json` suffix.
|
||||
recursively explored for all files that have a `.test.json` suffix.
|
||||
num_runs: Number of times all entries in the eval dataset should be
|
||||
assessed.
|
||||
agent_name: The name of the agent.
|
||||
|
||||
@@ -35,7 +35,7 @@ class ResponseEvaluator:
|
||||
Args:
|
||||
raw_eval_dataset: The dataset that will be evaluated.
|
||||
evaluation_criteria: The evaluation criteria to be used. This method
|
||||
support two criterias, `response_evaluation_score` and
|
||||
support two criteria, `response_evaluation_score` and
|
||||
`response_match_score`.
|
||||
print_detailed_results: Prints detailed results on the console. This is
|
||||
usually helpful during debugging.
|
||||
@@ -56,7 +56,7 @@ class ResponseEvaluator:
|
||||
Value range: [0, 5], where 0 means that the agent's response is not
|
||||
coherent, while 5 means it is . High values are good.
|
||||
A note on raw_eval_dataset:
|
||||
The dataset should be a list session, where each sesssion is represented
|
||||
The dataset should be a list session, where each session is represented
|
||||
as a list of interaction that need evaluation. Each evaluation is
|
||||
represented as a dictionary that is expected to have values for the
|
||||
following keys:
|
||||
|
||||
@@ -31,10 +31,9 @@ class TrajectoryEvaluator:
|
||||
):
|
||||
r"""Returns the mean tool use accuracy of the eval dataset.
|
||||
|
||||
Tool use accuracy is calculated by comparing the expected and actuall tool
|
||||
use trajectories. An exact match scores a 1, 0 otherwise. The final number
|
||||
is an
|
||||
average of these individual scores.
|
||||
Tool use accuracy is calculated by comparing the expected and the actual
|
||||
tool use trajectories. An exact match scores a 1, 0 otherwise. The final
|
||||
number is an average of these individual scores.
|
||||
|
||||
Value range: [0, 1], where 0 is means none of the too use entries aligned,
|
||||
and 1 would mean all of them aligned. Higher value is good.
|
||||
@@ -45,7 +44,7 @@ class TrajectoryEvaluator:
|
||||
usually helpful during debugging.
|
||||
|
||||
A note on eval_dataset:
|
||||
The dataset should be a list session, where each sesssion is represented
|
||||
The dataset should be a list session, where each session is represented
|
||||
as a list of interaction that need evaluation. Each evaluation is
|
||||
represented as a dictionary that is expected to have values for the
|
||||
following keys:
|
||||
|
||||
Reference in New Issue
Block a user