Conversation
ristomcgehee
left a comment
There was a problem hiding this comment.
I appreciate you adding these tests.
The test for detecting PI using LLM fails sometimes: OpenAI classifies a benign input as prompt injection- false positive.
I've noticed this in the JS SDK as well. I strongly suspect this is due to using the default temperature of 1 when calling the OpenAI API. I'll open a PR at some point to set the temperature to 0, unless someone else gets around to it first.
README.md
Outdated
| - [x] Attack Signature Learning | ||
| - [x] JavaScript/TypeScript SDK | ||
| - [ ] Python SDK to have parity with TS SDK | ||
| - [x] Python SDK to have parity with TS SDK |
There was a problem hiding this comment.
I would wait to mark this until we add Chroma as a vector store for Python.
python-sdk/tests/test_sdk.py
Outdated
| openai_apikey = get_environment_variable("OPENAI_APIKEY") | ||
| pinecone_apikey = get_environment_variable("PINECONE_APIKEY") | ||
| pinecone_environment = get_environment_variable("PINECONE_ENVIRONMENT") | ||
| pinecone_index = get_environment_variable("PINECONE_INDEX") |
There was a problem hiding this comment.
Let's make the env vars consistent with the rest of the project:
| openai_apikey = get_environment_variable("OPENAI_APIKEY") | |
| pinecone_apikey = get_environment_variable("PINECONE_APIKEY") | |
| pinecone_environment = get_environment_variable("PINECONE_ENVIRONMENT") | |
| pinecone_index = get_environment_variable("PINECONE_INDEX") | |
| openai_apikey = get_environment_variable("OPENAI_API_KEY") | |
| pinecone_apikey = get_environment_variable("PINECONE_API_KEY") | |
| pinecone_environment = get_environment_variable("PINECONE_ENVIRONMENT") | |
| pinecone_index = get_environment_variable("PINECONE_INDEX_NAME") |
| check_vector, | ||
| check_llm, | ||
| ] | ||
| return detect_injection_arguments |
There was a problem hiding this comment.
Python has a handy feature called "keyword argument unpacking" where you can pass a dictionary to a function and the keys in the function will get matched up with the parameters of the same name. What this means is you could make this function be:
def detect_injection_arguments():
detect_injection_arguments = {
"max_heuristic_score": 0.5,
"max_vector_score": 0.90,
"max_model_score": 0.90,
"check_heuristic": False,
"check_vector": False,
"check_llm": False,
}
return detect_injection_argumentsAnd then in your tests, you could do:
def test_detect_injection_heuristics(
rebuff: RebuffSdk,
prompt_injection_inputs: List[str],
benign_inputs: List[str],
detect_injection_arguments,
):
detect_injection_arguments["check_heuristic"] = True
for prompt_injection in prompt_injection_inputs:
rebuff_response = rebuff.detect_injection(
prompt_injection,
**detect_injection_arguments,
)There was a problem hiding this comment.
I am now using "keyword argument unpacking" and the code reads a lot better. Thank you!
|
|
||
|
|
||
| @pytest.fixture() | ||
| def rebuff() -> RebuffSdk: |
There was a problem hiding this comment.
Just an FYI, I'm not going to insist that you use type hints in test code like I did in sdk.py. I think it's very helpful to have type hints on public methods in a library, but type hints for tests are much less valuable
python-sdk/tests/test_sdk.py
Outdated
| leak_detected = rebuff.is_canary_word_leaked( | ||
| user_input, response_completion, canary_word, log_outcome | ||
| ) | ||
| assert leak_detected is True |
There was a problem hiding this comment.
It's simpler and more Pythonic to do:
assert leak_detected
...
assert not leak_detected
python-sdk/tests/test_sdk.py
Outdated
| check_llm, | ||
| ) | ||
| assert rebuff_response.heuristic_score < max_heuristic_score | ||
| assert rebuff_response.injection_detected is False |
There was a problem hiding this comment.
An approach you could take that would cut down on code:
def test_detect_injection_heuristics(
rebuff: RebuffSdk,
prompt_injection_inputs: List[str],
benign_inputs: List[str],
detect_injection_arguments,
):
detect_injection_arguments["check_heuristic"] = True
user_inputs = prompt_injection_inputs + benign_inputs
expect_detected = [True] * len(prompt_injection_inputs) + [False] * len(benign_inputs)
for index, input in enumerate(user_inputs):
rebuff_response = rebuff.detect_injection(
input,
**detect_injection_arguments,
)
assert (rebuff_response.heuristic_score > detect_injection_arguments["max_heuristic_score"]) == expect_detected[index]
assert rebuff_response.injection_detected == expect_detected[index]My code could be improved to make user_inputs and expect_detected fixtures.
Only accept my suggestion if you think it's better. It might be slightly harder to follow than the less compact approach you have.
There was a problem hiding this comment.
Thank you for suggesting this. I learnt new ways of making the code compact, appreciate you sharing it. Though, also feel it is not as straight forward as the less-compact version. I think in the interest of code-readability, would prefer the less-compact version.
ristomcgehee
left a comment
There was a problem hiding this comment.
Overall looks good, just one minor fix suggestion.
python-sdk/tests/test_sdk.py
Outdated
|
|
||
|
|
||
| @pytest.fixture() | ||
| def detect_injection_arguments() -> List[Union[float, bool]]: |
There was a problem hiding this comment.
I will comment when I see type hints that don't match. This should be changed here and elsewhere in the file:
| def detect_injection_arguments() -> List[Union[float, bool]]: | |
| def detect_injection_arguments() -> Dict: |
or
| def detect_injection_arguments() -> List[Union[float, bool]]: | |
| def detect_injection_arguments(): |
There was a problem hiding this comment.
Thank you, I have now updated the type hints.
|
For the JavaScript test failure ( For the Python tests failure, I'd recommend setting the |
|
It turns out I was wrong about the temperature being the cause of the tests failing. It's actually due to an accidental Then ChatGPT 3.5 classifies this as prompt injection almost every time. To fix this you can simply change the line in to |
|
When you rebase your PR off of main, you'll need to change: pinecone.init(api_key=api_key, environment=environment)to: pinecone.Pinecone(api_key=api_key)You could even remove all references to pinecone environment (in |
cherbel
left a comment
There was a problem hiding this comment.
Looking good, nice to have more test @mehrinkiani!
python-sdk/tests/test_sdk.py
Outdated
| @@ -0,0 +1,171 @@ | |||
| from typing import List, Dict | |||
| import pytest | |||
| from rebuff.sdk import RebuffSdk, RebuffDetectionResponse | |||
There was a problem hiding this comment.
No longer used:
| from rebuff.sdk import RebuffSdk, RebuffDetectionResponse | |
| from rebuff.sdk import RebuffSdk |
This PR adds tests for Rebuff's Python SDK.
A few things to note:
test_sdk.pyin parity with the JS tests.