import requestsCatch what your traces can't see

AI agents fail long before they throw errors.
# Label every turn inline. One forward pass, zero decode.res = requests.post( "https://api.morphllm.com/v1/reflex/predict", headers={"Authorization": f"Bearer {MORPH_API_KEY}"}, json={"model": "user-frustration", "text": user_message},).json() # {# "model": "user-frustration",# "mode": "single_label",# "classes": [# {"label": "frustrated", "score": 0.97, "selected": true},# {"label": "neutral", "score": 0.03, "selected": false}# ],# "inference_time_ms": 41# } labels = [c["label"] for c in res["classes"] if c["selected"]]if "frustrated" in labels: route_to_human(conversation)The feedback loop for
self-improving agents
Detect what error messages miss
A request returns 200 while the conversation falls apart. Reflexes read every turn and flag the failures that never throw: frustration, looping, jailbreaks, leaked reasoning.
Every turn, not a sample
LLM-as-judge is too slow and costly to run on everything, so teams sample, and sampling misses the turn that mattered. Reflexes run on all of it: ~15ms a check, from $0.00025 realtime, $0.00001 batch.
Bring your own signal
Describe the signal in a prompt and label a few edge cases. A custom reflex trains in under 30 minutes, no dataset required.
Close the loop: self-improving agents
Every label is a training signal. Flagged conversations flow back into your evals, prompts, and fine-tuning runs, so the turn that frustrated a user yesterday becomes the test your agent passes tomorrow.
Built to run on every turn
Track behavior over time, not one trace at a time
Request a Reflexes demo
Tell us what you're building. We'll tailor the demo to your stack.
Turn the failures you can't see into an agent that improves.
Book demoReflexes is in private beta. The fine-tuning API is live in the docs and may change before general availability.