Testing & Debugging — Docs | Nucleus by Kasi Labs

Overview

Plugin testing checks three things:

The plugin can be installed and configured.
The endpoint receives and verifies tool calls.
The agent uses the tool at the right time and explains the result correctly.

Test with realistic customer conversations, not only direct API calls. The agent's decision to use a tool depends on the customer's wording, the WhatsApp number instance prompt, the tool description, and which integrations are granted to that WhatsApp number.

Example: if the support plugin is granted to the support WhatsApp number but you test from the sales WhatsApp number, the sales agent will not see the support plugin tools.

Sandbox mode

Sandbox mode lets your organization test a plugin before publishing it.

Area	Sandbox behavior
Visibility	Only your organization can see it
Review	No public review required
Secret	Sandbox secret, shown once
Tool calls	Real calls from your granted instances
Customer testing	Works with real WhatsApp conversations

Use sandbox mode for private internal plugins, prototypes, and pre-release testing.

Install and grant checklist

Before sending WhatsApp test messages, confirm:

Manifest URL is public and reachable over HTTPS.
Manifest validates successfully.
Sandbox secret is saved.
Required configuration fields are filled.
Requested plugin permissions are understandable and not broader than needed.
Plugin is installed for the organization.
Plugin is granted to the target WhatsApp number instance.
Any required platform toolkit is installed and granted to the same WhatsApp number instance.
The WhatsApp number instance prompt explains when to use the plugin.
Your endpoint is deployed and logging requests.

Conversation test script

Use a predictable test script.

Example for a support plugin:

text

Customer: My order arrived damaged and I need help.
Expected: Agent creates a support ticket and gives the ticket ID.

Example for a delivery plugin:

text

Customer: Where is order MAT-2026-0510-0041?
Expected: Agent calls delivery lookup and summarizes delivery status.

Example for a booking plugin:

text

Customer: I want to book a repair consultation tomorrow afternoon.
Expected: Agent checks availability before confirming.

What to inspect

After each test, inspect:

Item	What to look for
Agent behavior	Did the agent call the right tool?
Tool input	Did the agent send complete and valid fields?
Endpoint logs	Did your endpoint verify and process the request?
Tool response	Was the JSON compact and customer-safe?
WhatsApp reply	Did the agent explain the result clearly?
Dashboard logs	Does the execution show success or failure?

If the agent does not call the tool

Common causes:

Plugin is installed but not granted to the WhatsApp number instance you are testing.
Tool description is too vague.
Instance prompt does not mention the workflow.
Customer wording does not match the tool's purpose.
Input schema asks for data the agent does not have.

Fixes:

Grant the plugin to the correct WhatsApp number instance.
Rewrite the tool description with trigger phrases.
Add prompt guidance to the WhatsApp number instance.
Add optional fields where the agent may not always know a value.
Split one broad tool into smaller specific tools.

If the agent calls the tool too often

Common causes:

Tool description is too broad.
Tool name sounds like a general-purpose answer.
The prompt tells the agent to use the plugin for too many cases.

Fixes:

Add "Use this only when..." language.
Add "Do not use this for..." language.
Separate read-only tools from side-effect tools.
Require a stable reference for sensitive operations.

If the endpoint rejects the request

Check:

The platform token is present.
You are using the correct sandbox or live secret.
The token has not expired.
Your endpoint expects the correct tool path.
Your endpoint reads tool, input, and context from the request body.
Your endpoint handles configuration values correctly.

Return safe error messages so the agent can recover.

If the tool returns poor answers

The agent can only use what your plugin returns.

Improve responses by returning:

status
Customer-safe message
External reference IDs
Next action
Relevant dates or amounts

Avoid returning large raw objects. If the agent receives too much unrelated data, customer replies become less reliable.

Testing sensitive workflows

For tools that affect money, fulfillment, bookings, or customer records:

Test with draft or sandbox accounts first.
Use small amounts.
Require explicit references.
Validate the business state inside your endpoint.
Confirm the platform toolkit that owns the prerequisite action is granted.
Confirm the action appears in the correct dashboard.
Confirm unauthorized cases are refused.

Examples of refusal cases:

Try to create delivery for an unpaid order.
Try to refund without an approved refund request.
Try to book an unavailable time.
Try to update a customer record without a valid customer reference.

Testing toolkit-dependent plugins

If your plugin depends on Payments, Scheduling, E-Commerce, or another toolkit, test the full chain.

Payment-dependent plugin

Example: accounting plugin creates an invoice after payment.

Expected test:

Agent creates or receives a payment request through Payments.
Payment status becomes successful.
Agent calls your plugin with the payment reference.
Plugin creates the invoice and returns an invoice ID.
Agent sends the customer or team the invoice reference.

Failure cases to test:

Payment is pending.
Payment failed.
Payment reference is missing.
Payments is not granted to the WhatsApp number instance.

Scheduling-dependent plugin

Example: booking plugin creates an appointment and asks Scheduling to remind the customer.

Expected test:

Agent calls your plugin to book the appointment.
Plugin returns the appointment time and booking ID.
Agent uses Scheduling to create the reminder.
Agent confirms both booking and reminder to the customer.

Failure cases to test:

Appointment time unavailable.
Scheduling is not granted to the WhatsApp number instance.
Customer has not confirmed reminder consent.

E-Commerce-dependent plugin

Example: warehouse plugin receives paid orders.

Expected test:

E-Commerce order is created.
Payment succeeds.
Order status is paid.
Agent calls your plugin with the order number.
Plugin creates the warehouse record.

Failure cases to test:

Order is unpaid.
Order was cancelled.
Required delivery address is missing.
E-Commerce is not granted to the WhatsApp number instance.

Testing platform bridge permissions

If your plugin uses a platform bridge, test each permission separately.

For payment and commerce bridges:

Test	Expected result
Missing `plugin:payments:initiate:current_chat`	Current-customer payment request is rejected
Missing `plugin:payments:status:own`	Plugin cannot read status for payments it created
Missing `plugin:payments:refund:execute:own`	Plugin cannot refund payments it created
Missing `plugin:ecommerce:orders:create:current_chat`	Current-customer order creation is rejected
Commerce checkout without `plugin:ecommerce:checkout:initiate`	Order may be created, but checkout request is rejected
Commerce checkout without `plugin:payments:initiate:current_chat`	Checkout payment request is rejected
Duplicate idempotency key	Bridge returns the stored result instead of duplicating the side effect

For messaging, scheduling, escalation, and obligation bridges:

Test	Expected result
Missing `plugin:obligations:request`	Request is rejected before any message is sent
Has request permission but missing `plugin:messages:send:current_chat`	Current-customer reminder request is rejected
Escalation without `plugin:messages:escalate:current_chat`	Current-customer escalation request is rejected
Future `runAt` without `plugin:messages:schedule:current_chat`	Scheduled current-customer request is rejected
External recipient without `plugin:messages:send:external_recipient`	External message request is blocked

This is how you confirm the plugin cannot use the bridge beyond the permissions granted to it.

Also test polled obligations if your manifest uses metadata.capability: "obligations.list_due".

Test	Expected result
Tool is not allowed in the WhatsApp number instance grant	KasiLabs does not poll it
Tool returns a malformed obligation	Obligation is ignored
Tool returns reminder but lacks the matching scoped `plugin:messages:send:*` permission	Reminder is blocked
Tool returns escalation but lacks the matching scoped `plugin:messages:escalate:*` permission	Escalation is blocked
Tool returns external recipient without the matching `external_recipient` permission	Recipient is blocked
`obligations.record_event` exists and is allowed	Plugin receives event results

Preparing for production

Before publishing or using a plugin in a live workflow:

Verify every request.
Remove debug-only tools.
Remove broad admin actions.
Confirm all side-effect tools validate business state.
Confirm customer identifiers are handled privately.
Confirm errors do not reveal secrets.
Confirm only the right instances have the plugin grant.
Confirm the organization knows who can configure and grant the plugin.

Updating a plugin

When you update a manifest:

Deploy the new manifest.
Re-fetch or reinstall the sandbox plugin.
Review tool changes.
Re-test conversations.
Grant or remove WhatsApp number instance access as needed.

When you update endpoint behavior without changing tools, deploy your endpoint and run the same conversation tests again.

Submitting for review

If the plugin should be available beyond your organization, submit it for review from the developer area.

Review focuses on:

Manifest quality.
Tool safety.
Authentication.
Endpoint reliability.
Customer privacy.
Whether descriptions match actual behavior.

Next steps

Developer Overview - Understand plugin lifecycle and permissions
Manifest Reference - Improve tool contracts
Authentication - Secure requests