Developers

Testing & Debugging

Test sandbox plugins, inspect tool calls, refine agent behavior, and prepare plugins for production.

Overview

Plugin testing checks three things:

  1. The plugin can be installed and configured.
  2. The endpoint receives and verifies tool calls.
  3. The agent uses the tool at the right time and explains the result correctly.

Test with realistic customer conversations, not only direct API calls. The agent's decision to use a tool depends on the customer's wording, the WhatsApp number instance prompt, the tool description, and which integrations are granted to that WhatsApp number.

Example: if the support plugin is granted to the support WhatsApp number but you test from the sales WhatsApp number, the sales agent will not see the support plugin tools.

Sandbox mode

Sandbox mode lets your organization test a plugin before publishing it.

AreaSandbox behavior
VisibilityOnly your organization can see it
ReviewNo public review required
SecretSandbox secret, shown once
Tool callsReal calls from your granted instances
Customer testingWorks with real WhatsApp conversations

Use sandbox mode for private internal plugins, prototypes, and pre-release testing.

Install and grant checklist

Before sending WhatsApp test messages, confirm:

  • Manifest URL is public and reachable over HTTPS.
  • Manifest validates successfully.
  • Sandbox secret is saved.
  • Required configuration fields are filled.
  • Requested plugin permissions are understandable and not broader than needed.
  • Plugin is installed for the organization.
  • Plugin is granted to the target WhatsApp number instance.
  • Any required platform toolkit is installed and granted to the same WhatsApp number instance.
  • The WhatsApp number instance prompt explains when to use the plugin.
  • Your endpoint is deployed and logging requests.

Conversation test script

Use a predictable test script.

Example for a support plugin:

text
Customer: My order arrived damaged and I need help.
Expected: Agent creates a support ticket and gives the ticket ID.

Example for a delivery plugin:

text
Customer: Where is order MAT-2026-0510-0041?
Expected: Agent calls delivery lookup and summarizes delivery status.

Example for a booking plugin:

text
Customer: I want to book a repair consultation tomorrow afternoon.
Expected: Agent checks availability before confirming.

What to inspect

After each test, inspect:

ItemWhat to look for
Agent behaviorDid the agent call the right tool?
Tool inputDid the agent send complete and valid fields?
Endpoint logsDid your endpoint verify and process the request?
Tool responseWas the JSON compact and customer-safe?
WhatsApp replyDid the agent explain the result clearly?
Dashboard logsDoes the execution show success or failure?

If the agent does not call the tool

Common causes:

  • Plugin is installed but not granted to the WhatsApp number instance you are testing.
  • Tool description is too vague.
  • Instance prompt does not mention the workflow.
  • Customer wording does not match the tool's purpose.
  • Input schema asks for data the agent does not have.

Fixes:

  • Grant the plugin to the correct WhatsApp number instance.
  • Rewrite the tool description with trigger phrases.
  • Add prompt guidance to the WhatsApp number instance.
  • Add optional fields where the agent may not always know a value.
  • Split one broad tool into smaller specific tools.

If the agent calls the tool too often

Common causes:

  • Tool description is too broad.
  • Tool name sounds like a general-purpose answer.
  • The prompt tells the agent to use the plugin for too many cases.

Fixes:

  • Add "Use this only when..." language.
  • Add "Do not use this for..." language.
  • Separate read-only tools from side-effect tools.
  • Require a stable reference for sensitive operations.

If the endpoint rejects the request

Check:

  • The platform token is present.
  • You are using the correct sandbox or live secret.
  • The token has not expired.
  • Your endpoint expects the correct tool path.
  • Your endpoint reads tool, input, and context from the request body.
  • Your endpoint handles configuration values correctly.

Return safe error messages so the agent can recover.

If the tool returns poor answers

The agent can only use what your plugin returns.

Improve responses by returning:

  • status
  • Customer-safe message
  • External reference IDs
  • Next action
  • Relevant dates or amounts

Avoid returning large raw objects. If the agent receives too much unrelated data, customer replies become less reliable.

Testing sensitive workflows

For tools that affect money, fulfillment, bookings, or customer records:

  • Test with draft or sandbox accounts first.
  • Use small amounts.
  • Require explicit references.
  • Validate the business state inside your endpoint.
  • Confirm the platform toolkit that owns the prerequisite action is granted.
  • Confirm the action appears in the correct dashboard.
  • Confirm unauthorized cases are refused.

Examples of refusal cases:

  • Try to create delivery for an unpaid order.
  • Try to refund without an approved refund request.
  • Try to book an unavailable time.
  • Try to update a customer record without a valid customer reference.

Testing toolkit-dependent plugins

If your plugin depends on Payments, Scheduling, E-Commerce, or another toolkit, test the full chain.

Payment-dependent plugin

Example: accounting plugin creates an invoice after payment.

Expected test:

  1. Agent creates or receives a payment request through Payments.
  2. Payment status becomes successful.
  3. Agent calls your plugin with the payment reference.
  4. Plugin creates the invoice and returns an invoice ID.
  5. Agent sends the customer or team the invoice reference.

Failure cases to test:

  • Payment is pending.
  • Payment failed.
  • Payment reference is missing.
  • Payments is not granted to the WhatsApp number instance.

Scheduling-dependent plugin

Example: booking plugin creates an appointment and asks Scheduling to remind the customer.

Expected test:

  1. Agent calls your plugin to book the appointment.
  2. Plugin returns the appointment time and booking ID.
  3. Agent uses Scheduling to create the reminder.
  4. Agent confirms both booking and reminder to the customer.

Failure cases to test:

  • Appointment time unavailable.
  • Scheduling is not granted to the WhatsApp number instance.
  • Customer has not confirmed reminder consent.

E-Commerce-dependent plugin

Example: warehouse plugin receives paid orders.

Expected test:

  1. E-Commerce order is created.
  2. Payment succeeds.
  3. Order status is paid.
  4. Agent calls your plugin with the order number.
  5. Plugin creates the warehouse record.

Failure cases to test:

  • Order is unpaid.
  • Order was cancelled.
  • Required delivery address is missing.
  • E-Commerce is not granted to the WhatsApp number instance.

Testing platform bridge permissions

If your plugin uses a platform bridge, test each permission separately.

For payment and commerce bridges:

TestExpected result
Missing plugin:payments:initiate:current_chatCurrent-customer payment request is rejected
Missing plugin:payments:status:ownPlugin cannot read status for payments it created
Missing plugin:payments:refund:execute:ownPlugin cannot refund payments it created
Missing plugin:ecommerce:orders:create:current_chatCurrent-customer order creation is rejected
Commerce checkout without plugin:ecommerce:checkout:initiateOrder may be created, but checkout request is rejected
Commerce checkout without plugin:payments:initiate:current_chatCheckout payment request is rejected
Duplicate idempotency keyBridge returns the stored result instead of duplicating the side effect

For messaging, scheduling, escalation, and obligation bridges:

TestExpected result
Missing plugin:obligations:requestRequest is rejected before any message is sent
Has request permission but missing plugin:messages:send:current_chatCurrent-customer reminder request is rejected
Escalation without plugin:messages:escalate:current_chatCurrent-customer escalation request is rejected
Future runAt without plugin:messages:schedule:current_chatScheduled current-customer request is rejected
External recipient without plugin:messages:send:external_recipientExternal message request is blocked

This is how you confirm the plugin cannot use the bridge beyond the permissions granted to it.

Also test polled obligations if your manifest uses metadata.capability: "obligations.list_due".

TestExpected result
Tool is not allowed in the WhatsApp number instance grantKasiLabs does not poll it
Tool returns a malformed obligationObligation is ignored
Tool returns reminder but lacks the matching scoped plugin:messages:send:* permissionReminder is blocked
Tool returns escalation but lacks the matching scoped plugin:messages:escalate:* permissionEscalation is blocked
Tool returns external recipient without the matching external_recipient permissionRecipient is blocked
obligations.record_event exists and is allowedPlugin receives event results

Preparing for production

Before publishing or using a plugin in a live workflow:

  • Verify every request.
  • Remove debug-only tools.
  • Remove broad admin actions.
  • Confirm all side-effect tools validate business state.
  • Confirm customer identifiers are handled privately.
  • Confirm errors do not reveal secrets.
  • Confirm only the right instances have the plugin grant.
  • Confirm the organization knows who can configure and grant the plugin.

Updating a plugin

When you update a manifest:

  1. Deploy the new manifest.
  2. Re-fetch or reinstall the sandbox plugin.
  3. Review tool changes.
  4. Re-test conversations.
  5. Grant or remove WhatsApp number instance access as needed.

When you update endpoint behavior without changing tools, deploy your endpoint and run the same conversation tests again.

Submitting for review

If the plugin should be available beyond your organization, submit it for review from the developer area.

Review focuses on:

  • Manifest quality.
  • Tool safety.
  • Authentication.
  • Endpoint reliability.
  • Customer privacy.
  • Whether descriptions match actual behavior.

Next steps