Proxy API Documentation

This documentation describes the API contract required to integrate a custom backend with the AI Workbench UI using the Proxy GPT provider.

Overview

The Proxy GPT provider supports two communication methods:

  1. HTTP Streaming (Preferred): Standard Server-Sent Events (SSE) for chat completions.
  2. Socket.IO (Fallback): Real-time bidirectional communication.

Your backend should ideally support both, but HTTP Streaming is prioritized.


1. Health Check

The UI checks if your proxy server is reachable before attempting to connect.

  • Endpoint: GET /health or HEAD /
  • Expected Response: Status 200 OK

2. Models (Optional)

To support dynamic model selection and expose specific capabilities (e.g., multimodal I/O, thinking), implement the models endpoint.

  • Endpoint: GET /v1/models
  • Expected Response:
json
{  "data": [    {      "id": "my-custom-model",      "object": "model",      "created": 1677610602,      "owned_by": "custom",      "capabilities": {        "imageInput": true,        "imageOutput": false,        "thinking": true,        "textInput": true,        "textOutput": true,        "internetBrowsing": false,        "fileOutput": false,        "videoInput": false,        "videoOutput": false      }    }  ]}

3. Chat Completions (HTTP Streaming)

This is the primary method for generating chat responses.

  • Endpoint: POST /api/chat
  • Content-Type: application/json

Request Body

json
{  "messages": [    {      "role": "user",      "content": "Hello, how are you?"    }  ]}

Response Format (Server-Sent Events)

The server must stream the response using the SSE format. Each chunk of data should be prefixed with data: .

Standard Text Response

text
data: {"content": "Hello"}data: {"content": " world"}data: [DONE]

Task Planning & Progress (New)

You can stream task updates to visualize the agent's plan and progress in the UI.

Plan Update Event: Defines the initial plan or updates the list of steps.

text
event: plan_updatedata: {  "current_task_id": "task-1",  "steps": [    {      "id": "step-1",      "status": "running",      "title": "Analyze request"    },    {      "id": "step-2",      "status": "pending",      "title": "Execute code"    }  ]}

Todo Update Event: Updates the status of existing tasks.

text
event: todo_updatedata: {  "items": [    {      "id": "step-1",      "status": "completed",      "title": "Analyze request"    },    {      "id": "step-2",      "status": "running",      "title": "Execute code"    }  ]}

Thinking/Reasoning Event: Displays a collapsible "Reasoning" block for internal thought processes.

text
event: thinkingdata: {"content": "I need to check the database first..."}

4. Socket.IO (Fallback)

If HTTP streaming fails, the UI attempts to connect via Socket.IO.

  • Event Name: chat
  • Payload: Same as the HTTP request body.

5. CORS Configuration

Ensure your server allows Cross-Origin Resource Sharing (CORS) from the domain where the AI Workbench UI is hosted.

javascript
// Express/Node.js exampleapp.use(cors({  origin: "*", // Or specific domain  methods: ["GET", "POST", "OPTIONS"]}));

Overview

The Proxy GPT provider supports two communication methods:

  1. HTTP Streaming (Preferred): Standard Server-Sent Events (SSE) for chat completions.
  2. Socket.IO (Fallback): Real-time bidirectional communication.

Your backend should ideally support both, but HTTP Streaming is prioritized.


1. Health Check

The UI checks if your proxy server is reachable before attempting to connect.

  • Endpoint: GET /health or HEAD /
  • Expected Response: Status 200 OK

2. Models (Optional)

To support dynamic model selection and expose specific capabilities (e.g., multimodal I/O, thinking), implement the models endpoint.

  • Endpoint: GET /v1/models
  • Expected Response:
json
{  "data": [    {      "id": "my-custom-model",      "object": "model",      "created": 1677610602,      "owned_by": "custom",      "capabilities": {        "imageInput": true,        "imageOutput": false,        "thinking": true,        "textInput": true,        "textOutput": true,        "internetBrowsing": false,        "fileOutput": false,        "videoInput": false,        "videoOutput": false      }    },    {      "id": "my-vision-model",      "object": "model",      "capabilities": {        "imageInput": true,        "imageOutput": true,        "textInput": true,        "textOutput": true      }    }  ]}

Supported Models in Example

The example implementation below supports the following models:

  • mock: Echoes back the input.
  • text-in-image-out: Returns a hardcoded image URL.
  • text-img-in-text-img-out: Handles text/image input and returns text/image output.
  • reflection-and-thinking: Simulates a thinking process before responding.
  • text-in-html-out: Returns a hardcoded HTML document.

Capabilities Object

The capabilities object allows you to define what your model can do. All fields are optional and default to false (except textInput and textOutput which default to true).

  • imageInput (boolean): Can accept images in the prompt
  • imageOutput (boolean): Can generate images
  • thinking (boolean): Supports chain-of-thought or reasoning steps
  • textInput (boolean): Can accept text input
  • textOutput (boolean): Can generate text output
  • internetBrowsing (boolean): Can access the internet
  • fileOutput (boolean): Can generate/return files
  • videoInput (boolean): Can accept video input
  • videoOutput (boolean): Can generate video

3. Chat Completions (HTTP Streaming)

This is the primary method for generating chat responses.

  • Endpoint: POST /api/chat
  • Content-Type: application/json

Request Body

json
{  "messages": [    {      "role": "user",      "content": "Hello, how are you?"    },    {      "role": "assistant",      "content": "I am doing well, thank you!"    }  ]}

Response Format (Server-Sent Events)

The server must stream the response using the SSE format. Each chunk of data should be prefixed with data: .

text
data: {"content": "Hello"}data: {"content": " world"}data: {"content": "!"}data: [DONE]
  • Chunk Format: JSON object with a content field containing the text fragment.
  • Termination: Send data: [DONE] to signal the end of the stream.

4. Socket.IO (Fallback)

If HTTP streaming fails, the UI attempts to connect via Socket.IO.

  • Event Name: chat
  • Payload: Same as the HTTP request body.

Server Implementation Example (Node.js)

javascript
io.on("connection", (socket) => {  socket.on("chat", async ({ messages }) => {    try {      // Process chat request...            // Emit chunks      socket.emit("chunk", { content: "Hello" });      socket.emit("chunk", { content: " world" });            // Signal completion      socket.emit("complete");    } catch (error) {      socket.emit("error", { message: "Something went wrong" });    }  });});

5. CORS Configuration

Ensure your server allows Cross-Origin Resource Sharing (CORS) from the domain where the AI Workbench UI is hosted.

javascript
// Express/Node.js exampleapp.use(cors({  origin: "*", // Or specific domain  methods: ["GET", "POST", "OPTIONS"]}));

6. Complete Example Implementation

You can run this example using npx ts-node example-proxy.ts.

typescript
import express from 'express';import cors from 'cors';import bodyParser from 'body-parser';const app = express();const port = 3001;app.use(cors());app.use(bodyParser.json());// Mock Models Definitionconst MODELS = [  {    id: "mock",    object: "model",    created: Date.now(),    owned_by: "custom",    capabilities: { textInput: true, textOutput: true }  },  {    id: "text-in-image-out",    object: "model",    created: Date.now(),    owned_by: "custom",    capabilities: { textInput: true, imageOutput: true }  },  {    id: "text-img-in-text-img-out",    object: "model",    created: Date.now(),    owned_by: "custom",    capabilities: { textInput: true, imageInput: true, textOutput: true, imageOutput: true }  },  {    id: "reflection-and-thinking",    object: "model",    created: Date.now(),    owned_by: "custom",    capabilities: { textInput: true, textOutput: true, thinking: true }  },  {    id: "text-in-html-out",    object: "model",    created: Date.now(),    owned_by: "custom",    capabilities: { textInput: true, textOutput: true }  }];// Health Checkapp.get('/health', (req, res) => {  res.status(200).send('OK');});// List Modelsapp.get('/v1/models', (req, res) => {  res.json({ data: MODELS });});// Chat Completionsapp.post('/v1/chat/completions', (req, res) => {  const { model, messages, stream } = req.body;  const lastMessage = messages[messages.length - 1].content;  if (!stream) {    return res.status(400).json({ error: "Only streaming is supported in this example" });  }  res.setHeader('Content-Type', 'text/event-stream');  res.setHeader('Cache-Control', 'no-cache');  res.setHeader('Connection', 'keep-alive');  const sendChunk = (content: string) => {    res.write(\`data: \${JSON.stringify({      choices: [{ delta: { content } }]    })}\\n\\n\`);  };  const sendThinkingChunk = (content: string) => {    res.write(\`data: \${JSON.stringify({      choices: [{ delta: { content, type: "thinking" } }] // Custom type for thinking    })}\\n\\n\`);  };  const endStream = () => {    res.write('data: [DONE]\\n\\n');    res.end();  };  // Model Logic  if (model === "mock") {    sendChunk("This is a response from the mock model. I received your message: ");    sendChunk(typeof lastMessage === 'string' ? lastMessage : "Multimedia content");    endStream();  }   else if (model === "text-in-image-out") {    sendChunk("Here is your generated image:\\n");    sendChunk("![Generated Image](https://canto.com/cdn/2023/04/07185018/url-for-a-picture-feature-1.jpg)");    endStream();  }  else if (model === "text-img-in-text-img-out") {    if (typeof lastMessage === 'string' && lastMessage.toLowerCase().includes("get me an image")) {       sendChunk("Here is the image you requested:\\n");       sendChunk("![Result](https://canto.com/cdn/2023/04/07185018/url-for-a-picture-feature-1.jpg)");    } else {       sendChunk("I can process images and text. Try asking 'get me an image' or sending an image.");       // Simulate returning a generated image url for image input       sendChunk("\\n![Processed](https://cascadeur.com/images/category/2023/05/25/9c3a16a53b77c84bac4c0cf70b3f26f3.png)");    }    endStream();  }  else if (model === "reflection-and-thinking") {    // Simulate thinking process    sendThinkingChunk("Analyzing the request...\\n");    setTimeout(() => {        sendThinkingChunk("Identifying key concepts...\\n");        setTimeout(() => {            sendChunk("After deep reflection, I have concluded that ");            sendChunk(typeof lastMessage === 'string' ? lastMessage : "this input");            sendChunk(" is indeed interesting.");            endStream();        }, 1000);    }, 1000);  }  else if (model === "text-in-html-out") {      sendChunk("Here is the HTML document you requested:\\n");      sendChunk("```html\\n");      sendChunk("<!DOCTYPE html>\\n<html>\\n<body>\\n<h1>Hello World</h1>\\n<p>This is a hardcoded HTML document.</p>\\n</body>\\n</html>\\n");      sendChunk("```");      endStream();  }  else {    sendChunk(\`Unknown model: \${model}\`);    endStream();  }});app.listen(port, () => {  console.log(\`Example proxy API listening at http://localhost:\${port}\`);});