Skip to main content

AI Extract Operation

What It Does

The ai_extract operation calls an LLM via OpenRouter and returns structured JSON extracted from unstructured text. The result is stored in the DataChain under the operation's key as:

{
"object": { /* extracted fields */ },
"usage": {
"promptTokens": 120,
"completionTokens": 45,
"totalTokens": 165
}
}

Source: src/modules/visual_flows/operations/ai-extract.ts


Options Schema

FieldTypeRequiredDefaultDescription
modelstringNogoogle/gemini-2.0-flash-exp:freeOpenRouter model ID
inputstringYesText to extract from. Supports {{ variable }} interpolation
system_promptstringNoInstructions to the LLM for how to extract
schema_fieldsSchemaField[]YesFields to extract (see below)
fallback_on_errorbooleanNofalseReturn {} instead of failing the flow on LLM error

SchemaField

interface SchemaField {
name: string;
type: "string" | "number" | "boolean" | "enum" | "array" | "object";
description?: string;
enumValues?: string[]; // Required when type = "enum"
required?: boolean;
}

Implementation Notes

  • Uses generateText() (not generateObject()) for maximum compatibility across all OpenRouter models.
  • Schema fields are appended as a JSON shape hint at the end of the system prompt, instructing the LLM to return valid JSON.
  • The response is parsed by extracting the first JSON block found in the LLM output.
  • If fallback_on_error is true, any parsing or API error returns { object: {}, usage: {} } and continues the flow.

Example Configuration

Extract structured order information from a vendor email:

{
"model": "google/gemini-2.0-flash-exp:free",
"input": "Subject: {{ $trigger.subject }}\n\n{{ $trigger.html_body }}",
"system_prompt": "Extract order info from this vendor email. Be precise.",
"schema_fields": [
{
"name": "email_type",
"type": "enum",
"enumValues": ["order_received", "confirmation", "shipped", "delivered", "other"],
"required": true
},
{ "name": "order_number", "type": "string" },
{ "name": "vendor", "type": "string" },
{ "name": "items", "type": "array" },
{ "name": "total", "type": "number" }
],
"fallback_on_error": true
}

With this operation saved under the key extracted, downstream operations can reference:

VariableValue
{{ extracted.object.email_type }}"order_received"
{{ extracted.object.order_number }}"PO-2024-001"
{{ extracted.object.vendor }}"Acme Textiles"
{{ extracted.object.items }}Array of line items
{{ extracted.object.total }}1250.00
{{ extracted.usage.totalTokens }}165

DataChain Output Shape

// dataChain["extracted"]
{
object: {
email_type: "order_received",
order_number: "PO-2024-001",
vendor: "Acme Textiles",
items: [...],
total: 1250.00,
},
usage: {
promptTokens: 120,
completionTokens: 45,
totalTokens: 165,
}
}

Error Handling

ScenarioWith fallback_on_error: falseWith fallback_on_error: true
LLM API errorFlow fails, logs errorFlow continues, object: {}
Invalid JSON responseFlow fails, logs errorFlow continues, object: {}
Model not availableFlow fails, logs errorFlow continues, object: {}