Overview
How the Extract API request and response flow works.
The Extract API accepts a document plus a schema definition and returns structured JSON that matches your schema.
Request Shape
| Field | Type | Description |
|---|---|---|
file | File | PDF or image upload sent as multipart/form-data. |
schema | String | Inline JSON schema for the extraction result. |
schema_id | UUID | Saved schema ID from your Struct PDF account. |
Authorization | Header | Bearer token header: Bearer <API_KEY>. |
X-API-Key | Header | Alternative API key header if you do not use Authorization. |
You can provide either schema or schema_id. Supported uploads include PDFs and common image formats such as PNG and JPEG.
Response Shape
| Field | Type | Description |
|---|---|---|
generationId | UUID | Extraction request identifier. |
success | String | Overall extraction status. |
result | Object | Structured JSON matching your schema. |
metadata.findings | Array | Evidence and snippets for extracted fields. |
metadata.errors | Array | Field-level extraction issues, if any. |
Findings
The metadata.findings array follows the schema you provide. Each finding points back to a field in your requested output shape, so you can map extracted evidence directly to the same structure you expect in result.
Struct PDF uses dot notation in schema_key to describe where the evidence belongs:
| Pattern | Meaning | Example |
|---|---|---|
| flat field | top-level field in your result | total |
| nested object | field inside an object | address.city |
| array item | field inside a specific array item | items.0.name |
That means the evidence format stays predictable even when your schema contains nested objects or arrays.
| Requested shape | Example finding keys |
|---|---|
customer.email | customer.email |
items[].price | items.0.price, items.1.price |
This makes it easier to:
- connect extracted values back to UI fields
- show evidence next to the exact field a user cares about
- troubleshoot ambiguous extractions without guessing where a finding belongs
Error Handling
The success field reports the overall extraction outcome:
| Status | Meaning |
|---|---|
| Complete | The requested fields were extracted without field-level issues. |
| Partial | The API returned a usable result, but one or more fields also produced issues in metadata.errors. |
| Fail | The extraction could not produce a usable result for the request. |
Partial is the most common non-terminal state. For example, if a document contains multiple plausible values for the same field, Struct PDF may still return a result while recording an error for that schema key. That lets you keep the successful parts of the extraction while also surfacing what needs review.
Example: if a document contains several different values that could all map to the same field, the extraction may return Partial so you still receive the usable output together with the ambiguity in
metadata.errors.
Use metadata.errors together with metadata.findings when you need to:
- detect fields that need manual review
- explain why a value was not returned cleanly
- handle ambiguous documents where several matches appear on the page
OpenAPI Schema and Tools
Struct PDF publishes a standard OpenAPI schema, which means you can plug the API into tools that understand Swagger and OpenAPI without hand-writing the full contract yourself. That includes API explorers, code generators, typed clients, and internal developer tooling.
The OpenAPI document is available in the live API Reference, alongside the machine-readable schema.
import SwaggerClient from 'swagger-client';
const client = await SwaggerClient({
url: 'https://api.structpdf.com/openapi.json',
requestInterceptor: (request) => {
request.headers.Authorization = `Bearer ${process.env.STRUCTPDF_API_KEY}`;
return request;
},
});
const formData = new FormData();
formData.append('file', file, 'receipt.pdf');
formData.append(
'schema',
JSON.stringify({
type: 'object',
properties: {
total: { type: 'number' },
tax: { type: 'number' },
},
}),
);
const response = await client.apis.default.extract({
file: formData.get('file'),
schema: formData.get('schema'),
});
console.log(response.body);
Zod-based Schema Format
Struct PDF works well with schemas that originate from Zod. That makes it easier to keep your extraction shape close to the validation rules you already use in your app, then convert that shape into the JSON schema sent to the API or managed through the Schema Builder.
You can define a schema in Zod, convert it to JSON schema, and send the result directly to the Extract API:
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const ReceiptSchema = z.object({
guest_count: z.number(),
tax: z.number(),
total: z.number(),
tip: z.number(),
subtotal: z.number(),
});
const extractionSchema = zodToJsonSchema(ReceiptSchema, 'ReceiptSchema');
const formData = new FormData();
formData.append('file', file, 'receipt.pdf');
formData.append('schema', JSON.stringify(extractionSchema));
const response = await fetch('https://api.structpdf.com/v1/extract', {
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`,
},
body: formData,
});
console.log(await response.json());