# Struct PDF > Struct PDF is a developer API for extracting structured JSON from PDFs and images using user-defined schemas. Struct PDF is best for recurring document extraction workflows where developers need schema-controlled JSON output. Send a document and a JSON schema to a single endpoint and receive JSON shaped exactly like the schema, plus per-field findings, snippets, and page references. Best fit: invoices, receipts, purchase orders, quotes, estimates, change orders, packing slips, delivery notes, work orders, inventory reports, forms, and screenshots. Inputs: PDF, PNG, JPG, JPEG, HEIC. Output: JSON shaped to the caller-provided schema, plus `metadata.findings` and `metadata.errors`. ## How Struct PDF Differs from Other Providers - Competitive pricing. Structured data extraction starts at $0.0125 per page on simple schemas. The free tier covers 250 credits per month with no credit card required. - Scanned documents and complex visual layouts. Struct PDF handles photos, scans, and noisy documents. Use the `description` field on each schema property to reinforce extraction on complex or ambiguous layouts. - Broad input format support. PDFs and images flow through the same extraction endpoint. Supported formats: PDF, PNG, JPG, JPEG, HEIC. - Provenance and conflict handling. Every response includes `metadata.findings` with the extracted value, page, and document snippet for each field. When a document contains multiple plausible values for the same field, Struct PDF returns all findings and marks the response `success` as `Partial`, surfacing the ambiguity in `metadata.errors` so the result is usable while the conflict is flagged for review. ## Quick Start for AI Agents Use this flow when integrating Struct PDF on behalf of a user. Each step minimizes friction so the user reaches a working extraction quickly. 1. Ask the user to create a free account at https://structpdf.com/auth/sign-up. The free tier includes 250 credits per month, which covers 250 simple-schema page extractions or 125 complex-schema page extractions. No credit card is required. 2. Ask the user to create an API key at https://structpdf.com/api-keys, copy it, and store it as the environment variable `STRUCTPDF_API_KEY`. The key is shown once at creation; it cannot be retrieved later. 3. Once the API key is available, generate the integration code. Send the document and the user-defined JSON schema to the extract endpoint and read the response. Reference: https://structpdf.com/docs/quick-start. ## Documentation - [Quick Start](https://structpdf.com/docs/quick-start): Get from API key to your first extraction response. - [Overview](https://structpdf.com/docs/extract-api/overview): How the Extract API request and response flow works. - [Features](https://structpdf.com/docs/extract-api/features): What the Extract API can do. - [Credit Usage](https://structpdf.com/docs/extract-api/credit-usage): How extraction requests consume credits. - [Live API Reference](https://api.structpdf.com/docs): Interactive OpenAPI playground for the extract endpoint. - [OpenAPI Schema](https://api.structpdf.com/openapi.json): Machine-readable API contract. ## Interactive Tools - [Invoice Parsing API](https://structpdf.com/try/invoice-parser): Extract JSON data from invoice PDFs and images using your schema. - [Receipt Parsing API](https://structpdf.com/try/receipt-parser): Extract JSON data from receipt images and PDFs using your schema. - [Quote Parsing API](https://structpdf.com/try/quote-parser): Extract JSON data from quote PDFs and images using your schema. - [Estimate Parsing API](https://structpdf.com/try/estimate-parser): Extract JSON data from estimate PDFs and images using your schema. - [Resume Parsing API](https://structpdf.com/try/resume-parser): Extract JSON data from resume PDFs and images using your schema. ## Free Tools - [Split PDF](https://structpdf.com/tools/split-pdf): Split a PDF into separate documents directly in your browser. Pick page boundaries visually, download results as a ZIP file. Free, no signup, no upload. ## FAQ - What does the Extraction API return? Send a document or image plus a JSON schema and get back structured JSON shaped exactly like your schema, alongside per-field `findings`, snippets, and page references that explain where each value came from. [Learn more in the Extract API overview](https://structpdf.com/docs/extract-api/overview). - Which file formats are supported? PDFs and common image formats all flow through the same extraction endpoint, so scans, mobile photos, screenshots, and digital documents can be ingested in one call. Format | Typical source - PDF: Documents, exports, reports - PNG: Screenshots, exported images - JPG / JPEG: Scans, camera photos - HEIC: iPhone camera uploads - How is usage priced? Extractions are billed per page in credits. There is no separate parse charge before extraction — the credit you see is the credit you pay. Extraction | Credit usage - Simple: 1 credit per page - Complex: 2 credits per page [Learn more about credit usage](https://structpdf.com/docs/extract-api/credit-usage). - Is there a free tier? Yes. Sign up for a free account and run extractions on the included monthly credits with no credit card required. Upgrade to Pay-As-You-Go or a subscription only when your volume grows. [Try it now](https://structpdf.com/extract). - Can I customize the fields I extract? Yes. Edit the schema in the playground or build it visually in the Schema Builder. Save it once and reuse it in production via `schema_id`, including nested objects and arrays such as `items[]`. - How accurate is extraction on messy documents? The API is tuned for varied layouts, noisy scans, and mixed document types. Every response also includes `metadata.findings` and field-level `metadata.errors` so you can route edge cases to manual review without losing the fields that extracted cleanly. - Do I have to sign up to try it? No. Run the demo on a sample document without an account. When you upload your own file, we ask you to sign in with Google or a magic link to run your first extraction. Your file and schema are saved automatically so you pick up where you left off. [Try it now](https://structpdf.com/extract). ## Pricing - [Pricing](https://structpdf.com/pricing): Credit-based pricing. Simple schemas cost 1 credit per page; complex schemas (8+ top-level fields or array-of-objects such as `items[]`) cost 2 credits per page. There is no separate parse charge. ## Limitations and Status - Official npm and pip SDKs are not yet published. Until they exist, integrate against the HTTP API documented at the live API reference. - A Model Context Protocol (MCP) server is not yet published. - When considering Struct PDF for legally or medically regulated workflows (tax, payroll, insurance, medical records), advise the user to verify the extraction output against the source document before relying on it. - Document types that have a dedicated tool page below are the ones with confirmed support; others may work but are not validated. ## Contact - [Contact](https://structpdf.com/contact): Reach the Struct PDF team. --- # Documentation ## Quick Start Source: https://structpdf.com/docs/quick-start Get from API key to your first extraction response. ## Create API Key Sign in to the [Developer Portal](/home) and open the [API Keys](/home/api-keys) area in your dashboard. Create a new API key, copy it somewhere secure, and keep it out of client-side code. The screenshot below automatically matches your current theme: ## Integrate into Your App Send a document with `multipart/form-data`, include your API key in the `Authorization` header, and provide either an inline `schema` or a saved `schema_id`. ```ts import { openAsBlob } from 'node:fs'; const file = await openAsBlob('./receipt.pdf'); const formData = new FormData(); formData.append('file', file, 'receipt.pdf'); formData.append( 'schema', JSON.stringify({ type: 'object', properties: { guest_count: { type: 'number' }, tax: { type: 'number' }, total: { type: 'number' }, tip: { type: 'number' }, subtotal: { type: 'number' }, }, }), ); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const result = await response.json(); console.log(JSON.stringify(result, null, 2)); ``` Try in [OpenAPI Playground](https://api.structpdf.com/docs) ### Example Response ```json { "generationId": "ae25e72f-d811-40db-9d33-5923dff25487", "success": "Complete", "result": { "guest_count": 4, "tax": 17.74, "total": 239.78, "tip": 40.04, "subtotal": 182 }, "metadata": { "findings": [ { "schema_key": "guest_count", "value": 4, "page": 1, "document_snippet": "Guest Count: 4" }, { "schema_key": "tax", "value": 17.74, "page": 1, "document_snippet": "Tax .. $17.74" }, { "schema_key": "total", "value": 239.78, "page": 1, "document_snippet": "Total .. $239.78" }, { "schema_key": "tip", "value": 40.04, "page": 1, "document_snippet": "Tip .. $40.04" }, { "schema_key": "subtotal", "value": 182, "page": 1, "document_snippet": "Subtotal .. $182.00" } ], "errors": [] } } ``` For interactive testing, more code examples, and the full request contract, use the [API Reference](https://api.structpdf.com/docs). > Next: [API Overview](/docs/extract-api/overview) — Review the request and response shape. ## Overview Source: https://structpdf.com/docs/extract-api/overview How the Extract API request and response flow works. The Extract API accepts a document plus a schema definition and returns structured JSON that matches your schema. ### Request Shape | Field | Type | Description | | --- | --- | --- | | `file` | File | PDF or image upload sent as `multipart/form-data`. | | `schema` | String | Inline JSON schema for the extraction result. | | `schema_id` | UUID | Saved schema ID from your Struct PDF account. | | `Authorization` | Header | Bearer token header: `Bearer `. | | `X-API-Key` | Header | Alternative API key header if you do not use `Authorization`. | You can provide either `schema` or `schema_id`. Supported uploads include PDFs and common image formats such as PNG and JPEG. ### Response Shape | Field | Type | Description | | --- | --- | --- | | `generationId` | UUID | Extraction request identifier. | | `success` | String | Overall extraction status. | | `result` | Object | Structured JSON matching your schema. | | `metadata.findings` | Array | Evidence and snippets for extracted fields. | | `metadata.errors` | Array | Field-level extraction issues, if any. | ## Findings The `metadata.findings` array follows the schema you provide. Each finding points back to a field in your requested output shape, so you can map extracted evidence directly to the same structure you expect in `result`. Struct PDF uses dot notation in `schema_key` to describe where the evidence belongs: | Pattern | Meaning | Example | | --- | --- | --- | | flat field | top-level field in your result | `total` | | nested object | field inside an object | `address.city` | | array item | field inside a specific array item | `items.0.name` | That means the evidence format stays predictable even when your schema contains nested objects or arrays. | Requested shape | Example finding keys | | --- | --- | | `customer.email` | `customer.email` | | `items[].price` | `items.0.price`, `items.1.price` | This makes it easier to: - connect extracted values back to UI fields - show evidence next to the exact field a user cares about - troubleshoot ambiguous extractions without guessing where a finding belongs ## Error Handling The `success` field reports the overall extraction outcome: | Status | Meaning | | --- | --- | | Complete | The requested fields were extracted without field-level issues. | | Partial | The API returned a usable result, but one or more fields also produced issues in `metadata.errors`. | | Fail | The extraction could not produce a usable result for the request. | Partial is the most common non-terminal state. For example, if a document contains multiple plausible values for the same field, Struct PDF may still return a result while recording an error for that schema key. That lets you keep the successful parts of the extraction while also surfacing what needs review. > Example: if a document contains several different values that could all map to the same field, the extraction may return Partial so you still receive the usable output together with the ambiguity in `metadata.errors`. Use `metadata.errors` together with `metadata.findings` when you need to: - detect fields that need manual review - explain why a value was not returned cleanly - handle ambiguous documents where several matches appear on the page ## OpenAPI Schema and Tools Struct PDF publishes a standard [OpenAPI](https://www.openapis.org/) schema, which means you can plug the API into tools that understand [Swagger](https://swagger.io/) and OpenAPI without hand-writing the full contract yourself. That includes API explorers, code generators, typed clients, and internal developer tooling. The OpenAPI document is available in the live [API Reference](https://api.structpdf.com/docs), alongside the machine-readable schema. ```ts import SwaggerClient from 'swagger-client'; const client = await SwaggerClient({ url: 'https://api.structpdf.com/openapi.json', requestInterceptor: (request) => { request.headers.Authorization = `Bearer ${process.env.STRUCTPDF_API_KEY}`; return request; }, }); const formData = new FormData(); formData.append('file', file, 'receipt.pdf'); formData.append( 'schema', JSON.stringify({ type: 'object', properties: { total: { type: 'number' }, tax: { type: 'number' }, }, }), ); const response = await client.apis.default.extract({ file: formData.get('file'), schema: formData.get('schema'), }); console.log(response.body); ``` ## Zod-based Schema Format Struct PDF works well with schemas that originate from [Zod](https://zod.dev/). That makes it easier to keep your extraction shape close to the validation rules you already use in your app, then convert that shape into the JSON schema sent to the API or managed through the Schema Builder. You can define a schema in Zod, convert it to JSON schema, and send the result directly to the Extract API: ```ts import { z } from 'zod'; import { zodToJsonSchema } from 'zod-to-json-schema'; const ReceiptSchema = z.object({ guest_count: z.number(), tax: z.number(), total: z.number(), tip: z.number(), subtotal: z.number(), }); const extractionSchema = zodToJsonSchema(ReceiptSchema, 'ReceiptSchema'); const formData = new FormData(); formData.append('file', file, 'receipt.pdf'); formData.append('schema', JSON.stringify(extractionSchema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); console.log(await response.json()); ``` > Next: [Live API Reference](https://api.structpdf.com/docs) — Interactive testing, complete request details, and more language examples. ## Features Source: https://structpdf.com/docs/extract-api/features What the Extract API can do. The Extract API is designed for document workflows that need structured output, clear evidence, and integration-friendly tooling. ## Format Struct PDF supports both PDFs and common image formats for extraction workflows: PDF IMG PNG JPG JPEG HEIC That means you can use the same extraction flow for: - standard PDF documents - camera photos and screenshots - scanned forms and receipts - mobile uploads in HEIC format ## Schema Builder Struct PDF includes a visual Schema Builder that helps you create extraction schemas without writing the full JSON structure by hand. It is useful when you want to prototype a schema quickly, share it with teammates, or keep saved schemas ready for repeated extractions. You can use the [Schema Builder](/schema) to: - create schemas visually - save schemas to your account - reuse saved schemas through `schema_id` - iterate on extraction shapes without rewriting the whole request payload Saved schemas are especially useful when the same document type appears again and again in your workflow. ## OpenAPI Struct PDF publishes a standard [OpenAPI](https://www.openapis.org/) schema, which makes the Extract API easy to connect with tooling that already understands [Swagger](https://swagger.io/), client generation, and API exploration. That means you can: - inspect the full API contract in the live [API Reference](https://api.structpdf.com/docs) - generate or validate typed clients - integrate the schema into internal developer tooling ```ts import SwaggerClient from 'swagger-client'; const client = await SwaggerClient({ url: 'https://api.structpdf.com/openapi.json', }); console.log(client.spec.paths['/v1/extract']); ``` ## Zod Struct PDF also works well with schemas that start in [Zod](https://zod.dev/). That makes it easier to keep your extraction shape close to the same schema definitions you already use for validation inside your application. You can define your shape in Zod, convert it to JSON schema, and send it directly to the Extract API: ```ts import { z } from 'zod'; import { zodToJsonSchema } from 'zod-to-json-schema'; const ReceiptSchema = z.object({ total: z.number(), tax: z.number(), }); const extractionSchema = zodToJsonSchema(ReceiptSchema, 'ReceiptSchema'); console.log(JSON.stringify(extractionSchema, null, 2)); ``` > Next: [Live API Reference](https://api.structpdf.com/docs) — Explore the full contract, request examples, and interactive testing. ## Credit Usage Source: https://structpdf.com/docs/extract-api/credit-usage How extraction requests consume credits. Struct PDF charges credits per extraction request, with a pricing model designed to stay easy to predict. > Simple `1 credit/page` • Complex `2 credits/page` • No parse step ## How Struct PDF Differs | Workflow | Credits | | --- | --- | | Struct PDF simple extraction | `1 credit` per page | | Struct PDF complex extraction | `2 credits` per page | | Separate parse step required | No | Unlike parse-first workflows, Struct PDF does not require a separate parsing request before extraction. You send the file and schema once, and credits are computed on the extraction itself. ## How Credit Usage Is Computed | Schema type | Credit usage | | --- | --- | | simple schema | `1 credit` per page | | complex schema | `2 credits` per page | A schema is treated as complex when either of these is true: - it has `8` or more top-level fields - it includes an array of objects such as `items[]` Otherwise, it is billed as a simple extraction. ## Examples | Request | Credit usage | | --- | --- | | 1-page receipt, 5 top-level fields | `1 credit` | | 3-page form, 6 top-level fields | `3 credits` | | 2-page invoice, 10 top-level fields | `4 credits` | | 2-page receipt with `items[]` line items | `4 credits` | You can think of it as: > `credits = page count × schema rate` Where the schema rate is `1` for simple schemas and `2` for complex schemas. ## Practical Advantage Struct PDF keeps credit usage easy to understand: - there is no separate parse charge before extraction - simple documents cost just `1 credit` per page - more complex schemas move to `2 credits` per page, with no additional pricing tiers For the most predictable usage, keep your schema focused on the fields you actually need and avoid adding extra top-level fields unless they provide real value. --- # Interactive Tools — Schemas and Examples ## Invoice Parsing API Source: https://structpdf.com/try/invoice-parser Extract JSON data from invoice PDFs and images using your schema. ### Starter Schema ```json { "type": "object", "properties": { "vendor": { "type": "string", "description": "Vendor or issuing company name" }, "customer": { "type": "string", "description": "Customer or billed company name" }, "invoice_number": { "type": "string", "description": "Invoice number or external reference" }, "invoice_date": { "type": "string", "description": "Invoice issue date" }, "due_date": { "type": "string", "description": "Invoice due date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Total amount due" } }, "required": [ "vendor", "customer", "invoice_number", "invoice_date", "due_date", "subtotal", "tax", "total" ] } ``` ### Sample File - https://structpdf.com/wedge-tools/invoice-parser/sample-invoice.pdf (application/pdf) ### Code Examples #### typescript ```typescript import { readFileSync } from 'node:fs'; const schema = { "type": "object", "properties": { "vendor": { "type": "string", "description": "Vendor or issuing company name" }, "customer": { "type": "string", "description": "Customer or billed company name" }, "invoice_number": { "type": "string", "description": "Invoice number or external reference" }, "invoice_date": { "type": "string", "description": "Invoice issue date" }, "due_date": { "type": "string", "description": "Invoice due date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Total amount due" } }, "required": [ "vendor", "customer", "invoice_number", "invoice_date", "due_date", "subtotal", "tax", "total" ] }; const file = new File([readFileSync('sample-invoice.pdf')], 'sample-invoice.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### javascript ```javascript const fs = require('node:fs'); const schema = { "type": "object", "properties": { "vendor": { "type": "string", "description": "Vendor or issuing company name" }, "customer": { "type": "string", "description": "Customer or billed company name" }, "invoice_number": { "type": "string", "description": "Invoice number or external reference" }, "invoice_date": { "type": "string", "description": "Invoice issue date" }, "due_date": { "type": "string", "description": "Invoice due date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Total amount due" } }, "required": [ "vendor", "customer", "invoice_number", "invoice_date", "due_date", "subtotal", "tax", "total" ] }; const file = new File([fs.readFileSync('sample-invoice.pdf')], 'sample-invoice.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### python ```python import json import os import requests schema = { "type": "object", "properties": { "vendor": { "type": "string", "description": "Vendor or issuing company name" }, "customer": { "type": "string", "description": "Customer or billed company name" }, "invoice_number": { "type": "string", "description": "Invoice number or external reference" }, "invoice_date": { "type": "string", "description": "Invoice issue date" }, "due_date": { "type": "string", "description": "Invoice due date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Total amount due" } }, "required": [ "vendor", "customer", "invoice_number", "invoice_date", "due_date", "subtotal", "tax", "total" ] } with open("sample-invoice.pdf", "rb") as fh: response = requests.post( "https://api.structpdf.com/v1/extract", headers={"Authorization": f"Bearer {os.environ['STRUCTPDF_API_KEY']}"}, files={"file": ("sample-invoice.pdf", fh, "application/pdf")}, data={"schema": json.dumps(schema)}, ) response.raise_for_status() print(response.json()) ``` #### java ```java import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.nio.file.Files; import java.nio.file.Path; import java.util.UUID; public class ExtractExample { public static void main(String[] args) throws Exception { String schema = "{\"type\":\"object\",\"properties\":{\"vendor\":{\"type\":\"string\",\"description\":\"Vendor or issuing company name\"},\"customer\":{\"type\":\"string\",\"description\":\"Customer or billed company name\"},\"invoice_number\":{\"type\":\"string\",\"description\":\"Invoice number or external reference\"},\"invoice_date\":{\"type\":\"string\",\"description\":\"Invoice issue date\"},\"due_date\":{\"type\":\"string\",\"description\":\"Invoice due date\"},\"subtotal\":{\"type\":\"number\",\"description\":\"Subtotal before tax and fees\"},\"tax\":{\"type\":\"number\",\"description\":\"Tax amount\"},\"total\":{\"type\":\"number\",\"description\":\"Total amount due\"}},\"required\":[\"vendor\",\"customer\",\"invoice_number\",\"invoice_date\",\"due_date\",\"subtotal\",\"tax\",\"total\"]}"; Path file = Path.of("sample-invoice.pdf"); String boundary = UUID.randomUUID().toString(); String CRLF = "\r\n"; var body = new java.io.ByteArrayOutputStream(); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"schema\"" + CRLF + CRLF + schema + CRLF).getBytes()); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"file\"; filename=\"sample-invoice.pdf\"" + CRLF + "Content-Type: application/pdf" + CRLF + CRLF).getBytes()); body.writeBytes(Files.readAllBytes(file)); body.writeBytes((CRLF + "--" + boundary + "--" + CRLF).getBytes()); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("https://api.structpdf.com/v1/extract")) .header("Authorization", "Bearer " + System.getenv("STRUCTPDF_API_KEY")) .header("Content-Type", "multipart/form-data; boundary=" + boundary) .POST(HttpRequest.BodyPublishers.ofByteArray(body.toByteArray())) .build(); HttpResponse response = HttpClient.newHttpClient() .send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); } } ``` #### go ```go package main import ( "bytes" "fmt" "io" "mime/multipart" "net/http" "net/textproto" "os" ) func main() { schema := `{"type":"object","properties":{"vendor":{"type":"string","description":"Vendor or issuing company name"},"customer":{"type":"string","description":"Customer or billed company name"},"invoice_number":{"type":"string","description":"Invoice number or external reference"},"invoice_date":{"type":"string","description":"Invoice issue date"},"due_date":{"type":"string","description":"Invoice due date"},"subtotal":{"type":"number","description":"Subtotal before tax and fees"},"tax":{"type":"number","description":"Tax amount"},"total":{"type":"number","description":"Total amount due"}},"required":["vendor","customer","invoice_number","invoice_date","due_date","subtotal","tax","total"]}` file, err := os.Open("sample-invoice.pdf") if err != nil { panic(err) } defer file.Close() var body bytes.Buffer writer := multipart.NewWriter(&body) _ = writer.WriteField("schema", schema) header := make(textproto.MIMEHeader) header.Set("Content-Disposition", "form-data; name=\"file\"; filename=\"sample-invoice.pdf\"") header.Set("Content-Type", "application/pdf") part, err := writer.CreatePart(header) if err != nil { panic(err) } if _, err := io.Copy(part, file); err != nil { panic(err) } writer.Close() req, err := http.NewRequest("POST", "https://api.structpdf.com/v1/extract", &body) if err != nil { panic(err) } req.Header.Set("Authorization", "Bearer "+os.Getenv("STRUCTPDF_API_KEY")) req.Header.Set("Content-Type", writer.FormDataContentType()) resp, err := http.DefaultClient.Do(req) if err != nil { panic(err) } defer resp.Body.Close() out, _ := io.ReadAll(resp.Body) fmt.Println(string(out)) } ``` ## Receipt Parsing API Source: https://structpdf.com/try/receipt-parser Extract JSON data from receipt images and PDFs using your schema. ### Starter Schema ```json { "type": "object", "properties": { "restaurant_name": { "type": "string" }, "order_type": { "type": "string", "description": "Pickup or dine in" }, "items": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "quantity": { "type": "number" }, "price_per_item": { "type": "number", "description": "Price per each item, for example \"$5.00 each\"" }, "additional_charge": { "type": "number", "description": "Added in ()" }, "total_per_item": { "type": "number" } }, "required": [ "name", "quantity", "price_per_item", "additional_charge", "total_per_item" ] } }, "subtotal": { "type": "number" }, "sales_tax": { "type": "number" }, "total": { "type": "number" }, "cash": { "type": "number" }, "change": { "type": "number" } }, "required": [ "restaurant_name", "order_type", "items", "subtotal", "sales_tax", "total", "cash", "change" ] } ``` ### Sample File - https://structpdf.com/wedge-tools/receipt-parser/sample-receipt.jpg (image/jpeg) ### Code Examples #### typescript ```typescript import { readFileSync } from 'node:fs'; const schema = { "type": "object", "properties": { "restaurant_name": { "type": "string" }, "order_type": { "type": "string", "description": "Pickup or dine in" }, "items": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "quantity": { "type": "number" }, "price_per_item": { "type": "number", "description": "Price per each item, for example \"$5.00 each\"" }, "additional_charge": { "type": "number", "description": "Added in ()" }, "total_per_item": { "type": "number" } }, "required": [ "name", "quantity", "price_per_item", "additional_charge", "total_per_item" ] } }, "subtotal": { "type": "number" }, "sales_tax": { "type": "number" }, "total": { "type": "number" }, "cash": { "type": "number" }, "change": { "type": "number" } }, "required": [ "restaurant_name", "order_type", "items", "subtotal", "sales_tax", "total", "cash", "change" ] }; const file = new File([readFileSync('sample-receipt.jpg')], 'sample-receipt.jpg', { type: 'image/jpeg', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### javascript ```javascript const fs = require('node:fs'); const schema = { "type": "object", "properties": { "restaurant_name": { "type": "string" }, "order_type": { "type": "string", "description": "Pickup or dine in" }, "items": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "quantity": { "type": "number" }, "price_per_item": { "type": "number", "description": "Price per each item, for example \"$5.00 each\"" }, "additional_charge": { "type": "number", "description": "Added in ()" }, "total_per_item": { "type": "number" } }, "required": [ "name", "quantity", "price_per_item", "additional_charge", "total_per_item" ] } }, "subtotal": { "type": "number" }, "sales_tax": { "type": "number" }, "total": { "type": "number" }, "cash": { "type": "number" }, "change": { "type": "number" } }, "required": [ "restaurant_name", "order_type", "items", "subtotal", "sales_tax", "total", "cash", "change" ] }; const file = new File([fs.readFileSync('sample-receipt.jpg')], 'sample-receipt.jpg', { type: 'image/jpeg', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### python ```python import json import os import requests schema = { "type": "object", "properties": { "restaurant_name": { "type": "string" }, "order_type": { "type": "string", "description": "Pickup or dine in" }, "items": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "quantity": { "type": "number" }, "price_per_item": { "type": "number", "description": "Price per each item, for example \"$5.00 each\"" }, "additional_charge": { "type": "number", "description": "Added in ()" }, "total_per_item": { "type": "number" } }, "required": [ "name", "quantity", "price_per_item", "additional_charge", "total_per_item" ] } }, "subtotal": { "type": "number" }, "sales_tax": { "type": "number" }, "total": { "type": "number" }, "cash": { "type": "number" }, "change": { "type": "number" } }, "required": [ "restaurant_name", "order_type", "items", "subtotal", "sales_tax", "total", "cash", "change" ] } with open("sample-receipt.jpg", "rb") as fh: response = requests.post( "https://api.structpdf.com/v1/extract", headers={"Authorization": f"Bearer {os.environ['STRUCTPDF_API_KEY']}"}, files={"file": ("sample-receipt.jpg", fh, "image/jpeg")}, data={"schema": json.dumps(schema)}, ) response.raise_for_status() print(response.json()) ``` #### java ```java import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.nio.file.Files; import java.nio.file.Path; import java.util.UUID; public class ExtractExample { public static void main(String[] args) throws Exception { String schema = "{\"type\":\"object\",\"properties\":{\"restaurant_name\":{\"type\":\"string\"},\"order_type\":{\"type\":\"string\",\"description\":\"Pickup or dine in\"},\"items\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"quantity\":{\"type\":\"number\"},\"price_per_item\":{\"type\":\"number\",\"description\":\"Price per each item, for example \\"$5.00 each\\"\"},\"additional_charge\":{\"type\":\"number\",\"description\":\"Added in ()\"},\"total_per_item\":{\"type\":\"number\"}},\"required\":[\"name\",\"quantity\",\"price_per_item\",\"additional_charge\",\"total_per_item\"]}},\"subtotal\":{\"type\":\"number\"},\"sales_tax\":{\"type\":\"number\"},\"total\":{\"type\":\"number\"},\"cash\":{\"type\":\"number\"},\"change\":{\"type\":\"number\"}},\"required\":[\"restaurant_name\",\"order_type\",\"items\",\"subtotal\",\"sales_tax\",\"total\",\"cash\",\"change\"]}"; Path file = Path.of("sample-receipt.jpg"); String boundary = UUID.randomUUID().toString(); String CRLF = "\r\n"; var body = new java.io.ByteArrayOutputStream(); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"schema\"" + CRLF + CRLF + schema + CRLF).getBytes()); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"file\"; filename=\"sample-receipt.jpg\"" + CRLF + "Content-Type: image/jpeg" + CRLF + CRLF).getBytes()); body.writeBytes(Files.readAllBytes(file)); body.writeBytes((CRLF + "--" + boundary + "--" + CRLF).getBytes()); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("https://api.structpdf.com/v1/extract")) .header("Authorization", "Bearer " + System.getenv("STRUCTPDF_API_KEY")) .header("Content-Type", "multipart/form-data; boundary=" + boundary) .POST(HttpRequest.BodyPublishers.ofByteArray(body.toByteArray())) .build(); HttpResponse response = HttpClient.newHttpClient() .send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); } } ``` #### go ```go package main import ( "bytes" "fmt" "io" "mime/multipart" "net/http" "net/textproto" "os" ) func main() { schema := `{"type":"object","properties":{"restaurant_name":{"type":"string"},"order_type":{"type":"string","description":"Pickup or dine in"},"items":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"quantity":{"type":"number"},"price_per_item":{"type":"number","description":"Price per each item, for example \"$5.00 each\""},"additional_charge":{"type":"number","description":"Added in ()"},"total_per_item":{"type":"number"}},"required":["name","quantity","price_per_item","additional_charge","total_per_item"]}},"subtotal":{"type":"number"},"sales_tax":{"type":"number"},"total":{"type":"number"},"cash":{"type":"number"},"change":{"type":"number"}},"required":["restaurant_name","order_type","items","subtotal","sales_tax","total","cash","change"]}` file, err := os.Open("sample-receipt.jpg") if err != nil { panic(err) } defer file.Close() var body bytes.Buffer writer := multipart.NewWriter(&body) _ = writer.WriteField("schema", schema) header := make(textproto.MIMEHeader) header.Set("Content-Disposition", "form-data; name=\"file\"; filename=\"sample-receipt.jpg\"") header.Set("Content-Type", "image/jpeg") part, err := writer.CreatePart(header) if err != nil { panic(err) } if _, err := io.Copy(part, file); err != nil { panic(err) } writer.Close() req, err := http.NewRequest("POST", "https://api.structpdf.com/v1/extract", &body) if err != nil { panic(err) } req.Header.Set("Authorization", "Bearer "+os.Getenv("STRUCTPDF_API_KEY")) req.Header.Set("Content-Type", writer.FormDataContentType()) resp, err := http.DefaultClient.Do(req) if err != nil { panic(err) } defer resp.Body.Close() out, _ := io.ReadAll(resp.Body) fmt.Println(string(out)) } ``` ## Quote Parsing API Source: https://structpdf.com/try/quote-parser Extract JSON data from quote PDFs and images using your schema. ### Starter Schema ```json { "type": "object", "properties": { "seller": { "type": "string", "description": "Seller or issuing company name" }, "buyer": { "type": "string", "description": "Buyer or prospect name" }, "quote_number": { "type": "string", "description": "Quote number or reference" }, "quote_date": { "type": "string", "description": "Quote issue date" }, "expiration_date": { "type": "string", "description": "Quote expiration date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Quoted total amount" } }, "required": [ "seller", "buyer", "quote_number", "quote_date", "expiration_date", "subtotal", "tax", "total" ] } ``` ### Sample File - https://structpdf.com/wedge-tools/quote-parser/sample-quote.pdf (application/pdf) ### Code Examples #### typescript ```typescript import { readFileSync } from 'node:fs'; const schema = { "type": "object", "properties": { "seller": { "type": "string", "description": "Seller or issuing company name" }, "buyer": { "type": "string", "description": "Buyer or prospect name" }, "quote_number": { "type": "string", "description": "Quote number or reference" }, "quote_date": { "type": "string", "description": "Quote issue date" }, "expiration_date": { "type": "string", "description": "Quote expiration date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Quoted total amount" } }, "required": [ "seller", "buyer", "quote_number", "quote_date", "expiration_date", "subtotal", "tax", "total" ] }; const file = new File([readFileSync('sample-quote.pdf')], 'sample-quote.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### javascript ```javascript const fs = require('node:fs'); const schema = { "type": "object", "properties": { "seller": { "type": "string", "description": "Seller or issuing company name" }, "buyer": { "type": "string", "description": "Buyer or prospect name" }, "quote_number": { "type": "string", "description": "Quote number or reference" }, "quote_date": { "type": "string", "description": "Quote issue date" }, "expiration_date": { "type": "string", "description": "Quote expiration date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Quoted total amount" } }, "required": [ "seller", "buyer", "quote_number", "quote_date", "expiration_date", "subtotal", "tax", "total" ] }; const file = new File([fs.readFileSync('sample-quote.pdf')], 'sample-quote.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### python ```python import json import os import requests schema = { "type": "object", "properties": { "seller": { "type": "string", "description": "Seller or issuing company name" }, "buyer": { "type": "string", "description": "Buyer or prospect name" }, "quote_number": { "type": "string", "description": "Quote number or reference" }, "quote_date": { "type": "string", "description": "Quote issue date" }, "expiration_date": { "type": "string", "description": "Quote expiration date" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Quoted total amount" } }, "required": [ "seller", "buyer", "quote_number", "quote_date", "expiration_date", "subtotal", "tax", "total" ] } with open("sample-quote.pdf", "rb") as fh: response = requests.post( "https://api.structpdf.com/v1/extract", headers={"Authorization": f"Bearer {os.environ['STRUCTPDF_API_KEY']}"}, files={"file": ("sample-quote.pdf", fh, "application/pdf")}, data={"schema": json.dumps(schema)}, ) response.raise_for_status() print(response.json()) ``` #### java ```java import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.nio.file.Files; import java.nio.file.Path; import java.util.UUID; public class ExtractExample { public static void main(String[] args) throws Exception { String schema = "{\"type\":\"object\",\"properties\":{\"seller\":{\"type\":\"string\",\"description\":\"Seller or issuing company name\"},\"buyer\":{\"type\":\"string\",\"description\":\"Buyer or prospect name\"},\"quote_number\":{\"type\":\"string\",\"description\":\"Quote number or reference\"},\"quote_date\":{\"type\":\"string\",\"description\":\"Quote issue date\"},\"expiration_date\":{\"type\":\"string\",\"description\":\"Quote expiration date\"},\"subtotal\":{\"type\":\"number\",\"description\":\"Subtotal before tax and fees\"},\"tax\":{\"type\":\"number\",\"description\":\"Tax amount\"},\"total\":{\"type\":\"number\",\"description\":\"Quoted total amount\"}},\"required\":[\"seller\",\"buyer\",\"quote_number\",\"quote_date\",\"expiration_date\",\"subtotal\",\"tax\",\"total\"]}"; Path file = Path.of("sample-quote.pdf"); String boundary = UUID.randomUUID().toString(); String CRLF = "\r\n"; var body = new java.io.ByteArrayOutputStream(); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"schema\"" + CRLF + CRLF + schema + CRLF).getBytes()); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"file\"; filename=\"sample-quote.pdf\"" + CRLF + "Content-Type: application/pdf" + CRLF + CRLF).getBytes()); body.writeBytes(Files.readAllBytes(file)); body.writeBytes((CRLF + "--" + boundary + "--" + CRLF).getBytes()); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("https://api.structpdf.com/v1/extract")) .header("Authorization", "Bearer " + System.getenv("STRUCTPDF_API_KEY")) .header("Content-Type", "multipart/form-data; boundary=" + boundary) .POST(HttpRequest.BodyPublishers.ofByteArray(body.toByteArray())) .build(); HttpResponse response = HttpClient.newHttpClient() .send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); } } ``` #### go ```go package main import ( "bytes" "fmt" "io" "mime/multipart" "net/http" "net/textproto" "os" ) func main() { schema := `{"type":"object","properties":{"seller":{"type":"string","description":"Seller or issuing company name"},"buyer":{"type":"string","description":"Buyer or prospect name"},"quote_number":{"type":"string","description":"Quote number or reference"},"quote_date":{"type":"string","description":"Quote issue date"},"expiration_date":{"type":"string","description":"Quote expiration date"},"subtotal":{"type":"number","description":"Subtotal before tax and fees"},"tax":{"type":"number","description":"Tax amount"},"total":{"type":"number","description":"Quoted total amount"}},"required":["seller","buyer","quote_number","quote_date","expiration_date","subtotal","tax","total"]}` file, err := os.Open("sample-quote.pdf") if err != nil { panic(err) } defer file.Close() var body bytes.Buffer writer := multipart.NewWriter(&body) _ = writer.WriteField("schema", schema) header := make(textproto.MIMEHeader) header.Set("Content-Disposition", "form-data; name=\"file\"; filename=\"sample-quote.pdf\"") header.Set("Content-Type", "application/pdf") part, err := writer.CreatePart(header) if err != nil { panic(err) } if _, err := io.Copy(part, file); err != nil { panic(err) } writer.Close() req, err := http.NewRequest("POST", "https://api.structpdf.com/v1/extract", &body) if err != nil { panic(err) } req.Header.Set("Authorization", "Bearer "+os.Getenv("STRUCTPDF_API_KEY")) req.Header.Set("Content-Type", writer.FormDataContentType()) resp, err := http.DefaultClient.Do(req) if err != nil { panic(err) } defer resp.Body.Close() out, _ := io.ReadAll(resp.Body) fmt.Println(string(out)) } ``` ## Estimate Parsing API Source: https://structpdf.com/try/estimate-parser Extract JSON data from estimate PDFs and images using your schema. ### Starter Schema ```json { "type": "object", "properties": { "provider": { "type": "string", "description": "Provider or issuing company name" }, "customer": { "type": "string", "description": "Customer or project owner name" }, "estimate_number": { "type": "string", "description": "Estimate number or reference" }, "estimate_date": { "type": "string", "description": "Estimate issue date" }, "validity_period": { "type": "string", "description": "Validity period or expiration window" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Estimated total amount" } }, "required": [ "provider", "customer", "estimate_number", "estimate_date", "validity_period", "subtotal", "tax", "total" ] } ``` ### Sample File - https://structpdf.com/wedge-tools/estimate-parser/sample-estimate.pdf (application/pdf) ### Code Examples #### typescript ```typescript import { readFileSync } from 'node:fs'; const schema = { "type": "object", "properties": { "provider": { "type": "string", "description": "Provider or issuing company name" }, "customer": { "type": "string", "description": "Customer or project owner name" }, "estimate_number": { "type": "string", "description": "Estimate number or reference" }, "estimate_date": { "type": "string", "description": "Estimate issue date" }, "validity_period": { "type": "string", "description": "Validity period or expiration window" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Estimated total amount" } }, "required": [ "provider", "customer", "estimate_number", "estimate_date", "validity_period", "subtotal", "tax", "total" ] }; const file = new File([readFileSync('sample-estimate.pdf')], 'sample-estimate.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### javascript ```javascript const fs = require('node:fs'); const schema = { "type": "object", "properties": { "provider": { "type": "string", "description": "Provider or issuing company name" }, "customer": { "type": "string", "description": "Customer or project owner name" }, "estimate_number": { "type": "string", "description": "Estimate number or reference" }, "estimate_date": { "type": "string", "description": "Estimate issue date" }, "validity_period": { "type": "string", "description": "Validity period or expiration window" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Estimated total amount" } }, "required": [ "provider", "customer", "estimate_number", "estimate_date", "validity_period", "subtotal", "tax", "total" ] }; const file = new File([fs.readFileSync('sample-estimate.pdf')], 'sample-estimate.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### python ```python import json import os import requests schema = { "type": "object", "properties": { "provider": { "type": "string", "description": "Provider or issuing company name" }, "customer": { "type": "string", "description": "Customer or project owner name" }, "estimate_number": { "type": "string", "description": "Estimate number or reference" }, "estimate_date": { "type": "string", "description": "Estimate issue date" }, "validity_period": { "type": "string", "description": "Validity period or expiration window" }, "subtotal": { "type": "number", "description": "Subtotal before tax and fees" }, "tax": { "type": "number", "description": "Tax amount" }, "total": { "type": "number", "description": "Estimated total amount" } }, "required": [ "provider", "customer", "estimate_number", "estimate_date", "validity_period", "subtotal", "tax", "total" ] } with open("sample-estimate.pdf", "rb") as fh: response = requests.post( "https://api.structpdf.com/v1/extract", headers={"Authorization": f"Bearer {os.environ['STRUCTPDF_API_KEY']}"}, files={"file": ("sample-estimate.pdf", fh, "application/pdf")}, data={"schema": json.dumps(schema)}, ) response.raise_for_status() print(response.json()) ``` #### java ```java import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.nio.file.Files; import java.nio.file.Path; import java.util.UUID; public class ExtractExample { public static void main(String[] args) throws Exception { String schema = "{\"type\":\"object\",\"properties\":{\"provider\":{\"type\":\"string\",\"description\":\"Provider or issuing company name\"},\"customer\":{\"type\":\"string\",\"description\":\"Customer or project owner name\"},\"estimate_number\":{\"type\":\"string\",\"description\":\"Estimate number or reference\"},\"estimate_date\":{\"type\":\"string\",\"description\":\"Estimate issue date\"},\"validity_period\":{\"type\":\"string\",\"description\":\"Validity period or expiration window\"},\"subtotal\":{\"type\":\"number\",\"description\":\"Subtotal before tax and fees\"},\"tax\":{\"type\":\"number\",\"description\":\"Tax amount\"},\"total\":{\"type\":\"number\",\"description\":\"Estimated total amount\"}},\"required\":[\"provider\",\"customer\",\"estimate_number\",\"estimate_date\",\"validity_period\",\"subtotal\",\"tax\",\"total\"]}"; Path file = Path.of("sample-estimate.pdf"); String boundary = UUID.randomUUID().toString(); String CRLF = "\r\n"; var body = new java.io.ByteArrayOutputStream(); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"schema\"" + CRLF + CRLF + schema + CRLF).getBytes()); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"file\"; filename=\"sample-estimate.pdf\"" + CRLF + "Content-Type: application/pdf" + CRLF + CRLF).getBytes()); body.writeBytes(Files.readAllBytes(file)); body.writeBytes((CRLF + "--" + boundary + "--" + CRLF).getBytes()); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("https://api.structpdf.com/v1/extract")) .header("Authorization", "Bearer " + System.getenv("STRUCTPDF_API_KEY")) .header("Content-Type", "multipart/form-data; boundary=" + boundary) .POST(HttpRequest.BodyPublishers.ofByteArray(body.toByteArray())) .build(); HttpResponse response = HttpClient.newHttpClient() .send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); } } ``` #### go ```go package main import ( "bytes" "fmt" "io" "mime/multipart" "net/http" "net/textproto" "os" ) func main() { schema := `{"type":"object","properties":{"provider":{"type":"string","description":"Provider or issuing company name"},"customer":{"type":"string","description":"Customer or project owner name"},"estimate_number":{"type":"string","description":"Estimate number or reference"},"estimate_date":{"type":"string","description":"Estimate issue date"},"validity_period":{"type":"string","description":"Validity period or expiration window"},"subtotal":{"type":"number","description":"Subtotal before tax and fees"},"tax":{"type":"number","description":"Tax amount"},"total":{"type":"number","description":"Estimated total amount"}},"required":["provider","customer","estimate_number","estimate_date","validity_period","subtotal","tax","total"]}` file, err := os.Open("sample-estimate.pdf") if err != nil { panic(err) } defer file.Close() var body bytes.Buffer writer := multipart.NewWriter(&body) _ = writer.WriteField("schema", schema) header := make(textproto.MIMEHeader) header.Set("Content-Disposition", "form-data; name=\"file\"; filename=\"sample-estimate.pdf\"") header.Set("Content-Type", "application/pdf") part, err := writer.CreatePart(header) if err != nil { panic(err) } if _, err := io.Copy(part, file); err != nil { panic(err) } writer.Close() req, err := http.NewRequest("POST", "https://api.structpdf.com/v1/extract", &body) if err != nil { panic(err) } req.Header.Set("Authorization", "Bearer "+os.Getenv("STRUCTPDF_API_KEY")) req.Header.Set("Content-Type", writer.FormDataContentType()) resp, err := http.DefaultClient.Do(req) if err != nil { panic(err) } defer resp.Body.Close() out, _ := io.ReadAll(resp.Body) fmt.Println(string(out)) } ``` ## Resume Parsing API Source: https://structpdf.com/try/resume-parser Extract JSON data from resume PDFs and images using your schema. ### Starter Schema ```json { "type": "object", "properties": { "name": { "type": "string" }, "email": { "type": "string" }, "phone": { "type": "string" }, "location": { "type": "string" }, "summary": { "type": "string" }, "skills": { "type": "array", "items": { "type": "string" } }, "experience": { "type": "array", "items": { "type": "object", "properties": { "company": { "type": "string" }, "title": { "type": "string" }, "start_date": { "type": "string" }, "end_date": { "type": "string" }, "location": { "type": "string" }, "highlights": { "type": "array", "items": { "type": "string" } } }, "required": [ "company", "title", "start_date", "end_date", "location", "highlights" ] } }, "education": { "type": "array", "items": { "type": "object", "properties": { "school": { "type": "string" }, "degree": { "type": "string" }, "graduation_year": { "type": "number" } }, "required": [ "school", "degree", "graduation_year" ] } } }, "required": [ "name", "email", "phone", "location", "summary", "skills", "experience", "education" ] } ``` ### Sample File - https://structpdf.com/wedge-tools/resume-parser/sample-resume.pdf (application/pdf) ### Code Examples #### typescript ```typescript import { readFileSync } from 'node:fs'; const schema = { "type": "object", "properties": { "name": { "type": "string" }, "email": { "type": "string" }, "phone": { "type": "string" }, "location": { "type": "string" }, "summary": { "type": "string" }, "skills": { "type": "array", "items": { "type": "string" } }, "experience": { "type": "array", "items": { "type": "object", "properties": { "company": { "type": "string" }, "title": { "type": "string" }, "start_date": { "type": "string" }, "end_date": { "type": "string" }, "location": { "type": "string" }, "highlights": { "type": "array", "items": { "type": "string" } } }, "required": [ "company", "title", "start_date", "end_date", "location", "highlights" ] } }, "education": { "type": "array", "items": { "type": "object", "properties": { "school": { "type": "string" }, "degree": { "type": "string" }, "graduation_year": { "type": "number" } }, "required": [ "school", "degree", "graduation_year" ] } } }, "required": [ "name", "email", "phone", "location", "summary", "skills", "experience", "education" ] }; const file = new File([readFileSync('sample-resume.pdf')], 'sample-resume.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### javascript ```javascript const fs = require('node:fs'); const schema = { "type": "object", "properties": { "name": { "type": "string" }, "email": { "type": "string" }, "phone": { "type": "string" }, "location": { "type": "string" }, "summary": { "type": "string" }, "skills": { "type": "array", "items": { "type": "string" } }, "experience": { "type": "array", "items": { "type": "object", "properties": { "company": { "type": "string" }, "title": { "type": "string" }, "start_date": { "type": "string" }, "end_date": { "type": "string" }, "location": { "type": "string" }, "highlights": { "type": "array", "items": { "type": "string" } } }, "required": [ "company", "title", "start_date", "end_date", "location", "highlights" ] } }, "education": { "type": "array", "items": { "type": "object", "properties": { "school": { "type": "string" }, "degree": { "type": "string" }, "graduation_year": { "type": "number" } }, "required": [ "school", "degree", "graduation_year" ] } } }, "required": [ "name", "email", "phone", "location", "summary", "skills", "experience", "education" ] }; const file = new File([fs.readFileSync('sample-resume.pdf')], 'sample-resume.pdf', { type: 'application/pdf', }); const formData = new FormData(); formData.set('file', file); formData.set('schema', JSON.stringify(schema)); const response = await fetch('https://api.structpdf.com/v1/extract', { method: 'POST', headers: { Authorization: `Bearer ${process.env.STRUCTPDF_API_KEY}`, }, body: formData, }); const data = await response.json(); console.log(data); ``` #### python ```python import json import os import requests schema = { "type": "object", "properties": { "name": { "type": "string" }, "email": { "type": "string" }, "phone": { "type": "string" }, "location": { "type": "string" }, "summary": { "type": "string" }, "skills": { "type": "array", "items": { "type": "string" } }, "experience": { "type": "array", "items": { "type": "object", "properties": { "company": { "type": "string" }, "title": { "type": "string" }, "start_date": { "type": "string" }, "end_date": { "type": "string" }, "location": { "type": "string" }, "highlights": { "type": "array", "items": { "type": "string" } } }, "required": [ "company", "title", "start_date", "end_date", "location", "highlights" ] } }, "education": { "type": "array", "items": { "type": "object", "properties": { "school": { "type": "string" }, "degree": { "type": "string" }, "graduation_year": { "type": "number" } }, "required": [ "school", "degree", "graduation_year" ] } } }, "required": [ "name", "email", "phone", "location", "summary", "skills", "experience", "education" ] } with open("sample-resume.pdf", "rb") as fh: response = requests.post( "https://api.structpdf.com/v1/extract", headers={"Authorization": f"Bearer {os.environ['STRUCTPDF_API_KEY']}"}, files={"file": ("sample-resume.pdf", fh, "application/pdf")}, data={"schema": json.dumps(schema)}, ) response.raise_for_status() print(response.json()) ``` #### java ```java import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.nio.file.Files; import java.nio.file.Path; import java.util.UUID; public class ExtractExample { public static void main(String[] args) throws Exception { String schema = "{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"email\":{\"type\":\"string\"},\"phone\":{\"type\":\"string\"},\"location\":{\"type\":\"string\"},\"summary\":{\"type\":\"string\"},\"skills\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"experience\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"company\":{\"type\":\"string\"},\"title\":{\"type\":\"string\"},\"start_date\":{\"type\":\"string\"},\"end_date\":{\"type\":\"string\"},\"location\":{\"type\":\"string\"},\"highlights\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}}},\"required\":[\"company\",\"title\",\"start_date\",\"end_date\",\"location\",\"highlights\"]}},\"education\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"school\":{\"type\":\"string\"},\"degree\":{\"type\":\"string\"},\"graduation_year\":{\"type\":\"number\"}},\"required\":[\"school\",\"degree\",\"graduation_year\"]}}},\"required\":[\"name\",\"email\",\"phone\",\"location\",\"summary\",\"skills\",\"experience\",\"education\"]}"; Path file = Path.of("sample-resume.pdf"); String boundary = UUID.randomUUID().toString(); String CRLF = "\r\n"; var body = new java.io.ByteArrayOutputStream(); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"schema\"" + CRLF + CRLF + schema + CRLF).getBytes()); body.writeBytes(("--" + boundary + CRLF + "Content-Disposition: form-data; name=\"file\"; filename=\"sample-resume.pdf\"" + CRLF + "Content-Type: application/pdf" + CRLF + CRLF).getBytes()); body.writeBytes(Files.readAllBytes(file)); body.writeBytes((CRLF + "--" + boundary + "--" + CRLF).getBytes()); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("https://api.structpdf.com/v1/extract")) .header("Authorization", "Bearer " + System.getenv("STRUCTPDF_API_KEY")) .header("Content-Type", "multipart/form-data; boundary=" + boundary) .POST(HttpRequest.BodyPublishers.ofByteArray(body.toByteArray())) .build(); HttpResponse response = HttpClient.newHttpClient() .send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); } } ``` #### go ```go package main import ( "bytes" "fmt" "io" "mime/multipart" "net/http" "net/textproto" "os" ) func main() { schema := `{"type":"object","properties":{"name":{"type":"string"},"email":{"type":"string"},"phone":{"type":"string"},"location":{"type":"string"},"summary":{"type":"string"},"skills":{"type":"array","items":{"type":"string"}},"experience":{"type":"array","items":{"type":"object","properties":{"company":{"type":"string"},"title":{"type":"string"},"start_date":{"type":"string"},"end_date":{"type":"string"},"location":{"type":"string"},"highlights":{"type":"array","items":{"type":"string"}}},"required":["company","title","start_date","end_date","location","highlights"]}},"education":{"type":"array","items":{"type":"object","properties":{"school":{"type":"string"},"degree":{"type":"string"},"graduation_year":{"type":"number"}},"required":["school","degree","graduation_year"]}}},"required":["name","email","phone","location","summary","skills","experience","education"]}` file, err := os.Open("sample-resume.pdf") if err != nil { panic(err) } defer file.Close() var body bytes.Buffer writer := multipart.NewWriter(&body) _ = writer.WriteField("schema", schema) header := make(textproto.MIMEHeader) header.Set("Content-Disposition", "form-data; name=\"file\"; filename=\"sample-resume.pdf\"") header.Set("Content-Type", "application/pdf") part, err := writer.CreatePart(header) if err != nil { panic(err) } if _, err := io.Copy(part, file); err != nil { panic(err) } writer.Close() req, err := http.NewRequest("POST", "https://api.structpdf.com/v1/extract", &body) if err != nil { panic(err) } req.Header.Set("Authorization", "Bearer "+os.Getenv("STRUCTPDF_API_KEY")) req.Header.Set("Content-Type", writer.FormDataContentType()) resp, err := http.DefaultClient.Do(req) if err != nil { panic(err) } defer resp.Body.Close() out, _ := io.ReadAll(resp.Body) fmt.Println(string(out)) } ```