Production API · AI-Powered Document Engine

FICA Document Processing API

AI classification · Data extraction · Onboarding automation

A production-grade backend API built with Node.js and PHP that uses OpenAI to classify and extract structured data from South African FICA documents — ID documents, bank statements, and payslips — in PDF or image form. The API accepts multiple files per request, identifies the document type, extracts every relevant field, and returns clean structured JSON ready to plug into onboarding, CRM, or credit workflows.

Node.js PHP OpenAI REST API PDF Processing OCR Multi-File Upload JSON

View Demo Key Features

Public Demo — Sensitive Data Masked

The API itself returns the full extracted data set. The demo frontend shown in these screenshots intentionally masks sensitive fields (ID numbers, account numbers, salary figures, addresses, etc.) on the client side because this portfolio is publicly accessible. In production, the API response is consumed directly by internal onboarding systems.

// Case Study

Problem, Solution & Outcome

Built to remove the bottleneck of manual FICA document handling during customer onboarding

The Problem

Onboarding teams were manually opening every FICA document, identifying whether it was an ID, payslip, or bank statement, then typing the same fields into the CRM by hand. This was slow, error-prone, hard to audit, and impossible to scale during high-volume application periods.

The Solution

A backend API built in Node.js + PHP that accepts a batch of files (PDF or image), automatically classifies each document using OpenAI, runs tailored extraction prompts per document type, and returns a clean structured JSON response with every field detected — including encrypted PDF handling with per-file passwords.

The Outcome

Manual capture time per customer dropped dramatically, data quality improved through consistent structured output, and onboarding teams could focus on verification instead of typing. The API now plugs directly into downstream CRM and credit workflows for fully automated FICA intake.

// How It Works

API Pipeline

From raw upload to structured JSON in one round trip

Receive & Validate

API accepts a batch of files (PDF / JPG / PNG / TIFF / WebP) with optional per-file passwords for encrypted PDFs.

Classify Document

OpenAI is used to determine whether each document is an ID, bank statement, or payslip — no manual tagging required.

Extract Fields

Document-specific prompts pull every relevant field (names, ID numbers, balances, salary breakdown, etc.) into structured data.

Return JSON

API responds with one structured JSON object per document, grouped by classification, ready to ingest into the CRM.

// What It Reads

Supported Document Types

Tailored extraction logic per document type for maximum accuracy

South African ID

Smart Card or Green Book — supports both formats and full identity extraction.

Full Names ID Number Date of Birth Gender Nationality ID Type

Bank Statement

Detects all major SA banks and pulls account, balance, and statement metadata.

Account Holder Bank Account No. Branch Code Address Period Balances

Payslip

Full salary breakdown including deductions, employer details, and banking info.

Employee Employer Job Title Department Gross / Net Tax / UIF Account No.

// What It Does

Key Features

Built for real onboarding workflows — not just a tech demo

AI Classification

Uses OpenAI to automatically identify document type — no manual tagging.

Field-Level Extraction

Pulls every relevant data point per document type into a structured object.

Multi-File Batches

Process several documents in one API call with mixed types and formats.

PDF + Image Support

Handles PDF, JPG, PNG, GIF, BMP, WebP, and TIFF inputs up to 10 MB each.

Encrypted PDF Handling

Per-file password support for password-protected bank statements and payslips.

Structured JSON Output

Clean, predictable JSON shape per document type — easy to consume downstream.

Public Demo Masking

Demo frontend masks sensitive fields client-side; raw API returns full data.

Plug-In Integration

Designed to slot directly into onboarding, CRM, or credit workflows.

Confidence-Aware

Returns N/A for missing fields rather than guessing — safer for compliance.

Fast Turnaround

Processes multi-document batches in seconds rather than minutes of manual work.

SA-Specific Logic

Tailored for South African ID formats, local banks, and SARS payslip layouts.

Production-Ready

Currently in active use — not a prototype or proof-of-concept demo.

// Demo Frontend

API In Action

The screenshots below show a thin demo frontend wrapping the API. The frontend masks sensitive output for public viewing — the actual API returns complete data. Click any image to enlarge.

Input

Document Upload

Drag-and-drop or browse to queue multiple files for the API. Supports PDF and common image formats.

Input

Per-File Passwords

Encrypted PDFs (like password-protected payslips) can be unlocked individually before processing.

Process

AI Analysis

Documents are classified and parsed using OpenAI — each file is handled with type-specific logic.

Output

Payslip Extraction

Employer, employee, salary breakdown, deductions, and banking details — all extracted automatically.