Skip to content

Latest commit

 

History

History
600 lines (415 loc) · 17.9 KB

File metadata and controls

600 lines (415 loc) · 17.9 KB

POST /api/v1/analyze

Analyze a PDF document from a publicly accessible URL to detect modifications and tampering.

Endpoint

POST https://api.htpbe.tech/v1/analyze

Authentication

This endpoint requires API key authentication via the Authorization header.

Authorization: Bearer YOUR_API_KEY

Overage Billing

Requests beyond your monthly quota are charged at your plan's overage rate and billed automatically at the end of your billing cycle. There is no hard block when your quota is reached — requests continue to succeed and overage charges appear on your next invoice.

Plan Overage rate
Starter $0.60/req
Growth $0.50/req
Pro $0.40/req
Enterprise Included

Request

Headers

Header Type Required Description
Authorization string Yes Bearer token with your API key (Bearer htpbe_live_... or Bearer htpbe_test_...). The Bearer prefix is recommended but optional — sending the raw key directly is also accepted, but only if the key starts with htpbe_ (e.g., Authorization: htpbe_live_sk_...).
Content-Type string Yes Must be application/json

Body Parameters

Parameter Type Required Description
url string Yes Publicly accessible HTTP/HTTPS URL pointing to a PDF file. The file must be downloadable without authentication and must not exceed 10 MB in size.
original_filename string No Original filename of the document (before any storage renaming). When provided, this name is stored and returned in results instead of the filename extracted from the URL.

url Parameter Details

Format: Must be a valid HTTP or HTTPS URL

File Size Limit: Maximum 10 MB (10,485,760 bytes)

Accessibility: The URL must be publicly accessible. Files behind authentication, firewalls, or VPNs will fail to download.

Valid Examples:

  • https://example.com/documents/contract.pdf
  • https://cdn.yoursite.com/uploads/2024/invoice-12345.pdf
  • https://storage.googleapis.com/bucket-name/file.pdf

Invalid Examples:

  • http://localhost/document.pdf (not publicly accessible)
  • https://example.com/file.docx (not a PDF)
  • ftp://example.com/file.pdf (FTP not supported)
  • file:///local/path/document.pdf (local file paths not supported)

Example Request

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer htpbe_live_sk_1234567890abcdef" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/documents/contract.pdf"
  }'
// Node.js / TypeScript
const response = await fetch('https://api.htpbe.tech/v1/analyze', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${process.env.HTPBE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/documents/contract.pdf',
  }),
});

const result = await response.json();
# Python
import requests
import os

response = requests.post(
    'https://api.htpbe.tech/v1/analyze',
    headers={
        'Authorization': f"Bearer {os.getenv('HTPBE_API_KEY')}",
        'Content-Type': 'application/json'
    },
    json={
        'url': 'https://example.com/documents/contract.pdf'
    }
)

result = response.json()

Response

Success Response (201 Created)

Analysis is performed synchronously. The response contains only the check ID. A Location response header points to the full result URL. Call GET /api/v1/result/{id} immediately after to retrieve the full analysis results.

Response Headers

Header Description
Location Full URL of the result: https://api.htpbe.tech/v1/result/{id} (e.g. https://api.htpbe.tech/v1/result/3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21)

Response Structure

{
  id: string;
}

Response Fields

id
  • Type: string (UUID v4)
  • Always Present: Yes
  • Description: Unique identifier for this analysis check
  • Format: xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx (UUID version 4)
  • Usage: Pass this ID to GET /api/v1/result/{id} to retrieve the full analysis
  • Example: "3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"

Example Response

{
  "id": "3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"
}

Two-Step Usage Pattern

// Step 1: Submit PDF for analysis
const analyzeRes = await fetch('https://api.htpbe.tech/v1/analyze', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${process.env.HTPBE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ url: 'https://example.com/documents/contract.pdf' }),
});

const { id } = await analyzeRes.json();

// Step 2: Retrieve full results
const resultRes = await fetch(`https://api.htpbe.tech/v1/result/${id}`, {
  headers: { Authorization: `Bearer ${process.env.HTPBE_API_KEY}` },
});

const result = await resultRes.json();
// result.status: "intact" | "modified" | "inconclusive"
// ... (see /result endpoint docs for full schema)

See GET /api/v1/result/{id} for the full result schema, including verdict fields (status, modification_confidence, modification_markers), metadata, structure, signatures, threats, and findings.


Understanding the verdict

The verdict fields (status, modification_confidence, modification_markers) are available in the result endpoint. See GET /api/v1/result/{id} for details.

How status: "modified" is determined:

The verdict is set to "modified" when one or more forensic markers fire. Markers are returned in modification_markers[] as stable machine-readable ids (e.g. HTPBE_SIGNATURE_REMOVED, HTPBE_DATES_DISAGREE, HTPBE_MULTIPLE_REVISION_LAYERS, HTPBE_POST_SIGNATURE_EDIT). The full id → outcome-label dictionary is published on htpbe.tech/how — branch your integration logic on the id, render the user-facing label from the dictionary.

What we detect:

  • Structural modifications (incremental updates, xref table additions)
  • Metadata date discrepancies
  • Digital signature removal or post-sign modifications
  • Suspicious tool patterns

What we don't detect:

  • Document fabrication (entire content created in Word/Excel and exported to PDF)
  • Visual/content differences without structural changes
  • Password-protected content
  • Cryptographic signature validity

Error Responses

All errors follow this format:

{
  "error": "Human-readable error message",
  "code": "machine_readable_error_code",
  "details": "Optional additional context (present for some errors)"
}

400 Bad Request

Returned when the request is malformed or contains invalid parameters.

Invalid JSON Body

{
  "error": "Invalid JSON body",
  "code": "invalid_request"
}

Cause: The request body is not valid JSON (missing braces, trailing commas, unquoted keys, etc.).

Solution: Ensure the request body is valid JSON and the Content-Type: application/json header is set.


Missing or Invalid url

{
  "error": "Missing or invalid url parameter",
  "code": "invalid_request"
}

Cause: The url field is not present in the request body, or it's not a string.

Solution: Ensure you're sending a valid JSON body with a url string field.


Invalid URL Format

{
  "error": "Invalid url format",
  "code": "invalid_url_format"
}

Cause: The url value is not a valid HTTP or HTTPS URL.

Examples of Invalid URLs:

  • not-a-url
  • example.com/file.pdf (missing protocol)
  • file:///local/path.pdf (local path)

Note: ftp:// URLs are rejected with this error — they fail the HTTP/HTTPS protocol check before any download is attempted.

Solution: Use a complete HTTP or HTTPS URL like https://example.com/file.pdf


Failed to Download File

{
  "error": "Failed to download file from URL",
  "code": "download_failed",
  "details": "Network error" // or specific error message
}

Common Causes:

  • URL returns 404 Not Found
  • URL returns 403 Forbidden
  • URL requires authentication
  • Network timeout (30 second limit)
  • DNS resolution failure
  • Server is unreachable

HTTP Status Details Examples:

{
  "error": "Failed to download file from URL",
  "code": "download_failed",
  "details": "HTTP 404: Not Found"
}
{
  "error": "Failed to download file from URL",
  "code": "download_failed",
  "details": "HTTP 403: Forbidden"
}

Solution: Ensure the URL is publicly accessible and returns the file successfully.

Why 400 and not 422? This is intentional. HTTP 422 (Unprocessable Entity) means the request syntax is valid but the server cannot process the content. Here the problem is the client's input — the URL they supplied cannot be fetched. This is treated as a bad-input error (400), consistent with how invalid URL format is handled. Do not add 422 handling for this error code.


401 Unauthorized

Authentication failed.

Missing API Key

{
  "error": "Missing API key. Please provide an API key in the Authorization header.",
  "code": "missing_api_key"
}

Cause: No Authorization header provided.

Solution: Add header: Authorization: Bearer YOUR_API_KEY


Invalid API Key (live key)

{
  "error": "Invalid API key. Please check your credentials.",
  "code": "invalid_api_key"
}

Cause: Live key (htpbe_live_*) not found in database or has been revoked.

Solution:

  • Verify your API key is correct
  • Generate a new API key from the dashboard if needed

Invalid API Key (test key)

{
  "error": "Invalid test API key. Please check your credentials.",
  "code": "invalid_api_key"
}

Cause: Test key (htpbe_test_*) not found. Test keys are stored separately from live keys and cannot be used interchangeably.

Solution:

  • Copy your test key from the Test API Key section on the dashboard
  • Verify you are not using a live key where a test key is expected (or vice versa)

402 Payment Required

No active subscription found for this API key.

{
  "error": "No active subscription. Please subscribe to a plan to use the API.",
  "code": "payment_required"
}

Cause: Your account does not have an active paid plan.

Solution:

  1. Log in to your dashboard at https://htpbe.tech/dashboard
  2. Subscribe to a plan (Starter, Growth, or Pro)
  3. Your API access will resume immediately

403 Forbidden

Access forbidden due to account status.

Inactive Client

{
  "error": "This API key has been deactivated. Please contact support.",
  "code": "inactive_client"
}

Cause: Your API client account has been disabled (usually due to payment issues or terms violation).

Solution: Contact support to reactivate your account.

Test URL Required (test keys only)

{
  "error": "Test API keys can only be used with test URLs. See documentation for available test URLs.",
  "code": "test_url_required",
  "details": "Use URLs like: https://api.htpbe.tech/v1/test/clean.pdf, https://api.htpbe.tech/v1/test/modified-high.pdf, etc."
}

Cause: A test API key (htpbe_test_...) was used with a real URL. Test keys are restricted to the predefined test URLs — they cannot download or analyze real PDFs.

Solution: Use one of the mock test URLs — see testing.md for the full list — or switch to a live API key (htpbe_live_...) to analyze real files.


413 Payload Too Large

File exceeds size limit.

{
  "error": "File size exceeds limit",
  "code": "file_too_large",
  "details": "Maximum file size is 10 MB, received 15 MB"
}

Cause: PDF file is larger than 10 MB (10,485,760 bytes).

Solution:

  • Compress the PDF using tools like Adobe Acrobat or online compressors
  • Split large PDFs into smaller files
  • Remove high-resolution images
  • Contact support for Enterprise plan with higher limits

422 Unprocessable Entity

Request is valid but the content cannot be processed.

Invalid PDF File

{
  "error": "Invalid PDF file",
  "code": "invalid_pdf",
  "details": "PDF header not found or file is corrupted"
}

Common Causes:

  • File is not actually a PDF (wrong file type)
  • PDF file is corrupted or truncated
  • File is encrypted with a password
  • PDF uses unsupported features

Solution:

  • Verify the file opens correctly in a PDF reader
  • Try re-saving the file
  • Remove encryption/password protection
  • If the file is valid, contact support with the file URL for investigation

Analysis Interrupted

{
  "error": "PDF could not be analyzed",
  "code": "invalid_pdf",
  "details": "Analysis was interrupted. The file may be malformed or too complex."
}

Returned when the analysis worker is killed by the runtime — typically because the PDF is structurally pathological (e.g. enormous object graph) and would otherwise exhaust memory. The error code is the same as for malformed PDFs (invalid_pdf), so you do not need to handle it separately.

A second variant signals the analysis time limit:

{
  "error": "PDF could not be analyzed",
  "code": "invalid_pdf",
  "details": "Analysis exceeded the time limit. The file may be too complex."
}

Solution: Retrying the same file is not expected to help. If you believe the file is valid, contact support with the URL.


429 Too Many Requests

The server is processing the maximum number of concurrent PDF analyses and cannot accept another one right now.

Server At Capacity

{
  "error": "Server is at analysis capacity, retry shortly",
  "code": "server_at_capacity",
  "details": "Concurrent analyses in flight: 2 of 2."
}

The response includes a Retry-After header (in seconds) indicating how long to wait before retrying.

Common Causes:

  • A burst of parallel requests against the same API instance
  • A long-running analysis on a particularly complex PDF holding a slot

Solution:

  • Respect the Retry-After header — wait the indicated number of seconds, then retry the same request
  • Use exponential backoff if you receive multiple 429s in a row
  • If you need higher sustained throughput, contact support — Enterprise plans can be deployed with elevated concurrency limits

Note: This is server-wide capacity, not per-API-key rate limiting. The same retry strategy applies for both live and test keys.


500 Internal Server Error

Server-side error during processing.

{
  "error": "Failed to analyze PDF",
  "code": "internal_error"
}

Cause: Unexpected server error. This is rare and usually indicates a bug.

Solution:

  • Retry the request (may be a transient error)
  • If persists, contact support with the url for investigation
  • Check our status page for any ongoing incidents

Testing

Use test API keys (htpbe_test_...) to integrate without consuming quota or analyzing real PDFs. Test keys return deterministic UUID v4 check IDs (e.g., 00000000-0000-4000-8000-000000000001 for clean.pdf, 00000000-0000-4000-8000-000000000005 for modified-high.pdf) for the predefined mock URLs plus an error-trigger URL — no real file is downloaded and no quota is consumed.

See testing.md for the full list of test URLs, synthetic ID table, code examples, and checklist.


Notes

Processing Time

  • Typical: 2-5 seconds for average PDFs (1-20 pages)
  • Larger files: 5-15 seconds for complex PDFs (50-100 pages)
  • Download timeout: 30 seconds. Requests where downloading the source URL exceeds this fail with download_failed (400).
  • Analysis timeout: 20 seconds. Files whose structure takes longer to parse fail with invalid_pdf (422). This is independent of the download timeout.

Supported PDF Features

  • ✅ PDF versions 1.0 through 2.0
  • ✅ Linearized (Fast Web View) PDFs
  • ✅ PDFs with forms, annotations, and multimedia
  • ✅ Signed PDFs (detects but doesn't validate signatures)

Limitations

  • ❌ Cannot validate cryptographic signatures (only detects presence/removal)
  • ❌ Password-protected PDFs — the password must be removed before submitting for analysis
  • ❌ Cannot analyze files behind authentication
  • ❌ Does not perform OCR or content analysis
  • ❌ Does not detect visual changes (only structural/metadata changes)

Related Endpoints: