Documentation

Implementation reference for the SpamBlock pixel

SpamBlock Pixel Documentation

This guide walks you through installing the SpamBlock pixel, understanding the scoring pipeline, and tuning the available configuration options.

Quick start

  1. Add the script. Place the pixel on every page that contains a form you want protected.
  2. Add data-block-spam to the forms you want enforced. If no forms have the attribute, SpamBlock will protect all forms on the page.
  3. Submit the form. SpamBlock intercepts the submission, runs 13+ detection signals, and only forwards the request when the score is below the configured threshold.
<script 
  src="https://api.spamblock.io/sdk/pixel/v1.js" defer
></script>

<form data-block-spam>   ... your existing fields ... </form>

Detection signals

Core signals (always enabled):

  • Honeypot field detection (+90 points, instant block)
  • Disposable email domains (+40 points)
  • IP reputation checks (+100 if denied)
  • Tor exit nodes (+35 points)

Advanced signals (enabled by default):

  • Language detection & mismatch analysis (+30 max)
  • Entropy & repetition detection (+30 max)
  • Header analysis (User-Agent, Referer, ASN) (+25 max)
  • Intent classification (spam keywords) (+15 per keyword)
  • Timing & behavioral analysis (+20 max)
  • Script detection & homoglyph analysis (+30 max)
  • Profanity detection (+30 per word)

Default threshold is 60. Submissions with a score ≥ threshold are blocked. All signals can be toggled individually via configuration.

Configuration reference

Set configuration via data-* attributes directly on the script tag. All attributes are optional with sensible defaults.

Core Configuration

Attribute Default Description
data-max-score 60 Highest acceptable score. Requests with a score ≥ threshold are blocked.
data-debug false Logs detailed scoring output to the browser console; useful during testing.

Feature Toggles

Attribute Default Description
data-assess-profanity true Enable/disable profanity detection (+30 per word).
data-assess-language true Enable/disable language detection (page vs text mismatch, +30 max).
data-language-weight 10 Points for language mismatch (default: 10).
data-expected-languages "" Comma-separated language codes (e.g., "en,de") for multilingual sites.
data-assess-entropy true Enable/disable entropy analysis (randomness detection, +30 max).
data-assess-email-address false Send full email addresses (not just domains) for username entropy analysis. Shows console warning for DPA transparency.
data-assess-timing true Enable/disable timing and behavioral analysis (fast submission, interaction tracking, +20 max).
data-assess-scripts true Enable/disable script detection and homoglyph analysis (Unicode analysis, +30 max).

Geo Configuration

Attribute Default Description
data-block-geo "" Comma-separated ISO country codes (e.g., "RU,CN") to block instantly (+100, hard block).
data-allow-geo "" Comma-separated ISO country codes (e.g., "US,CA,GB") to allowlist. Takes precedence over data-block-geo (+100 if not in allowlist, hard block).
<script
  src="https://api.spamblock.io/sdk/pixel/v1.js"
  data-max-score="60"
  data-assess-profanity="true"
  data-assess-language="true"
  data-assess-entropy="true"
  data-assess-email-address="false"
  data-assess-timing="true"
  data-assess-scripts="true"
  data-expected-languages="en,de"
  data-language-weight="10"
  data-debug="false"
  data-block-geo=""
  data-allow-geo=""
  defer
></script>

<form data-block-spam>   ... your existing fields ... </form>

What happens on submit?

  1. SpamBlock intercepts: The script listens for form submission events and prevents default behavior.
  2. Data collection: Collects form data (email domain, text fields), timing metrics (time to submit, per-field dwell times), interaction patterns (keyboard/mouse events), and page metadata (language, headers).
  3. API request: Sends collected data to /v1/check for scoring.
  4. Scoring: Worker evaluates 13+ signals including language detection, entropy analysis, behavioral tracking, script detection, and more. Each signal contributes points to a total score.
  5. Decision: Worker responds with { allow: true/false, score: 0-100, reasons: [...], latencyMs: 123 }.
  6. Action: If allow: true, form submits normally. If allow: false, submission is blocked and custom event is emitted for UI handling.

Scoring Categories

SpamBlock uses category caps to prevent single signals from dominating:

  • Content-based: +30 max (profanity, spam keywords)
  • Language & Script: +30 max (language detection, script analysis)
  • Entropy & Structure: +30 max (entropy, repetition)
  • Headers & IP: +25 max (User-Agent, ASN, reputation)
  • Timing & Behavioral: +20 max (timing, interactions)
  • Geo: +100 (hard block if violated)
  • Honeypot: +90 (highest priority after geo)

Server-side integration

No backend changes are required. SpamBlock replays the submission when it passes the score threshold. To inspect results, enable debug logging or forward the console output into your monitoring pipeline.

Downstream Filtering & Marker Fields

When a form submission is allowed (or fails open), SpamBlock automatically injects hidden marker fields into the form before submission. These markers act as authenticity indicators for downstream processing in email systems, CRMs, Zapier/Make automations, and other form handlers.

Marker Fields

The following hidden fields are injected into forms before submission:

Field Name Description Example
_sb_v Marker schema version 1
_sb_allow Final allow/block decision true
_sb_score Final numeric score (0-100) 37
_sb_reasons Comma-separated reason codes hp_filled,profanity
_sb_ts UTC timestamp of evaluation (ISO 8601) 2025-02-14T10:03:11Z

Marker Behavior

  • Markers are only injected when the form is actually submitted: If SpamBlock blocks the submission, no markers are added (form doesn't submit).
  • Fail-open cases: If SpamBlock encounters an error (network failure, API error, etc.), markers are still injected with _sb_allow="true", _sb_score="0", and _sb_reasons including error_fail_open plus any issues discovered during processing.
  • Missing markers: If a form submission arrives without SpamBlock marker fields, it indicates the form was submitted before the pixel could evaluate it (e.g., bot bypass, script blocker, no JavaScript). This should be treated as higher risk.

Server-Side Validation

On your server, check for marker presence:

  • Marker present with _sb_allow="true": Submission was evaluated and allowed by SpamBlock.
  • Marker present with _sb_allow="true" and error_fail_open in reasons: Submission was allowed due to fail-open (error occurred, but form was allowed to proceed).
  • Marker missing: Form was submitted without SpamBlock evaluation (bot bypass, script blocker, etc.). Treat as suspicious (+30 risk score recommended).

Note: The absence of markers is a strong indicator of bot/bypass activity, as legitimate users with JavaScript enabled will always have markers injected before submission.

Detection Signals Explained

Content & Structure Signals

  • Language Detection: Detects mismatches between page language, browser language, and detected text language. Useful for catching Russian spam on English sites.
  • Entropy Analysis: Detects random strings (high entropy) and repetitive junk (low entropy). Flags suspicious patterns like x8q2m6k9p4r7t1 or !!!!!!.
  • Header Analysis: Flags suspicious User-Agents (curl, python-requests), missing Referers, and hosting ASNs (AWS, DigitalOcean, etc.).
  • Intent Classification: Matches text against known spam/scam keywords (viagra, free money, etc.).

Behavioral Signals

  • Timing Analysis: Detects fast submissions (<1.2s) and rapid field filling (<100ms per field), indicating bot automation.
  • Interaction Tracking: Flags missing keyboard/mouse events when form took >500ms, indicating script-based filling.
  • Focus Patterns: Detects suspicious focus patterns (all fields filled without focus events in <1s).

Advanced Unicode Analysis

  • Script Detection: Detects unexpected Unicode scripts (e.g., Cyrillic on English sites) and mixed-script confusables (e.g., frеe саѕh with Cyrillic characters).
  • Homoglyph Detection: Flags homoglyph substitutions in short fields (e.g., Jоhn with Cyrillic 'о' instead of Latin 'o').
  • RTL Control Characters: Detects bidirectional control characters used for obfuscation.
  • Emoji Density: Flags high emoji/symbol density (>10%) in fields where uncommon.

Tips & Best Practices

  • Testing: Enable data-debug="true" and try disposable domains such as [email protected], spam phrases, or fast submissions.
  • Multilingual Sites: Set data-expected-languages="en,de" to help language and script detection.
  • Tuning: Adjust data-max-score based on your false positive rate. Start with 60, increase if too many false positives, decrease if missing spam.
  • Analytics: Capture the log output to monitor performance and tune thresholds.
  • Multiple forms: Add data-block-spam only where needed if you have forms that should remain untouched.
  • Performance: Disable timing/behavioral analysis (data-assess-timing="false") if you need to reduce payload size.