SpamBlock Pixel Documentation

This guide walks you through installing the SpamBlock pixel, understanding the scoring pipeline, and tuning the available configuration options.

Quick start

Add the script. Place the pixel on every page that contains a form you want protected.
Add data-block-spam to the forms you want enforced. If no forms have the attribute, SpamBlock will protect all forms on the page.
Submit the form. SpamBlock intercepts the submission, runs 13+ detection signals, and only forwards the request when the score is below the configured threshold.

<script 
  src="https://api.spamblock.io/sdk/pixel/v1.js" defer
></script>
<form data-block-spam>
  ... your existing fields ...
</form>

Detection signals

Core signals (always enabled):

Honeypot field detection (+90 points, instant block)
Disposable email domains (+40 points)
IP reputation checks (+100 if denied)
Tor exit nodes (+35 points)

Advanced signals (enabled by default):

Language detection & mismatch analysis (+30 max)
Entropy & repetition detection (+30 max)
Header analysis (User-Agent, Referer, ASN) (+25 max)
Intent classification (spam keywords) (+15 per keyword)
Timing & behavioral analysis (+20 max)
Script detection & homoglyph analysis (+30 max)
Profanity detection (+30 per word)

Default threshold is 60. Submissions with a score ≥ threshold are blocked. All signals can be toggled individually via configuration.

Configuration reference

Set configuration via data-* attributes directly on the script tag. All attributes are optional with sensible defaults.

Core Configuration

Attribute	Default	Description
`data-max-score`	60	Highest acceptable score. Requests with a score ≥ threshold are blocked.
`data-debug`	false	Logs detailed scoring output to the browser console; useful during testing.

Feature Toggles

Attribute	Default	Description
`data-assess-profanity`	true	Enable/disable profanity detection (+30 per word).
`data-assess-language`	true	Enable/disable language detection (page vs text mismatch, +30 max).
`data-language-weight`	10	Points for language mismatch (default: 10).
`data-expected-languages`	""	Comma-separated language codes (e.g., `"en,de"`) for multilingual sites.
`data-assess-entropy`	true	Enable/disable entropy analysis (randomness detection, +30 max).
`data-assess-email-address`	false	Send full email addresses (not just domains) for username entropy analysis. Shows console warning for DPA transparency.
`data-assess-timing`	true	Enable/disable timing and behavioral analysis (fast submission, interaction tracking, +20 max).
`data-assess-scripts`	true	Enable/disable script detection and homoglyph analysis (Unicode analysis, +30 max).

Geo Configuration

Attribute	Default	Description
`data-block-geo`	""	Comma-separated ISO country codes (e.g., `"RU,CN"`) to block instantly (+100, hard block).
`data-allow-geo`	""	Comma-separated ISO country codes (e.g., `"US,CA,GB"`) to allowlist. Takes precedence over `data-block-geo` (+100 if not in allowlist, hard block).

<script
  src="https://api.spamblock.io/sdk/pixel/v1.js"
  data-max-score="60"
  data-assess-profanity="true"
  data-assess-language="true"
  data-assess-entropy="true"
  data-assess-email-address="false"
  data-assess-timing="true"
  data-assess-scripts="true"
  data-expected-languages="en,de"
  data-language-weight="10"
  data-debug="false"
  data-block-geo=""
  data-allow-geo=""
  defer
></script>
<form data-block-spam>
  ... your existing fields ...
</form>

What happens on submit?

SpamBlock intercepts: The script listens for form submission events and prevents default behavior.
Data collection: Collects form data (email domain, text fields), timing metrics (time to submit, per-field dwell times), interaction patterns (keyboard/mouse events), and page metadata (language, headers).
API request: Sends collected data to /v1/check for scoring.
Scoring: Worker evaluates 13+ signals including language detection, entropy analysis, behavioral tracking, script detection, and more. Each signal contributes points to a total score.
Decision: Worker responds with { allow: true/false, score: 0-100, reasons: [...], latencyMs: 123 }.
Action: If allow: true, form submits normally. If allow: false, submission is blocked and custom event is emitted for UI handling.

Scoring Categories

SpamBlock uses category caps to prevent single signals from dominating:

Content-based: +30 max (profanity, spam keywords)
Language & Script: +30 max (language detection, script analysis)
Entropy & Structure: +30 max (entropy, repetition)
Headers & IP: +25 max (User-Agent, ASN, reputation)
Timing & Behavioral: +20 max (timing, interactions)
Geo: +100 (hard block if violated)
Honeypot: +90 (highest priority after geo)

Server-side integration

No backend changes are required. SpamBlock replays the submission when it passes the score threshold. To inspect results, enable debug logging or forward the console output into your monitoring pipeline.

Downstream Filtering & Marker Fields

When a form submission is allowed (or fails open), SpamBlock automatically injects hidden marker fields into the form before submission. These markers act as authenticity indicators for downstream processing in email systems, CRMs, Zapier/Make automations, and other form handlers.

Marker Fields

The following hidden fields are injected into forms before submission:

Field Name	Description	Example
`_sb_v`	Marker schema version	1
`_sb_allow`	Final allow/block decision	true
`_sb_score`	Final numeric score (0-100)	37
`_sb_reasons`	Comma-separated reason codes	hp_filled,profanity
`_sb_ts`	UTC timestamp of evaluation (ISO 8601)	2025-02-14T10:03:11Z

Marker Behavior

Markers are only injected when the form is actually submitted: If SpamBlock blocks the submission, no markers are added (form doesn't submit).
Fail-open cases: If SpamBlock encounters an error (network failure, API error, etc.), markers are still injected with _sb_allow="true", _sb_score="0", and _sb_reasons including error_fail_open plus any issues discovered during processing.
Missing markers: If a form submission arrives without SpamBlock marker fields, it indicates the form was submitted before the pixel could evaluate it (e.g., bot bypass, script blocker, no JavaScript). This should be treated as higher risk.

Server-Side Validation

On your server, check for marker presence:

Marker present with _sb_allow="true": Submission was evaluated and allowed by SpamBlock.
Marker present with _sb_allow="true" and error_fail_open in reasons: Submission was allowed due to fail-open (error occurred, but form was allowed to proceed).
Marker missing: Form was submitted without SpamBlock evaluation (bot bypass, script blocker, etc.). Treat as suspicious (+30 risk score recommended).

Note: The absence of markers is a strong indicator of bot/bypass activity, as legitimate users with JavaScript enabled will always have markers injected before submission.

Detection Signals Explained

Content & Structure Signals

Language Detection: Detects mismatches between page language, browser language, and detected text language. Useful for catching Russian spam on English sites.
Entropy Analysis: Detects random strings (high entropy) and repetitive junk (low entropy). Flags suspicious patterns like x8q2m6k9p4r7t1 or !!!!!!.
Header Analysis: Flags suspicious User-Agents (curl, python-requests), missing Referers, and hosting ASNs (AWS, DigitalOcean, etc.).
Intent Classification: Matches text against known spam/scam keywords (viagra, free money, etc.).

Behavioral Signals

Timing Analysis: Detects fast submissions (<1.2s) and rapid field filling (<100ms per field), indicating bot automation.
Interaction Tracking: Flags missing keyboard/mouse events when form took >500ms, indicating script-based filling.
Focus Patterns: Detects suspicious focus patterns (all fields filled without focus events in <1s).

Advanced Unicode Analysis

Script Detection: Detects unexpected Unicode scripts (e.g., Cyrillic on English sites) and mixed-script confusables (e.g., frеe саѕh with Cyrillic characters).
Homoglyph Detection: Flags homoglyph substitutions in short fields (e.g., Jоhn with Cyrillic 'о' instead of Latin 'o').
RTL Control Characters: Detects bidirectional control characters used for obfuscation.
Emoji Density: Flags high emoji/symbol density (>10%) in fields where uncommon.

Tips & Best Practices

Testing: Enable data-debug="true" and try disposable domains such as [email protected], spam phrases, or fast submissions.
Multilingual Sites: Set data-expected-languages="en,de" to help language and script detection.
Tuning: Adjust data-max-score based on your false positive rate. Start with 60, increase if too many false positives, decrease if missing spam.
Analytics: Capture the log output to monitor performance and tune thresholds.
Multiple forms: Add data-block-spam only where needed if you have forms that should remain untouched.
Performance: Disable timing/behavioral analysis (data-assess-timing="false") if you need to reduce payload size.