Question 1

What is AI data redaction and why do I need it?

Accepted Answer

AI data redaction is the process of detecting and masking sensitive fields — SSNs, payment data, PHI, secrets, internal identifiers — in any payload that's about to be sent to an LLM. Without it, every prompt your team writes effectively exports raw business data to OpenAI, Anthropic, Google, or Microsoft. PortEden's redaction engine runs in your perimeter, replaces sensitive values with reversible placeholders, and only the sanitized payload reaches the model. The AI gets the structure it needs to be useful; your data never leaves your control.

Question 2

Which identifiers does PortEden's redaction engine cover by default?

Accepted Answer

Out of the box: all 18 HIPAA Safe Harbor identifiers (names, dates, phone numbers, fax numbers, email, SSN, MRN, account numbers, license numbers, vehicle IDs, device IDs, URLs, IPs, biometric IDs, photos, and any other unique identifying number/code), PCI-DSS payment data (PAN, expiry, CVV, IBAN, routing numbers), GDPR Article 4 personal data (names, government IDs, online identifiers, location, biometrics), and common secrets (API keys, JWTs, OAuth tokens, AWS access keys, GitHub PATs, private keys). You can extend the rule set with regex patterns, allow-lists, and custom NER labels for proprietary identifiers like matter numbers, claim IDs, or internal SKUs.

Question 3

How does redaction work technically — regex, ML, or both?

Accepted Answer

Both, layered. Regex patterns catch deterministic formats (SSN, credit card with Luhn check, IBAN, JWT, AWS keys). A transformer-based NER model catches contextual entities (names, addresses, organizations, medical terms) where format alone isn't enough. A custom rule layer lets you add domain-specific patterns (matter numbers, claim IDs, MRNs in your system's format). Detections are merged, deduplicated, and confidence-scored before replacement. You can tune thresholds per integration or per AI client.

Question 4

Will redaction break the AI's ability to be useful?

Accepted Answer

No — and that's the design point. PortEden uses structure-preserving placeholders: a name becomes [PERSON_1] (consistent across the prompt so the model can still reason about who's who), a date becomes [DATE_1985-03-15] preserving format, an account number becomes [ACCOUNT_8821] keeping last-four for reference. The model gets a coherent, redacted payload — it can still summarize, classify, draft, and reason. On the response path, PortEden re-hydrates placeholders client-side so the human user sees the real values without them ever leaving your perimeter.

Question 5

What evidence does this produce for HIPAA, GDPR, PCI-DSS, and SOC 2 auditors?

Accepted Answer

PortEden's redaction engine produces a per-request log of every identifier masked at egress, with rule, category, integration, AI client, user, and a hash of the original payload. Auditors evaluating HIPAA §164.514(b) Safe Harbor de-identification, GDPR Article 32 pseudonymization (Recital 28), PCI-DSS Requirement 3.5 truncation/masking, and SOC 2 CC6.7 transmission of confidential information typically request exactly this kind of evidence. Logs export to SIEM or to a signed CSV. Compliance with these frameworks remains your responsibility — PortEden provides the technical control, you operate the program around it.

Question 6

Can I see exactly what was redacted from each prompt?

Accepted Answer

Yes. Every redaction event is recorded in the audit trail with: the integration source (Gmail, Outlook, Drive, Calendar, Slack, etc.), the AI client that received the request (Claude, ChatGPT, Copilot, Gemini), the rules that fired, the count and category of replacements, the user, the timestamp, and a hash of the original payload. Logs export to SIEM (Splunk, Datadog, Elastic) or to a signed CSV for compliance review.

Question 7

Does redaction work with Claude, ChatGPT, Copilot, and Gemini?

Accepted Answer

Yes. PortEden is model-agnostic — redaction happens at the integration layer (between your data sources and any AI client). Claude desktop, Claude API, ChatGPT, ChatGPT Enterprise, GitHub Copilot, Microsoft 365 Copilot, Gemini, Perplexity, Cursor, and any MCP-compatible client are supported with the same policy. Switch models without rewriting a single rule.

Question 8

How fast is the redaction engine? Will it slow down my AI workflows?

Accepted Answer

Median redaction latency is under 40 ms for a typical email-sized payload (~4 KB), under 120 ms for a long document (~50 KB). For streaming workflows, the engine processes deltas in-flight so the user-perceived response latency increase is negligible. Heavy NER inference runs on dedicated workers and is cached per-tenant.

Question 9

Can I customize redaction rules for my own internal identifiers?

Accepted Answer

Yes. Add custom regex rules with named placeholders, upload allow-lists for terms that should never be redacted (your product names, public officers, etc.), or train a lightweight custom NER model on a sample of your data. Rules can be scoped to specific integrations, specific users, specific AI clients, or specific times of day — useful for separating production from sandbox traffic.

Question 10

What happens to data we've already sent to OpenAI or Anthropic before installing PortEden?

Accepted Answer

PortEden can't undo prior exfiltration, but it can help you contain blast radius. The product includes a one-time discovery scan that identifies which integrations have been connected to which AI clients, what scopes were granted, and how much data has flowed. From there, you set redaction policy going forward, and the audit trail starts the day you turn it on. We recommend rotating any secrets that have been pasted into AI tools as a precaution.

Question 11

Is the redaction reversible? How do I see the original values?

Accepted Answer

Yes — for authorized users on the same prompt round-trip. PortEden keeps a short-lived, encrypted token vault (default 5-minute TTL, configurable) so when the AI's response comes back referencing [PERSON_1] or [ACCOUNT_8821], the placeholders are swapped back to real values in the user's browser before display. The model never sees the original; the user sees the original; the audit log records both.

Question 12

What pricing tier includes data redaction?

Accepted Answer

Basic regex-based redaction (SSN, credit card, common PII) is included on the Pro tier. Full NER, HIPAA Safe Harbor, custom rules, SSO/SAML, SCIM, and SIEM export are on the Enterprise tier. See pricing for the full breakdown.

Strip every SSN, PHI, secret, and identifier before any prompt reaches the model.

Every prompt is an unmonitored data export.

Your team pastes customer emails into ChatGPT

Claude summarizes a contract and ingests every clause

A developer pastes a stack trace with API keys into Copilot

Sensitive data, redacted before it reaches the model.

200+ patterns, four pillars of protection.

PHI · 18 HIPAA Safe Harbor identifiers

PCI · Payment Card Industry data

PII · GDPR Article 4 personal data

Secrets · keys, tokens, credentials

Inspect. Detect. Redact. Re-hydrate.

1. Inspect

2. Detect

3. Redact

4. Re-hydrate

Same email, two very different exposures.

The same workflow, two very different audit trails.

Citations, not vague reassurances.

Every source your AI tries to read from.

One redaction engine, six regulated workflows.

AI scribes without HIPAA exposure

Privileged communications stay privileged

Account & SSN redaction at the inbox

Progress notes, redacted

Secrets-aware code assistants

PII-safe macro generation

Egress redaction, not endpoint trust.

Integration-side, not endpoint-side

Tokens stay in your tenant

Every event is an audit record

Pairs well with

AI Access Control

Audit Trail

Policy Groups

Secure Gmail for AI

Data redaction questions

Stop your data from leaking into someone else's AI.