BCL easyPDF SDK
easyPDF SDK Usermanual
PDF Creator Programming API  |  Download Free Trial  |  Contact Us to Purchase

RedactWords Method

Securely and permanently removes words from all pages of a PDF document without leaving any traces behind.

Sub RedactWords(InputFileName As String, OutputFileName As String, WordsToRemove As Variant, FillColor As OLE_COLOR)

void RedactWords(string InputFileName, string OutputFileName, string[] WordsToRemove, System.UInt32 FillColor)

void RedactWords(String InputFileName, String OutputFileName, String[] WordsToRemove, int FillColor) throws PDFProcessorException

Parameters

Return Values

N/A.

Description

The main purpose of this function is to securely and permanently remove individual words from a PDF file, without leaving any traces behind.

After removing the specified words, they will never again be found in the PDF. However, a gaping hole will be left behind. From the size of the gap it may be possible to guess how long the removed words were.

This function automatically optimizes the PDF, even when OptimizeAfterEachProcess is False, removing the full history, thus all traces of the original words.

If there is an image underneath a redacted word, those pixels directly under the redacted word will also be wiped out.

In addition, the wiped out words can optionally be filled with a rectangle (usually black or gray, but any color may be chosen). The special color value of hexadecimal FFFFFFFF indicates no fill rectangle at all.

Use Cases

This function was designed to remove sensitive information from PDF documents. For example, names, social security numbers, driver license numbers, insurance policy numbers, and other sensitive information.

However, the function may also be used to simply delete words from all pages, even when security is of no concern.

Restrictions

This function only removes page content that is an actual text. It will not remove annotations, form fields, metadata, graphics that look like text to only a human, but not for a computer. The text can only be removed if it can be copied to the clipboard, and the text must be legible on the clipboard. If the text looks good to a human reader, but contains complete garbage when copied to the clipboard, it will not be redacted.

Furthermore, this function does not consider reading order, paragraphs, tables, lists, text that flows over to the next line, next column, next page. Hyphenated or misspelled words will not be redacted. There is no stemming / inflection support. For example, run and running are two completely separate words. Also color and colour are different, just like aluminium and aluminum, or pediatrician and paediatrician.

When a word is removed, some graphical entities may still be left behind, such as its background color, underline and strikethrough marks.

Hidden or invisible text will be found and removed, though, if all other conditions are met. For example, white ink on white background is no problem for the engine. Image + hidden text is handled as well, as well as text under images or rectangles. As long as the text is there and computer-readable, it will be removed.

Example Usage (VB, ASP)

Set oProcessor = CreateObject("easyPDF.PDFProcessor.8")
 
' Redact words
oProcessor.RedactWords "C:\test\input.pdf", "C:\test\redacted.pdf", Array("John", "Smith"), CLng(&Hffffffff)

Example Usage (C#)

PDFProcessor oProcessor = new PDFProcessor();

// Redact words
oProcessor.RedactWords(@"C:\test\input.pdf", @"C:\test\redacted.pdf", new string[] { "John", "Smith" }, 0xFFFFFFFF);