Project Summary
US Internal Revenue Service (IRS), PII Redaction Tool
Enhancing customer service without compromising data privacy.
Client
United States Internal Revenue Service (IRS)
Industries
Analytics
Sector
Government
Practice Area
Analytics & Data Management
Overview

The IRS wanted to analyze important customer comments submitted via their website. This presented a complex problem as the IRS needed to ensure that clients who upload comments are never at risk of divulging Personally-Identifying Information, or PII. Using a custom grammar, Quoin built an application to parse and redact all PII, thus allowing the comments to be available for content analysis and metrics.

The Challenge

In order to improve customer service, the Internal Revenue Service wanted to analyze customer feedback submitted via their website. The content would be aggregated as part of a content analysis process implemented by PublicRelay. Examples of such data include inquiries, complaints, and feedback such as:

  • ”I am an accountant and I need to know how long to keep my client's Form 8879.”
  • “There is no listing on the website for form 5500SF”
  • “I tried to file Form 4868 electronically. This appears impossible.”

As these comments are often submitted with information that identifies the clients, the IRS needed a tool to ensure that their clients who upload comments are never at risk of divulging Personally-Identifying Information (PII), before this content is imported to the PublicRelay platform.

Our Solution

After analyzing the situation, Quoin designed and implemented a Windows desktop application to process and “sanitize” the data, in order to preserve all vital contextual data while redacting the sensitive PII. Using a custom grammar, the application ingests CSV-formatted data files and then parses and redacts all PII, including Social Security, phone numbers, email addresses, and other sensitive information. The resulting output CSV file displays “X” characters in place of the PII, indicating that the redaction is complete and the “clean” data is now available for use elsewhere in the IRS enterprise.

The cross-platform capable application was developed under Windows 7 and Debian Linux using free, open-source software development tools. Further, the code is capable of being recompiled for many other operating systems as required in the future.