Classify Business Documents is the First Step towards Information Security

October 20, 2024

Top 5 Reasons to Classify First

Text

REASON 1 | DATA SECURITY IS A BUSINESS PROBLEM THAT TECHNOLOGY ALONE CANNOT SOLVE

There is a widely held belief (or perhaps simply a hope?) that data security can be solved by implementing a new piece of technology. Stopping data from being downloaded, encrypting data, ensuring access credentials—all of these protections can be programed into a security net designed to prevent breaches. True security, however, is a constant process that involves everyone in an organization. Exclusive reliance on automated systems will doom your project to failure.

Many DLP implementations hit their first snag with the initial setup. Often, the IT department is given a list of criteria that defines sensitive information and security policies for dealing with it. Beyond defining what the DLP system must look for, the data and business process owners are not involved in enforcement. Even though users are a large part of the problem (either through accidental or malicious intent), they are not required to identify the data they are handling. The task of protection is left in the hands of IT administrators.

Handed their instructions, IT staff program the search algorithms that catch data breaches. Assuming they have accurately interpreted the instruction from the business process owners (that are often simply lobbed over the wall), IT creates rules for detecting and then managing data leaks. To ensure that nothing is leaked, these algorithms are set to be stringent at first, meaning that many potential breaches are caught. But, many “catches” are not security breaches at all. The tighter the security, the more “false positives” are caught and the more calls workers place to the IT department asking for data to be released. False positives are a big problem as they:

Require manual handling (review or release) by the IT team
Stop business workflow, frustrating the users

The IT department is ill suited for the task of determining what constitutes a breach and what does not. It overloads them with added work and, in some cases, this review by IT may itself constitute a security breach. Without user involvement, DLP systems are guessing on the sensitivity of the data. If the users had means to tell the DLP system how to handle the data, IT would not be put in the position of having to review excessive data breach reports or have to respond to constant requests from information owners to let their data go.

Business user frustration is another negative side effect of making data security an IT issue. Workers want and need to have the power to perform the tasks they were trained for and were hired to do. While it is important to prepare for the small fraction of individuals who may steal data, it is important not to treat your entire workforce as though you distrust them all. If the day your DLP system is turned on your workers find that activities they used to do as part of usual business practice are blocked or significantly hindered, there could be tremendous resistance and push-back. Even when they know the changes are coming, if the DLP system is catching too many false positives, the whole project could be at risk as angry employees harass IT to release their data or search for ways to circumvent security. The result? DLP security measures are weakened. Companies would rather deal with minimal data loss just to keep workers happy and the business rolling.

Users should be empowered to take responsibility for the security of data they use and create. User-driven classification provides much more accurate data identity and will thus help ensure the DLP system handles the data correctly. Greater accuracy will also release the IT team from excessive manual monitoring. User classification also has the added benefit of fostering a culture of security in the user community. Rather than being subject to “Big Brother,” users are a respected part of the security solution that is in place to help protect their company and, subsequently, their jobs.

REASON 2 | CLASSIFICATION FOSTERS A SECURITY CULTURE

Security systems have done an excellent job at preventing prying eyes from gaining access to sensitive information in the corporate network. What they aren’t as good at is preventing accidental disclosure by careless users with legitimate access. While a DLP’s failure to catch a particular breach can be classified as an “error,” it is the user who accessed and distributed the information that is the real problem. The act of asking (or forcing) users to classify each file while guiding them to correct decisions based on approved policy helps to improve the source of the problem: users who lack awareness of the proper security procedures.

Common data breach accidents include:

Incorrectly addressed email
Sensitive data in an email or email attachment
Accessing data from unsecure, public sources
Lost devices and storage media
Accidental inclusion in e-discovery packages
Inappropriate sharing to personal email and devices.

These breaches are predominantly caused by user ignorance or error. While a DLP system is vital to providing a second look when these mistakes occur, without classification not all breaches may be caught.

Despite all the time, money, and effort your organization may (or may not!) put into training staff on security policy and the proper handling of sensitive information, employees are not likely to retain this information to the degree necessary because they are not usually motivated by security. As work pressures ebb and flow, users tend to put security concerns aside to expedite business. Deadlines, commissions, being seen as efficient and as a hard worker; these are the motivations that drive most employees. They quickly forget why they need to protect information (“it won’t hurt the company’s profits”) or they intentionally try to bypass security (“if I can’t email this document I will just print it and take it with me”) in their rush to finish a task.

Even if a DLP system does catch the breach, there is usually no informative response to help the user remediate or learn from their error. Depending on how the DLP system is configured, an email that violates the organization’s security policies may be:

Returned immediately to the user
Put in quarantine pending manual review
Encrypted and sent anyway (hopefully the recipient can decrypt)
Automatically deleted

The user responsible for the email may not know for hours, days or ever that the email was blocked. Even if the email is sent back to the sender, the policy breach notification (normally just the policy rule name) may not contain enough details for the sender to know how to fix the email or avoid the same problem in the future.

This “solution” not only fails to prevent users from repeating the same error, but it creates frustration among the user community. Although the DLP system is there to help protect the users and the data they share, it becomes viewed as an impediment to business. In many cases, user push-back has even forced administrators to turn off data protection polices and simply rely on data monitoring. In monitoring mode, harmful data are not blocked; it is only recorded in logs. Pointing fingers after a data breach does nothing to mitigate the damage a breach can cause.

A classification tool, however, consistently reminds users of data security policies each time they save a document or send an email. By reminding (or forcing) users to identify the sensitivity of the information, data security remains constantly top of mind. TITUS classification solutions provide policy information to the user, guiding them through their decisions so they apply the proper classification designation. And, by checking the selected classification against the email content and attachments, classification tools can immediately identify possible breaches before the email ever leaves the user’s control.

With classification, the user works with DLP and other security systems to ensure data protection policies are followed and enforced.

REASON 3 | DLP SYSTEMS HAVE TO KNOW THE DATA TO KNOW HOW TO MANAGE IT

To prevent data loss, your DLP technology must know what to block. DLP systems use powerful search algorithms to examine the data residing in, traveling through, and leaving your network. Based on what it finds, DLP systems have several options—from preventing access, to denying copy actions, to encrypting data. But all these useful data governance actions are dependent on how the DLP system identifies the data. Failure of the search algorithm means either failure to enforce the proper security policy or freezing the data until it is manually reviewed.

DLP searches look for key strings of text in the data or in its properties. In some cases, this data can be very specific, such as a Social Security Number (SSN). In other cases, the sensitive data indicators might be a specific string of text unique to your organization. In both cases, the DLP system is still making a guess; configuration of the DLP search algorithms determines how much is caught.

Some PII, such as credit card numbers, do have a precise mathematical formula which can be used by DLP systems for detection. But there are other items, like a SSN, where no validation algorithm exists, and as a result DLP search algorithms must be set to be fairly broad.

Take the following example of the nine-digit Social Security Number (SSN): 000-00-0000

A DLP system could be set to capture the specific sequence of numbers and symbols. The DLP could be specifically set to search for a group of three numbers, a group of two, and then a group of four numbers, all separated by a hyphen. Using this specific search criteria, the DLP would identify only data which contains this exact number/character sequence and would miss any others Social Security Numbers where the exact sequence was broken. If there were no hyphens, for example, the scan results would miss the number, resulting in a false negative. If the search rule is less stringent to include any sequence of nine numbers (not broken by more than one character or space between each number), then any and all nine number sequences—like telephone numbers—would be identified as a security breach. The broader the search rule, however, the more false positives are found resulting in increased user frustration and review work.

Since false positives are such a burden, many organizations running a DLP system resort to loosening the reins. For instance, a policy may state that any email or document cannot be sent outside the company if it contains a Social Security Number. However, because of the number of false positives, the policy may be amended to stop only emails or documents that contain five or more Social Security Numbers. While this ensures the likelihood that the data does contain an SSN, it also means that small breaches are permitted

Regardless of the content or the formatting, explicit classification metadata allows DLP systems to manage data with certainty. It doesn’t matter if the DLP scan confuses a telephone number with a Social Security Number. Classification provides precise governance instructions in either case. Of note, the DLP system should still be configured to record when its scan conflicts with the classification. By using both tools, any irregularities in worker behavior can be tracked to locate careless or possibly malevolent employees.

Context is another area where DLP search algorithms cannot be relied on to correctly filter data. What might be sensitive information in one context may be innocuous in another. For example, sales data might be a closely guarded secret for a publicly traded company until the official earnings report is shared. In a different case, access rights to the information may have changed based on the user’s role or even their physical location. Accounting for these context changes can be difficult to enforce programmatically. Yet, users know this information and should be given the ability to communicate context with the DLP gateway. In most cases, however, DLP administrators are left to decide the fate of quarantined data without knowing who it belongs to, its exact importance, or the intent of the sender. Once again, classification can solve this issue. Clear classification labels provided by a user who understands the current data context can provide unambiguous instructions the DLP can interpret for proper policy enforcement.

REASON 4 | DLP WORKS BEST ON KNOWN THREATS

DLP systems are designed to check for specific patterns in text. But, if the identifying data is difficult to isolate as risky (common phrases, shared terms) or is not text-based, DLP systems can miss this information all together.

Intellectual property (IP) often falls into the category of data that is difficult to recognize. Unlike a credit card number or a patient I.D., intellectual property is widely varying in format and is constantly being created faster than search terms can be updated. For instance, for each new project it may be required that DLP administrators create and test new rules based on the expected content. Without the new rules, the DLP system may fail to protect data about the new project.

Also, IP could take almost any form or media format. Chemical formulas, manufacturing processes, customer lists, product development documents; these are all examples of data that could either contain such specific terms that a DLP cannot realistically be updated to detect, or are so common that filtering to find them would bring up far too many false positives. Media files—such as videos, audio recordings and images—may contain private data or IP as well, but scanning their contents is difficult. Unless multimedia files are given an explicit classification using metadata the DLP can read, the DLP search capabilities are nearly powerless.

Potentially the most valuable asset to an organization, intellectual property must be protected. Studies have shown that 50% of all staff that leave your organization will take IP with them; 80% of those will knowingly use that IP at their new job.2 The primary reason behind these high numbers is a poor understanding by employees about IP and its importance. Since intellectual property is generated by your users, it follows that they should be tasked with identifying files that contain IP and the sensitivity. These actions will not only dramatically help your DLP systems protect IP from illicit access or sharing, but it will also remind users that this information has real value and belongs to the organization.

REASON 5 | ADDITIONAL BENEFITS OF CLASSIFICATION

Outside of enhancing DLP classification provides several other benefits that should not be overlooked.

Interoperability with the Entire Security Ecosystem

Persistent classification metadata offers the ability to trigger other protection systems based on classification, such as the automatic application of encryption like Ionic file protection, Microsoft AD Rights Management Services® (RMS) or S/MIME protection for email.

Data Retention Management

Classification simplifies data retention because it provides more information to a content archiving system and individual users to process when making decisions about the appropriate retention period. Classifications can include date or status fields that, when filled or edited, can instantly update the retention and disposition status.

Email Redactions

Email text can often contain sensitive information. By checking the email’s classification level against the email content it is possible to alert users when they are about to send information that conflicts with policy. Users can be given the option to redact the sensitive data, replacing it with a black mark.

Flexible Email and Document Visual Markings

Classification can enable the application of customizable headers and footers, watermarks, email subject line marking, email message body labeling, dynamic disclaimers, and portion markings. These markings remind users of the information sensitivity which promotes responsible handling.

eDiscovery

Classification helps organizations avoid accidentally including too much of or even the wrong information in eDiscovery process. Classification labels can be used to help sort and qualify only the data required.

Insider Threat Detection

The effectiveness of insider threat detection improves significantly when it becomes possible to monitor how users interact with sensitive information. By providing identity to data there is no guesswork when analyzing exactly which files users are accessing, copying, and uploading. In addition, applying policy based on classification forces the malicious user to engage in activities that can quickly be flagged as suspicious, such as downgrading the classification of a file in order to bypass security protocols.

Sample Classifications

1. Public	PUBL	Documents intended for wide or external distribution.
		Examples: Press releases, marketing materials, published reports, and product brochures.



2. Internal Use Only	INTER	Documents that are not sensitive but meant only for internal employees or within the organization.
		Examples: Internal announcements, employee handbooks, non-sensitive meeting minutes, and newsletters.



3. Confidential	CONF	Sensitive information that should only be accessible to certain groups or individuals within the organization.
		Examples: Financial reports, customer lists, product roadmaps, contracts, and internal strategic plans.



4. Highly Confidential (Restricted)	SECRT	Highly sensitive documents that could cause significant harm if leaked outside the organization or to unauthorized personnel.
		Examples: Trade secrets, intellectual property (patents, design specs), executive communications, legal matters, and merger/acquisition documents.



5. Personal Identifiable Information (PII)	PII	Documents containing personal information of individuals, such as employees, customers, or business partners.
		Examples: Social Security Numbers, employee records, customer addresses, health data, and financial data.



6. Payment Card Information (PCI)	PCI	Documents containing payment-related information.
		Examples: Credit card numbers, billing details, and transaction receipts.



7. Health Information (HIPAA or PHI)	HEALT	Documents containing protected health information.
		Examples: Medical records, health insurance information, lab results, and patient-related data.



8. Regulated Information	REGUL	Documents subject to specific legal, regulatory, or industry compliance requirements.
		Examples: Compliance reports (GDPR, CCPA, SOX), environmental reports, and audit documents.



9. Legal and Contractual Documents	LEGAL	Documents that contain legal agreements, contracts, or binding commitments.
		Examples: NDAs, service-level agreements (SLAs), partnership agreements, and legal correspondence.



10. Intellectual Property (IP)	IP	Documents related to proprietary technology, inventions, or business processes.
		Examples: Patents, copyrights, product designs, source code, and proprietary algorithms.



11. Financial and Tax Records	FINT	Documents containing financial performance, tax filings, and accounting data.
		Examples: Income statements, balance sheets, tax returns, audit results, and budget forecasts.



12. Operational Documents	OPSD	Documents related to the day-to-day operations of the business.
		Examples: Project management documents, operation manuals, maintenance logs, and supply chain documentation.



13. Human Resources (HR) Documents	HRDD	Documents that pertain to employee information, hiring processes, and employment records.
		Examples: Employee contracts, performance reviews, training materials, and payroll information.



14. Incident Reports and Risk Assessments	RISK	Documents that report security incidents, risk evaluations, or safety-related issues.
		Examples: Cybersecurity breach reports, safety inspection records, and disaster recovery plans.



15. Strategic Documents	SD	Documents that outline the organization’s long-term plans or business strategy.
		Examples: Strategic goals, market analysis reports, leadership presentations, and competitive analysis.



16. Audit Logs	LOGS	Documents tracking the use of systems, data, or operational workflows for compliance purposes.
		Examples: System access logs, application usage logs, security event logs, and audit trail reports.



17. Marketing and Sales Documents	MARC	Documents related to marketing strategies, campaigns, or customer outreach.
		Examples: Campaign plans, customer journey maps, promotional materials, and sales forecasts.



18. Research and Development (R&D)	RND	Documents related to product research, testing, or development projects.
		Examples: Research papers, development roadmaps, technical specifications, and test results.



19. Communication Archives	CA	Documents or records that capture internal or external communication for business purposes.
		Examples: Emails, chat logs, meeting transcripts, and customer support conversations.