Data Classification & Cataloging
In this topic we’ll provide information about Data Classification and Data Cataloging, and cover the following topics:
What Is Data Classification in Information Security?
Data classification is a process in which an organization associates data objects with one or more business contexts to support governance, security, and other aspects of the business. The act of associating a data object with a business context is often referred to as tagging, and there is a wide variety of applicable business contexts from the source of the data to its business value and whether or not it represents information about an individual.
In the context of information security, data is often classified by its sensitivity level and relevance to different information security and data protection frameworks. For example, a customer information table could be generically classified as sensitive personal identifiable information (PII) in addition to specific data labeling (e.g. names, addresses, and phone numbers).
Why Is Data Classification Important?
Data classification provides an interface for organizations to implement controls and procedures across data formats, structures, and storage technologies. Classified data allows an organization to define and implement a single policy for handling sensitive data across multiple systems and data objects. Defining multiple policies per each type of data object is not realistic in today’s data abundant environments.
There are several reasons why data classification is important:
- Context: data classification adds business context to applications and processes. For example, based on data classification, an organization can identify applications that handle sensitive data and define stricter security requirements for those applications.
- Compliance: data classification makes it easier to comply, and also proves compliance, with regulatory frameworks such as GDPR, CCPA, HIPAA, and PCI.
- Security: data classification makes the business aware of the data sensitivity, both as a whole and each time data is introduced, and allows the business to use that context to apply the right level of security control.
- Governance: data classification makes it easier to map, track, and control data.
What Are the Four Data Classification Levels?
There are typically four data classification levels in information security:
- Public: data that is in, or can be in, the public domain and can be openly shared with anyone outside of the organization. For example a data sheet about the company’s products and services.
- Internal: company-wide data that is kept within the organization and, while not sensitive, should not be shared externally. For example a guide about how to get help from the IT helpdesk.
- Confidential: domain-specific data that can be shared with specific people or teams and contains sensitive company information. For example a price list for one of the company’s products.
- Restricted: highly sensitive information that should only be available on a need-to-know basis. For example employee agreements.
What Are the Different Types of Classification of Data?
While data is classified based on each individual business’s needs, there are a few types of data classification that are more common:
- Data-based classification: a classification that describes the nature of the data. For example a credit card number or an email address.
- Context-based classification: a classification that describes the data’s business context. For example sensitive data, healthcare information, or earnings data.
- Source-based classification: a classification that describes the source of the data. For example customer data collected from the webinar registration form.
What Is a Data Classification Policy/Standard?
Organizations use data classification standards to define the various aspects of data classification and how they will come into effect within their organization.
- Goals: defines the business objectives for classification.
- Scope: defines which data will be classified. For example, the scope for classification can include customer data but exclude employee data.
- Classification levels define the types and levels of classification. For example: public/internal/confidential/restricted.
- Process: defines the classification method.
- Roles and responsibilities define the roles and responsibilities for data classification.
Challenges of Data Classification
While data classification is essential for carrying out various functions, information security is mainly concerned with sensitive data. In most organizations, sensitive data is classified into various sensitivity levels and then mapped to different categories of sensitive data (e.x. personal information).
The challenges organizations usually face when classifying data are:
- False positives: the same data could appear in different formats and different contexts. Classification algorithms that do not take into account the data’s format and context are more likely to generate false classifications. As huge amounts of data are usually involved in classification projects, even very low false-positive rates can prevent an organization from effectively classifying.
- False negatives: under various regulatory standards, data might be considered sensitive in a specific context but not in another. For example, a name might be considered non-sensitive by itself but sensitive when alongside a medical record. Classifying data outside of the usage context can and often does result in incorrect classification.
- Big data: data lakes and data warehouses represent ever-growing, dynamic repositories of data, creating a huge challenge for non-continuous classification tools.
- Cost: for most classification tools, the cost of implementing and operating a data classification policy depends on the amount of data and the number of controls established. This process hinders an organization that wants to classify large data sets with strict access requirements.