In an earlier post (Starting the Azure Information Protection Conversation), I mentioned that organizations that don’t have a data classification standard and associated policy will have a difficult time implementing many information security related controls, such as DLP and rights management. Data classification can also help with disaster recovery optimization, justifying spend on technology, and it may be a requirement for cyber-insurance. Since many security projects rely on proper classification of data, we have seen an up-tick in requests related to helping clients to tackle these initiatives.
What does a Data Classification Project look like?
A Data Classification Project assesses an organization’s digital assets to determine criticality, sensitivity, privacy requirements, and to determine a naming taxonomy to categorize the data. A typical project consists of a series of interviews, research, and tools-based discovery to establish the following regarding the data:
- Criticality to the organization
- Regulations relevant to data
Location of Data and its Risk
Interviews with business unit managers and the technical staff are coupled with a tools-based discovery effort to identify all of the repositories containing data. These repositories will start with physical data containers such as hard drives, SANs (Storage Area Network), NAS (Network Attached Storage) devices, cloud providers, etc. Ultimately, this undertaking will identify databases, spreadsheets, and applications. Once there is a mapping of the data locations, we will start to group the data by risk to the organization.
The following table illustrates an example of the risk to data and how it might be calculated:
Data Labels and Classification
Once the data is identified and ranked, a naming taxonomy will need to be decided upon. This is when “data labels” will be used to mark data so the appropriate controls can be applied – whether they are automatic or manual, technical, or otherwise. Most people have at least heard of one of the federal government’s data labels – “Top Secret” – a very high level of sensitivity of information, only allowed to be viewed by individuals with a “Top Secret”, or higher, clearance. The “Top Secret” designation is the “data label”, which is applied to documents, emails, etc. The “classification” is the understanding of what information falls into this category. While regulated entities, such as healthcare and banking, already have data protections defined, all organizations will still need to come up with a labeling scheme. Carnegie Mellon University has a great example of how their data is labeled and classified:
Restricted Data – Data should be classified as Restricted when the unauthorized disclosure, alteration or destruction of that data could cause a significant level of risk to the University or its affiliates. Examples of Restricted data include data protected by state or federal privacy regulations and data protected by confidentiality agreements. The highest level of security controls should be applied to Restricted data.
Private Data – Data should be classified as Private when the unauthorized disclosure, alteration or destruction of that data could result in a moderate level of risk to the University or its affiliates. By default, all Institutional Data that is not explicitly classified as Restricted or Public data should be treated as Private data. A reasonable level of security controls should be applied to Private data.
Public Data – Data should be classified as Public when the unauthorized disclosure, alteration or destruction of that data would result in little or no risk to the University and its affiliates. Examples of Public data include press releases, course information and research publications. While little or no controls are required to protect the confidentiality of Public data, some level of control is required to prevent unauthorized modification or destruction of Public data.
Once a project like this is completed, it will become much easier to implement effective protective controls on the appropriate data. I usually find the exercise also sheds light on the sheer expanse of data an organization maintains. When management is given visibility into the amount of data, backed up by numbers showing the risk the data poses to an organization, decisions can be made and budgets can be created to ensure the protection of data is appropriate.
Data classification can be a big undertaking. If you find yourself needing any assistance or if you have questions on the process, you can email us at firstname.lastname@example.org. We’re happy to help.