O365 Custom Sensitive Information Types – A Business-Person’s Guide

by | May 9, 2018 | Security | 0 comments

Microsoft’s Office 365 Security and Compliance Center provides a broad toolset for identification and protection of business content.   One of the really powerful capabilities is Sensitive Information Types which allow organizations to identify specific content based on matching patterns of words, characters, and spacing within documents, emails, and conversations.

For fast adoption, Microsoft provides a large number of pre-built Sensitive Information Types.  These pre-built types primarily target identification numbers, such as United States Social Security numbers, bank routing numbers, passport numbers, etc.

Sensitive Information Types can be thought of as reusable search queries.   By adding a Sensitive Information Type to Security and Compliance Label or Data Loss Prevention policy, organizations are able to identify content stored in O365 for compliance and retention purposes.

Organizations can also create their own Sensitive Information Types.   Why would this be valuable?

  • To identify if usernames and passwords to internal systems are being shared.
  • To protect formulas
  • To locate contracts and other business records
  • To identify confidential information

The technical process to create Sensitive Data Types includes creating .XML files (Package) and uploading to your O365 Tenant via PowerShell.    The design of the rules that define a Sensitive Data Type, however, can be done by business personnel once they understand the capabilities.

How a Sensitive Information Type is put together.

Sensitive Information Types consist of one or more “Patterns”, which, as the name implies, is a set of rules (Elements) that identify the information you are looking for.   Each pattern is assigned a “level of confidence”, which identifies how reliably the pattern will find documents with the specific information without including results that don’t match.

Pattern Elements are created using the following concepts:

Regex (Regular Expression) – This is an expression that describes what the information should look like in the document.  Regular Expressions utilize a specific language (https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference) to define a pattern of numbers, letters and spaces that should contain the information you’re looking for.

Keywords – Specific words or phrases associated with the information.  Keywords can be case sensitive or not.

Built-in functions – Prebuilt regexes that perform tasks such as locating a specific date or a credit card number.

Proximity – Within the text, define how far away keywords can be from the regex you are looking for.

Match element – allows And or Or boolean operators to combine Pattern Elements together.

minCount – Describes how many instances of data that match are required.

Custom Sensitive Information Type Example – Passwords

Let’s say, for example, that I’m trying to determine if company usernames and passwords are being sent out via email from our company.  In order to help my developer create a custom sensitive information type that can find passwords in the email or attachments, I need to provide rules that define what the password should look like.

In this example, let’s assume our company appropriately enforces strong password generation:

  • Password must be at least 8 characters.
  • Must include at least one uppercase letter, one number and one special character.
  • Cannot contain the username text

Usernames are generated by IT and consist of the user’s email address or the user’s 1st initial, then the first 7 characters of the user’s last name.

I would expect that somewhere in the email, the sender would have text that references the concept of access to our network.  Keywords could be:

  • Password
  • User
  • Username
  • User name
  • name
  • Pass
  • U:
  • U-
  • P:
  • P-
  • Login
  • Logon
  • Access
  • Network
  • System
  • Systems
  • A list of our business applications

I might expect that these keywords should be pretty close in the text to the username and/or password.

I wouldn’t expect that the username or password would show up more than once.   It’s possible the username and password are sent as separate emails.

The person sending the email might surround the username or password with “ “ or ‘ ‘.

To create my rules, I’ll define several expressions and patterns:

Regex for Password – Matching our password rules with either spaces, “” or ‘ ‘ on either side of the password

Regex for Username – Matching email address rules or against a list of our user names

Pattern 1 – Match Regex for Password or Regex for Username.  Confidence -Medium

Pattern 2 – Match Regex for Password or Regex for Username and any of the keyword list. Confidence – Medium

Pattern 3 – Match any of the keyword list.  Confidence – Low

Pattern 4 – Match Regex for Password or Regex for Username and any of the keyword list. Keyword are in proximity of the Regex by 20 characters or spaces.  Confidence – High

Now my developer has a structure to build a sensitive information type.  We can test this by using uploading the sensitive information type to the Security and Compliance center and building a search using the sensitive information type via Content Search.

As you may have been reading in our blogs lately, we have been assisting many organizations in meeting requirements for the General Data Protection Regulation (GDPR).  Microsoft has provided a very helpful resource that discusses how you can leverage this feature for sensitive information types specific to GDPR which you can read HERE.

Have questions? Would you like to learn more? Simply send us an email at info@peters.com or call 630.832.0075 to start the conversation.