Validating File Integrity with Get-FileHash

by | Apr 22, 2021 | Collaboration

Downloading and verifying that a file hasn’t been inadvertently or maliciously changed has been something admins have done for a while.   Ideally, to make sure the file you have downloaded is exactly the same as the source, you would do a byte-to-byte comparison.  But that’s not often practical or possible for files you’ve downloaded.  Additionally, that just proves the file you downloaded was the same one that was published.  It doesn’t verify the integrity of the file in any way.  To truly accomplish this comparison and validation, we can use something known as a hash.  A hash is a string of characters that is generated by analyzing the bytes of the file using a specific algorithm.  This hash value is much smaller than the actual size of the file.  This hash value is then published alongside the file you’re downloading, which allows you to run that same hash algorithm against your downloaded file and verify the hashes match.  There are different algorithms and utilities to generate these hash values.  Every algorithm will generate a different hash, but the utility used to generate the hash will always generate the same hash value when you choose the same algorithm. 

Previously you needed a 3rd party tool to do this, but PowerShell provides a handy cmdlet to perform the computations for you.  Get-FileHash is the built-in PowerShell cmdlet that can be used to generate a hash value, allowing you verify against the reference hash. Details of the cmdlet and options are located in https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/get-filehash?view=powershell-7.1

Some vendors publish the information pretty consistently.  HPE, for example, tends to include the hash values in the notes along with the download files. Take a look at this file for the ILO 5 firmware update.

You’ll see on the tab “Installation Instruction” that they have the hash/checksum values listed there.

If you download the files, what you’d do from PowerShell is run “Get-FileHash” and specify the path to the file you want checked.

PS C:\Down\Blog> Get-FileHash -Path .\cp045967.exe 
 Algorithm       Hash                                                                   Path 
---------       ----                                                                   ---- 

SHA256          82F776A89483EB2192FC41637708A820938B9091D5964530A089092CB64AEBFB       C:\Down\Blog\cp045967.exe 

You’ll see it generated a hash value of 82F776A89483EB2192FC41637708A820938B9091D5964530A089092CB64AEBFB and you can compare that to the value on the web page and verify it matches. 

You can check multiple files too.  If you had downloaded all 3 files on that page, you can use a wildcard in the path and get them all. 

PS C:\Down\Blog> Get-FileHash -Path .\*.* 

Algorithm       Hash                                                                   Path 

---------       ----                                                                   ---- 

SHA256          82F776A89483EB2192FC41637708A820938B9091D5964530A089092CB64AEBFB       C:\Down\Blog\cp045967.exe 

SHA256          71EF16D38226ED06E72B1A87493AA90521D62D18DCF68BB15014C3C1028FBF4C       C:\Down\Blog\cp045967_part1.compsig 

SHA256          8B6A297F69E570D72111C076505BFC074AB84B618B9142129CC8746525DE49F6       C:\Down\Blog\cp045967_part2.compsig 

Then you can do the comparison of each of those files. 

Not all sites will use the default algorithm of SHA256 for computing the hash.  Some may have SHA1 or MD5 (shorter keys and faster), and some may have SHA384 or SHA512 (longer keys means longer compute time, but less chance of a chance match when it shouldn’t match, not that it is very likely with the short keys). 

In those cases when you need to use a non-default algorithm, you run the cmdlet and provide the algorithm as a parameter as shown below. 

PS C:\Down\Blog> Get-FileHash -Path .\*.* -Algorithm SHA1  

Algorithm       Hash                                                                   Path 

---------       ----                                                                   ---- 

SHA1            589038C7ED6F0271F16CDCE148534AAAE387BA0B                               C:\Down\Blog\cp045967.exe 

SHA1            BD54FC0333A123F4558ACC5BBF7B6825DC3A45A6                               C:\Down\Blog\cp045967_part1.compsig 

SHA1            B92AD34D54FFD5C0BE0289EFAAEB0B77F30AB7F1                               C:\Down\Blog\cp045967_part2.compsig 

 

This allows you to be flexible and match based on the site’s information. 

A few items to be aware of:

  • The larger the file, the longer it takes to compute the hash.  I’ve run a check against 500GB images that have taken a couple of hours.  So, it takes a long time to go through a large file.
  • The hash is based on the contents, not the date/time stamps of the file, etc.
    • For example, if I create a file called “HelloWorld.txt” and put “Hello World” in it, the hash is:
    • A591A6D40BF420404A011733CFB7B190D62C65BF0BCDA32B57B277D9AD9F146E 
    • If I rename the file to “GoodbyeWorld.txt”, the hash remains the same. 
    • If I change the text inside the file to “Goodbye World” and save it, the hash is now:
      C96724127AF2D6F56BBC3898632B101167242F02519A99E5AB3F1CAB9FF995E7 
    • And if I change the text inside the file back to “Hello World”, the hash is back to the original value.

You can generate a file and put the same text in your file and get the same hash value as I have noted above.  Hashes are computed based on the contents to ensure their integrity. If you change the “Hello World” to “Hello world” (lowercase w on world), you’ll see that you get a completely different hash.   

 Aside from basic sanity checks to make sure your downloads from sites are matching what they should be, using a hash is handy for validating large copies.  

For example, I had a fairly large file transfer occur where there was a network hiccup during the transfer.  The transfer auto-resumed, but I wasn’t confident that there wasn’t any corruption as a result of the network glitch.   I computed the hash on both the source and destination to validate no issues with the file and thus was able to save an hour of time I would have spent re-downloading a file when it wasn’t necessary. 

Another purpose of this might be to help ensure files weren’t tampered with.  While you may have good security in place, there’s always some way that someone can get to files they shouldn’t.  If you want to ensure files weren’t modified at all, you can document the results of a computation against all the files to a text file and put that in another alternate location or on immutable storage. Then whenever you need to do a validation, you can run another computation against those files and verify nothing has changed.  This is perfect for the person that is extremely paranoid within your organization. 

Leveraging the PowerShell cmdlet Get-FileHash can bring reassurance that your files were transferred properly and match the published source. 

If there’s anything we can assist with, let us know at info@peters.com; we are happy to help!