Hashbytes – VARCHAR vs. VARBINARY Performance

A recent post on sqlservercentral.com got me thinking about Hashbytes recently. I was ignoring the issue of getting a false positive using a hashbyte solution to check for errors, and focused on the performance of the routine.

My initial pass on my testing was simply this.

SELECT
REPLICATE('x', 8000)
, HASHBYTES( 'sha1', REPLICATE('x', 8000) )

After reading into the BoL a little more, I noticed that HASHBYTES can accept varchar, nvarchar and varbinary. It was at this point that I wondered about how the performance would or wouldn’t change based on the input, here are my methods and my result.

--Variables
DECLARE
@TimeStart DATETIME
, @TimeEnd DATETIME
, @Counter INT
, @HashHolder VARBINARY  -- Simply a container, overwritten many times.
, @StringToHash VARCHAR(MAX) -- string to test speed of hashing
, @VarbinaryToHash VARBINARY(MAX) -- varbinary representation of @StringToHash to test speed of hashing

--Variable Setup
SELECT @TimeStart = GETDATE()
, @Counter = 0
, @StringToHash = REPLICATE('x', 8000 )

SELECT @VarbinaryToHash = CONVERT( VARBINARY, @StringToHash )

--While loop set to 10k
WHILE @Counter < 10000
BEGIN
--swap these statements as needed to test both situations
--SET @HashHolder = HASHBYTES( 'sha1', @StringToHash )
SET @HashHolder = HASHBYTES( 'sha1', @VarbinaryToHash )

SET @Counter = @Counter + 1
END

SELECT @TimeEnd = GETDATE()

--Results
SELECT @TimeStart
, @TimeEnd
, @TimeEnd - @TimeStart
, @Counter

Here’s what I found:

Type – Records – Time

String – 100,000 – 00:00:04.030
String – 1,000,000 – 00:00:40.077
String – 10,000,000 – N/A
Varbinary – 100,000 – 00:00:00.737
Varbinary – 1,000,000 – 00:00:07.017
Varbinary – 10,000,000 – 00:01:04.720

As you can see – at least in this limited test – the Varbinary hashing far out performs the string method. The reason for this test was I’m looking at using a combination of methods to detect change in large data sets, and I was testing if it would be worth my time to convert everything to varbinary and then hash, or convert to varchar(max) and then hash.

I still need to investigate the overhead of converting everything to varbinary of course!

This entry was posted in SQL Server Performance and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *