A recent post on sqlservercentral.com got me thinking about Hashbytes recently. I was ignoring the issue of getting a false positive using a hashbyte solution to check for errors, and focused on the performance of the routine.
My initial pass on my testing was simply this.
, HASHBYTES( 'sha1', REPLICATE('x', 8000) )
After reading into the BoL a little more, I noticed that HASHBYTES can accept varchar, nvarchar and varbinary. It was at this point that I wondered about how the performance would or wouldn’t change based on the input, here are my methods and my result.
, @TimeEnd DATETIME
, @Counter INT
, @HashHolder VARBINARY -- Simply a container, overwritten many times.
, @StringToHash VARCHAR(MAX) -- string to test speed of hashing
, @VarbinaryToHash VARBINARY(MAX) -- varbinary representation of @StringToHash to test speed of hashing
SELECT @TimeStart = GETDATE()
, @Counter = 0
, @StringToHash = REPLICATE('x', 8000 )
SELECT @VarbinaryToHash = CONVERT( VARBINARY, @StringToHash )
--While loop set to 10k
WHILE @Counter < 10000
--swap these statements as needed to test both situations
--SET @HashHolder = HASHBYTES( 'sha1', @StringToHash )
SET @HashHolder = HASHBYTES( 'sha1', @VarbinaryToHash )
SET @Counter = @Counter + 1
SELECT @TimeEnd = GETDATE()
, @TimeEnd - @TimeStart
Here’s what I found:
Type – Records – Time
String – 100,000 – 00:00:04.030
String – 1,000,000 – 00:00:40.077
String – 10,000,000 – N/A
Varbinary – 100,000 – 00:00:00.737
Varbinary – 1,000,000 – 00:00:07.017
Varbinary – 10,000,000 – 00:01:04.720
As you can see – at least in this limited test – the Varbinary hashing far out performs the string method. The reason for this test was I’m looking at using a combination of methods to detect change in large data sets, and I was testing if it would be worth my time to convert everything to varbinary and then hash, or convert to varchar(max) and then hash.
I still need to investigate the overhead of converting everything to varbinary of course!