Philip Hendry's Blog

Randomising data in a table using T-sql

Mar 8, 2011 • SQL Server, T-SQL • 1 min read

I needed to obfuscate some data for a demo server but rather than re-invent all the data I simply wanted to randomly swap data so associations could not be made between a store and pay rates for example. After a bit of head scratching I came to the conclusion that I simply needed to be able to correlate one set of data with a random set. The simplest way I thought of was to use a row_number() to correlate one list of data with another where one set of row numbers were ordered by a primary key whilst the other used a little trick of ordering by newid() which is recalculated for each row and therefore randomly distributes the data. Here’s the code :

select * 
--update s1 set s1.StoreName = s2.StoreName, s1.StoreAddress1 = s2.StoreAddress1, s1.County = s2.County, s1.Town = s2.Town, s1.PostCode = s2.PostCode, s1.Telephone = s2.Telephone 
from (
	select row_number() over(order by StoreID) as RowNumber, StoreId, StoreName, StoreAddress1, County, Town, PostCode, Telephone from Stores s1) as s1  
	join (select row_number() over(order by newid()) as RowNumber, StoreId, StoreName, StoreAddress1, County, Town, PostCode, Telephone from Stores s2) as s2 
	on s1.RowNumber = s2.RowNumber 
)
Post by: Philip Hendry