Wednesday, April 7, 2010

Delete duplicate records from a SQL Server table

Today one of my colleague asked me how to delete duplicate records from a SQL Server Table. So to answer his question, let’s check the following example.

Note: The below example works on SQL Server 2005 and later versions.

--Create Table
CREATE TABLE dbo.TestDuplicate (EmpNo INT, EmpName VARCHAR(10)); 

--Insert Test Data
INSERT
INTO dbo.TestDuplicate VALUES (1,'Emp 1');
INSERT
INTO dbo.TestDuplicate VALUES (1,'Emp 1');
INSERT
INTO dbo.TestDuplicate VALUES (2,'Emp 2');
INSERT
INTO dbo.TestDuplicate VALUES (2,'Emp 2');
INSERT INTO dbo.TestDuplicate VALUES (3,'Emp 3'); 

--Remove Duplicates
WITH
CTE
AS (SELECT
ROW_NUMBER () OVER (ORDER BY EmpNo) SrNo , EmpNo FROM dbo.TestDuplicate)
DELETE FROM CTE WHERE SrNo NOT IN (SELECT MAX(SrNo) FROM CTE GROUP BY EmpNo);

--Check the table
SELECT * FROM dbo.TestDuplicate;

-- Data before deleting the duplicate records.

image

-- Data after deleting the duplicate records.

image

Points to Note:

  1. This was possible because of the ROW_NUMBER() function available in SQL 2005 onwards.
  2. ROW_NUMBER() function Returns the sequential number of a row within a partition of a result set. This means for each row in a result set it will return one unique sequential number for that row.
  3. The OVER clause determines the partitioning and ordering of the row set.
  4. As we have used CTE (Common Table Expression) it allows us to reuse the result set like a temporary table.

Hope this article may help you in deleting the duplicate records.

Happy Learning !!!

No comments:

Post a Comment