Highlight Duplicate Records in an Excel List

With Excel’s conditional formatting, you can highlight cells based on specific rules. There are some built-in rules available, and you can use formulas to create your own formatting rules.

Highlight Duplicates

In this example, we want highlight duplicate records in a table. There is a built-in rule for highlighting duplicate values in a single column, but nothing that will check an entire row.

So, we’ll create our own rule, and it will require a new column on the worksheet, before we add the conditional formatting.

Concatenate the Data

In the sample data, there are two identical rows, and these should be highlighted after we apply our conditional formatting.

The first step is to use the CONCATENATE function to combine all the data into one cell in each row.

Add a new heading in cell G1 – AllData – and in cell G2, enter this formula, to combine the data from all the cells in that row.

=CONCATENATE(A2,B2,C2,D2,E2,F2)

Next, copy the formula down to the last row of data.

Apply the Conditional Formatting

Then, a conditional formatting rule is set, to color the rows that are duplicate records. We’ll use the COUNTIF function to check for duplicates in the AllData column.

=COUNTIF($G$2:$G$8,$G2)>1

If there is more than one instance of a data combination, that indicates a duplicate row, and the cells in columns A:F will be coloured. The two rows with duplicate records are highlighted, so our conditional formatting formula worked!

Download the Sample File

For detailed instructions, and to download the sample file, please visit my Contextures website: Highlight Duplicate Records in a List

Watch the Video

To see the steps for setting up the conditional formatting, watch this short video.

_________________________

0 thoughts on “Highlight Duplicate Records in an Excel List”

Jeff Weir says:

April 11, 2013 at 3:28 AM

Cool!
What’s also cool about tables is that you can filter by color. So using a bit of VBA, you can then filter to show only duplicate rows, and then do something with them.

Reply
1. eddy says:
  
  June 17, 2014 at 5:55 PM
  
  i need your help.i want to edit numbers within an excel cell.how can i do it?.when i hightligght’em and click outside the color’s gone.thanks a lot.
  
  Reply
John says:

April 11, 2013 at 5:13 AM

One very rare bug that may happen with this is if the data you have in two columns, when concatenated, looks the same. For example:
Col A Col B Concatenated 123 456 123456 1234 56 123456
This will show the rows above as duplicate when they are not.

Reply
1. Debra Dalgleish says:
  
  April 11, 2013 at 10:00 AM
  
  @John, yes, if that’s a possibility you could put separator characters into the CONCATENATE formula.
  
  Reply
Jeff Weir says:

April 11, 2013 at 6:23 PM

On the subject of delimiters, you can’t beat the pipe character | (Shift + \)

Reply
Ankit Mac says:

April 14, 2013 at 1:12 AM

Great Tutorial. But is there any way to highlight only a single instance of duplicate records? I mean if I could do so then it would be easier to just sort out the highlighted records and later delete them.

Reply
1. Debra Dalgleish says:
  
  April 14, 2013 at 10:09 AM
  
  @Ankit Mac, if you only want to highlight the duplicate records, and not the first instance of a duplicated record, use this formula: =COUNTIF($G$2:$G2,$G2)>1
  That checks the rows down to the current row for duplicates, so for the first instance, the count would be 1, and that row wouldn’t be highlighted.
  
  Reply
Rick says:

April 14, 2013 at 10:57 AM

I’m working on a project where I need to find records where certain fields are close matches. For example in the Company field one entry may be “SC Johnson Wax” and another “S.C. Johnson Wax” or “S C Johnson Wax”. Another would be where someone has misspelled a word so maybe they typed “Johnston” instead of “Johnson”. These are all things that stick out to the human eye but how to look for them programmatically?

Reply
Jeff Weir says:

April 14, 2013 at 4:39 PM

This probably requires VBA. This thread will probably help, Rick:
http://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html

Reply
Jason M says:

April 15, 2013 at 9:51 AM

@Rick,
Trying looking at using multiple wildcards, such as:
=COUNTIF(A1,”*s*c*john*wax*”)
This will get you a hit on most entries, but of course if the user leaves out the word “wax”, it won’t work.

Reply
Steve says:

April 17, 2013 at 6:53 PM

@Rick
I had to do something similar, and used VBA. Depending on your application, you might try something like this:
http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
It has sample code written in C#, but I was able to convert this to VBA without too much trouble and use it very effectively.
My application was to select the correct employee (from an employee roster) even if the input name was spelled incorrectly.

Reply
1. Steve says:
  
  April 17, 2013 at 7:00 PM
  
  Here is the code I came up with:
  Public Function iDLD_EditDistance(ByVal sSource As Variant, ByVal sTarget As Variant) As Integer
  ””””””””””””””””””””””””””””””””
  ‘ Comments: Determines the Damerau–Levenshtein distance
  ‘ (Edit Distance) of two strings. More information
  ‘ can be found at:
  ‘ http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
  ‘
  ‘ Arguments: sSource – Source String
  ‘ sTarget – Target String
  ‘
  ‘ Returns: Integer – This should be an integer value used
  ‘ by calling routine to determine a matching
  ‘ string based on lowest edit distance
  ‘
  ‘ ————————————————————-
  ‘
  ‘
  Dim m As Integer, n As Integer
  Dim i As Integer, j As Integer
  Dim i1 As Integer, j1 As Integer
  Dim xc As Integer
  Dim INF As Integer
  Dim DB As Integer
  Dim H()
  Dim sd()
  Dim sdlen As Long
  Dim tsd, tchr, tchrS, tchrT As String
  ‘ Quick check to resolve 0 lenthg stings
  If Len(sSource) = 0 Then
  iDLD_EditDistance = Len(sTarget)
  Exit Function
  Else
  If Len(sTarget) = 0 Then
  iDLD_EditDistance = Len(sSource)
  Exit Function
  End If
  End If
  ‘ Prevent inappropriately high Edit Distance values with short source strings.
  ‘ Otherwise, the Edit Distance for a string like “Weinb” would actually be smaller with “Way”
  ‘ rather than “Weinbrenner”, given a false ‘match’. It will also speed things up.
  sTarget = Left(sTarget, Len(sSource) + 1) ‘ note that if Len(sSource) + 1 > Len(sTarget) then it will return the whole string
  ‘ Remove Capital letters
  sSource = LCase(sSource)
  sTarget = LCase(sTarget)
  ‘ Initialize Variables
  m = Len(sSource)
  n = Len(sTarget)
  ReDim H(m + 2, n + 2)
  INF = m + n
  ‘Initialize Matrix
  H(0, 0) = INF
  For i = 0 To m
  H(i + 1, 1) = i
  H(i + 1, 0) = INF
  Next i
  For i = 0 To n
  H(1, i + 1) = i
  H(0, i + 1) = INF
  Next i
  ‘Initialize sd array
  sdlen = 1 ‘ Initialize sd with first char using tsd
  tsd = Mid(sSource, 1, 1) ‘ Enter first char of sSource into tsd
  For i = 2 To m ‘ pick up iteration with 2nd char of sSource
  tchr = Mid(sSource, i, 1)
  If InStr(1, tsd, tchr) = 0 Then ‘ If the current Char is not in tsd then add it
  sdlen = sdlen + 1
  tsd = tsd & tchr
  End If
  Next i
  For j = 1 To n ‘ Now do same for chars from sTarget
  tchr = Mid(sTarget, j, 1)
  If InStr(1, tsd, tchr) = 0 Then
  sdlen = sdlen + 1
  tsd = tsd & tchr
  End If
  Next j
  ReDim sd(1 To Len(tsd), 1 To 2) ‘ ReDim sd() to actual deminsions
  For i = 1 To sdlen ‘ Move tsd into sd()
  sd(i, 1) = Mid(tsd, i, 1)
  sd(i, 2) = 0
  Next i
  ‘Begin processing strings
  ‘ Not really sure how it all works, but it does! 😀
  ‘ It uses a matrix, H(), in conjuction with sd()
  ‘ By flood filling the matrix, it will determine Edit Distance
  For i = 1 To m
  DB = 0
  For j = 1 To n
  For xc = 1 To sdlen
  tsd = sd(xc, 1)
  tchr = Mid(sTarget, j, 1)
  If tsd = tchr Then
  i1 = sd(xc, 2)
  Exit For
  End If
  Next xc
  j1 = DB
  tchrT = Mid(sTarget, j, 1)
  tchrS = Mid(sSource, i, 1)
  If tchrS = tchrT Then ‘ Filling matrix with new value
  H(i + 1, j + 1) = H(i, j) ‘ If = then previous low value
  DB = j
  Else
  H(i + 1, j + 1) = H(i, j) + 1 ‘ if then Min of adjacent matrix cells
  If H(i + 1, j + 1) > H(i + 1, j) + 1 Then H(i + 1, j + 1) = H(i + 1, j) + 1
  If H(i + 1, j + 1) > H(i, j + 1) + 1 Then H(i + 1, j + 1) = H(i, j + 1) + 1
  End If
  If H(i + 1, j + 1) > H(i1, j1) + (i – i1 – 1) + 1 + (j – j1 – 1) Then ‘ Magic formula that is used for the matrix
  H(i + 1, j + 1) = H(i1, j1) + (i – i1 – 1) + 1 + (j – j1 – 1) ‘ in the case of transposition
  End If
  Next j
  For xc = 1 To sdlen ‘ More magic that helps with transpostion
  tsd = sd(xc, 1)
  tchr = Mid(sSource, i, 1)
  If tsd = tchr Then
  sd(xc, 2) = i
  Exit For
  End If
  Next xc
  Next i ‘ Matrix is now complete
  iDLD_EditDistance = H(m + 1, n + 1)
  End Function
  
  Reply
Hugo says:

April 23, 2013 at 8:37 AM

Thanks, Steve.
Great to see your implementation.
But how this could be used to color-mark cells having a short DLD?

Reply
1. Steve says:
  
  April 23, 2013 at 11:16 PM
  
  Hi Hugo,
  I will say upfront that my suggestion was beyond the original post on using the Conditional Formatting feature. Because of that, the implementation in more complicated, and required the use of VBA. However, the question poised in the comment I replied to was a more complicated problem: “I need to find records where certain fields are close matches” How close is close enough? 🙂
  First, here is the very basics of how I used it.
  1) I started with a roster of names to check against. This is kept up to date with all persons possible correct names (employees in my case).
  2) When a name is entered into a sheet (in designated name cells), the name is checked using a VBA call from a change event.
  2.1) If there is an match with the roster, then do nothing
  2.2) If no match, find the name with lowest edit distance by iterating through the roster. Replace entered text with the name found.
  So, if I were to color mark cells with potential name matches, here is how I would do it.
  Assume I’m starting with a known good name.
  1) Parse the good name so there are no spaces or punctuation, just letters.
  2) Iterate through the cells in question, parsing the cell contents to remove everything but the letters. At this point, things can be done to skip the cell of it has certain features. i.e. empty, only numbers, etc.
  2.1) If there is an exact match, change the color of the cell
  2.2) If not, find the edit distance. Change the color of any name that is within a certain of edit distance. This is where things can get fuzzy. There could be a weighted set of criteria. Edit Distance, number of letters, first and last letters the same, etc. Each of these criteria can be used to help produce more accurate results.
  The nature of the problem is such that there will likely to be some false positives (cells with names that are highlighted that really aren’t matches) and some missed matches. I would think in this case the false positives would be preferred over missing a name that should be highlighted, but wasn’t. In the end, the person looking at the results will have to verify that any name highlighted is a correct ‘close match.’
  
  Reply
Hugo says:

April 24, 2013 at 5:41 PM

Thanx, Steve.
When operating only one name at a time, then your approach will work perfectly.
Would be fine to have a solution showing the minimum of DLD distances of a name in line to all the others.
When having this, you can mark all the lines showing a very low DLD value, so indicating a possible doubled name.
No idea how to implement this.

Reply
Brian B says:

June 29, 2013 at 5:26 AM

I am curious as to how you could assign different colors to different sets of duplicates.

Reply
David says:

June 18, 2014 at 8:30 PM

You can avoid the need for concatenation if you use the COUNTIFS function instead of the COUNTIF function:
=COUNTIFS($A$2:$A$8,A2,$B$2:$B$8,B2,$C$2:$C$8,C2,$D$2:$D$8,D2,$E$2:$E$8,E2,$F$2:$F$8,F2,$G$2:$G$8,G2)
This will look at all of the values in the row and count how many other rows have the same set of values.
You could then use a modified version of this to create your conditional formatting rule without the need to create a separate column in the spreadsheet to count the duplicates:
=COUNTIFS($A$2:$A$8,A2,$B$2:$B$8,B2,$C$2:$C$8,C2,$D$2:$D$8,D2,$E$2:$E$8,E2,$F$2:$F$8,F2,$G$2:$G$8,G2)>1
You can read more about this approach here: http://fiveminutelessons.com/learn-microsoft-excel/find-duplicate-rows-excel-across-multiple-columns
David

Reply
Maggie says:

January 4, 2015 at 4:49 PM

I created a mock set of columns (Last name, First name) to test why Conditional formatting is not working for me on my real data. I used the MATCH formula and found an error (extra space after a , ) It resulted in one False and I could see the extra space. But it did not find another that had a , after “Last name”. My columns had one that needed be concatenated. Conditional formatting did yield duplicates.Also is there a way to request the duplicates to be put into a new column?
thanks in advance,
Maggie

Reply
Bruno says:

June 17, 2015 at 6:20 AM

I have a 2-column list with “telephone-number” in one column and “comment” in the other (the “comment” column may be empty).
How can I identify and remove duplicate telephone-numbers even if the comments are different?
The “Remove duplicates” function only removes duplicate telephone-numbers if the respective comments are identical.
Example:
999999999;Home
999999999;Ann
999999999;Home
Excel only identifies the first and the third row as duplicates.
If I select the first column only, duplicate telephone numbers get removed, however, the comments no longer match as none got removed.
Thanks for helping
Bruno

Reply
1. Debra Dalgleish says:
  
  June 17, 2015 at 10:48 AM
  
  Select both columns, and in the Remove Duplicates dialog box, remove the check mark from the Comment heading.
  There is a video here that shows the steps:
  http://blog.contextures.com/archives/2013/11/26/remove-duplicates-in-excel-2013-list/
  
  Reply
shameer chagani says:

February 5, 2016 at 6:20 AM

Hi,
i have a list of employee names which has a the full name(s) of the employees now the requirement is we need a separate fields for First name and last name what formula i can use to do this?

Reply
anas kambali says:

February 18, 2016 at 1:52 AM

Can i delete back the “Concenated” column and leaving the duplicates data only???

Reply
Ainsley says:

February 21, 2016 at 10:41 PM

I have a data set which requires an exact formula due to case sensitivity. What would I use instead of =COUNTIF($G$2:$G$8,$G2)>1

Reply
MarkT says:

March 7, 2016 at 2:05 PM

Thank you all for sharing your knowledge, I appreciate it!

Reply