How to use Power Query to display a list of duplicate values or records in Excel

3 years ago 312

Highlighting or even deleting duplicate records in Excel is simple but listing them isn't as easy. Fortunately, you can use Power Query to easily display a list of duplicate values or records.

excel.jpg

Image: PixieMe/Shutterstock

Whether duplicate records are good or bad depends on specific conventions that you determine, not Excel. For the most part, duplicate data is common because many records repeat the same values within the same column. 

SEE: 83 Excel tips every user should master (TechRepublic)

On the other hand, a duplicate record—where all of the values are repeated—can spell trouble when reporting and analyzing the data set at large. It's easy to find duplicates; you can filter them out of a data set or format them using a conditional format rule. What you can't easily do in Excel is display only the duplicate records. The good news is that doing so is easy using Power Query, so in this article, I'll show you how to use Power Query to display a list of duplicates.

I'm using Microsoft 365, but Power Query is available through 2010, as an add-in. Excel Online doesn't fully support Power Query, but you can run queries. You can download the demonstration .xlsx file or work with your own data. This article assumes you have basic Excel skills, but even a beginner should be able to follow the instructions to success.

What is Power Query?

Power Query lets you connect to foreign and local data and then lets you transform that data that so you can use it in Excel without changing the source data. It's easy to use but unfortunately, most users are unfamiliar with it. Although we're using a simple feature within Power Query, this article isn't a basic introduction to Power Query. Now, let's actually use Power Query to display duplicates.

Define duplicate

To work efficiently with duplicates, you need to define what a duplicate is within the context of your data and how you use it. Any value that occurs more than once within the same column is a duplicate. For instance, many records in a tracking data set might have the same delivery date or customer. These are duplicate values, and they are common. 

SEE: Windows 10: Lists of vocal commands for speech recognition and dictation (free PDF) (TechRepublic)

The term can also define a record where every value in the record is repeated in another record. In other words, the entire record is a duplicate. For instance, two records that contain the same delivery date, customer and invoice number might cause a problem; you wouldn't want to invoice a client twice for the same order. These are duplicate records, and you'll usually want to delete one of them. 

For our purposes in this article, we'll use Power Query to display a list of both duplicate types: values repeated in the same column and values repeated across all columns.

As you can see in Figure A, the demonstration sheet contains duplicate values. In such a small data set, duplicates aren't difficult to spot. Working with them is a different matter, especially if the data set is large. We also have one duplicate record.

Figure A

pqfindduplicates-a.jpg

  This simple data set contains duplicate values and at least one duplicate record. 

How to list duplicate values with Power Query

Let's use Power Query to see values repeated in the columns. To do so, click anywhere inside the data set, click the Data tab, and then do the following:

  1. In the Get & Transform Data group, click From Sheet. The resulting window shows the data in Power Query (Figure B).
  2. Select the column that you want to check for duplicates. In this case, the date column is already selected, so let's use it.
  3. On the Home tab (in Power Query, not Excel), click the Keep Rows dropdown in the Reduce Rows group.
  4. In the resulting dropdown list, choose Keep Duplicates.

Figure B

pqfindduplicates-b.jpg

  Now the data is in Power Query.

As you can see in Figure C, the data set repeats two dates at least once. To see duplicates in the other columns, select a column and repeat step 3. For example, Figure D shows the duplicate value in the personnel column.

Figure C

pqfindduplicates-c.jpg

  Two dates have duplicates. 

Figure D

pqfindduplicates-d.jpg

  One employee is duplicated. 

Now you know that at least two columns repeat at least one value. If you like, check for duplicates in each column; you'll find that every column repeats a value at least once.

What if you want to see if there's a duplicate record? Let's tackle that next.

How to list duplicate records with Power Query in Excel

To quickly recap, a duplicate record repeats values across all columns. To check the data set for duplicate records, select all of the columns in Power Query. To do so, hold down the Shift key while you click each column. Then, choose Keep Duplicates from the Keep Rows dropdown. Figure E shows the result. As you might have guessed already, the result is the same as the personnel query in the last section.

Figure E

pqfindduplicates-e.jpg

  One record has a duplicate. 

Granted, this is a simple example, and the results were easy to predict. That won't always be true, especially in a large data set.

How to use the results of your Power Query search

Seeing the records might not be enough. Fortunately, you can return the results to Excel as a sheet. Simply click Close & Load in the Close group. Doing so will create a new sheet and save the resulting data set, as shown in Figure F. Once the data is in Excel, you can use it as you would any other data set.

Figure F

pqfindduplicates-f.jpg

  Save the results to Excel.

This is a simple use for Power Query. Take some time to become familiar with the different options so you can apply it to more complex tasks. 

Microsoft Weekly Newsletter

Be your company's Microsoft insider by reading these Windows and Office tips, tricks, and cheat sheets. Delivered Mondays and Wednesdays

Sign up today

Also see

Read Entire Article