This post was inspired by a client's query during a recent Excel course. Before looking at the potential solutions, it's worth mentioning that the first step in solving the problem had nothing to do with Excel itself. The problem as stated was that similar information was being held in two separate tables within Excel. Items might be included in both tables but, if this was the case, then for one of the rows, a particular value would be zero. This seemed to indicate that we would need to combine the tables, check for whether the ID was duplicated, perhaps using =COUNTIFS(), and then check to see whether the value column was zero. In fact, once we had looked at the problem more closely, we realised that it was much simpler. We just needed to remove anything from table 1 where the ID also existed in table 2. Ensuring that you have properly defined your solution and simplified it as much as possible, can often save a lot of time implementing an unnecessary and over-complicated spreadsheet solution.
There are several ways to identify duplicates in Excel. One of the Highlight Cells Rules in Conditional Formatting identifies duplicate or unique values in a range of cells. As we mentioned above, for more complicated situations, Excel functions can be used to return a value that identifies whether a row contains duplicated data.
Moving from identification to removal, The Data Ribbon tab, Data Tools group includes a Remove Duplicates command that allows you to select which columns in your data range need to be the same for a row to be removed:
Note that, if you don't select all the columns in the range, it might be important which row is deleted. Remove Duplicates retains the first instance found and deletes any others. It is a good idea to ensure you have a copy of the data to check back to before you allow Excel to remove duplicates.
Where your data is in an Excel Table, the Table Tools, Design Ribbon tab, Tools group also includes the Remove Duplicates command.
In our specific example, to use the Remove Duplicates command, we would have needed to add the rows in our first table beneath our second table to create a single range of data and then used the command based on our ID column. This is a manual process that would need to be repeated whenever the data changed.
For a more automatic alternative, we could use Excel's revolutionary automation feature, Power Query. Although it might require a little more knowledge and time to set up, once the process has been created, it can be repeated with a couple of clicks or set up to run automatically. In Power Query, the process will be as follows:
Use Data Ribbon tab, Get & Transform Data group, From Table/Range to load each of our tables into the Power Query editor and use the editor Home Ribbon tab, Close and Load dropdown, to Close & Load to…, Only Create Connection. Our two tables are now both available in Power Query and we can right-click on TableA in the Queries & Connections pane and choose Edit to re-open it in the editor. We need to remove any rows from TableA where a row with the same ID exists in Table B. We do this using Merge Queries to merge our two queries using the ID field as the join. We need to choose a particular type of Join Kind from the dropdown. We want our merge to only return rows where the ID only exists in the first table:
Once we have performed the merge we can just remove the new Table column that it creates and use Append Queries as New to append our Table B query to create a new, single table omitting duplicates. We then use Close & Load to load this as a new table in our workbook. When the data in either of our two tables changes, we can just right-click on our new table and choose refresh:
Over at the Excel Community there are dozens of posts that look at different ways to use Power Query to automate practical Excel solutions. Just use the search box to search for Power Query. Details of how to join the Excel Community are provided below. IT Faculty members have free access.
This blog is brought to you by the Excel Community where you can find additional blogs, extended articles and webinar recordings on a variety of Excel related topics. In addition to live training events, Excel Community members have access to a full suite of online training modules from Excel with Business. There is also an online forum where you can ask questions and share ideas with other community members.