Excel Tip of the Week #305 - Revisiting counting distinct items

Hello all and welcome back to the Excel Tip of the Week!  This week we have a Creator level post - with a Developer-level kicker - that's all about counting unique items.  This is an update to TOTW #142.

The basic case - counting unique items in the whole range

We'll start with the simple case - where we have a range of data and want to know how many unique items are in it.  Let's take this customer list:

There's no direct function for this, so we have to use a combination of other functions.  The most common and straightforward way is to use this function:

=SUMPRODUCT(1/COUNTIF(range, range))

This is a weird looking function - so let's break down how it works.

  1. SUMPRODUCT is used to indicate that we're dealing with an array formula - i.e. we're expecting to replace what's normally a single input with a whole range of inputs.  In this case, we're using a range of criteria for our COUNTIF instead of the usual 1.
  2. COUNTIF(range, range) - This asks, for each item in the range, how many times that item appears.  Because it's an array, our output will be a list for each item - in this case {2;3;3;2;2;1;1;1;2;3} - i.e. each item in the range is replaced with a number for how many times that item appears.
  3. 1/COUNTIF - We then take the inverse of our range.  This replaces our 1s, 2s, and 3s with 1s, 1/2s, and 1/3s
  4. Finally, the SUMPRODUCT adds up the total of the range.  The more often an item appears, the smaller a contribution each item makes - so we are adding 1*1 for items that appear once, 2*1/2 for items that appear twice, 3*1/3 for items that appear 3 times - i.e. we just add 1 for each unique entry - done!

There's an Excel file showing this method with real data attached at the bottom of this post.

Incidentally, you might be thinking about our recent Google Sheets tip on UNIQUE (TOTW #299) and whether that could be used here.  And if so, good news!  This is much simpler to do in Sheets, with just:

=COUNTA(UNIQUE(range))

Check out a demonstration here.

The more complex case - distinct items within a category

There is a more complex case that we might wonder about - how many distinct items are there in some subcategory of our data?  Continuing the above example, let's say we care about how many distinct order dates are made per customer.

This is a more complicated task - but it's not impossible!  The formula is essentially an evolution of our original solution:

Here's the generic form of the formula:

=SUMPRODUCT((label range = condition) / COUNTIFS(label range, label range, value range, value range))

Essentially the way this works is to set up the numerator of the division with 1s and 0s depending on whether the label matches the condition, and the denominator with a number for the number of combinations of the label and value.  This means that we get the same inversion trick as before for anything that does match the condition, and 0 for everything else.  We can add additional conditions too:

=SUMPRODUCT((label range 1 = condition 1) * (label range 2 = condition 2) / COUNTIFS(label range 1, label range 1, label range 2, label range 2, value range, value range))

Alternatively, you could import the data into the Excel Data Model, and make a PowerPivot of it - PP has 'Distinct Count' as one of the ways of summarising a value field.

And yes, once again Google Sheets has an easier answer:

=COUNT(UNIQUE(FILTER(label and value rangelabel range=condition)))

You can see these approaches in action in the Google Sheets demonstration here or the Excel demonstration attached at the bottom of this post.

Previous post - Google Sheets: FILTER
TOTW index 
Next post - VBA case study: Creating filtered reports

Anonymous
Parents
No Data
Comment
  • Perhaps worth mentioning that the most recent version of Excel now has its own UNIQUE() and FILTER() dynamic array functions as described by Liam Bastick in an Excel Community article: ion.icaew.com/.../modelling-101-part-41-getting-arrays-spilling-the-beans-on-seven-new-functions

Children
No Data