Excel Tip of the Week #353 - Audit sampling templates: Dynamic arrays

This blog is brought to you by the Excel Community where you can find additional blogs, extended articles and webinar recordings on a variety of Excel related topics. In addition to live training events, Excel Community members have access to a full suite of online training modules from Filtered. There is also an online forum where you can ask questions and share ideas with other community members.

Hello all and welcome back to the Excel Tip of the Week!  This time we have a Creator-level post in which we are going to kick off a new miniseries.  In this miniseries we will be showing how you can build a template to do audit sampling in three different ways - with traditional Excel formulas, with VBA macros, and today's topic - how to build the template with the new dynamic arrays available in Excel for Microsoft 365.

In each of these posts, we'll be examining how the template works and explaining the formulas used in the main post.  And in each, the finished product will be available to download.  This first instalment is available for free on Tech News; the other two will be posted in the coming weeks in the Excel Community.

You can also view a version of this blog on YouTube:

A look at the layout of our template

Here's the basic layout that we will be using for all three of our templates:

The options at the top allow the user to specify whether they want item sampling (in which all items have an equal chance of being chosen), or monetary unit sampling (in which the sample is weighted towards the larger items).  They can of course also specify the size of the sample they want.

The lower part of the template is an Excel Table (see TOTW #163 and #164).  By using a Table, we can have the formulas that refer to this range automatically adjust based on the size of the population that the user pastes into the table.  In this example, we have pre-filled the template with a 100-item population.

There is also a cumulative total column that is calculated automatically by formulas, that we will be using for the monetary unit sampling later on.  This shows the total of the items prior to the current line.  The formula is:

=SUM(INDEX([Value],1):[@Value])-[@Value]

This uses structured references to refer to the Value column.  We have two different versions of that reference: [Value] refers to the entire column, whereas [@Value] refers only to the Value entry for the current row.  We are also taking advantage of the fact that INDEX functions (see TOTW #201) can be used as one or both sides of a range (indicated with a colon).  So we have an INDEX with a value of 1 that indicates the start of the table as one end of the SUM range, and this then continues down to the current row to give us a running total.  We then subtract the current row's value to show the total prior to the current row.

The other items in the template are calculated automatically.  "Sampling unit" shows the size of monetary units that we will use, and is calculated with:

=10^IF(D3<1000000,0,ROUNDDOWN(LOG10(D3)-5,0))

Essentially this will use a unit of £1 if the total items have a value of £1,000,000 or less, and otherwise will choose a power of 10 based on the size of the sample desired.  This will prevent the dynamic arrays that we will create from going over the Excel maximum of 1,048,576 items.

Building our formulas

We'll be using dynamic arrays for our template - if you're not familiar with those, check out TOTW #327.

First off, we'll create a numbering for our sample.  This is just a SEQUENCE function that looks at the size of the sample selected:

=SEQUENCE(B3)

This isn't just neat labelling - we will also use this SEQUENCE function to control the number of items that we return from our random sampling process later on.

There are two formulas that we will use based on whether the user selects Item or Monetary Unit sampling; the final version will have an IF function that just chooses between the two.  For this post we'll build up each separately as an example.

Item sampling

Here what we need to do is:

  1. Bring in our list of (in this case) 100 items
  2. Sort the list into a random order
  3. Pick a number of items from the top of our random list that matches the user's desired sample size

Let's look at each of those stages and build up our dynamic array formula step-by-step.  First, listing all off the items:

This is just a matter of referring to the identifier column of the Excel Table - the list will automatically spill into the following cells.  Next, we need to randomise this list.  To do this, we are going to make an array of random numbers the same size as the list in the table, and then sort the list of identifiers according to that random list.  To do this we use RANDARRAY to make the random numbers, and SORTBY to use that as the sorting key for the item identifiers:

This gives a shuffled-up list of all 100 items.  Finally, we need to pick just the first few items in this list, to match the chosen sample size.  To do this, we are going to use an INDEX.  Normally INDEX will pull the nth item from a list - so e.g. =INDEX(array, 1) would pull the 1st item.  However, in our case, instead of using 1, we'll point at our SEQUENCE function using the new #-style referencing.  This will create a spilled function that will take the 1st, 2nd, 3rd and so on items from our shuffled list, and return each in turn.

Our final function is:

=INDEX(SORTBY(Sampling[Item identifier],RANDARRAY(D2)),H3#)

We also want to pull the associated value for each item - we can do that using an array-enabled INDEX MATCH to look up each identifier in turn:

Once again we have used a #-reference to make a spilled function that will resize as the sample size chosen is changed.

Monetary unit sampling

If you're not familiar with monetary unit sampling, the idea is that, instead of considering each item individually, we consider each "pound" (or other unit) of the total value represented - so £1 is the first pound of the first item, and £7,849 is the first pound of the second item (in the above example), and so on.  Instead of picking items, we pick random pounds within the range (from £1 to £535,329), which will mean that larger items are more likely to be chosen.  However, we ignore any duplicates picked so that the total sample is still the desired size.

This is trickier to convert into a function, but it's still possible!  Here's the process we will use:

  1. List out all the "pounds" in the sample in order
  2. Sort the list into a random order
  3. Associate each pound in the list with the item it represents - larger items are statistically more likely to occur earlier in the list
  4. Remove all duplicates to create a 100-item list
  5. Pick the number of items needed to create the final sample

Steps 1, 2, and 5 are essentially identical to the first process - except we will be using the total value of the sample as the list length instead of the number of items.  Here's a look at where we are after steps 1 and 2:

The array spills down a full 500,000 rows!  Our current work-in-progress formula is:

=SORTBY(SEQUENCE(D3/D4),RANDARRAY(D3/D4))*D4

Note that we divide the sizes of the sequences by the sampling unit, then multiply the resulting sequence by the unit again.  This makes no difference in the case where the sampling unit is £1, but for populations where the total exceeds £1,000,000 it will prevent the array becoming too large and causing errors.

Next we need to convert these "pounds" into the items that contain them in our sorted list.  We do this with an INDEX MATCH that matches on the cumulative total, but using an inexact match to find the largest item that is less than or equal to the search value.  Now our formula looks like this:

The formula is now (previous formula in bold):

=INDEX(Sampling[Item identifier],MATCH(SORTBY(SEQUENCE(D3/D4),RANDARRAY(D3/D4))*D4,Sampling[Cumulative value],1))

This list of course contains many duplicates.  To remove them, we will use UNIQUE - and then finally we will pass our list to and INDEX as before.  Here's the finished article:

And the formula in its entirety:

=INDEX(UNIQUE(INDEX(Sampling[Item identifier],MATCH(SORTBY(SEQUENCE(D3/D4),RANDARRAY(D3/D4))*D4,Sampling[Cumulative value],1))),H3#)

This is obviously much more complex than the previous example, but it works in a very similar way.  Now we have a sample that's weighted towards the larger items in the list.

Download the template yourself from the link below and have a go at using it - and try stepping through the calculations with Formulas => Evaluate Formula to see how the dynamic arrays work!

Previous post - Keyboard shortcuts for entering formulas
TOTW index 
Next post - REPLACE

Anonymous