Excel Tip of the Week #309 - Randomisation redux

This blog is brought to you by the Excel Community where you can find additional blogs, extended articles and webinar recordings on a variety of Excel related topics. In addition to live training events, Excel Community members have access to a full suite of online training modules from Filtered. There is also an online forum where you can ask questions and share ideas with other community members.

Hello all and welcome back to the Excel Tip of the Week!  This week, we have a Creator-level post in which we're making a definitive review of how to handle randomisation in Excel - including generating random numbers, randomising the order of data, and even generating random data in a statistical distribution or doing some Monte Carlo simulation.  The most recent previous tip on the randomising functions was TOTW #146, and there's fuller coverage of generating random normalised data in TOTW #162 as well as a full coverage on Monte Carlo in TOTW #278.

The basics - RAND and RANDBETWEEN

Excel, like any computer, cannot generate truly random numbers, because computing a program it is a deterministic process.  Instead, it uses a complicated algorithm to generate a pseudorandom number - which is evenly distributed and unpredictable in any practical sense.  For any practical purpose other than security, these results are "random enough".

There are two randomisation functions with slightly different purposes.  These are:

=RANDBETWEEN(lower bound, upper bound)

RAND has no inputs (although like for all functions the parentheses are required); it generates a 15-decimal random number between 0 and 1.

RANDBETWEEN requires two integer inputs which are the lower and upper bounds; it generates an integer number between these two bounds (inclusively).

In both cases, each RAND / RANDBETWEEN function works independently - i.e. if you have two identical formulas with one of these functions, each will generate its own random number separately.  Furthermore, these are both volatile functions - that is, they are always recalculated whenever anything in your workbook changes.  If you want a one-time pick of a random number, remember to copy and paste values over your randomising function afterwards.

Because it is limited in the outputs it can return, multiple RANDBETWEEN functions can and will sometimes return duplicates.  In fact, thanks to what's called the Birthday problem, such collisions will likely occur more frequently than you might expect.  For example, if you're picking random numbers between 1 and 99, you only need 12 RANDBETWEEN functions before it's more likely than not that two of them will select the same number.  This is technically also true for RAND, but with 1015 options to choose from this is extremely unlikely - it would take over 37,000,000 RAND functions before even a 50% chance of a collision.  However, because of the possibility of a clash, RANDBETWEEN is not ideal when choosing a random sample of items - which brings us to our next subtopic.

Randomising a list of items

To select a random sample of items from a list, the best way is not to use RANDBETWEEN for the reason explained above.  Instead, RAND can be used to effectively "shuffle" the sample and then "deal" the top N items to obtain a sample of size N, analogously to shuffling a deck of cards.

The way to do this is to assign a RAND function to each item in the list, and then sort the list according to the RAND values.  Because each RAND value is random and independent, the resulting order will be completely random - and hence selecting the top N items from the resulting list will yield a random sample.  Here's a demonstration of generating a random five-card poker hand from a full deck:

An unexciting pair of tens!  Note that the RAND values have changed from the left to the right - the Ten of Diamonds for example has increased its value.  This is because sorting or otherwise changing the data causes the RAND to recalculate.

Generating random data according to a distribution

The methods shown above only work for generating random data that occurs uniformly between two bounds.  If you want to generate dummy data for a model or similar, most likely this isn't realistic - real numbers tend to instead follow some kind of statistical distribution.  However, it's possible to use some of Excel's statistical functions to generate random data that follows a distribution.  For example, to generate data that fits the normal distribution (see TOTW #162 for more details), use:

=NORM.INV(RAND(), desired mean, desired standard deviation)

Or for a binomial distribution, use:

=BINOM.INV(number of trials, probability of success, RAND())

There are demonstrations of both these approaches in the attachment at the bottom of this post.

Monte Carlo

Finally, if you want to carry out a process that involves a lot of randomisation many times, and record the results from many trials - perhaps to see how randomised sales data affects a model, or to generate average results for a die-rolling game - you can use the Data Tables feature to automate the process of re-randomising and recalculating the results.  This is a more detailed approach so check out the full guide in TOTW #278 for more.

Previous post - Google Sheets: QUERY
TOTW index 
Next post - Revisiting reviewing formulas