This month’s T-SQL Tuesday is being hosted by Jes Borland (blog | twitter) and is about Aggregate Functions. Good topic, especially with all the new features that are available in SQL Server 2008 and 2008 R2.
But we’re still on SQL Server 2005 where I work, and I don’t do too much development anymore, so what do I know about Aggregate Functions? Not a lot, I must admit. I can SUM, AVG, or take a MAX and MIN, grouping by a set of fields and tallying the others. But, so can everyone else. And there are a lot of people that can probably do it a heck of a lot better than I can. So, what to write about.
Many of you know, my father passed away a couple months ago. He had in is house a treasure trove of math books. I was flipping through one of these books the other day and came across the Binomial distribution, determining the probability distribution of a series of events. It brought back memories (or nightmares, depending) of my actuarial exams from long ago. Thinking about these I realized that SQL Server does not have a factorial function. That’s what I decided to write about, taking an example out of one of the Probably and Statistics books in his inventory.
Problem: Sampling without Replacement. Suppose you have a bag of 30 marbles. Ten of them are black, and 20 are white. Taking 5 marbles out of the bag at random, without putting them back, what is the probability that three of them are black?
Now, I know SQL Server isn’t the best venue for solving this (Excel), but I thought what the heck. My first task was to create a factorial function:
I know, I know, I haven’t checked for negative numbers, and I didn’t set up the fact that 0! = 1. But, I made assumptions of all positive integers in my example, knowing that to really do it right I’d have to put in some error checking.
Next, I had to create the formula that calculates the probability. The formula on paper is, and forgive me, I tried to create this in Word, but I know nothing about Word:
I then created a stored procedure that basically took four input parameters: The number of desired black marbles, the total number of marbles picked, and the total number of marbles in the bag. The stored procedure wrote itself. Again, I do not have any error checking, and since factorials grow very large very quickly, this doesn’t work after reaching a certain number. I tried to figure out the odds of various poker hands, but a decimal (38,0) is too small for 52 factorial (52!). Here’s the sproc:
Running the stored procedure with the 3 black marbles, 5 total marbles picked, and 30 marbles in the bag leads to this desired result of approximately 16% probability that out of five picks from a bag of 10 black marbles and 20 white marbles that exactly three of them are black:
There it is. Again, not the best way to calculate a probability, but a way to have a little fun with T-SQL and reliving your Probability and Statistics class. Thanks, Jes, for hosting!