This site will work and look better in a browser that supports web standards, but it is accessible to any browser or Internet device.
N U T S H E L L | |
Module | Histogram |
---|---|
Description | Generates dynamic histogram result sets |
Environment | SQL Server 2000/2005 |
Author | Michael Sorens |
Since | CleanCode 0.9.22 |
A histogram, according to Wikipedia, is "...a mapping... that counts the number of observations that fall into various disjoint categories."
The Histogram
stored procedure generates a histogram
as either a single-row or multiple-row result set in T-SQL.
Each column of the result set is a bucket that tabulates values for a given range of the data under analysis.
For a multiple-row result (a detail histogram) each row provides in effect a mini-histogram for an enumerated key field.
Besides the description here, I have also published an article discussing the practical uses of this tool -- A Generic Histogram Generator for SQL Server -- available on the DevX online magazine.
Histogram @SchemaName, @TableName, @BucketExpr, @BucketFilter, @IdExpr, @IdFilter, @IdAlias, @DateTemplate, @Mode, @LowerBoundDate, @UpperBoundDate, @Buckets, @LowerBoundInt, @UpperBoundInt, @Smooth, @Verbose, @Process where: @SchemaName ::= schema or owner name @TableName ::= table name @BucketExpr ::= field name or expression @BucketFilter ::= filter predicate for @BucketExpr @IdExpr ::= field name or expression @IdFilter ::= filter predicate for @IdExpr @IdAlias ::= column name alias for @IdExpr @Aggregate ::= aggregate expression (count, min, max) @DateTemplate ::= template calling out year, month, and/or day specifiers @Mode ::= histogram type: 'DateRange', 'DateMonths', or 'NumberRange' @LowerBoundDate ::= smallest date to include for date histograms @UpperBoundDate ::= largest date to include for date histograms @Buckets ::= number of columns for NumberRange mode @LowerBoundInt ::= smallest value to include for numerical histograms @UpperBoundInt ::= largest value to include for numerical histograms @Smooth ::= for NumberRange mode, adjust bounds to palatable values @Verbose ::= "0"-"4" for no diagnostic output or progressively more @Process ::= "1" to execute; "0" to only show generated statements
OrderTotal
field.
On the other hand, to check on whether users have created
strong passwords by examining the lengths of passwords,
you could use an expression such as len(Password)
.
'regionID like ''%America'''
.
(Note that to embed quotes you must double them.)
group by
clause of the query and the field
itself will be the first column in the result set.
If @IdExpr is a single field name and it has
a date datatype it is formatted according to @DateTemplate.
The format that you specify implicitly determines the row groups.
That is, if you specify 'yyyy-mm' then all dates
in a given month are grouped into a single row.
@IdExpr is not limited to being a field name but, like @BucketExpr,
may be an expression. If for instance you wish to iterate over
full names you might use something like
'LastName+'',''+FirstName'
.
'DATEADD(year, 1, customerCreatedDate) > getdate()'
.
Note that to embed quotes (as in a string constant) you must double them.
FullName
for the @IdAlias.
Certain aggregate functions (count, min, max) may be applied
to obtain a second-order result, best illustrated by example.
Say you have a table of ClientAccounts with a primary key of IdAccount.
Each record also has a foreign key IdClient to match one or more
accounts to each client.
If you want to create a histogram displaying the distribution
of accounts per client, you could use this:
exec histogram @TableName='ClientAccounts', @BucketExpr='count(IdAccount)',
@Aggregate='IdClient'
The @Aggregate
parameter specifies the grouping field,
just as you would use in a GROUP BY
clause.
When you use the @Aggregate parameter, the @Mode defaults to NumberRange
.
As such, you may supply any other parameters valid for that mode
except for @IdExpr
, as this always generates a summary histogram.
(See the parameter table under Parameter Requirements.)
This aggregation is currently not supported in SQL Server 2000; you must use SS2005 or later.
DateRange
--indicating the columns
will be ranges of dates--the column dates are formatted according to @DateTemplate.
Use 'yyyy' to represent the year, 'mm' the month, and 'dd' the day, separated
by either hyphens (-) or virgules (/), as in 'yyyy-mm-dd' or 'mm/yyyy'.
No other characters are allowed in @DateTemplate.
datetime
or smalldatetime
,DateRange, DateMonths, and NumberRange
@Mode is set to DateRange
.
If, on the other hand, it is some type of integer or money datatype,
@Mode is set to NumberRange
.
If @BucketExpr is not a simple field name, the system will not
be able to check the datatype so it next looks at minimum/maximum values
that you specify.
If @LowerBoundDate
or @UpperBoundDate
is non-null, @Mode is set to DateRange
,
whereas if @LowerBoundInt
or @UpperBoundInt
is non-null, @Mode is set to NumberRange
,
DateRange
this parameter
defines the lower bound of the histogram.
When unspecified, the specified table (@TableName)
will be scanned for the minimum value.
DateRange
this parameter
defines the upper bound of the histogram.
When unspecified, the specified table (@TableName)
will be scanned for the maximum value.
When @Mode is set to NumberRange
this parameter specifies
the preferred number of buckets (columns) of the histogram.
Note, however, that the actual number of buckets may be smaller due to integral rounding.
Say, for example, that you specify a range of 0-45 with 15 buckets.
The bucket size will be three, so the first bucket will hold 0-2, the second 3-5, etc.,
and this yields exactly 15 buckets.
If you change the upper bound to 50, then the bucket size is determined to be four,
yielding 0-3, 4-7, 8-11, etc., with a total of just 13 buckets (columns).
The buckets will all be the same size except for the final bucket whose size
may be equal to, greater than, or less than, the other bucket sizes.
As an example, start with a range of 1-22 with 4 buckets.
This range of 22 values yields 3 buckets holding 6 values each and one holding 4 values.
Similarly, a range of 0-25 with 5 buckets--a range of 26 values, not 25--
yields 4 buckets with 5 values each and the final bucket with 6 values.
When the final bucket size differs, the column name is postfixed
with either a minus sign or a plus sign
to indicate that it is smaller or larger, respectively.
When @Mode is set to DateRange
the number of buckets
is specified implicitly by the @DateTemplate.
In either case, the current implementation supports up to approximately
36 buckets.
NumberRange
this parameter
defines the lower bound of the histogram.
When left at null
, the specified table (@TableName)
will be scanned for the minimum value
subject to smoothing (by enabling @Smooth).
NumberRange
this parameter
defines the upper bound of the histogram.
When left at null
, the specified table (@TableName)
will be scanned for the maximum value
subject to smoothing (by enabling @Smooth).
NumberRange
,
then the upper and lower bounds are "smoothed" to nearby round lots.
For example, if the range is 105 through 279, then this is adjusted
to become 100 through 300.
The algorithm for smoothing these bounds takes into account how close
they are together
(i.e. 1013 to 1017 does not change at all since it is such a small range),
how large the upper bound is relatively to the lower's proximity to zero,
and other factors.
Parameter | DateRange | DateMonths | NumberRange |
---|---|---|---|
@SchemaName | optional | optional | optional |
@TableName | required | required | required |
@BucketExpr | required | required | required |
@BucketFilter | optional | optional | optional |
@IdExpr | optional | optional | optional (NA if @Aggregate used) |
@IdFilter | optional (and only when @IdExpr is also used) | ||
@IdAlias | optional (and only when @IdExpr is also used) | ||
@Aggregate | NA | NA | optional |
@DateTemplate | required | required if: @IdExpr is a date field | |
@Mode | required if: bounds are omitted and @BucketExpr is not a date field | required | required if: bounds are omitted and @BucketExpr is not an integer or money field |
@LowerBoundDate | optional | optional | NA |
@UpperBoundDate | optional | optional | NA |
@Buckets | optional | NA | NA |
@LowerBoundInt | NA | NA | optional |
@UpperBoundInt | NA | NA | optional |
@Smooth | NA | NA | optional |
@Verbose | optional | optional | optional |
@Process | optional | optional | optional |