Spaces:

cfahlgren1
/

sql-snippets

Running

App Files Files Community

cfahlgren1 HF Staff commited on Sep 26, 2024

Commit

318fad3

1 Parent(s): 6032e5b

improve histogram

Browse files

Files changed (1) hide show

src/snippets/histogram.md +15 -2

src/snippets/histogram.md CHANGED Viewed

@@ -7,7 +7,6 @@ code: |
     from histogram(
         table_name,
         column_name,
-        bin_count := 10
     )
 ---
@@ -27,7 +26,21 @@ from histogram(
 - `table_name`: The name of the table or a subquery result.
 - `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
-- `bin_count`: The number of bins to use in the histogram.
 ## Histogram of the length of the input persona from the `PersonaHub` dataset

     from histogram(
         table_name,
         column_name,
     )
 ---
 - `table_name`: The name of the table or a subquery result.
 - `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
+- `bin_count`: The number of bins to use in the histogram. (_**Optional**_)
+- `technique`: The binning technique to use. (_**Optional**_)
+## Binning Techniques
+| Technique         | Description                                                                                                                                                                                                 |
+|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `auto`            | Automatically selects the best binning technique based on the data type. If the data type is not numeric or timestamp, it defaults to `sample`. For numeric or timestamp data, it defaults to `equi-width-nice`. |
+| `sample`          | Uses distinct values in the column as bins. This technique is useful when the column has a small number of distinct values.                                                                                   |
+| `equi-height`     | Creates bins such that each bin has approximately the same number of data points. This technique is useful for ensuring that each bin has a similar number of entries. This can be helpful for skewed distributions. |
+| `equi-width`      | Creates bins of equal width. This technique is useful for numeric data. You want each bin to cover the same range of values.                                                                                   |
+| `equi-width-nice` | Creates bins of equal width with "nice" boundaries. This technique is similar to `equi-width`. It adjusts the bin boundaries to be more human-readable (e.g., rounding to the nearest whole number).            |
+You can find more information in the [PR](https://github.com/duckdb/duckdb/pull/12590) that added this feature.
 ## Histogram of the length of the input persona from the `PersonaHub` dataset