Spaces:
Running
Running
Commit
·
318fad3
1
Parent(s):
6032e5b
improve histogram
Browse files- src/snippets/histogram.md +15 -2
src/snippets/histogram.md
CHANGED
|
@@ -7,7 +7,6 @@ code: |
|
|
| 7 |
from histogram(
|
| 8 |
table_name,
|
| 9 |
column_name,
|
| 10 |
-
bin_count := 10
|
| 11 |
)
|
| 12 |
---
|
| 13 |
|
|
@@ -27,7 +26,21 @@ from histogram(
|
|
| 27 |
|
| 28 |
- `table_name`: The name of the table or a subquery result.
|
| 29 |
- `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
|
| 30 |
-
- `bin_count`: The number of bins to use in the histogram.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
## Histogram of the length of the input persona from the `PersonaHub` dataset
|
|
|
|
| 7 |
from histogram(
|
| 8 |
table_name,
|
| 9 |
column_name,
|
|
|
|
| 10 |
)
|
| 11 |
---
|
| 12 |
|
|
|
|
| 26 |
|
| 27 |
- `table_name`: The name of the table or a subquery result.
|
| 28 |
- `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
|
| 29 |
+
- `bin_count`: The number of bins to use in the histogram. (_**Optional**_)
|
| 30 |
+
- `technique`: The binning technique to use. (_**Optional**_)
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
## Binning Techniques
|
| 34 |
+
|
| 35 |
+
| Technique | Description |
|
| 36 |
+
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 37 |
+
| `auto` | Automatically selects the best binning technique based on the data type. If the data type is not numeric or timestamp, it defaults to `sample`. For numeric or timestamp data, it defaults to `equi-width-nice`. |
|
| 38 |
+
| `sample` | Uses distinct values in the column as bins. This technique is useful when the column has a small number of distinct values. |
|
| 39 |
+
| `equi-height` | Creates bins such that each bin has approximately the same number of data points. This technique is useful for ensuring that each bin has a similar number of entries. This can be helpful for skewed distributions. |
|
| 40 |
+
| `equi-width` | Creates bins of equal width. This technique is useful for numeric data. You want each bin to cover the same range of values. |
|
| 41 |
+
| `equi-width-nice` | Creates bins of equal width with "nice" boundaries. This technique is similar to `equi-width`. It adjusts the bin boundaries to be more human-readable (e.g., rounding to the nearest whole number). |
|
| 42 |
+
|
| 43 |
+
You can find more information in the [PR](https://github.com/duckdb/duckdb/pull/12590) that added this feature.
|
| 44 |
|
| 45 |
|
| 46 |
## Histogram of the length of the input persona from the `PersonaHub` dataset
|