cfahlgren1 HF staff commited on
Commit
6032e5b
·
1 Parent(s): bfe985b

add histogram

Browse files
Files changed (1) hide show
  1. src/snippets/histogram.md +48 -0
src/snippets/histogram.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ id: "duckdb-summarize"
3
+ title: "Histogram"
4
+ slug: "duckdb-histogram-query"
5
+ description: "Create a histogram for a specific column to visualize the distribution of values."
6
+ code: |
7
+ from histogram(
8
+ table_name,
9
+ column_name,
10
+ bin_count := 10
11
+ )
12
+ ---
13
+
14
+ # DuckDB Histogram
15
+
16
+ This snippet demonstrates how to use the `Histogram` function in DuckDB to calculate aggregate statistics for a dataset. The `histogram` function in DuckDB is used to compute histograms over columns of a dataset. It works for columns of any type and allows for various binning strategies and a custom number of bins.
17
+
18
+ ```sql
19
+ from histogram(
20
+ table_name,
21
+ column_name,
22
+ bin_count := 10
23
+ )
24
+ ```
25
+
26
+ ## Parameters
27
+
28
+ - `table_name`: The name of the table or a subquery result.
29
+ - `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
30
+ - `bin_count`: The number of bins to use in the histogram.
31
+
32
+
33
+ ## Histogram of the length of the input persona from the `PersonaHub` dataset
34
+
35
+ ```sql
36
+ from histogram(
37
+ instruction,
38
+ len("input persona"),
39
+ bin_count := 5
40
+ )
41
+ ```
42
+
43
+ <iframe
44
+ src="https://huggingface.co/datasets/proj-persona/PersonaHub/embed/viewer/instruction/train?sql_console=true&sql=from+histogram%28%0A++instruction%2C%0A++len%28%22input+persona%22%29%2C%0A++bin_count+%3A%3D+5%0A%29"
45
+ frameborder="0"
46
+ width="100%"
47
+ height="560px"
48
+ ></iframe>