Help language development. Donate to The Perl Foundation

Data::Summarizers zef:antononcube last updated on 2023-09-09
# Raku Data::Summarizers

[![License: Artistic-2.0](](

This Raku package has data summarizing functions for different data structures that are 
coercible to full arrays.

The supported data structures (so far) are:
  - 1D Arrays
  - 1D Lists  
  - Positional-of-hashes
  - Positional-of-arrays


## Usage examples

### Setup

Here we load the Raku modules 
and this module,

use Data::Generators;
use Data::Reshapers;
use Text::Plot;
use Data::Summarizers;

### Summarize vectors

Here we generate a numerical vector, place some NaN's or Whatever's in it:

my @vec = [^1001].roll(12);
@vec = @vec.append( [NaN, Whatever, Nil]);
@vec .= pick(@vec.elems);

Here we summarize the vector generated above:


### Summarize tabular datasets

Here we generate a random tabular dataset with 16 rows and 3 columns and display it:

my $tbl = random-tabular-dataset(16, 
                                 <Pet Ref Code>,
                                 generators=>[random-pet-name(4), -> $n { ((^20).rand xx $n).List }, random-string(6)]);

**Remark:** The values of the column "Pet" is sampled from a set of four pet names, and the values of the column
and "Code" is sampled from a set of 6 strings.

Here we summarize the tabular dataset generated above:


### Summarize collections of tabular datasets 

Here is a hash of tabular datasets:

my %group = group-by($tbl, 'Pet');{ say("{$_.key} =>"); say to-pretty-table($_.value) });

Here is the summary of that collection of datasets:


### Pareto principle statistic

Here is vector of 200 random (normally distributed) numbers:

my @vec = random-variate(, 20), 200);

Here we compute the 
[Pareto principle statistic]( 
and plot it:


### Skim




- [ ] User specified `NA` marker
- [ ] Tabular dataset summarization tests

- [ ] Skimmer

- [ ] Peek-er


## References

### Functions, repositories

[AAf1] Anton Antonov,
[Wolfram Function Repository](