Help language development. Donate to The Perl Foundation

## Data::Summarizers cpan:ANTONOV last updated on 2022-05-30

Data-Summarizers-0.2.0/

# Raku Data::Summarizers

This Raku package has data summarizing functions for different data structures that are coercible to full arrays.

The supported data structures (so far) are: - 1D Arrays - 1D Lists
- Positional-of-hashes - Positional-of-arrays

## Usage examples

### Setup

Here we load the Raku modules `Data::Generators`, `Data::Reshapers` and this module, `Data::Summarizers`:

```use Data::Generators;
use Data::Reshapers;
use Data::Summarizers;
```
```# (Any)
```

### Summarize vectors

Here we generate a numerical vector, place some NaN's or Whatever's in it:

```my @vec = .roll(12);
@vec = @vec.append( [NaN, Whatever, Nil]);
@vec .= pick(@vec.elems);
@vec
```
```# [740 311 434 300 (Whatever) 192 705 202 576 561 544 NaN (Any) 744 133]
```

Here we summarize the vector generated above:

```records-summary(@vec)
```
```# O────────────────────────────────────O
# │ numerical                          │
# O────────────────────────────────────O
# │ 1st-Qu                    => 251   │
# │ Max                       => 744   │
# │ Median                    => 489   │
# │ (Any-Nan-Nil-or-Whatever) => 3     │
# │ Mean                      => 453.5 │
# │ Min                       => 133   │
# │ 3rd-Qu                    => 640.5 │
# O────────────────────────────────────O
```

### Summarize tabular datasets

Here we generate a random tabular dataset with 16 rows and 3 columns and display it:

```srand(32);
my \$tbl = random-tabular-dataset(16,
<Pet Ref Code>,
generators=>[random-pet-name(4), -> \$n { ((^20).rand xx \$n).List }, random-string(6)]);
to-pretty-table(\$tbl)
```
```# O────────────────O───────────O──────────O
# │      Code      │    Ref    │   Pet    │
# O────────────────O───────────O──────────O
# │ A2Ue69EWAMtJCi │  0.050176 │ Guinness │
# │ KNwmt0QmoqABwR │  0.731900 │ Truffle  │
# │ A2Ue69EWAMtJCi │  0.739763 │  Jumba   │
# │       aY       │  7.342107 │ Guinness │
# │ xgZjtSP6VrKbH  │ 19.868591 │  Jumba   │
# │    20CO9FGD    │ 12.956172 │  Jumba   │
# │    20CO9FGD    │ 15.854088 │ Guinness │
# │ A2Ue69EWAMtJCi │  4.774780 │ Guinness │
# │ A2Ue69EWAMtJCi │ 18.729798 │ Guinness │
# │ xgZjtSP6VrKbH  │ 13.383997 │ Guinness │
# │       aY       │  9.837488 │  Jumba   │
# │    20CO9FGD    │  2.912506 │ Truffle  │
# │ xgZjtSP6VrKbH  │ 11.782221 │ Truffle  │
# │ KNwmt0QmoqABwR │  9.825102 │ Truffle  │
# │ xgZjtSP6VrKbH  │ 16.277717 │  Jumba   │
# │ CQmrQcQ4YkXvaD │  1.740695 │ Guinness │
# O────────────────O───────────O──────────O
```

Remark: The values of the column "Pet" is sampled from a set of four pet names, and the values of the column and "Code" is sampled from a set of 6 strings.

Here we summarize the tabular dataset generated above:

```records-summary(\$tbl)
```
```# O───────────────O──────────────────────────────O─────────────────────O
# │ Pet           │ Ref                          │ Code                │
# O───────────────O──────────────────────────────O─────────────────────O
# │ Guinness => 7 │ Min    => 0.0501758995572299 │ xgZjtSP6VrKbH  => 4 │
# │ Jumba    => 5 │ 1st-Qu => 2.3266005718178704 │ A2Ue69EWAMtJCi => 4 │
# │ Truffle  => 4 │ Mean   => 9.175443804770861  │ 20CO9FGD       => 3 │
# │               │ Median => 9.831294839627123  │ KNwmt0QmoqABwR => 2 │
# │               │ 3rd-Qu => 14.619042446877677 │ aY             => 2 │
# │               │ Max    => 19.868590809216744 │ CQmrQcQ4YkXvaD => 1 │
# O───────────────O──────────────────────────────O─────────────────────O
```

### Summarize collections of tabular datasets

Here is a hash of tabular datasets:

```my %group = group-by(\$tbl, 'Pet');

%group.pairs.map({ say("{\$_.key} =>"); say to-pretty-table(\$_.value) });
```
```# Guinness =>
# O────────────────O───────────O──────────O
# │      Code      │    Ref    │   Pet    │
# O────────────────O───────────O──────────O
# │ A2Ue69EWAMtJCi │  0.050176 │ Guinness │
# │       aY       │  7.342107 │ Guinness │
# │    20CO9FGD    │ 15.854088 │ Guinness │
# │ A2Ue69EWAMtJCi │  4.774780 │ Guinness │
# │ A2Ue69EWAMtJCi │ 18.729798 │ Guinness │
# │ xgZjtSP6VrKbH  │ 13.383997 │ Guinness │
# │ CQmrQcQ4YkXvaD │  1.740695 │ Guinness │
# O────────────────O───────────O──────────O
# Truffle =>
# O─────────O───────────O────────────────O
# │   Pet   │    Ref    │      Code      │
# O─────────O───────────O────────────────O
# │ Truffle │  0.731900 │ KNwmt0QmoqABwR │
# │ Truffle │  2.912506 │    20CO9FGD    │
# │ Truffle │ 11.782221 │ xgZjtSP6VrKbH  │
# │ Truffle │  9.825102 │ KNwmt0QmoqABwR │
# O─────────O───────────O────────────────O
# Jumba =>
# O───────────O────────────────O───────O
# │    Ref    │      Code      │  Pet  │
# O───────────O────────────────O───────O
# │  0.739763 │ A2Ue69EWAMtJCi │ Jumba │
# │ 19.868591 │ xgZjtSP6VrKbH  │ Jumba │
# │ 12.956172 │    20CO9FGD    │ Jumba │
# │  9.837488 │       aY       │ Jumba │
# │ 16.277717 │ xgZjtSP6VrKbH  │ Jumba │
# O───────────O────────────────O───────O
```

Here is the summary of that collection of datasets:

```records-summary(%group)
```
```# summary of Guinness =>
# O──────────────────────────────O─────────────────────O───────────────O
# │ Ref                          │ Code                │ Pet           │
# O──────────────────────────────O─────────────────────O───────────────O
# │ Min    => 0.0501758995572299 │ A2Ue69EWAMtJCi => 3 │ Guinness => 7 │
# │ 1st-Qu => 1.7406953436440742 │ CQmrQcQ4YkXvaD => 1 │               │
# │ Mean   => 8.839377375678543  │ 20CO9FGD       => 1 │               │
# │ Median => 7.34210706081909   │ xgZjtSP6VrKbH  => 1 │               │
# │ 3rd-Qu => 15.854088005472917 │ aY             => 1 │               │
# │ Max    => 18.72979803423013  │                     │               │
# O──────────────────────────────O─────────────────────O───────────────O
# summary of Truffle =>
# O──────────────O──────────────────────────────O─────────────────────O
# │ Pet          │ Ref                          │ Code                │
# O──────────────O──────────────────────────────O─────────────────────O
# │ Truffle => 4 │ Min    => 0.7318998724597869 │ KNwmt0QmoqABwR => 2 │
# │              │ 1st-Qu => 1.822202836225727  │ 20CO9FGD       => 1 │
# │              │ Mean   => 6.312932174017679  │ xgZjtSP6VrKbH  => 1 │
# │              │ Median => 6.368803873269801  │                     │
# │              │ 3rd-Qu => 10.803661511809633 │                     │
# │              │ Max    => 11.782221077071329 │                     │
# O──────────────O──────────────────────────────O─────────────────────O
# summary of Jumba =>
# O──────────────────────────────O────────────O─────────────────────O
# │ Ref                          │ Pet        │ Code                │
# O──────────────────────────────O────────────O─────────────────────O
# │ Min    => 0.7397628145038704 │ Jumba => 5 │ xgZjtSP6VrKbH  => 2 │
# │ 1st-Qu => 5.28862527360509   │            │ 20CO9FGD       => 1 │
# │ Mean   => 11.935946110102654 │            │ A2Ue69EWAMtJCi => 1 │
# │ Median => 12.956171789492936 │            │ aY             => 1 │
# │ 3rd-Qu => 18.073154106905072 │            │                     │
# │ Max    => 19.868590809216744 │            │                     │
# O──────────────────────────────O────────────O─────────────────────O
```

TBD...

## TODO

• [ ] User specified `NA` marker

• [ ] Tabular dataset summarization tests

• [ ] Skimmer

• [ ] Peek-er

## References

### Functions, repositories

[AAf1] Anton Antonov, RecordsSummary, (2019), Wolfram Function Repository.