Help language development. Donate to The Perl Foundation
This Raku package has data reshaping functions for different data structures that are coercible to full arrays.
The supported data structures are: - Positional-of-hashes - Positional-of-arrays
The five data reshaping provided by the package over those data structures are:
cross-tabulate
to-long-format
to-wide-format
SQL JOIN
), join-across
transpose
The first four operations are fundamental in data wrangling and data analysis; see [AA1, Wk1, Wk2, AAv1-AAv2].
(Transposing of tabular data is, of course, also fundamental, but it also can be seen as a basic functional programming operation.)
Making contingency tables -- or cross tabulation -- is a fundamental statistics and data analysis operation, [Wk1, AA1].
Here is an example using the
Titanic
dataset (that is provided by this package through the function get-titanic-dataset
):
use Data::Reshapers; my @tbl = get-titanic-dataset(); my $res = cross-tabulate( @tbl, 'passengerSex', 'passengerClass'); say $res; # {female => {1st => 144, 2nd => 106, 3rd => 216}, male => {1st => 179, 2nd => 171, 3rd => 493}} say to-pretty-table($res); # +--------+-----+-----+-----+ # | | 1st | 2nd | 3rd | # +--------+-----+-----+-----+ # | female | 144 | 106 | 216 | # | male | 179 | 171 | 493 | # +--------+-----+-----+-----+
Conversion to long format allows column names to be treated as data.
(More precisely, when converting to long format specified column names of a tabular dataset become values in a dedicated column, e.g. "Variable" in the long format.)
my @tbl1 = @tbl.roll(3); .say for @tbl1; .say for to-long-format( @tbl1 ); my @lfRes1 = to-long-format( @tbl1, 'id', [], variablesTo => "VAR", valuesTo => "VAL2" ); .say for @lfRes1;
Here we transform the long format result @lfRes1
above into wide format --
the result has the same records as the @tbl1
:
say to-pretty-table( to-wide-format( @lfRes1, 'id', 'VAR', 'VAL2' ) ); # +-------------------+----------------+--------------+--------------+-----+ # | passengerSurvival | passengerClass | passengerAge | passengerSex | id | # +-------------------+----------------+--------------+--------------+-----+ # | died | 1st | 20 | male | 308 | # | died | 2nd | 40 | female | 412 | # | survived | 2nd | 50 | female | 441 | # | died | 3rd | 20 | male | 741 | # | died | 3rd | -1 | male | 932 | # +-------------------+----------------+--------------+--------------+-----+
Using cross tabulation result above:
my $tres = transpose( $res ); say to-pretty-table($res, title => "Original"); # +--------------------------+ # | Original | # +--------+------+----------+ # | | died | survived | # +--------+------+----------+ # | female | 127 | 339 | # | male | 682 | 161 | # +--------+------+----------+ say to-pretty-table($tres, title => "Transposed"); # +--------------------------+ # | Transposed | # +----------+--------+------+ # | | female | male | # +----------+--------+------+ # | died | 127 | 682 | # | survived | 339 | 161 | # +----------+--------+------+
[X] Simpler more convenient interface.
[ ] More extensive long format tests.
[ ] More extensive wide format tests.
[ ] Implement verifications for
[X] Positional-of-hashes
[X] Positional-of-arrays
[X] Positional-of-key-to-array-pairs
[ ] Positional-of-hashes, each record of which has:
[ ] Positional-of-arrays, each record of which has:
[X] Implement "nice tabular visualization" using Pretty::Table and/or Text::Table::Simple.
[X] Document examples using pretty tables.
[X] Implement transposing operation for:
[X] Implement to-pretty-table for:
[ ] Implemented join-across:
[ ] Implement to long format conversion for:
[ ] Speed/performance profiling.
[AA1] Anton Antonov, "Contingency tables creation examples", (2016), MathematicaForPrediction at WordPress.
[Wk1] Wikipedia entry, Contingency table.
[Wk2] Wikipedia entry, Wide and narrow data.
[AAf1] Anton Antonov, CrossTabulate, (2019), Wolfram Function Repository.
[AAf2] Anton Antonov, LongFormDataset, (2020), Wolfram Function Repository.
[AAf3] Anton Antonov, WideFormDataset, (2021), Wolfram Function Repository.
[AAf4] Anton Antonov, RecordsSummary, (2019), Wolfram Function Repository.
[AAv1] Anton Antonov, "Multi-language Data-Wrangling Conversational Agent", (2020), YouTube channel of Wolfram Research, Inc.. (Wolfram Technology Conference 2020 presentation.)
[AAv2] Anton Antonov, "Data Transformation Workflows with Anton Antonov, Session #1", (2020), YouTube channel of Wolfram Research, Inc..
[AAv3] Anton Antonov, "Data Transformation Workflows with Anton Antonov, Session #2", (2020), YouTube channel of Wolfram Research, Inc..