Help language development. Donate to The Perl Foundation
Top level raku Data ANalysis Module that provides a base set of raku-style datatype roles, accessors & methods, primarily: - DataSlices - Series - DataFrames
A common basis for bindings such as ... Dan::Pandas (via Inline::Python), Dan::Polars(tbd) (via NativeCall / Rust FFI), etc.
It's rather a zen concept since raku contains many Data Analysis constructs & concepts natively anyway (see note 7 below)
Contributions via PR are very welcome - please see the backlog Issue, or just email [email protected] to share ideas!
more examples in bin/synopsis.raku
### Series ### my \s = Series.new( [b=>1, a=>0, c=>2] ); #from Array of Pairs # -or- Series.new( [rand xx 5], index => <a b c d e>); # -or- Series.new( data => [1, 3, 5, NaN, 6, 8], index => <a b c d e f>, name => 'john' ); say ~s; # Accessors say s[1]; #2 (positional) say s<b c>; #2 1 (associative with slice) # Map/Reduce say s.map(*+2); #(3 2 4) say [+] s; #3 # Hyper say s >>+>> 2; #(3 2 4) say s >>+<< s; #(2 0 4) # Update s.data[1] = 1; # set value s.splice(1,2,(j=>3)); # update index & value # Combine my \t = Series.new( [f=>1, e=>0, d=>2] ); s.concat: t; # concatenate say "============================================="; ### DataFrames ### my \dates = (Date.new("2022-01-01"), *+1 ... *); my \df = DataFrame.new( [[rand xx 4] xx 6], index => dates, columns => <A B C D> ); # -or- DataFrame.new( [rand xx 5], columns => <A B C D>); # -or- DataFrame.new( [rand xx 5] ); say ~df; say "---------------------------------------------"; # Data Accessors [row;col] say df[0;0]; df[0;0] = 3; # set value # Cascading Accessors (ok to mix Positional and Associative) say df[0][0]; say df[0]<A>; say df{"2022-01-03"}[1]; # Object Accessors & Slices (see note 1) say ~df[0]; # 1d Row 0 (DataSlice) say ~df[*]<A>; # 1d Col A (Series) say ~df[0..*-2][1..*-1]; # 2d DataFrame say ~df{dates[0..1]}^; # the ^ postfix converts an Array of DataSlices into a new DataFrame say "---------------------------------------------"; ### DataFrame Operations ### # 2d Map/Reduce say df.map(*.map(*+2).eager); say [+] df[*;1]; say [+] df[*;*]; # Hyper say df >>+>> 2; say df >>+<< df; # Transpose say ~df.T; # Describe say ~df[0..^3]^; # head say ~df[(*-3..*-1)]^; # tail say ~df.shape; say ~df.describe; # Sort say ~df.sort: { .[1] }; # sort by 2nd col (ascending) say ~df.sort: { -.[1] }; # sort by 2nd col (descending) say ~df.sort: { df[$++]<C> }; # sort by col C say ~df.sort: { df.ix[$++] }; # sort by index # Grep (binary filter) say ~df.grep( { .[1] < 0.5 } ); # by 2nd column say ~df.grep( { df.ix[$++] eq <2022-01-02 2022-01-06>.any } ); # by index (multiple) say "---------------------------------------------"; my \df2 = DataFrame.new([ A => 1.0, B => Date.new("2022-01-01"), C => Series.new(1, index => [0..^4], dtype => Num), D => [3 xx 4], E => Categorical.new(<test train test train>), F => "foo", ]); say ~df2; say df2.data; say df2.dtypes; say df2.index; #Hash (name => row number) -or- df.ix; #Array say df2.columns; #Hash (label => col number) -or- df.cx; #Array say "---------------------------------------------"; ### DataFrame Splicing ### (see notes 2 & 3) # row-wise splice: my $ds = df2[1]; # get a DataSlice $ds.splice($ds.index<d>,1,7); # tweak it a bit df2.splice( 1, 2, [j => $ds] ); # default # column-wise splice: my $se = df2.series: <a>; # get a Series $se.splice(2,1,7); # tweak it a bit df2.splice( :ax, 1, 2, [K => $se] ); # axis => 1 say "---------------------------------------------"; ### DataFrame Concatenation ### (see notes 4 & 5) my \dfa = DataFrame.new( [['a', 1], ['b', 2]], columns => <letter number>, ); #`[ letter number 0 a 1 1 b 2 #] my \dfc = DataFrame.new( [['c', 3, 'cat'], ['d', 4, 'dog']], columns => <animal letter number>, ); #`[ letter number animal 0 c 3 cat 1 d 4 dog #] dfa.concat: dfc; # row-wise / outer join is default #`[ letter number animal 0 a 1 NaN 1 b 2 NaN 0⋅1 c 3 cat 1⋅1 d 4 dog #] dfa.concat: dfc, join => 'inner'; #`[ letter number 0 a 1 1 b 2 0⋅1 c 3 1⋅1 d 4 #] my \dfd = DataFrame.new( [['bird', 'polly'], ['monkey', 'george']], columns=> <animal name>, ); dfb.concat: dfd, axis => 1; #column-wise #`[ letter number animal name 0 a 1 bird polly 1 b 2 monkey george #] say "=============================================";
Notes:
[1] raku accessors may use any function that makes a List, e.g.
Positional slices: [1,3,4], [0..3], [0..*-2], [*]
Associative slices: <A C D>, {'A'..'C'}
viz. https://docs.raku.org/language/subscripts
[2] splice is the core update method
for all add, drop, move, delete, update & insert operations
viz. https://docs.raku.org/routine/splice
[3] named parameter 'axis' indicates if row(0) or col(1)
if omitted, default=0 (row) / 'ax' is an alias
use a Pair literal like :!axis, :axis(1) or :ax
[4] concat is the core combine method
for all join, merge & combine operations
duplicate labels are extended with $mark ~ $i++
# $mark = '⋅'; # unicode Dot Operator U+22C5
use :ii (:ignore-index)
to reset the index (row or col)
[5] concat supports join => outer|inner|right|left
unknown values are set to NaN
default is outer, :jn is alias, and you can go :jn
set axis param (see splice above) for col-wise concatenation
[6] relies on hypers instead of overriding dyadic operators [+-*/]
say ~my \quants = Series.new([100, 15, 50, 15, 25]); say ~my \prices = Series.new([1.1, 4.3, 2.2, 7.41, 2.89]); say ~my \costs = Series.new( quants >>*<< prices );
[7] what are we getting from raku core that others do in libraries? - pipes & maps - multi-dimensional arrays - slicing & indexing - references & views - map, reduce, hyper operators - operator overloading - concurrency - types (incl. NaN)