Torashii Math
Basic mathematical and statistical operations used in the model.
center_xsection(target_col, over_col, standardize=False)
Cross-sectionally center (and optionally standardize) a Polars DataFrame target_col
partitioned by over_col
.
This returns a Polars expression, so it be chained in a select
or with_columns
invocation
without needing to set a new intermediate DataFrame or materialize lazy evaluation.
Parameters
target_col: the column to be standardized over_col: the column over which standardization should be applied, cross-sectionally standardize: boolean indicating if we should also standardize the target column
Returns
Polars Expr
Source code in torashii/math.py
exp_weights(window, half_life)
Generate exponentially decaying weights over window
trailing values, decaying by half each half_life
index.
Parameters
window: integer number of points in the trailing lookback period half_life: integer decay rate
Returns
numpy array
Source code in torashii/math.py
norm_xsection(target_col, over_col, lower=0, upper=1)
Cross-sectionally normalize a Polars DataFrame target_col
partitioned by over_col
, with rescaling
to the interval [lower
, upper
].
This returns a Polars expression, so it can be chained in a select
or with_columns
invocation
without needing to set a new intermediate DataFrame or materialize lazy evaluation.
NaN values are not propagated in the max and min calculation, but NaN values are preserved for normalization.
Parameters
target_col: str name of the column to normalize over_col: str name of the column to partition the normalization by lower: lower bound of the rescaling interval, defaults to 0 to construct a percent upper: upper bound of the rescaling interval, defaults to 1 to construct a percent
Returns
Polars Expr
Source code in torashii/math.py
percentiles_xsection(target_col, over_col, lower_pct, upper_pct, fill_val=0.0)
Cross-sectionally mark all values of target_col
that fall outside the lower_pct
percentile or
upper_pct
percentile, within each over_col
group. This is essentially an anti-winsorization, suitable for
building high - low portfolios. The fill_val
is inserted to each value between the percentile cutoffs.
This returns a Polars expression, so it be chained in a select
or with_columns
invocation
without needing to set a new intermediate DataFrame or materialize lazy evaluation.
Parameters
target_col: str column name to have non-percentile thresholded values masked over_col: str column name to apply masking over, cross-sectionally lower_pct: float lower percentile under which to keep values upper_pct: float upper percentile over which to keep values fill_val: numeric value for masking
Returns
Polars Expr
Source code in torashii/math.py
winsorize(data, percentile=0.05, axis=0)
Windorize each vector of a 2D numpy array to symmetric percentiles given by percentile
.
This returns a Polars expression, not a DataFrame, so it be chained (including lazily) in
a select
or with_columns
invocation without needing to set a new intermediate DataFrame variable.
Parameters
data: numpy array containing original data to be winsorized
percentile: float indicating the percentiles to apply winsorization at
axis: int indicating which axis to apply winsorization over (i.e. orientation if dara
is 2D)
Returns
numpy array
Source code in torashii/math.py
winsorize_xsection(df, data_cols, group_col, percentile=0.05)
Cross-sectionally winsorize the data_cols
of df
, grouped on group_col
, to the symmetric percentile
given by percentile
.
Parameters
df: Polars DataFrame or LazyFrame containing feature data to winsorize
data_cols: collection of strings indicating the columns of df
to be winsorized
group_col: str column of df
to use as the cross-sectional group
percentile: float value indicating the symmetric winsorization threshold
Returns
Polars DataFrame or LazyFrame