recipes
0.1.2Edwin Thoen suggested adding validation checks for certain data characteristics. This fed into the existing notion of expanding recipes
beyond steps (see the non-step steps project). A new set of operations, called checks
, can now be used. These should throw an informative error when the check conditions are not met and return the existing data otherwise.
Steps now have a skip
option that will not apply preprocessing when bake
is used. See the article on skipping steps for more information.
check_missing
will validate that none of the specified variables contain missing data.step_num2factor
can be used to convert numeric data (especially integers) to factors.step_novel
adds a new factor level to nominal variables that will be used when new data contain a level that did not exist when the recipe was prepared.step_profile
can be used to generate design matrix grids for prediction profile plots of additive models where one variable is varied over a grid and all of the others are fixed at a single value.step_downsample
and step_upsample
can be used to change the number of rows in the data based on the frequency distributions of a factor variable in the training set. By default, this operation is only applied to the training set; bake
ignores this operation.step_spatialsign
now has the option of removing missing data prior to computing the norm.recipes
0.1.1bake
was changed from all_predictors()
to everything()
.verbose
option for prep
is now defaulted to FALSE
step_dummy
was fixed that makes sure that the correct binary variables are generated despite the levels or values of the incoming factor. Also, step_dummy
now requires factor inputs.step_dummy
also has a new default naming function that works better for factors. However, there is an extra argument (ordinal
) now to the functions that can be passed to step_dummy
.step_interact
now allows for selectors (e.g. all_predictors()
or starts_with("prefix")
to be used in the interaction formula.step_YeoJohnson
gained an na.rm
option.dplyr::one_of
was added to the list of selectors.step_bs
adds B-spline basis functions.step_unorder
converts ordered factors to unordered factors.step_count
counts the number of instances that a pattern exists in a string.step_string2factor
and step_factor2string
can be used to move between encodings.step_lowerimpute
is for numeric data where the values cannot be measured below a specific value. For these cases, random uniform values are used for the truncated values.step_zv
).tidy
methods were added for recipes and many (but not all) steps.bake.recipe
, the argument newdata
is now without a default.bake
and juice
can now save the final processed data set in sparse format. Note that, as the steps are processed, a non-sparse data frame is used to store the results.recipes
0.1.0First CRAN release.
prepare
to prep
per issue #59recipes
0.0.1.9003learn
has become prepare
and process
has become bake
recipes
0.0.1.9002New steps:
step_lincomb
removes variables involved in linear combinations to resolve them.step_bin2factor
)step_regex
applies a regular expression to a character or factor vector to create dummy variables.Other changes:
step_dummy
and step_interact
do a better job of respecting missing values in the data set.recipes
0.0.1.9001recipe
objects was changed so that pipes can be used to create the recipe with a formula.process.recipe
lost the role
argument in factor of a general set of selectors. If no selector is used, all the predictors are returned.