core

config

class flatiron.core.config.BaseConfig(**data)[source]

Bases: BaseModel

_abc_impl = <_abc._abc_data object>

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class flatiron.core.config.CallbacksConfig(**data)[source]

Bases: BaseConfig

Configuration for callbacks.

See: https://thenewflesh.github.io/flatiron/core.html#module-flatiron.core.tools See: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint

project

Name of project.

Type:: str

root

Tensorboard parent directory. Default: /mnt/storage.

Type:: str or Path

monitor

Metric to monitor. Default: ‘val_loss’.

Type:: str, optional

verbose

Log callback actions. Default: 0.

Type:: int, optional

save_best_only

Save only best model. Default: False.

Type:: bool, optional

mode

Overwrite best model via mode(old metric, new metric). Options: [auto, min, max]. Default: ‘auto’.

Type:: str, optional

save_weights_only

Only save model weights. Default: False.

Type:: bool, optional

save_freq

Save after each epoch or N batches. Options: ‘epoch’ or int. Default: ‘epoch’.

Type:: union, optional

initial_value_threshold

Initial best value of metric. Default: None.

Type:: float, optional

_abc_impl = <_abc._abc_data object>

initial_value_threshold: Optional[float]

mode: Annotated[str]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

monitor: str

project: str

root: str

save_best_only: bool

save_freq: Union[str, int]

save_weights_only: bool

verbose: int

class flatiron.core.config.DatasetConfig(**data)[source]

Bases: BaseConfig

Configuration for Dataset.

See: https://thenewflesh.github.io/flatiron/core.html#module-flatiron.core.dataset

source

Dataset directory or CSV filepath.

Type:: str

ext_regex

File extension pattern. Default: ‘npy|exr|png|jpeg|jpg|tiff’.

Type:: str, optional

labels

Label channels. Default: None.

Type:: object, optional

label_axis

Label axis. Default: -1.

Type:: int, optional

test_size

Test set size as a proportion. Default: 0.2.

Type:: float, optional

limit

Limit data by number of samples. Default: None.

Type:: str or int

reshape

Reshape concatenated data to incorpate frames as the first dimension: (FRAME, …). Analogous to the first dimension being batch. Default: True.

Type:: bool, optional

shuffle

Randomize data before splitting. Default: True.

Type:: bool, optional

seed

Shuffle seed number. Default: None.

Type:: int, optional

_abc_impl = <_abc._abc_data object>

ext_regex: str

label_axis: int

labels: Union[int, str, list[int], list[str], None]

limit: Optional[Annotated[int]]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

reshape: bool

seed: Optional[int]

shuffle: bool

source: str

test_size: Optional[Annotated[float]]

class flatiron.core.config.FrameworkConfig(**data)[source]

Bases: BaseModel

_abc_impl = <_abc._abc_data object>

device: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: Annotated[str]

class flatiron.core.config.LoggerConfig(**data)[source]

Bases: BaseConfig

Configuration for logger.

See: https://thenewflesh.github.io/flatiron/core.html#module-flatiron.core.logging

slack_channel

Slack channel name. Default: None.

Type:: str, optional

slack_url

Slack URL name. Default: None.

Type:: str, optional

slack_methods

Pipeline methods to be logged to Slack. Default: [load, compile, train].

Type:: list[str], optional

timezone

Timezone. Default: UTC.

Type:: str, optional

level

Log level. Default: warn.

Type:: str or int, optional

_abc_impl = <_abc._abc_data object>

classmethod _validate_slack_methods(value)[source]

level: str

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

slack_channel: Optional[str]

slack_methods: list[str]

slack_url: Optional[str]

timezone: str

class flatiron.core.config.LossConfig(**data)[source]

Bases: BaseModel

Configuration for loss.

name

Name of loss. Default=’MeanSquaredError’.

Type:: string, optional

_abc_impl = <_abc._abc_data object>

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

class flatiron.core.config.OptimizerConfig(**data)[source]

Bases: BaseModel

Configuration for optimizer.

See: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer

name

Name of optimizer. Default=’SGD’.

Type:: string, optional

_abc_impl = <_abc._abc_data object>

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

class flatiron.core.config.PipelineConfig(**data)[source]

Bases: BaseConfig

Configuration for PipelineBase classes.

See: https://thenewflesh.github.io/flatiron/core.html#module-flatiron.core.pipeline

framework

Deep learning framework config.

Type:: dict

dataset

Dataset configuration.

Type:: dict

optimizer

Optimizer configuration.

Type:: dict

loss

Loss configuration.

Type:: dict

metrics

Metric dicts. Default=[dict(name=’Mean’)].

Type:: list[dict], optional

compile

Compile configuration.

Type:: dict

callbacks

Callbacks configuration.

Type:: dict

logger

Logger configuration.

Type:: dict

train

Train configuration.

Type:: dict

_abc_impl = <_abc._abc_data object>

classmethod _validate_metrics(items)[source]

callbacks: CallbacksConfig

dataset: DatasetConfig

framework: FrameworkConfig

logger: LoggerConfig

loss: LossConfig

metrics: list[dict[str, Any]]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

optimizer: OptimizerConfig

train: TrainConfig

class flatiron.core.config.TrainConfig(**data)[source]

Bases: BaseConfig

Configuration for calls to model train function.

See: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

batch_size

Number of samples per update. Default: 32.

Type:: int, optional

epochs

Number of epochs to train model. Default: 30.

Type:: int, optional

verbose

Verbosity of model logging. Options: ‘auto’, 0, 1, 2. 0 is silent. 1 is progress bar. 2 is one line per epoch. Auto is usually 1. Default: auto.

Type:: str or int, optional

validation_split

Fraction of training data to use for validation. Default: 0.

Type:: float, optional

seed

Seed value. Default: 42.

Type:: int, optional

shuffle

Shuffle training data per epoch. Default: True.

Type:: bool, optional

initial_epoch

Epoch at which to start training (useful for resuming a previous training run). Default: 1.

Type:: int, optional

validation_freq

Number of training epochs before new validation. Default: 1.

Type:: int, optional

_abc_impl = <_abc._abc_data object>

batch_size: int

epochs: int

initial_epoch: int

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

seed: int

shuffle: bool

validation_freq: int

validation_split: float

verbose: Union[str, int]

dataset

class flatiron.core.dataset.Dataset(info, ext_regex='npy|exr|png|jpeg|jpg|tiff', calc_file_size=True, labels=None, label_axis=-1)[source]

Bases: object

__getitem(frame)

Get data by frame. Thisi is needed to avoid recursion errors when overloading __getitem__.

Raises:: IndexError – If frame is missing or multiple frames were found.
Returns:: Data of given frame.
Return type:: object

__init__(info, ext_regex='npy|exr|png|jpeg|jpg|tiff', calc_file_size=True, labels=None, label_axis=-1)[source]

Construct a Dataset instance. If labels is an integer it will assumed to be an axis which the data will be split upon.

Parameters:

info (pd.DataFrame) – Info DataFrame.
ext_regex (str, optional) – File extension pattern. Default: ‘npy|exr|png|jpeg|jpg|tiff’.
calc_file_size (bool, optional) – Calculate file size in GB. Default: True.
labels (object, optional) – Label channels. Default: None.
label_axis (int, optional) – Label axis. Default: -1.

Raises:

EnforceError – If info is not an instance of DataFrame.
EnforceError – If required columns not found in info.

static _get_stats(info)[source]

Creates table of statistics from given info DataFrame.

Parameters:: info (pd.DataFrame) – Info DataFrame.
Returns:: Stats DataFrame.
Return type:: pd.DataFrame

_read_file(filepath)[source]

Read given file.

Parameters:: filepath (str) – Filepath.
Raises:: IOError – If extension is not supported.
Returns:: File content.
Return type:: object

_read_file_as_array(filepath)[source]

Read file as numpy array.

Parameters:: filepath (str) – Filepath.
Returns:: Array.
Return type:: np.ndarray

static _resolve_limit(limit)[source]

Resolves a given limit into a number of samples and limit type.

Parameters:: limit (str, int, None) – Limit descriptor.
Returns:: Number of samples and limit type.
Return type:: tuple[int, str]

property asset_name: str: Returns: str: Asset name of Dataset.

property asset_path: str: Returns: str: Asset path of Dataset.

property filepaths: list[str]: Returns: list[str]: Filepaths sorted by frame.

get_arrays(frame)[source]

Get data and convert into numpy arrays according to labels.

Parameters:: frame (int) – Frame.
Raises:: IndexError – If frame is missing or multiple frames were found.
Returns:: List of arrays from the given frame.
Return type:: list[np.ndarray]

get_filepath(frame)[source]

Get filepath of given frame.

Raises:: IndexError – If frame is missing or multiple frames were found.
Returns:: Filepath of given frame.
Return type:: str

property info: DataFrame: Returns: DataFrame: Copy of info DataFrame.

load(limit=None, shuffle=False, reshape=True)[source]

Load data from files.

Parameters:

limit (str or int, optional) – Limit data by number of samples or memory size. Default: None.
shuffle (bool, optional) – Shuffle frames before loading. Default: False.
reshape (bool, optional) – Reshape concatenated data to incorpate frames as the first dimension: (FRAME, …). Analogous to the first dimension being batch. Default: True.

Returns:

self.

Return type:

Dataset

classmethod read_csv(filepath, **kwargs)[source]

Construct Dataset instance from given csv filepath.

Parameters:: filepath (str or Path) – Info CSV filepath.
Raises:: EnforceError – If filepath does not exist or is not a CSV.
Returns:: Dataset instance.
Return type:: Dataset

classmethod read_directory(directory, **kwargs)[source]

Construct dataset from directory.

Parameters:

directory (str or Path) – Dataset directory.

Raises:

EnforceError – If directory does not exist.
EnforceError – If more or less than 1 CSV file found in directory.

Returns:

Dataset instance.

Return type:

Dataset

property stats: DataFrame

Generates a table of statistics of info data.

Metrics include:

min
max
mean
std
loaded
total

Units include:

gb
frame
sample

Returns:: Table of statistics.
Return type:: DataFrame

train_test_split(test_size=0.2, limit=None, shuffle=True, seed=None)[source]

Split into train and test Datasets.

Parameters:

test_size (float, optional) – Test set size as a proportion. Default: 0.2.
limit (int, optional) – Limit the total length of train and test. Default: None.
shuffle (bool, optional) – Randomize data before splitting. Default: True.
seed (float, optional) – Seed number between 0 and 1. Default: None.

Returns:

Train Dataset, Test Dataset.

Return type:

tuple[Dataset]

unload()[source]

Delete self.data and reset self.info.

Returns:: self.
Return type:: Dataset

xy_split()[source]

Split data into x and y arrays, according to self.labels as the split index and self.label_axis as the split axis.

Raises:

EnforceError – If data has not been loaded.
EnforceError – If self.labels is not a list of a single integer.

Returns:

x and y arrays.

Return type:

tuple[np.ndarray]

logging

class flatiron.core.logging.SlackLogger(message, config, slack_channel=None, slack_url=None, timezone='UTC', level='warn', **kwargs)[source]

Bases: LogRuntime

SlackLogger is a class for logging information to stdout and Slack.

__init__(message, config, slack_channel=None, slack_url=None, timezone='UTC', level='warn', **kwargs)[source]

SlackLogger is a class for logging information to stdout and Slack.

If slack_url and slack_channel are specified, SlackLogger will attempt to log custom formatted output to Slack.

Parameters:

message (str) – Log message or Slack title.
config (dict) – Config dict.
slack_channel (str, optional) – Slack channel name. Default: None.
slack_url (str, optional) – Slack URL name. Default: None.
timezone (str, optional) – Timezone. Default: UTC.
level (str or int, optional) – Log level. Default: warn.
**kwargs (optional) – LogRuntime kwargs.

multidataset

class flatiron.core.multidataset.MultiDataset(datasets)[source]

Bases: object

This class combines a dictionary of Dataset instances into a single dataset. Datasets are merged by frame.

__init__(datasets)[source]

Constructs a MultiDataset instance.

Parameters:: datasets (dict[str, Dataset]) – Dictionary of Dataset instances.

get_arrays(frame)[source]

For each dataset, get data and convert into numpy arrays according to labels.

Parameters:: frame (int) – Frame.
Raises:: IndexError – If frame is missing or multiple frames were found.
Returns:: Dict where values are lists of arrays from the given frame.
Return type:: dict

get_filepaths(frame)[source]

For each dataset, get filepath of given frame.

Returns:: Dict where values are filepaths of the given frame.
Return type:: dict

property info: DataFrame: Returns: DataFrame: Copy of info DataFrame.

load(limit=None, reshape=True)[source]

For each dataset, load data from files.

Parameters:

limit (str or int, optional) – Limit data by number of samples or memory size. Default: None.
reshape (bool, optional) – Reshape concatenated data to incorpate frames as the first dimension: (FRAME, …). Analogous to the first dimension being batch. Default: True.

Returns:

self.

Return type:

MultiDataset

train_test_split(test_size=0.2, limit=None, shuffle=True, seed=None)[source]

Split into train and test MultiDatasets.

Parameters:

test_size (float, optional) – Test set size as a proportion. Default: 0.2.
limit (int, optional) – Limit the total length of train and test. Default: None.
shuffle (bool, optional) – Randomize data before splitting. Default: True.
seed (float, optional) – Seed number between 0 and 1. Default: None.

Returns:

Train MultiDataset, Test MultiDataset.

Return type:

tuple[MultiDataset]

unload()[source]

For each dataset, delete self.data and reset self.info.

Returns:: self.
Return type:: MultiDataset

xy_split()[source]

For each dataset, split data into x and y arrays, according to self.labels as the split index and self.label_axis as the split axis.

Raises:

EnforceError – If data has not been loaded.
EnforceError – If self.labels is not a list of a single integer.

Returns:

Dict where values are x and y arrays.

Return type:

dict

pipeline

class flatiron.core.pipeline.PipelineBase(config)[source]

Bases: ABC

__init__(config)[source]

PipelineBase is a base class for machine learning pipelines.

Parameters:: config (dict) – PipelineBase config.

_abc_impl = <_abc._abc_data object>

property _engine: Any

Uses config to retrieve flatiron engine subpackage.

Returns:: flatiron.tf or flatiron.torch
Return type:: Any

_logger(method, message, config)[source]

Retreives a logger given a message, config and slack flag.

Parameters:

method (str) – Name of method calling logger.
message (str) – Log message or Slack title.
config (dict) – Config dict.

Returns:

Configured logger instance.

Return type:

ficl.SlackLogger

build()[source]

Build machine learning model and assign it to self.model. Calls self.model_func with model params.

Returns:: Self.
Return type:: PipelineBase

compile()[source]

Sets self._compiled to a dictionary of compiled objects.

Returns:: Self.
Return type:: PipelineBase

classmethod from_string(text)[source]

Construct PipelineBase instance from given YAML text.

Parameters:: text (str) – YAML text.
Returns:: PipelineBase instance.
Return type:: PipelineBase

classmethod generate_config(framework='torch', project='project-name', callback_root='/tensorboard/parent/dir', dataset='/mnt/data/dataset', optimizer='SGD', loss='CrossEntropyLoss', metrics=['MeanMetric'])[source]

Prints a generated pipeline config based on given parameters.

Parameters:

framework (str) – Framework name. Default: torch.
project (str) – Project name. Default: project-name.
callback_root (str) – Callback root path. Default: /tensorboard/parent/dir.
dataset (str) – Dataset path. Default: /mnt/data/dataset.
optimizer (str) – Optimizer name. Default: SGD.
loss (str) – Loss name. Default: CrossEntropyLoss.
metrics (list[str]) – Metric names. Default: [‘MeanMetric’].

Return type:

None

load()[source]

Loads train and test datasets into memory. Calls load on self._train_data and self._test_data.

Raises:: RuntimeError – If train and test data are not datasets.
Returns:: Self.
Return type:: PipelineBase

abstract model_config()[source]

Subclasses of PipelineBase will need to define a config class for models created in the build method.

Returns:: Pydantic BaseModel config class.
Return type:: BaseModel

abstract model_func()[source]

Subclasses of PipelineBase need to define a function that builds and returns a machine learning model.

Returns:: Machine learning model.
Return type:: object

classmethod read_yaml(filepath)[source]

Construct PipelineBase instance from given yaml file.

Parameters:: filepath (str or Path) – YAML file.
Returns:: PipelineBase instance.
Return type:: PipelineBase

run()[source]

Run the following pipeline operations:

build
compile
train_test_split
load (for tensorflow only)
train

Returns:: Self.
Return type:: PipelineBase

train()[source]

Call model train function with params.

Returns:: Self.
Return type:: PipelineBase

train_test_split()[source]

Split dataset into train and test sets.

Assigns the following instance members:

_train_data

_test_data

Returns:: Self.
Return type:: PipelineBase

unload()[source]

Unload train and test datasets from memory. Calls unload on self._train_data and self._test_data.

Raises:

RuntimeError – If train and test data are not datasets.
RuntimeError – If train and test data are not loaded.

Returns:

Self.

Return type:

PipelineBase

resolve

flatiron.core.resolve._generate_config(framework='torch', project='project-name', callback_root='/tensorboard/parent/dir', dataset='/mnt/data/dataset', optimizer='SGD', loss='CrossEntropyLoss', metrics=['MeanMetric'])[source]

Generate a pipeline config based on given parameters.

Parameters:

framework (str) – Framework name. Default: torch.
project (str) – Project name. Default: project-name.
callback_root (str) – Callback root path. Default: /tensorboard/parent/dir.
dataset (str) – Dataset path. Default: /mnt/data/dataset.
optimizer (str) – Optimizer name. Default: SGD.
loss (str) – Loss name. Default: CrossEntropyLoss.
metrics (list[str]) – Metric names. Default: [‘MeanMetric’].

Returns:

Generated config.

Return type:

dict

flatiron.core.resolve._resolve_field(config, field)[source]

Resolve and validate given pipeline config field.

Parameters:

config (dict) – Pipeline config.
field (str) – Config field name.

Returns:

Updated pipeline config.

Return type:

dict

flatiron.core.resolve._resolve_model(config, model)[source]

Resolve and validate given model config.

Parameters:

config (dict) – Model config.
model (BaseModel) – Model config class.

Returns:

Validated model config.

Return type:

dict

flatiron.core.resolve._resolve_pipeline(config)[source]

Resolve and validate given pipeline config.

Parameters:: config (dict) – Pipeline config.
Returns:: Validated pipeline config.
Return type:: dict

flatiron.core.resolve._resolve_subconfig(subconfig, class_prefix, prepend, config_module, other_module)[source]

For use in _resolve_field. Resolves and validates given subconfig. If class is not custom definition found in config module or other module, a standard definition will be resolved from config module. class prefix and prepend are used to modify the config name field in order to make it a valid class name.

Parameters:

subconfig (dict) – Subconfig.
class_prefix (str) – Class prefix.
prepend (bool) – Prepend class prefix.
config_module (str) – Module name.
other_module (str) – Module name.

Returns:

Validated subconfig.

Return type:

dict

flatiron.core.resolve.resolve_config(config, model)[source]

Resolves given Pipeline config. Config fields include:

framework

model

dataset

optimizer

loss

metrics

callbacks

train

logger

Parameters:

config (dict) – Config dict.
model (BaseModel) – Model config class.

Returns:

Resolved config.

Return type:

dict

tools

flatiron.core.tools.enforce_callbacks(log_directory, checkpoint_pattern)[source]

Enforces callback parameters.

Parameters:

log_directory (str or Path) – Tensorboard project log directory.
checkpoint_pattern (str) – Filepath pattern for checkpoint callback.

Raises:

EnforceError – If log directory does not exist.
EnforeError – If checkpoint pattern does not contain ‘{epoch}’.

Return type:

None

flatiron.core.tools.enforce_getter(value)[source]

Enforces value is a dict with a name key.

Parameters:: value (dict) – Dict..
Raises:: EnforceError – Is not a dict with a name key.
Return type:: None

flatiron.core.tools.get_module(name)[source]

Get a module from a given name.

Parameters:: name (str) – Module name.
Raises:: NotImplementedError – If module is not found.
Returns:: Module.
Return type:: object

flatiron.core.tools.get_module_class(name, module)[source]

Get a class from a given module.

Parameters:

name (str) – Class name.
module (str) – Module name.

Raises:

NotImplementedError – If class is not found in module.

Returns:

Module class.

Return type:

class

flatiron.core.tools.get_module_function(name, module)[source]

Get a function from a given module.

Parameters:

name (str) – Function name.
module (str) – Module name.

Raises:

NotImplementedError – If function is not found in module.

Returns:

Module function.

Return type:

function

flatiron.core.tools.get_tensorboard_project(project, root='/mnt/storage', timezone='UTC', extension='keras')[source]

Creates directory structure for Tensorboard project.

Parameters:

project (str) – Name of project.
root (str or Path) – Tensorboard parent directory. Default: /mnt/storage
timezone (str, optional) – Timezone. Default: UTC.
extension (str, optional) – File extension. Options: [keras, safetensors]. Default: keras.

Raises:

EnforceError – If extension is not keras, pth or safetensors.

Returns:

Project details.

Return type:

dict

flatiron.core.tools.is_custom_definition(config, module)[source]

Determine whether config is of custom defined code.

Parameters:

config (dict) – Instance config.
module (str) – Always __name__.

Raises:

EnforceError – If config is not a dict with a name key.

Returns:

True if config is of custom defined code.

Return type:

bool

flatiron.core.tools.pad_layer_name(name, length=18)[source]

Pads underscores in a given layer name to make the string achieve a given length.

Parameters:

name (str) – Layer name to be padded.
length (int) – Length of output string. Default: 18.

Returns:

Padded layer name.

Return type:

str

flatiron.core.tools.resolve_kwargs(kwargs, engine, optimizer, return_type='both')[source]

Filter keyword arguments base on prefix and return them minus the prefix.

Parameters:

kwargs (dict) – Kwargs dict.
engine (str) – Deep learning framework.
optimizer (str) – Optimizer name.
return_type (str, optional) – Which kind of keys to return. Options: [prefixed, unprefixed, both]. Default: both.

Returns:

Resolved kwargs.

Return type:

dict

flatiron.core.tools.resolve_module_config(config, module)[source]

Given a config and set of modules return a validated dict.

Parameters:

config (dict) – Instance config.
module (str) – Always __name__.

Raises:

EnforceError – If config is not a dict with a name key.

Returns:

Resolved config dict.

Return type:

dict

flatiron.core.tools.slack_it(title, channel, url, config=None, stopwatch=None, timezone='UTC', suppress=False)[source]

Compose a message from given arguments and post it to slack.

Parameters:

title (str) – Post title.
channel (str) – Slack channel.
url (str) – Slack URL.
config (dict, optional) – Parameter dict. Default: None.
stopwatch (StopWatch, optional) – StopWatch instance. Default: None.
timezone (str, optional) – Timezone. Default: UTC.
suppress (bool, optional) – Return message, rather than post it to Slack. Default: False.

Returns:

Slack response.

Return type:

HTTPResponse

flatiron.core.tools.train_test_split(data, test_size=0.2, shuffle=True, seed=None, limit=None)[source]

Split DataFrame into train and test DataFrames.

Parameters:

data (pd.DataFrame) – DataFrame.
test_size (float, optional) – Test set size as a proportion. Default: 0.2.
shuffle (bool, optional) – Randomize data before splitting. Default: True.
seed (int, optional) – Seed number. Default: None.
limit (int, optional) – Limit the total length of train and test. Default: None.

Raises:

EnforceError – If data is not a DataFrame.
EnforceError – If test_size is not between 0 and 1.

Returns:

Train and test DataFrames.

Return type:

tuple[pd.DataFrame, pd.DataFrame]

flatiron.core.tools.unindent(text, spaces=4)[source]

Unindents given block of text according to given number of spaces.

Parameters:

text (str) – Text block to unindent.
spaces (int, optional) – Number of spaces to remove. Default: 4.

Returns:

Unindented text.

Return type:

str

validators

flatiron.core.validators.is_base_two(number)[source]

Validates that number is base two.

Parameters:: number (int) – Number.
Raises:: ValueError – If number is not base two.
Returns:: Input number.
Return type:: int

flatiron.core.validators.is_callback_mode(mode)[source]

Validates that mode is a legal calback mode.

Parameters:: mode (str) – Callback mode.
Raises:: ValueError – If mode type is not legal.
Returns:: Input callback mode.
Return type:: str

flatiron.core.validators.is_engine(engine)[source]

Validates that engine is a legal deep learning framework.

Parameters:: engine (str) – Deep learning framework.
Raises:: ValueError – If engine is not legal.
Returns:: Input engine.
Return type:: str

flatiron.core.validators.is_even(number)[source]

Validates that number is even.

Parameters:: number (int) – Number.
Raises:: ValueError – If number is not even.
Returns:: Input number.
Return type:: int

flatiron.core.validators.is_odd(number)[source]

Validates that number is odd.

Parameters:: number (int) – Number.
Raises:: ValueError – If number is not odd.
Returns:: Input number.
Return type:: int

flatiron.core.validators.is_padding(pad_type)[source]

Validates that pad_type is a legal padding type.

Parameters:: pad_type (str) – Padding type.
Raises:: ValueError – If padding type is not legal.
Returns:: Input padding type.
Return type:: str

flatiron.core.validators.is_pipeline_method(method)[source]

Validates that method is a legal pipeline method.

Parameters:: mode (str) – Pipeline method.
Raises:: ValueError – If method is not legal.
Returns:: Input pipeline method.
Return type:: str