core¶
config¶
- class hidebound.core.config.Config(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
A class for validating configurations supplied to Database.
- ingress_directory¶
Root directory to recurse.
- Type:
str or Path
- staging_directory¶
Directory where hidebound data will be staged.
- Type:
str or Path
- include_regex¶
Include filenames that match this regex. Default: ‘’.
- Type:
str, optional
- exclude_regex¶
Exclude filenames that match this regex. Default: ‘.DS_Store’.
- Type:
str, optional
- write_mode¶
How assets will be extracted to hidebound/content directory. Default: copy.
- Type:
str, optional
- workflow¶
Ordered steps of workflow. Default: [‘delete’, ‘update’, ‘create’, ‘export’].
- Type:
list[str], optional
- redact_regex¶
Regex pattern matched to config keys. Values of matching keys will be redacted. Default: “(_key|_id|_token|url)$”.
- Type:
str, optional
- redact_hash¶
Whether to replace redacted values with “REDACTED” or a hash of the value. Default: True.
- Type:
bool, optional
- specification_files¶
List of asset specification files. Default: [].
- Type:
list[str], optional
- exporters¶
Dictionary of exporter configs, where the key is the exporter name and the value is its config. Default: {}.
- Type:
dict, optional
- webhooks¶
List of webhooks to be called after export. Default: [].
- Type:
list[dict], optional
- dask¶
{}.
- Type:
dict, optional
- class WebhookConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
data:
DictType
= <DictType(BaseType) instance on WebhookConfig as 'data'>¶
-
headers:
DictType
= <DictType(StringType) instance on WebhookConfig as 'headers'>¶
-
json:
DictType
= <DictType(BaseType) instance on WebhookConfig as 'json'>¶
-
method:
StringType
= <StringType() instance on WebhookConfig as 'method'>¶
-
params:
DictType
= <DictType(BaseType) instance on WebhookConfig as 'params'>¶
-
timeout:
IntType
= <IntType() instance on WebhookConfig as 'timeout'>¶
-
url:
URLType
= <URLType() instance on WebhookConfig as 'url'>¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
dask:
ModelType
= <ModelType(DaskConnectionConfig) instance on Config as 'dask'>¶
-
exclude_regex:
StringType
= <StringType() instance on Config as 'exclude_regex'>¶
-
exporters:
ListType
= <ListType(BaseType) instance on Config as 'exporters'>¶
-
include_regex:
StringType
= <StringType() instance on Config as 'include_regex'>¶
-
ingress_directory:
StringType
= <StringType() instance on Config as 'ingress_directory'>¶
-
redact_hash:
BooleanType
= <BooleanType() instance on Config as 'redact_hash'>¶
-
redact_regex:
StringType
= <StringType() instance on Config as 'redact_regex'>¶
-
specification_files:
ListType
= <ListType(StringType) instance on Config as 'specification_files'>¶
-
staging_directory:
StringType
= <StringType() instance on Config as 'staging_directory'>¶
-
webhooks:
ListType
= <ListType(ModelType) instance on Config as 'webhooks'>¶
-
workflow:
ListType
= <ListType(StringType) instance on Config as 'workflow'>¶
-
write_mode:
StringType
= <StringType() instance on Config as 'write_mode'>¶
- hidebound.core.config.is_specification_file(filepath)[source]¶
Validator for specification files given to Database.
- Parameters:
filepath (str or Path) – Filepath of python specification file.
- Raises:
ValidationError – If module could not be imported.
ValidationError – If module has no SPECIFICATIONS attribute.
ValidationError – If module SPECIFICATIONS attribute is not a list.
ValidationError – If modules classes in SPECIFICATIONS attribute are not subclasses of SpecificationBase.
ValidationError – If keys in SPECIFICATIONS attribute are not lowercase versions of class names.
- Return type:
None
connection¶
- class hidebound.core.connection.DaskConnection(config)[source]¶
Bases:
object
- __init__(config)[source]¶
Instantiates a DaskConnection.
- Parameters:
config (dict) – DaskConnection config.
- Raises:
DataError – If config is invalid.
- property cluster_type: str¶
Returns: str: Cluster type.
- property gateway_config: dict¶
Returns: dict: gateway cluster config.
- property local_config: dict¶
Returns: dict: Local cluster config.
- property num_partitions: int¶
Returns: int: Number of partitions.
- class hidebound.core.connection.DaskConnectionConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
A class for validating DaskConnection configurations.
- cluster_type¶
Dask cluster type. Options include: local, gateway. Default: local.
- Type:
str, optional
- num_partitions¶
Number of partions each DataFrame is to be split into. Default: 1.
- Type:
int, optional
- local_num_workers¶
Number of workers to run on local cluster. Default: 1.
- Type:
int, optional
- local_threads_per_worker¶
Number of threads to run per worker local cluster. Default: 1.
- Type:
int, optional
- local_multiprocessing¶
Whether to use multiprocessing for local cluster. Default: True.
- Type:
bool, optional
- gateway_address¶
Dask Gateway server address. Default: ‘http://proxy-public/services/dask-gateway’.
- Type:
str, optional
- gateway_proxy_address¶
Dask Gateway scheduler proxy server address. Default: ‘gateway://traefik-daskhub-dask-gateway.core:80’
- Type:
str, optional
- gateway_public_address¶
The address to the gateway server, as accessible from a web browser. Default: ‘https://dask-gateway/services/dask-gateway/’.
- Type:
str, optional
- gateway_auth_type¶
Dask Gateway authentication type. Default: basic.
- Type:
str, optional
- gateway_api_token¶
Authentication API token.
- Type:
str, optional
- gateway_api_user¶
Basic authentication user name.
- Type:
str, optional
- gateway_cluster_options¶
Dask Gateway cluster options. Default: [].
- Type:
list, optional
- gateway_min_workers¶
Minimum number of Dask Gateway workers. Default: 1.
- Type:
int, optional
- gateway_max_workers¶
Maximum number of Dask Gateway workers. Default: 8.
- Type:
int, optional
- gateway_shutdown_on_close¶
Whether to shudown cluster upon close. Default: True.
- Type:
bool, optional
- gateway_timeout¶
Dask Gateway connection timeout in seconds. Default: 30.
- Type:
int, optional
- class ClusterOption(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
default:
BaseType
= <BaseType() instance on ClusterOption as 'default'>¶
-
field:
StringType
= <StringType() instance on ClusterOption as 'field'>¶
-
label:
StringType
= <StringType() instance on ClusterOption as 'label'>¶
- option_type = <StringType() instance on ClusterOption as 'option_type'>¶
- options = <ListType(BaseType) instance on ClusterOption as 'options'>¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
cluster_type:
StringType
= <StringType() instance on DaskConnectionConfig as 'cluster_type'>¶
-
gateway_address:
URLType
= <URLType() instance on DaskConnectionConfig as 'gateway_address'>¶
- gateway_api_token = <StringType() instance on DaskConnectionConfig as 'gateway_api_token'>¶
- gateway_api_user = <StringType() instance on DaskConnectionConfig as 'gateway_api_user'>¶
- gateway_auth_type = <StringType() instance on DaskConnectionConfig as 'gateway_auth_type'>¶
-
gateway_cluster_options:
ListType
= <ListType(ModelType) instance on DaskConnectionConfig as 'gateway_cluster_options'>¶
-
gateway_max_workers:
IntType
= <IntType() instance on DaskConnectionConfig as 'gateway_max_workers'>¶
-
gateway_min_workers:
IntType
= <IntType() instance on DaskConnectionConfig as 'gateway_min_workers'>¶
-
gateway_proxy_address:
StringType
= <StringType() instance on DaskConnectionConfig as 'gateway_proxy_address'>¶
-
gateway_public_address:
URLType
= <URLType() instance on DaskConnectionConfig as 'gateway_public_address'>¶
-
gateway_shutdown_on_close:
BooleanType
= <BooleanType() instance on DaskConnectionConfig as 'gateway_shutdown_on_close'>¶
-
gateway_timeout:
IntType
= <IntType() instance on DaskConnectionConfig as 'gateway_timeout'>¶
-
local_multiprocessing:
BooleanType
= <BooleanType() instance on DaskConnectionConfig as 'local_multiprocessing'>¶
-
local_num_workers:
IntType
= <IntType() instance on DaskConnectionConfig as 'local_num_workers'>¶
-
local_threads_per_worker:
IntType
= <IntType() instance on DaskConnectionConfig as 'local_threads_per_worker'>¶
-
num_partitions:
IntType
= <IntType() instance on DaskConnectionConfig as 'num_partitions'>¶
database¶
- class hidebound.core.database.Database(ingress_dir, staging_dir, specifications=[], include_regex='', exclude_regex='\\\\.DS_Store', write_mode='copy', exporters=[], webhooks=[], dask={}, testing=False)[source]¶
Bases:
object
Generates a DataFrame using the files within a given directory as rows.
- __init__(ingress_dir, staging_dir, specifications=[], include_regex='', exclude_regex='\\\\.DS_Store', write_mode='copy', exporters=[], webhooks=[], dask={}, testing=False)[source]¶
Creates an instance of Database but does not populate it with data.
- Parameters:
ingress_dir (str or Path) – Root directory to recurse.
staging_dir (str or Path) – Directory where hidebound data will be staged.
specifications (list[SpecificationBase], optional) – List of asset specifications. Default: [].
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.
write_mode (str, optional) – How assets will be extracted to hidebound/content directory. Default: copy.
exporters (list[dict], optional) – List of exporter configs. Default: [].
webhooks (list[dict], optional) – List of webhooks to call. Default: []. Default: False.
dask (dict, optional) – Dask configuration. Default: {}.
testing (
bool
) – (bool, optional): Used for testing. Default: False.
- Raises:
TypeError – If specifications contains a non-SpecificationBase object.
ValueError – If write_mode not is not “copy” or “move”.
FileNotFoundError – If root is not a directory or does not exist.
FileNotFoundError – If staging_dir is not directory or does not exist.
NameError – If staging_dir is not named “hidebound”.
- Returns:
Database instance.
- Return type:
- call_webhooks()[source]¶
Calls webhooks defined in config.
- Yields:
requests.Response – Webhook response.
- create()[source]¶
Extract valid assets as data and metadata within the hidebound directory.
Writes:
- file content to hb_parent/hidebound/content - under same directory
structure
asset metadata as json to hb_parent/hidebound/metadata/asset
file metadata as json to hb_parent/hidebound/metadata/file
asset metadata as single json to hb_parent/hidebound/metadata/asset-chunk
file metadata as single json to hb_parent/hidebound/metadata/file-chunk
- Raises:
RunTimeError – If data has not been initialized.
- Returns:
self.
- Return type:
- delete()[source]¶
Deletes hidebound/content and hidebound/metadata directories and all their contents.
- Returns:
self.
- Return type:
- export()[source]¶
Exports all the files found in in hidebound root directory. Calls webhooks afterwards.
- Returns:
Self.
- Return type:
- static from_config(config)[source]¶
Constructs a Database instance given a valid config.
- Parameters:
config (dict) – Dictionary that meets Config class standards.
- Raises:
DataError – If config is invalid.
- Returns:
Database instance.
- Return type:
- static from_json(filepath)[source]¶
Constructs a Database instance from a given json file.
- Parameters:
filepath (str or Path) – Filepath of json config file.
- Returns:
Database instance.
- Return type:
- static from_yaml(filepath)[source]¶
Constructs a Database instance from a given yaml file.
- Parameters:
filepath (str or Path) – Filepath of yaml config file.
- Returns:
Database instance.
- Return type:
- read(group_by_asset=False)[source]¶
Return a DataFrame which can be easily be queried and has only cells with scalar values.
- Parameters:
group_by_asset (bool, optional) – Whether to group the data by asset. Default: False.
- Raises:
RunTimeError – If data has not been initialized.
- Returns:
Formatted data.
- Return type:
DataFrame
database_tools¶
- hidebound.core.database_tools.DF¶
A library of tools for Database to use in construction of its central DataFrame.
alias of
Union
[DataFrame
,DataFrame
]
- hidebound.core.database_tools.add_asset_id(data)[source]¶
Adds asset_id column derived UUID hash of asset filepath.
- Parameters:
data (pd.DataFrame) – DataFrame.
- Returns:
DataFrame with asset_id column.
- Return type:
pd.DataFrame
- hidebound.core.database_tools.add_asset_name(data)[source]¶
Adds asset_name column derived from filepath.
- Parameters:
data (DataFrame) – DataFrame.
- Returns:
DataFrame with updated asset_name column.
- Return type:
DataFrame
- hidebound.core.database_tools.add_asset_path(data)[source]¶
Adds asset_path column derived from filepath.
- Parameters:
data (DataFrame) – DataFrame.
- Returns:
DataFrame with asset_path column.
- Return type:
DataFrame
- hidebound.core.database_tools.add_asset_traits(data)[source]¶
Adds traits derived from aggregation of file traits. Add asset_traits column and one column per traits key.
- Parameters:
data (DataFrame) – DataFrame.
- Returns:
DataFrame with asset_traits column.
- Return type:
DataFrame
- hidebound.core.database_tools.add_asset_type(data)[source]¶
Adds asset_type column derived from specification.
- Parameters:
data (DataFrame) – DataFrame.
- Returns:
DataFrame with asset_type column.
- Return type:
DataFrame
- hidebound.core.database_tools.add_file_traits(data)[source]¶
Adds traits derived from file in filepath. Add file_traits column and one column per traits key.
- Parameters:
data (DataFrame) – DataFrame.
- Returns:
DataFrame with updated file_error columns.
- Return type:
DataFrame
- hidebound.core.database_tools.add_relative_path(data, column, root_dir)[source]¶
Adds relative path column derived from given column.
- Parameters:
data (DataFrame) – DataFrame.
column (str) – Column to be made relative.
root_dir (Path or str) – Root path to be removed.
- Returns:
DataFrame with updated [column]_relative column.
- Return type:
DataFrame
- hidebound.core.database_tools.add_specification(data, specifications)[source]¶
Adds specification data to given DataFrame.
Columns added:
specification
specification_class
file_error
- Parameters:
data (DataFrame) – DataFrame.
specifications (dict) – Dictionary of specifications.
- Returns:
- DataFrame with specification, specification_class and
file_error columns.
- Return type:
DataFrame
- hidebound.core.database_tools.cleanup(data)[source]¶
Ensures only specific columns are present and in correct order and Paths are converted to strings.
- Parameters:
data (DataFrame) – DataFrame.
- Returns:
Cleaned up DataFrame.
- Return type:
DataFrame
- hidebound.core.database_tools.get_data_for_write(data, source_dir, target_dir)[source]¶
Split given data into three DataFrame creating files.
- Parameters:
data (
DataFrame
) – DataFrame: DataFrame to be transformed.source_dir (str or Path) – Source directory of asset files.
target_dir (str or Path) – Target directory where data will be written.
DataFrames:
File data - For writing asset file data to a target filepath.
Asset metadata - For writing asset metadata to a target json file.
File metadata - For writing file metadata to a target json file.
Asset chunk - For writing asset metadata chunk to a target json file.
File chunk - For writing file metadata chunk to a target json file.
- Returns:
- file_data, asset_metadata, file_metadata, asset_chunk,
file_chunk.
- Return type:
tuple[DataFrame]
logging¶
- class hidebound.core.logging.ProgressLogger(name, filepath='/var/log/hidebound/hidebound-progress.log', level=20)[source]¶
Bases:
object
Logs progress to quasi-JSON files.
- __init__(name, filepath='/var/log/hidebound/hidebound-progress.log', level=20)[source]¶
Create ProgressLogger instance.
- Parameters:
name (str) – Logger name.
filepath (str or Path, optional) – Log filepath. Default: /var/logs/hidebound/hidebound-progress.log.
level (int, optional) – Log level. Default: INFO.
- static _get_logger(name, filepath, level=20)[source]¶
Creates a JSON logger.
- Parameters:
name (str) – Name of logger.
filepath (str or Path) – Filepath of JSON log.
level (int, optional) – Log level. Default: INFO.
- Returns:
JSON logger.
- Return type:
Logger
- critical(message, step=None, total=None, **kwargs)[source]¶
Log given message with CRITICAL log level.
- Parameters:
message (str) – Log message.
step (int, optional) – Step in progress. Default: None.
total (int, optional) – Total number of steps. Default: None.
- Return type:
None
- debug(message, step=None, total=None, **kwargs)[source]¶
Log given message with DEBUG log level.
- Parameters:
message (str) – Log message.
step (int, optional) – Step in progress. Default: None.
total (int, optional) – Total number of steps. Default: None.
- Return type:
None
- error(message, step=None, total=None, **kwargs)[source]¶
Log given message with ERROR log level.
- Parameters:
message (str) – Log message.
step (int, optional) – Step in progress. Default: None.
total (int, optional) – Total number of steps. Default: None.
- Return type:
None
- fatal(message, step=None, total=None, **kwargs)[source]¶
Log given message with FATAL log level.
- Parameters:
message (str) – Log message.
step (int, optional) – Step in progress. Default: None.
total (int, optional) – Total number of steps. Default: None.
- Return type:
None
- property filepath: str¶
Filepath of progress log.
- Type:
str
- info(message, step=None, total=None, **kwargs)[source]¶
Log given message with INFO log level.
- Parameters:
message (str) – Log message.
step (int, optional) – Step in progress. Default: None.
total (int, optional) – Total number of steps. Default: None.
- Return type:
None
- log(level, message, step=None, total=None, **kwargs)[source]¶
Log given message with given level.
- Parameters:
level (int) – Log level.
message (str) – Log message.
step (int, optional) – Step in progress. Default: None.
total (int, optional) – Total number of steps. Default: None.
- Return type:
None
- property logs: List[dict]¶
Logs read from filepath.
- Type:
list[dict]
parser¶
- class hidebound.core.parser.AssetNameParser(fields)[source]¶
Bases:
object
A class for converting asset names to metadata and metadata to asset names, according to a dynimcally defined grammar.
-
COORDINATE_INDICATOR:
str
= 'c'¶
-
COORDINATE_PADDING:
int
= 4¶
-
DESCRIPTOR_INDICATOR:
str
= 'd-'¶
-
EXTENSION_INDICATOR:
str
= '.'¶
-
FIELD_SEPARATOR:
str
= '_'¶
-
FRAME_INDICATOR:
str
= 'f'¶
-
FRAME_PADDING:
int
= 4¶
-
LEGAL_FIELDS:
List
[str
] = ['project', 'specification', 'descriptor', 'version', 'coordinate', 'frame', 'extension']¶
-
PROJECT_INDICATOR:
str
= 'p-'¶
-
SPECIFICATION_INDICATOR:
str
= 's-'¶
-
TOKEN_SEPARATOR:
str
= '-'¶
-
VERSION_INDICATOR:
str
= 'v'¶
-
VERSION_PADDING:
int
= 3¶
- __init__(fields)[source]¶
Create a AssetNameParser instance with given fields.
- Parameters:
fields (list[str]) – An ordered list of asset fields.
- Raises:
ValueError – If fields is empty.
ValueError – If fields are duplicated.
ValueError – If illegal fields are given.
ValueError – If illegal field order given.
- Returns:
instance.
- Return type:
- static _get_extension_parser(grammar)[source]¶
Creates a parser for file extensions.
- Parameters:
grammar (dict) – AssetNameParser grammar dictionary.
- Returns:
Parser.
- Return type:
Group
- static _get_grammar()[source]¶
Create parser grammar dictionary.
- Returns:
Grammar.
- Return type:
dict
- static _get_parser(grammar, fields)[source]¶
Creates a parser for asset names.
- Parameters:
grammar (dict) – AssetNameParser grammar dictionary.
fields (list[str]) – List of fields.
- Returns:
Parser.
- Return type:
Group
- static _get_specification_parser()[source]¶
Returns a parser for finding a specification within an arbitrary string.
- Returns:
Parser.
- Return type:
Group
- static _raise_field_error(field, part)[source]¶
A convenience function used for raising custom ParseExceptions.
- Parameters:
field (str) – Field.
part (str) – Part of field.
- Returns:
lambda s, l, i, e: raise_error(field, s, i)
- Return type:
function
- parse(text)[source]¶
Parse a given string.
- Parameters:
text (str) – String to be parsed.
- Raises:
ParseException – If parse fails.
- Returns:
parser.
- Return type:
dict
-
COORDINATE_INDICATOR:
specification_base¶
- class hidebound.core.specification_base.ComplexSpecificationBase(data={})[source]¶
Bases:
SpecificationBase
The base class for assets that consist of multiple directories of files.
- asset_type¶
Complex.
- Type:
str
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
asset_type:
str
= 'complex'¶
-
descriptor:
ListType
= <ListType(StringType) instance on ComplexSpecificationBase as 'descriptor'>¶
-
extension:
ListType
= <ListType(StringType) instance on ComplexSpecificationBase as 'extension'>¶
-
project:
ListType
= <ListType(StringType) instance on ComplexSpecificationBase as 'project'>¶
-
specification:
ListType
= <ListType(StringType) instance on ComplexSpecificationBase as 'specification'>¶
-
version:
ListType
= <ListType(IntType) instance on ComplexSpecificationBase as 'version'>¶
- class hidebound.core.specification_base.FileSpecificationBase(data={})[source]¶
Bases:
SpecificationBase
The base class for asset that consist of a single file.
- asset_type¶
File.
- Type:
str
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
asset_type:
str
= 'file'¶
-
descriptor:
ListType
= <ListType(StringType) instance on FileSpecificationBase as 'descriptor'>¶
-
extension:
ListType
= <ListType(StringType) instance on FileSpecificationBase as 'extension'>¶
- get_asset_path(filepath)[source]¶
Returns the filepath.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Asset path.
- Return type:
Path
-
project:
ListType
= <ListType(StringType) instance on FileSpecificationBase as 'project'>¶
-
specification:
ListType
= <ListType(StringType) instance on FileSpecificationBase as 'specification'>¶
- to_filepaths(root)[source]¶
Generates a complete list of filepaths given a root directory and filepath pattern.
- Parameters:
root (str or Path) – Directory containing asset.
pattern (str) – Filepath pattern.
- Returns:
List of filepaths.
- Return type:
list[str]
-
version:
ListType
= <ListType(IntType) instance on FileSpecificationBase as 'version'>¶
- class hidebound.core.specification_base.SequenceSpecificationBase(data={})[source]¶
Bases:
SpecificationBase
The base class for assets that consist of a sequence of files under a single directory.
- asset_type¶
Sequence.
- Type:
str
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
asset_type:
str
= 'sequence'¶
-
descriptor:
ListType
= <ListType(StringType) instance on SequenceSpecificationBase as 'descriptor'>¶
-
extension:
ListType
= <ListType(StringType) instance on SequenceSpecificationBase as 'extension'>¶
- get_asset_path(filepath)[source]¶
Returns the directory containing the asset files.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Asset path.
- Return type:
Path
-
project:
ListType
= <ListType(StringType) instance on SequenceSpecificationBase as 'project'>¶
-
specification:
ListType
= <ListType(StringType) instance on SequenceSpecificationBase as 'specification'>¶
- to_filepaths(root)[source]¶
Generates a complete list of filepaths given a root directory and filepath pattern.
- Parameters:
root (str or Path) – Directory containing asset.
pattern (str) – Filepath pattern.
- Returns:
List of filepaths.
- Return type:
list[str]
-
version:
ListType
= <ListType(IntType) instance on SequenceSpecificationBase as 'version'>¶
- class hidebound.core.specification_base.SpecificationBase(data={})[source]¶
Bases:
Model
The base class for all Hidebound specifications.
- asset_type¶
Type of asset. Options include: file, sequence, complex.
- Type:
str
- filename_fields¶
List of fields found in the asset filenames.
- Type:
list[str]
- asset_name_fields¶
List of fields found in the asset name.
- Type:
list[str]
- project¶
Project name.
- Type:
str
- descriptor¶
Asset descriptor.
- Type:
str
- version¶
Asset version.
- Type:
int
- extension¶
File extension.
- Type:
str
- __init__(data={})[source]¶
Returns a new specification instance.
- Parameters:
data (dict, optional) – Dictionary of asset data.
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
- _to_filepaths(root, pattern)[source]¶
Generates a complete list of filepaths given a root directory and filepath pattern.
- Parameters:
root (str or Path) – Directory containing asset.
pattern (str) – Filepath pattern.
- Returns:
List of filepaths.
- Return type:
list[str]
-
asset_name_fields:
List
[str
] = ['project', 'specification', 'descriptor', 'version']¶
-
asset_type:
str
= 'specification'¶
-
descriptor:
ListType
= <ListType(StringType) instance on SpecificationBase as 'descriptor'>¶
-
extension:
ListType
= <ListType(StringType) instance on SpecificationBase as 'extension'>¶
-
file_traits:
Dict
[str
,Any
] = {}¶
-
filename_fields:
List
[str
] = ['project', 'specification', 'descriptor', 'version', 'extension']¶
- get_asset_id(filepath)[source]¶
Returns a hash UUID of the asset directory or file, depending of asset type.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Asset id.
- Return type:
str
- get_asset_name(filepath)[source]¶
Returns the expected asset name give a filepath.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Asset name.
- Return type:
str
- get_asset_path(filepath)[source]¶
Returns the expected asset path given a filepath.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Raises:
NotImplementedError – If method not defined in subclass.
- Returns:
Asset path.
- Return type:
Path
- get_file_traits(filepath)[source]¶
Returns a dictionary of file traits from given filepath. Returns error in respective key if one is encountered.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Traits.
- Return type:
dict
- get_filename_traits(filepath)[source]¶
Returns a dictionary of filename traits from given filepath. Returns error in filename_error key if one is encountered.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Traits.
- Return type:
dict
- get_name_patterns()[source]¶
Generates asset name and filename patterns from class fields.
- Returns:
Asset name pattern, filename pattern.
- Return type:
tuple(str)
- get_traits(filepath)[source]¶
Returns a dictionary of file and filename traits from given filepath. Errors are captured in their respective keys.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Returns:
Traits.
- Return type:
dict
-
project:
ListType
= <ListType(StringType) instance on SpecificationBase as 'project'>¶
-
specification:
ListType
= <ListType(StringType) instance on SpecificationBase as 'specification'>¶
- validate_filepath(filepath)[source]¶
Attempts to parse the given filepath.
- Parameters:
filepath (str or Path) – filepath to asset file.
- Raises:
ValidationError – If parse fails.
ValidationError – If asset directory name is invalid.
- Return type:
None
-
version:
ListType
= <ListType(IntType) instance on SpecificationBase as 'version'>¶
specifications¶
- class hidebound.core.specifications.Raw001(data={})[source]¶
Bases:
SequenceSpecificationBase
Raw JPEG sequences with 1 or 3 channels.
- filename_fields¶
project, specification, descriptor, version, frame, extension
- Type:
list[str]
- asset_name_fields¶
project, specification, descriptor, version,
- Type:
list[str]
- height¶
Image height. Must be 1024.
- Type:
int
- width¶
Image width. Must be 1024.
- Type:
int
- extension¶
File extension. Must be “png”.
- Type:
str
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
asset_name_fields:
List
[str
] = ['project', 'specification', 'descriptor', 'version']¶
-
channels:
ListType
= <ListType(IntType) instance on Raw001 as 'channels'>¶
-
descriptor:
ListType
= <ListType(StringType) instance on Raw001 as 'descriptor'>¶
-
extension:
ListType
= <ListType(StringType) instance on Raw001 as 'extension'>¶
-
file_traits:
Dict
[str
,Any
] = {'channels': <function get_num_image_channels>, 'height': <function get_image_height>, 'width': <function get_image_width>}¶
-
filename_fields:
List
[str
] = ['project', 'specification', 'descriptor', 'version', 'frame', 'extension']¶
-
frame:
ListType
= <ListType(IntType) instance on Raw001 as 'frame'>¶
-
height:
ListType
= <ListType(IntType) instance on Raw001 as 'height'>¶
-
project:
ListType
= <ListType(StringType) instance on Raw001 as 'project'>¶
-
specification:
ListType
= <ListType(StringType) instance on Raw001 as 'specification'>¶
-
version:
ListType
= <ListType(IntType) instance on Raw001 as 'version'>¶
-
width:
ListType
= <ListType(IntType) instance on Raw001 as 'width'>¶
- class hidebound.core.specifications.Raw002(data={})[source]¶
Bases:
SequenceSpecificationBase
Raw JPEG sequences with 1 or 3 channels and coordinates.
- filename_fields¶
project, specification, descriptor, version, frame, extension
- Type:
list[str]
- asset_name_fields¶
project, specification, descriptor, version,
- Type:
list[str]
- height¶
Image height. Must be 1024.
- Type:
int
- width¶
Image width. Must be 1024.
- Type:
int
- extension¶
File extension. Must be “png”.
- Type:
str
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
-
asset_name_fields:
List
[str
] = ['project', 'specification', 'descriptor', 'version']¶
-
channels:
ListType
= <ListType(IntType) instance on Raw002 as 'channels'>¶
-
coordinate:
ListType
= <ListType(ListType) instance on Raw002 as 'coordinate'>¶
-
descriptor:
ListType
= <ListType(StringType) instance on Raw002 as 'descriptor'>¶
-
extension:
ListType
= <ListType(StringType) instance on Raw002 as 'extension'>¶
-
file_traits:
Dict
[str
,Any
] = {'channels': <function get_num_image_channels>, 'height': <function get_image_height>, 'width': <function get_image_width>}¶
-
filename_fields:
List
[str
] = ['project', 'specification', 'descriptor', 'version', 'coordinate', 'frame', 'extension']¶
-
frame:
ListType
= <ListType(IntType) instance on Raw002 as 'frame'>¶
-
height:
ListType
= <ListType(IntType) instance on Raw002 as 'height'>¶
-
project:
ListType
= <ListType(StringType) instance on Raw002 as 'project'>¶
-
specification:
ListType
= <ListType(StringType) instance on Raw002 as 'specification'>¶
-
version:
ListType
= <ListType(IntType) instance on Raw002 as 'version'>¶
-
width:
ListType
= <ListType(IntType) instance on Raw002 as 'width'>¶
tools¶
- hidebound.core.tools.DFS¶
The tools module contains general functions useful to other hidebound modules.
alias of
Union
[DataFrame
,Series
,DataFrame
,Series
]
- hidebound.core.tools.delete_empty_directories(directory)[source]¶
Recurses given directory tree and deletes directories that do not contain files or directories trees with files. .DS_Store files do not count as files. Does not delete given directory.
- Parameters:
directory (str or Path) – Directory to recurse.
- Raises:
EnforceError – If argument is not a directory or does not exist.
- Return type:
None
- hidebound.core.tools.directory_to_dataframe(directory, include_regex='', exclude_regex='\\\\.DS_Store')[source]¶
Recursively list files with in a given directory as rows in a pd.DataFrame.
- Parameters:
directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.
- Returns:
pd.DataFrame with one file per row.
- Return type:
pd.DataFrame
- hidebound.core.tools.error_to_string(error)[source]¶
Formats error as string.
- Parameters:
error (Exception) – Error.
- Returns:
Error message.
- Return type:
str
- hidebound.core.tools.get_lut(data, column, aggregator, meta='__no_default__')[source]¶
Constructs a lookup table with the given column as its keys and the aggregator results as its values. Data is grouped by given column and the given aggregator is applied to each group of values.
- Parameters:
data (DataFrame) – DataFrame.
column (str) – Column to be used as the key.
aggregator (function) – Function that expects a group DataFrame and returns a scalar.
meta (object, optional) – Metadata inference. Default: ‘__no_default__’.
- Returns:
DataFrame with key and value columns.
- Return type:
DataFrame
- hidebound.core.tools.get_meta_kwargs(data, meta)[source]¶
Convenience utility for coercing the meta keyword between pandas and dask.
- Parameters:
data (DataFrame or Series) – Pandas or dask object.
meta (object) – Meta key word argument.
- Returns:
Appropriate keyword args.
- Return type:
dict
- hidebound.core.tools.lut_combinator(data, key_column, value_column, aggregator, meta='__no_default__')[source]¶
Constructs a lookup table from given key_column, then applies it to given data as value column.
- Parameters:
data (DataFrame) – DataFrame.
key_column (str) – Column to be used as the lut keys.
value_column (str) – Column to be used as the values.
aggregator (function) – Function that expects a pd.DataFrame.
meta (object, optional) – Metadata irom_nference. Default: ‘__no_default__’.
- Returns:
DataFrame with value column.
- Return type:
DataFrame
- hidebound.core.tools.pred_combinator(data, predicate, true_func, false_func, meta='object')[source]¶
Apply true_func to rows where predicate if true and false_func to rows where it is false.
- Parameters:
data (DataFrame) – DataFrame or Series.
predicate (function) – Function that expects a row and returns a bool.
true_func (function) – Function that expects a row. Called when predicate is true.
false_func (function) – Function that expects a row. Called when predicate is false.
meta (object, optional) – Metadata inference. Default: ‘object’.
- Returns:
Apply results.
- Return type:
DataFrame or Series
- hidebound.core.tools.read_json(filepath)[source]¶
Convenience function for reading JSON files. Files may include comments.
- Parameters:
filepath (Path or str) – Filepath.
- Raises:
JSONDecodeError – If no JSON data could be decoded.
- Returns:
JSON object.
- Return type:
object
- hidebound.core.tools.str_to_bool(string)[source]¶
Converts a string to a boolean value.
- Parameters:
string (str) – String to be converted.
- Returns:
Boolean
- Return type:
bool
- hidebound.core.tools.time_string()[source]¶
- Returns:
String representing current time.
- Return type:
str
- hidebound.core.tools.to_prototype(dicts)[source]¶
Converts a list of dicts into a dict of lists. .. example:
:nowrap: >>> dicts = [dict(a=1, b=2, c=3), dict(a=10, b=20)] >>> to_prototype(dicts) {'a': [1, 10], 'b': [2, 20], 'c': [3]}
- Parameters:
dicts (list[dict]) – List of dicts.
- Returns:
Prototype dictionary.
- Return type:
dict
- hidebound.core.tools.traverse_directory(directory, include_regex='', exclude_regex='', entry_type='file')[source]¶
Recusively list all files or directories within a given directory.
- Parameters:
directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: ‘’.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘’.
entry_type (str, optional) – Kind of directory entry to return. Options include: file, directory. Default: file.
- Raises:
FileNotFoundError – If argument is not a directory or does not exist.
EnforceError – If entry_type is not file or directory.
- Yields:
Path – File.
- Return type:
Generator
[Path
,None
,None
]
traits¶
- hidebound.core.traits.get_image_height(filepath)[source]¶
Gets the height of the given image.
- Parameters:
filepath (str or Path) – filepath to image file.
- Returns:
Image height.
- Return type:
int
validators¶
- hidebound.core.validators.coordinates_begin_at(items, origin)[source]¶
Validates that the minimum coordinate of a given list equals a given origin.
- Parameters:
items (list[list[int]]) – List of coordinates.
origin (list[int]) – Origin coordinate.
- Raises:
ValidationError – If coordinates do not begin at origin.
- Returns:
State of items.
- Return type:
bool
- hidebound.core.validators.has_dense_coordinates(items)[source]¶
Validates that list of coordinates is dense (every point is filled).
- Parameters:
items (list[list[int]]) – List of coordinates.
- Raises:
ValidationError – If coordinates are not dense.
- Returns:
Density of coordinates.
- Return type:
bool
- hidebound.core.validators.has_uniform_coordinate_count(items)[source]¶
Validates that non-unique list of coordinates has a uniform count per coordinate.
- Parameters:
items (list[list[int]]) – List of coordinates.
- Raises:
ValidationError – If coordinate count is non-uniform.
- Returns:
Uniformity of coordinates.
- Return type:
bool
- hidebound.core.validators.is_attribute_of(name, object)[source]¶
Validates that each name is an attribute of given object.
- Parameters:
a (str) – Attribute name.
b (object) – Object.
- Raises:
ValidationError – If an name is not an attribute of given object.
- Returns:
Alls names are attributes of object.
- Return type:
bool
- hidebound.core.validators.is_aws_region(item)[source]¶
Validates an AWS region name.
- Parameters:
item (str) – AWS region name.
- Raises:
ValidationError – If region name is invalid.
- Returns:
Validity of region name.
- Return type:
bool
- hidebound.core.validators.is_bucket_name(item)[source]¶
Validates a bucket name.
- Parameters:
item (str) – bucket name.
- Raises:
ValidationError – If bucket name is invalid.
- Returns:
Validity of bucket name.
- Return type:
bool
- hidebound.core.validators.is_cluster_option_type(item)[source]¶
Validates that a given cluster option type is legal. Legal types include:
bool
float
int
mapping
select
string
- Parameters:
item (str) – Cluster option type.
- Raises:
ValidationError – If cluster option type is illegal.
- Returns:
Validity of cluster option type.
- Return type:
bool
- hidebound.core.validators.is_coordinate(item)[source]¶
Validates a coordinate.
- Parameters:
item (list[int]) – Coordinate.
- Raises:
ValidationError – If coordinate is invalid.
- Returns:
Validity of coordinate.
- Return type:
bool
- hidebound.core.validators.is_descriptor(item)[source]¶
Validates a descriptor.
- Parameters:
item (str) – Descriptor.
- Raises:
ValidationError – If descriptor is invalid.
- Returns:
Validity of descriptor.
- Return type:
bool
- hidebound.core.validators.is_directory(item)[source]¶
Validates thats item is a directory.
- Parameters:
item (str) – Directory path.
- Raises:
ValidationError – If item is not a directory or does not exist.
- Returns:
State of item.
- Return type:
bool
- hidebound.core.validators.is_eq(a, b)[source]¶
Validates that a and b are equal.
- Parameters:
a (object) – Object.
b (object) – Object.
- Raises:
ValidationError – If a does not equal b.
- Returns:
Equality of a and b.
- Return type:
bool
- hidebound.core.validators.is_extension(item)[source]¶
Validates a file extension.
- Parameters:
item (str) – File extension.
- Raises:
ValidationError – If extension is invalid.
- Returns:
Validity of extension.
- Return type:
bool
- hidebound.core.validators.is_file(item)[source]¶
Validates thats item is a file.
- Parameters:
item (str) – Filepath.
- Raises:
ValidationError – If item is not a file or does not exist.
- Returns:
State of item.
- Return type:
bool
- hidebound.core.validators.is_frame(item)[source]¶
Validates a frame.
- Parameters:
item (int) – Frame.
- Raises:
ValidationError – If frame is invalid.
- Returns:
Validity of frame.
- Return type:
bool
- hidebound.core.validators.is_gt(a, b)[source]¶
Validates that a is greater than b.
- Parameters:
a (object) – Object.
b (object) – Object.
- Raises:
ValidationError – If a is not greater than b.
- Returns:
A is greater than b.
- Return type:
bool
- hidebound.core.validators.is_gte(a, b)[source]¶
Validates that a is greater than or equal to b.
- Parameters:
a (object) – Object.
b (object) – Object.
- Raises:
ValidationError – If a is not greater than or equal to b.
- Returns:
A is greater than or equal to b.
- Return type:
bool
- hidebound.core.validators.is_hidebound_directory(directory)[source]¶
Ensures directory name is “hidebound”.
- Parameters:
directory (str or Path) – Hidebound directory.
- Raises:
ValidationError – If directory is not named “hidebound”.
- Return type:
None
- hidebound.core.validators.is_homogenous(items)[source]¶
Validates thats all items are equal.
- Parameters:
items (list) – List of items.
- Raises:
ValidationError – If items are not all the same.
- Returns:
Homogeneity of items.
- Return type:
bool
- hidebound.core.validators.is_http_method(method)[source]¶
Ensures given method is a legal HTTP method. Legal methods include:
get
put
post
delete
patch
- Parameters:
method (str) – HTTP method.
- Raises:
ValidationError – If method is not a legal HTTP method.
- Return type:
None
- hidebound.core.validators.is_in(a, b)[source]¶
Validates that each a is in b.
- Parameters:
a (object) – Object.
b (object) – Object.
- Raises:
ValidationError – If a is not in b.
- Returns:
Alls a’s in b.
- Return type:
bool
- hidebound.core.validators.is_legal_directory(item)[source]¶
Validates that directory path is legal. Legal directory paths must:
Begin with /
Not end with /
Contain only the characters: /, a-z, A-Z, 0-9, _, -
- Parameters:
item (str) – Directory path.
- Raises:
ValidationError – If directory path is invalid.
- Returns:
Validity of directory path.
- Return type:
bool
- hidebound.core.validators.is_lt(a, b)[source]¶
Validates that a is less than b.
- Parameters:
a (object) – Object.
b (object) – Object.
- Raises:
ValidationError – If a is not less than b.
- Returns:
A is less than b.
- Return type:
bool
- hidebound.core.validators.is_lte(a, b)[source]¶
Validates that a is less than or equal to b.
- Parameters:
a (object) – Object.
b (object) – Object.
- Raises:
ValidationError – If a is not less than or equal to b.
- Returns:
A is less than or equal to b.
- Return type:
bool
- hidebound.core.validators.is_metadata_type(item)[source]¶
Validates that a given metadata type is legal. Legal types include:
asset
file
asset-chunk
file-chunk
- Parameters:
item (str) – Metadata type.
- Raises:
ValidationError – If metadata type is illegal.
- Returns:
Validity of metadata type.
- Return type:
bool
- hidebound.core.validators.is_not_missing_values(items)[source]¶
Validates that sequence of integers is not missing any values.
- Parameters:
items (list[int]) – Integers.
- Raises:
ValidationError – If items is missing values.
- Returns:
State of item.
- Return type:
bool
- hidebound.core.validators.is_one_of(item, models)[source]¶
Validates whether given item matches at least one given model.
- Parameters:
item (dict) – Item to be validated.
models (list[Model]) – List schematics Models.
- Raises:
ValidationError – If no valid model could be found for given item.
- Return type:
None
- hidebound.core.validators.is_project(item)[source]¶
Validates a project name.
- Parameters:
item (str) – Project name.
- Raises:
ValidationError – If project name is invalid.
- Returns:
Validity of project name.
- Return type:
bool
- hidebound.core.validators.is_version(item)[source]¶
Validates a version.
- Parameters:
item (int) – Version.
- Raises:
ValidationError – If version is invalid.
- Returns:
Validity of version.
- Return type:
bool
- hidebound.core.validators.is_workflow(steps)[source]¶
Ensures given workflow steps are legal. Legal workflows steps include:
delete
update
create
export
- Parameters:
steps (list[str]) – List of workflow steps:
- Raises:
ValidationError – If method is not a legal workflow.
- Return type:
None
- hidebound.core.validators.validate(message)[source]¶
A decorator for predicate functions that raises a ValidationError if it returns False.
- Parameters:
message (str) – Error message if predicate returns False.
- Raises:
ValidationError – If predicate returns False.
- Returns:
Function that returns a boolean.
- Return type:
function
- hidebound.core.validators.validate_each(message, list_first_arg=False)[source]¶
A decorator for predicate functions that raises a ValidationError if it returns False when applied to each argument individually.
- Parameters:
message (str) – Error message if predicate returns False.
list_first_arg (str, optional) – Set to True if first argument is a list. Default: False.
- Raises:
ValidationError – If predicate returns False.
- Returns:
Function that returns a boolean.
- Return type:
function