core

config

class hidebound.core.config.Config(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

A class for validating configurations supplied to Database.

ingress_directory

Root directory to recurse.

Type:

str or Path

staging_directory

Directory where hidebound data will be staged.

Type:

str or Path

include_regex

Include filenames that match this regex. Default: ‘’.

Type:

str, optional

exclude_regex

Exclude filenames that match this regex. Default: ‘.DS_Store’.

Type:

str, optional

write_mode

How assets will be extracted to hidebound/content directory. Default: copy.

Type:

str, optional

workflow

Ordered steps of workflow. Default: [‘delete’, ‘update’, ‘create’, ‘export’].

Type:

list[str], optional

redact_regex

Regex pattern matched to config keys. Values of matching keys will be redacted. Default: “(_key|_id|_token|url)$”.

Type:

str, optional

redact_hash

Whether to replace redacted values with “REDACTED” or a hash of the value. Default: True.

Type:

bool, optional

specification_files

List of asset specification files. Default: [].

Type:

list[str], optional

exporters

Dictionary of exporter configs, where the key is the exporter name and the value is its config. Default: {}.

Type:

dict, optional

webhooks

List of webhooks to be called after export. Default: [].

Type:

list[dict], optional

dask

{}.

Type:

dict, optional

class WebhookConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
data: DictType = <DictType(BaseType) instance on WebhookConfig as 'data'>
headers: DictType = <DictType(StringType) instance on WebhookConfig as 'headers'>
json: DictType = <DictType(BaseType) instance on WebhookConfig as 'json'>
method: StringType = <StringType() instance on WebhookConfig as 'method'>
params: DictType = <DictType(BaseType) instance on WebhookConfig as 'params'>
timeout: IntType = <IntType() instance on WebhookConfig as 'timeout'>
url: URLType = <URLType() instance on WebhookConfig as 'url'>
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
dask: ModelType = <ModelType(DaskConnectionConfig) instance on Config as 'dask'>
exclude_regex: StringType = <StringType() instance on Config as 'exclude_regex'>
exporters: ListType = <ListType(BaseType) instance on Config as 'exporters'>
include_regex: StringType = <StringType() instance on Config as 'include_regex'>
ingress_directory: StringType = <StringType() instance on Config as 'ingress_directory'>
redact_hash: BooleanType = <BooleanType() instance on Config as 'redact_hash'>
redact_regex: StringType = <StringType() instance on Config as 'redact_regex'>
specification_files: ListType = <ListType(StringType) instance on Config as 'specification_files'>
staging_directory: StringType = <StringType() instance on Config as 'staging_directory'>
webhooks: ListType = <ListType(ModelType) instance on Config as 'webhooks'>
workflow: ListType = <ListType(StringType) instance on Config as 'workflow'>
write_mode: StringType = <StringType() instance on Config as 'write_mode'>
hidebound.core.config.is_specification_file(filepath)[source]

Validator for specification files given to Database.

Parameters:

filepath (str or Path) – Filepath of python specification file.

Raises:
  • ValidationError – If module could not be imported.

  • ValidationError – If module has no SPECIFICATIONS attribute.

  • ValidationError – If module SPECIFICATIONS attribute is not a list.

  • ValidationError – If modules classes in SPECIFICATIONS attribute are not subclasses of SpecificationBase.

  • ValidationError – If keys in SPECIFICATIONS attribute are not lowercase versions of class names.

Return type:

None

connection

class hidebound.core.connection.DaskConnection(config)[source]

Bases: object

__init__(config)[source]

Instantiates a DaskConnection.

Parameters:

config (dict) – DaskConnection config.

Raises:

DataError – If config is invalid.

property cluster_type: str

Returns: str: Cluster type.

property gateway_config: dict

Returns: dict: gateway cluster config.

property local_config: dict

Returns: dict: Local cluster config.

property num_partitions: int

Returns: int: Number of partitions.

class hidebound.core.connection.DaskConnectionConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

A class for validating DaskConnection configurations.

cluster_type

Dask cluster type. Options include: local, gateway. Default: local.

Type:

str, optional

num_partitions

Number of partions each DataFrame is to be split into. Default: 1.

Type:

int, optional

local_num_workers

Number of workers to run on local cluster. Default: 1.

Type:

int, optional

local_threads_per_worker

Number of threads to run per worker local cluster. Default: 1.

Type:

int, optional

local_multiprocessing

Whether to use multiprocessing for local cluster. Default: True.

Type:

bool, optional

gateway_address

Dask Gateway server address. Default: ‘http://proxy-public/services/dask-gateway’.

Type:

str, optional

gateway_proxy_address

Dask Gateway scheduler proxy server address. Default: ‘gateway://traefik-daskhub-dask-gateway.core:80’

Type:

str, optional

gateway_public_address

The address to the gateway server, as accessible from a web browser. Default: ‘https://dask-gateway/services/dask-gateway/’.

Type:

str, optional

gateway_auth_type

Dask Gateway authentication type. Default: basic.

Type:

str, optional

gateway_api_token

Authentication API token.

Type:

str, optional

gateway_api_user

Basic authentication user name.

Type:

str, optional

gateway_cluster_options

Dask Gateway cluster options. Default: [].

Type:

list, optional

gateway_min_workers

Minimum number of Dask Gateway workers. Default: 1.

Type:

int, optional

gateway_max_workers

Maximum number of Dask Gateway workers. Default: 8.

Type:

int, optional

gateway_shutdown_on_close

Whether to shudown cluster upon close. Default: True.

Type:

bool, optional

gateway_timeout

Dask Gateway connection timeout in seconds. Default: 30.

Type:

int, optional

class ClusterOption(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
default: BaseType = <BaseType() instance on ClusterOption as 'default'>
field: StringType = <StringType() instance on ClusterOption as 'field'>
label: StringType = <StringType() instance on ClusterOption as 'label'>
option_type = <StringType() instance on ClusterOption as 'option_type'>
options = <ListType(BaseType) instance on ClusterOption as 'options'>
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
cluster_type: StringType = <StringType() instance on DaskConnectionConfig as 'cluster_type'>
gateway_address: URLType = <URLType() instance on DaskConnectionConfig as 'gateway_address'>
gateway_api_token = <StringType() instance on DaskConnectionConfig as 'gateway_api_token'>
gateway_api_user = <StringType() instance on DaskConnectionConfig as 'gateway_api_user'>
gateway_auth_type = <StringType() instance on DaskConnectionConfig as 'gateway_auth_type'>
gateway_cluster_options: ListType = <ListType(ModelType) instance on DaskConnectionConfig as 'gateway_cluster_options'>
gateway_max_workers: IntType = <IntType() instance on DaskConnectionConfig as 'gateway_max_workers'>
gateway_min_workers: IntType = <IntType() instance on DaskConnectionConfig as 'gateway_min_workers'>
gateway_proxy_address: StringType = <StringType() instance on DaskConnectionConfig as 'gateway_proxy_address'>
gateway_public_address: URLType = <URLType() instance on DaskConnectionConfig as 'gateway_public_address'>
gateway_shutdown_on_close: BooleanType = <BooleanType() instance on DaskConnectionConfig as 'gateway_shutdown_on_close'>
gateway_timeout: IntType = <IntType() instance on DaskConnectionConfig as 'gateway_timeout'>
local_multiprocessing: BooleanType = <BooleanType() instance on DaskConnectionConfig as 'local_multiprocessing'>
local_num_workers: IntType = <IntType() instance on DaskConnectionConfig as 'local_num_workers'>
local_threads_per_worker: IntType = <IntType() instance on DaskConnectionConfig as 'local_threads_per_worker'>
num_partitions: IntType = <IntType() instance on DaskConnectionConfig as 'num_partitions'>

database

class hidebound.core.database.Database(ingress_dir, staging_dir, specifications=[], include_regex='', exclude_regex='\\\\.DS_Store', write_mode='copy', exporters=[], webhooks=[], dask={}, testing=False)[source]

Bases: object

Generates a DataFrame using the files within a given directory as rows.

__init__(ingress_dir, staging_dir, specifications=[], include_regex='', exclude_regex='\\\\.DS_Store', write_mode='copy', exporters=[], webhooks=[], dask={}, testing=False)[source]

Creates an instance of Database but does not populate it with data.

Parameters:
  • ingress_dir (str or Path) – Root directory to recurse.

  • staging_dir (str or Path) – Directory where hidebound data will be staged.

  • specifications (list[SpecificationBase], optional) – List of asset specifications. Default: [].

  • include_regex (str, optional) – Include filenames that match this regex. Default: None.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.

  • write_mode (str, optional) – How assets will be extracted to hidebound/content directory. Default: copy.

  • exporters (list[dict], optional) – List of exporter configs. Default: [].

  • webhooks (list[dict], optional) – List of webhooks to call. Default: []. Default: False.

  • dask (dict, optional) – Dask configuration. Default: {}.

  • testing (bool) – (bool, optional): Used for testing. Default: False.

Raises:
  • TypeError – If specifications contains a non-SpecificationBase object.

  • ValueError – If write_mode not is not “copy” or “move”.

  • FileNotFoundError – If root is not a directory or does not exist.

  • FileNotFoundError – If staging_dir is not directory or does not exist.

  • NameError – If staging_dir is not named “hidebound”.

Returns:

Database instance.

Return type:

Database

call_webhooks()[source]

Calls webhooks defined in config.

Yields:

requests.Response – Webhook response.

create()[source]

Extract valid assets as data and metadata within the hidebound directory.

Writes:

  • file content to hb_parent/hidebound/content - under same directory

    structure

  • asset metadata as json to hb_parent/hidebound/metadata/asset

  • file metadata as json to hb_parent/hidebound/metadata/file

  • asset metadata as single json to hb_parent/hidebound/metadata/asset-chunk

  • file metadata as single json to hb_parent/hidebound/metadata/file-chunk

Raises:

RunTimeError – If data has not been initialized.

Returns:

self.

Return type:

Database

delete()[source]

Deletes hidebound/content and hidebound/metadata directories and all their contents.

Returns:

self.

Return type:

Database

export()[source]

Exports all the files found in in hidebound root directory. Calls webhooks afterwards.

Returns:

Self.

Return type:

Database

static from_config(config)[source]

Constructs a Database instance given a valid config.

Parameters:

config (dict) – Dictionary that meets Config class standards.

Raises:

DataError – If config is invalid.

Returns:

Database instance.

Return type:

Database

static from_json(filepath)[source]

Constructs a Database instance from a given json file.

Parameters:

filepath (str or Path) – Filepath of json config file.

Returns:

Database instance.

Return type:

Database

static from_yaml(filepath)[source]

Constructs a Database instance from a given yaml file.

Parameters:

filepath (str or Path) – Filepath of yaml config file.

Returns:

Database instance.

Return type:

Database

read(group_by_asset=False)[source]

Return a DataFrame which can be easily be queried and has only cells with scalar values.

Parameters:

group_by_asset (bool, optional) – Whether to group the data by asset. Default: False.

Raises:

RunTimeError – If data has not been initialized.

Returns:

Formatted data.

Return type:

DataFrame

search(query, group_by_asset=False)[source]

Search data according to given SQL query.

Parameters:
  • query (str) – SQL query. Make sure to use “FROM data” in query.

  • group_by_asset (bool, optional) – Whether to group the data by asset. Default: False.

Returns:

Formatted data.

Return type:

DataFrame

update()[source]

Recurse root directory, populate self.data with its files, locate and validate assets.

Returns:

self.

Return type:

Database

database_tools

hidebound.core.database_tools.DF

A library of tools for Database to use in construction of its central DataFrame.

alias of Union[DataFrame, DataFrame]

hidebound.core.database_tools.add_asset_id(data)[source]

Adds asset_id column derived UUID hash of asset filepath.

Parameters:

data (pd.DataFrame) – DataFrame.

Returns:

DataFrame with asset_id column.

Return type:

pd.DataFrame

hidebound.core.database_tools.add_asset_name(data)[source]

Adds asset_name column derived from filepath.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with updated asset_name column.

Return type:

DataFrame

hidebound.core.database_tools.add_asset_path(data)[source]

Adds asset_path column derived from filepath.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with asset_path column.

Return type:

DataFrame

hidebound.core.database_tools.add_asset_traits(data)[source]

Adds traits derived from aggregation of file traits. Add asset_traits column and one column per traits key.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with asset_traits column.

Return type:

DataFrame

hidebound.core.database_tools.add_asset_type(data)[source]

Adds asset_type column derived from specification.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with asset_type column.

Return type:

DataFrame

hidebound.core.database_tools.add_file_traits(data)[source]

Adds traits derived from file in filepath. Add file_traits column and one column per traits key.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with updated file_error columns.

Return type:

DataFrame

hidebound.core.database_tools.add_relative_path(data, column, root_dir)[source]

Adds relative path column derived from given column.

Parameters:
  • data (DataFrame) – DataFrame.

  • column (str) – Column to be made relative.

  • root_dir (Path or str) – Root path to be removed.

Returns:

DataFrame with updated [column]_relative column.

Return type:

DataFrame

hidebound.core.database_tools.add_specification(data, specifications)[source]

Adds specification data to given DataFrame.

Columns added:

  • specification

  • specification_class

  • file_error

Parameters:
  • data (DataFrame) – DataFrame.

  • specifications (dict) – Dictionary of specifications.

Returns:

DataFrame with specification, specification_class and

file_error columns.

Return type:

DataFrame

hidebound.core.database_tools.cleanup(data)[source]

Ensures only specific columns are present and in correct order and Paths are converted to strings.

Parameters:

data (DataFrame) – DataFrame.

Returns:

Cleaned up DataFrame.

Return type:

DataFrame

hidebound.core.database_tools.get_data_for_write(data, source_dir, target_dir)[source]

Split given data into three DataFrame creating files.

Parameters:
  • data (DataFrame) – DataFrame: DataFrame to be transformed.

  • source_dir (str or Path) – Source directory of asset files.

  • target_dir (str or Path) – Target directory where data will be written.

DataFrames:

  • File data - For writing asset file data to a target filepath.

  • Asset metadata - For writing asset metadata to a target json file.

  • File metadata - For writing file metadata to a target json file.

  • Asset chunk - For writing asset metadata chunk to a target json file.

  • File chunk - For writing file metadata chunk to a target json file.

Returns:

file_data, asset_metadata, file_metadata, asset_chunk,

file_chunk.

Return type:

tuple[DataFrame]

hidebound.core.database_tools.validate_assets(data)[source]

Validates assets according to their specification. Add asset_error and asset_valid columns.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with asset_error and asset_valid columns.

Return type:

DataFrame

hidebound.core.database_tools.validate_filepath(data)[source]

Validates filepath column of given DataFrame. Adds error to error column if invalid.

Parameters:

data (DataFrame) – DataFrame.

Returns:

DataFrame with updated file_error columns.

Return type:

DataFrame

logging

class hidebound.core.logging.DummyLogger[source]

Bases: object

Dummy class for logging.

critical(*args, **kwargs)[source]

Does nothing.

Return type:

None

debug(*args, **kwargs)[source]

Does nothing.

Return type:

None

error(*args, **kwargs)[source]

Does nothing.

Return type:

None

fatal(*args, **kwargs)[source]

Does nothing.

Return type:

None

info(*args, **kwargs)[source]

Does nothing.

Return type:

None

warning(*args, **kwargs)[source]

Does nothing.

Return type:

None

class hidebound.core.logging.ProgressLogger(name, filepath='/var/log/hidebound/hidebound-progress.log', level=20)[source]

Bases: object

Logs progress to quasi-JSON files.

__init__(name, filepath='/var/log/hidebound/hidebound-progress.log', level=20)[source]

Create ProgressLogger instance.

Parameters:
  • name (str) – Logger name.

  • filepath (str or Path, optional) – Log filepath. Default: /var/logs/hidebound/hidebound-progress.log.

  • level (int, optional) – Log level. Default: INFO.

static _get_logger(name, filepath, level=20)[source]

Creates a JSON logger.

Parameters:
  • name (str) – Name of logger.

  • filepath (str or Path) – Filepath of JSON log.

  • level (int, optional) – Log level. Default: INFO.

Returns:

JSON logger.

Return type:

Logger

critical(message, step=None, total=None, **kwargs)[source]

Log given message with CRITICAL log level.

Parameters:
  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

debug(message, step=None, total=None, **kwargs)[source]

Log given message with DEBUG log level.

Parameters:
  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

error(message, step=None, total=None, **kwargs)[source]

Log given message with ERROR log level.

Parameters:
  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

fatal(message, step=None, total=None, **kwargs)[source]

Log given message with FATAL log level.

Parameters:
  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

property filepath: str

Filepath of progress log.

Type:

str

info(message, step=None, total=None, **kwargs)[source]

Log given message with INFO log level.

Parameters:
  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

log(level, message, step=None, total=None, **kwargs)[source]

Log given message with given level.

Parameters:
  • level (int) – Log level.

  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

property logs: List[dict]

Logs read from filepath.

Type:

list[dict]

static read(filepath)[source]

Read a given progress log file.

Parameters:

filepath (str or Path) – Log path.

Returns:

Logs.

Return type:

list[dict]

warning(message, step=None, total=None, **kwargs)[source]

Log given message with WARNING log level.

Parameters:
  • message (str) – Log message.

  • step (int, optional) – Step in progress. Default: None.

  • total (int, optional) – Total number of steps. Default: None.

Return type:

None

hidebound.core.logging.get_progress(logpath='/var/log/hidebound/hidebound-progress.log')[source]

Get last line of given progress file. Returns {} if logpath is not a file.

Parameters:

logpath (str or Path, optional) – Path to log file.

Returns:

Progress dictionary.

Return type:

dict

parser

class hidebound.core.parser.AssetNameParser(fields)[source]

Bases: object

A class for converting asset names to metadata and metadata to asset names, according to a dynimcally defined grammar.

COORDINATE_INDICATOR: str = 'c'
COORDINATE_PADDING: int = 4
DESCRIPTOR_INDICATOR: str = 'd-'
EXTENSION_INDICATOR: str = '.'
FIELD_SEPARATOR: str = '_'
FRAME_INDICATOR: str = 'f'
FRAME_PADDING: int = 4
LEGAL_FIELDS: List[str] = ['project', 'specification', 'descriptor', 'version', 'coordinate', 'frame', 'extension']
PROJECT_INDICATOR: str = 'p-'
SPECIFICATION_INDICATOR: str = 's-'
TOKEN_SEPARATOR: str = '-'
VERSION_INDICATOR: str = 'v'
VERSION_PADDING: int = 3
__init__(fields)[source]

Create a AssetNameParser instance with given fields.

Parameters:

fields (list[str]) – An ordered list of asset fields.

Raises:
  • ValueError – If fields is empty.

  • ValueError – If fields are duplicated.

  • ValueError – If illegal fields are given.

  • ValueError – If illegal field order given.

Returns:

instance.

Return type:

AssetNameParser

static _get_extension_parser(grammar)[source]

Creates a parser for file extensions.

Parameters:

grammar (dict) – AssetNameParser grammar dictionary.

Returns:

Parser.

Return type:

Group

static _get_grammar()[source]

Create parser grammar dictionary.

Returns:

Grammar.

Return type:

dict

static _get_parser(grammar, fields)[source]

Creates a parser for asset names.

Parameters:
  • grammar (dict) – AssetNameParser grammar dictionary.

  • fields (list[str]) – List of fields.

Returns:

Parser.

Return type:

Group

static _get_specification_parser()[source]

Returns a parser for finding a specification within an arbitrary string.

Returns:

Parser.

Return type:

Group

static _raise_field_error(field, part)[source]

A convenience function used for raising custom ParseExceptions.

Parameters:
  • field (str) – Field.

  • part (str) – Part of field.

Returns:

lambda s, l, i, e: raise_error(field, s, i)

Return type:

function

parse(text)[source]

Parse a given string.

Parameters:

text (str) – String to be parsed.

Raises:

ParseException – If parse fails.

Returns:

parser.

Return type:

dict

static parse_specification(text)[source]

Parse a string for a specification.

Parameters:

text (str) – String to be parsed.

Raises:

ParseException – If specification is not found.

Returns:

Dictionary with “specification” key.

Return type:

dict

to_string(dict_)[source]

Converts a given dictionary to a string.

Parameters:

dict (dict) – Dictionary.

Returns:

Asset name.

Return type:

str

specification_base

class hidebound.core.specification_base.ComplexSpecificationBase(data={})[source]

Bases: SpecificationBase

The base class for assets that consist of multiple directories of files.

asset_type

Complex.

Type:

str

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
asset_type: str = 'complex'
descriptor: ListType = <ListType(StringType) instance on ComplexSpecificationBase as 'descriptor'>
extension: ListType = <ListType(StringType) instance on ComplexSpecificationBase as 'extension'>
project: ListType = <ListType(StringType) instance on ComplexSpecificationBase as 'project'>
specification: ListType = <ListType(StringType) instance on ComplexSpecificationBase as 'specification'>
version: ListType = <ListType(IntType) instance on ComplexSpecificationBase as 'version'>
class hidebound.core.specification_base.FileSpecificationBase(data={})[source]

Bases: SpecificationBase

The base class for asset that consist of a single file.

asset_type

File.

Type:

str

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
asset_type: str = 'file'
descriptor: ListType = <ListType(StringType) instance on FileSpecificationBase as 'descriptor'>
extension: ListType = <ListType(StringType) instance on FileSpecificationBase as 'extension'>
get_asset_path(filepath)[source]

Returns the filepath.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Asset path.

Return type:

Path

project: ListType = <ListType(StringType) instance on FileSpecificationBase as 'project'>
specification: ListType = <ListType(StringType) instance on FileSpecificationBase as 'specification'>
to_filepaths(root)[source]

Generates a complete list of filepaths given a root directory and filepath pattern.

Parameters:
  • root (str or Path) – Directory containing asset.

  • pattern (str) – Filepath pattern.

Returns:

List of filepaths.

Return type:

list[str]

version: ListType = <ListType(IntType) instance on FileSpecificationBase as 'version'>
class hidebound.core.specification_base.SequenceSpecificationBase(data={})[source]

Bases: SpecificationBase

The base class for assets that consist of a sequence of files under a single directory.

asset_type

Sequence.

Type:

str

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
asset_type: str = 'sequence'
descriptor: ListType = <ListType(StringType) instance on SequenceSpecificationBase as 'descriptor'>
extension: ListType = <ListType(StringType) instance on SequenceSpecificationBase as 'extension'>
get_asset_path(filepath)[source]

Returns the directory containing the asset files.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Asset path.

Return type:

Path

project: ListType = <ListType(StringType) instance on SequenceSpecificationBase as 'project'>
specification: ListType = <ListType(StringType) instance on SequenceSpecificationBase as 'specification'>
to_filepaths(root)[source]

Generates a complete list of filepaths given a root directory and filepath pattern.

Parameters:
  • root (str or Path) – Directory containing asset.

  • pattern (str) – Filepath pattern.

Returns:

List of filepaths.

Return type:

list[str]

version: ListType = <ListType(IntType) instance on SequenceSpecificationBase as 'version'>
class hidebound.core.specification_base.SpecificationBase(data={})[source]

Bases: Model

The base class for all Hidebound specifications.

asset_type

Type of asset. Options include: file, sequence, complex.

Type:

str

filename_fields

List of fields found in the asset filenames.

Type:

list[str]

asset_name_fields

List of fields found in the asset name.

Type:

list[str]

project

Project name.

Type:

str

descriptor

Asset descriptor.

Type:

str

version

Asset version.

Type:

int

extension

File extension.

Type:

str

__init__(data={})[source]

Returns a new specification instance.

Parameters:

data (dict, optional) – Dictionary of asset data.

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
_to_filepaths(root, pattern)[source]

Generates a complete list of filepaths given a root directory and filepath pattern.

Parameters:
  • root (str or Path) – Directory containing asset.

  • pattern (str) – Filepath pattern.

Returns:

List of filepaths.

Return type:

list[str]

asset_name_fields: List[str] = ['project', 'specification', 'descriptor', 'version']
asset_type: str = 'specification'
descriptor: ListType = <ListType(StringType) instance on SpecificationBase as 'descriptor'>
extension: ListType = <ListType(StringType) instance on SpecificationBase as 'extension'>
file_traits: Dict[str, Any] = {}
filename_fields: List[str] = ['project', 'specification', 'descriptor', 'version', 'extension']
get_asset_id(filepath)[source]

Returns a hash UUID of the asset directory or file, depending of asset type.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Asset id.

Return type:

str

get_asset_name(filepath)[source]

Returns the expected asset name give a filepath.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Asset name.

Return type:

str

get_asset_path(filepath)[source]

Returns the expected asset path given a filepath.

Parameters:

filepath (str or Path) – filepath to asset file.

Raises:

NotImplementedError – If method not defined in subclass.

Returns:

Asset path.

Return type:

Path

get_file_traits(filepath)[source]

Returns a dictionary of file traits from given filepath. Returns error in respective key if one is encountered.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Traits.

Return type:

dict

get_filename_traits(filepath)[source]

Returns a dictionary of filename traits from given filepath. Returns error in filename_error key if one is encountered.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Traits.

Return type:

dict

get_name_patterns()[source]

Generates asset name and filename patterns from class fields.

Returns:

Asset name pattern, filename pattern.

Return type:

tuple(str)

get_traits(filepath)[source]

Returns a dictionary of file and filename traits from given filepath. Errors are captured in their respective keys.

Parameters:

filepath (str or Path) – filepath to asset file.

Returns:

Traits.

Return type:

dict

project: ListType = <ListType(StringType) instance on SpecificationBase as 'project'>
specification: ListType = <ListType(StringType) instance on SpecificationBase as 'specification'>
validate_filepath(filepath)[source]

Attempts to parse the given filepath.

Parameters:

filepath (str or Path) – filepath to asset file.

Raises:
  • ValidationError – If parse fails.

  • ValidationError – If asset directory name is invalid.

Return type:

None

version: ListType = <ListType(IntType) instance on SpecificationBase as 'version'>

specifications

class hidebound.core.specifications.Raw001(data={})[source]

Bases: SequenceSpecificationBase

Raw JPEG sequences with 1 or 3 channels.

filename_fields

project, specification, descriptor, version, frame, extension

Type:

list[str]

asset_name_fields

project, specification, descriptor, version,

Type:

list[str]

height

Image height. Must be 1024.

Type:

int

width

Image width. Must be 1024.

Type:

int

extension

File extension. Must be “png”.

Type:

str

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
asset_name_fields: List[str] = ['project', 'specification', 'descriptor', 'version']
channels: ListType = <ListType(IntType) instance on Raw001 as 'channels'>
descriptor: ListType = <ListType(StringType) instance on Raw001 as 'descriptor'>
extension: ListType = <ListType(StringType) instance on Raw001 as 'extension'>
file_traits: Dict[str, Any] = {'channels': <function get_num_image_channels>, 'height': <function get_image_height>, 'width': <function get_image_width>}
filename_fields: List[str] = ['project', 'specification', 'descriptor', 'version', 'frame', 'extension']
frame: ListType = <ListType(IntType) instance on Raw001 as 'frame'>
height: ListType = <ListType(IntType) instance on Raw001 as 'height'>
project: ListType = <ListType(StringType) instance on Raw001 as 'project'>
specification: ListType = <ListType(StringType) instance on Raw001 as 'specification'>
version: ListType = <ListType(IntType) instance on Raw001 as 'version'>
width: ListType = <ListType(IntType) instance on Raw001 as 'width'>
class hidebound.core.specifications.Raw002(data={})[source]

Bases: SequenceSpecificationBase

Raw JPEG sequences with 1 or 3 channels and coordinates.

filename_fields

project, specification, descriptor, version, frame, extension

Type:

list[str]

asset_name_fields

project, specification, descriptor, version,

Type:

list[str]

height

Image height. Must be 1024.

Type:

int

width

Image width. Must be 1024.

Type:

int

extension

File extension. Must be “png”.

Type:

str

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
asset_name_fields: List[str] = ['project', 'specification', 'descriptor', 'version']
channels: ListType = <ListType(IntType) instance on Raw002 as 'channels'>
coordinate: ListType = <ListType(ListType) instance on Raw002 as 'coordinate'>
descriptor: ListType = <ListType(StringType) instance on Raw002 as 'descriptor'>
extension: ListType = <ListType(StringType) instance on Raw002 as 'extension'>
file_traits: Dict[str, Any] = {'channels': <function get_num_image_channels>, 'height': <function get_image_height>, 'width': <function get_image_width>}
filename_fields: List[str] = ['project', 'specification', 'descriptor', 'version', 'coordinate', 'frame', 'extension']
frame: ListType = <ListType(IntType) instance on Raw002 as 'frame'>
height: ListType = <ListType(IntType) instance on Raw002 as 'height'>
project: ListType = <ListType(StringType) instance on Raw002 as 'project'>
specification: ListType = <ListType(StringType) instance on Raw002 as 'specification'>
version: ListType = <ListType(IntType) instance on Raw002 as 'version'>
width: ListType = <ListType(IntType) instance on Raw002 as 'width'>

tools

hidebound.core.tools.DFS

The tools module contains general functions useful to other hidebound modules.

alias of Union[DataFrame, Series, DataFrame, Series]

hidebound.core.tools.delete_empty_directories(directory)[source]

Recurses given directory tree and deletes directories that do not contain files or directories trees with files. .DS_Store files do not count as files. Does not delete given directory.

Parameters:

directory (str or Path) – Directory to recurse.

Raises:

EnforceError – If argument is not a directory or does not exist.

Return type:

None

hidebound.core.tools.directory_to_dataframe(directory, include_regex='', exclude_regex='\\\\.DS_Store')[source]

Recursively list files with in a given directory as rows in a pd.DataFrame.

Parameters:
  • directory (str or Path) – Directory to walk.

  • include_regex (str, optional) – Include filenames that match this regex. Default: None.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.

Returns:

pd.DataFrame with one file per row.

Return type:

pd.DataFrame

hidebound.core.tools.error_to_string(error)[source]

Formats error as string.

Parameters:

error (Exception) – Error.

Returns:

Error message.

Return type:

str

hidebound.core.tools.get_lut(data, column, aggregator, meta='__no_default__')[source]

Constructs a lookup table with the given column as its keys and the aggregator results as its values. Data is grouped by given column and the given aggregator is applied to each group of values.

Parameters:
  • data (DataFrame) – DataFrame.

  • column (str) – Column to be used as the key.

  • aggregator (function) – Function that expects a group DataFrame and returns a scalar.

  • meta (object, optional) – Metadata inference. Default: ‘__no_default__’.

Returns:

DataFrame with key and value columns.

Return type:

DataFrame

hidebound.core.tools.get_meta_kwargs(data, meta)[source]

Convenience utility for coercing the meta keyword between pandas and dask.

Parameters:
  • data (DataFrame or Series) – Pandas or dask object.

  • meta (object) – Meta key word argument.

Returns:

Appropriate keyword args.

Return type:

dict

hidebound.core.tools.lut_combinator(data, key_column, value_column, aggregator, meta='__no_default__')[source]

Constructs a lookup table from given key_column, then applies it to given data as value column.

Parameters:
  • data (DataFrame) – DataFrame.

  • key_column (str) – Column to be used as the lut keys.

  • value_column (str) – Column to be used as the values.

  • aggregator (function) – Function that expects a pd.DataFrame.

  • meta (object, optional) – Metadata irom_nference. Default: ‘__no_default__’.

Returns:

DataFrame with value column.

Return type:

DataFrame

hidebound.core.tools.pred_combinator(data, predicate, true_func, false_func, meta='object')[source]

Apply true_func to rows where predicate if true and false_func to rows where it is false.

Parameters:
  • data (DataFrame) – DataFrame or Series.

  • predicate (function) – Function that expects a row and returns a bool.

  • true_func (function) – Function that expects a row. Called when predicate is true.

  • false_func (function) – Function that expects a row. Called when predicate is false.

  • meta (object, optional) – Metadata inference. Default: ‘object’.

Returns:

Apply results.

Return type:

DataFrame or Series

hidebound.core.tools.read_json(filepath)[source]

Convenience function for reading JSON files. Files may include comments.

Parameters:

filepath (Path or str) – Filepath.

Raises:

JSONDecodeError – If no JSON data could be decoded.

Returns:

JSON object.

Return type:

object

hidebound.core.tools.str_to_bool(string)[source]

Converts a string to a boolean value.

Parameters:

string (str) – String to be converted.

Returns:

Boolean

Return type:

bool

hidebound.core.tools.time_string()[source]
Returns:

String representing current time.

Return type:

str

hidebound.core.tools.to_prototype(dicts)[source]

Converts a list of dicts into a dict of lists. .. example:

:nowrap:

>>> dicts = [dict(a=1, b=2, c=3), dict(a=10, b=20)]
>>> to_prototype(dicts)
{'a': [1, 10], 'b': [2, 20], 'c': [3]}
Parameters:

dicts (list[dict]) – List of dicts.

Returns:

Prototype dictionary.

Return type:

dict

hidebound.core.tools.traverse_directory(directory, include_regex='', exclude_regex='', entry_type='file')[source]

Recusively list all files or directories within a given directory.

Parameters:
  • directory (str or Path) – Directory to walk.

  • include_regex (str, optional) – Include filenames that match this regex. Default: ‘’.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘’.

  • entry_type (str, optional) – Kind of directory entry to return. Options include: file, directory. Default: file.

Raises:
  • FileNotFoundError – If argument is not a directory or does not exist.

  • EnforceError – If entry_type is not file or directory.

Yields:

Path – File.

Return type:

Generator[Path, None, None]

hidebound.core.tools.write_json(data, filepath)[source]

Convenience function for writing objects to JSON files. Writes lists with 1 item per line.

Parameters:
  • data (object) – Object to be written.

  • filepath (Path or str) – Filepath.

Return type:

None

traits

hidebound.core.traits.get_image_height(filepath)[source]

Gets the height of the given image.

Parameters:

filepath (str or Path) – filepath to image file.

Returns:

Image height.

Return type:

int

hidebound.core.traits.get_image_width(filepath)[source]

Gets the width of the given image.

Parameters:

filepath (str or Path) – filepath to image file.

Returns:

Image width.

Return type:

int

hidebound.core.traits.get_num_image_channels(filepath)[source]

Gets the number of channels of the given image.

Parameters:

filepath (str or Path) – filepath to image file.

Returns:

Number of channels.

Return type:

int

validators

hidebound.core.validators.coordinates_begin_at(items, origin)[source]

Validates that the minimum coordinate of a given list equals a given origin.

Parameters:
  • items (list[list[int]]) – List of coordinates.

  • origin (list[int]) – Origin coordinate.

Raises:

ValidationError – If coordinates do not begin at origin.

Returns:

State of items.

Return type:

bool

hidebound.core.validators.has_dense_coordinates(items)[source]

Validates that list of coordinates is dense (every point is filled).

Parameters:

items (list[list[int]]) – List of coordinates.

Raises:

ValidationError – If coordinates are not dense.

Returns:

Density of coordinates.

Return type:

bool

hidebound.core.validators.has_uniform_coordinate_count(items)[source]

Validates that non-unique list of coordinates has a uniform count per coordinate.

Parameters:

items (list[list[int]]) – List of coordinates.

Raises:

ValidationError – If coordinate count is non-uniform.

Returns:

Uniformity of coordinates.

Return type:

bool

hidebound.core.validators.is_attribute_of(name, object)[source]

Validates that each name is an attribute of given object.

Parameters:
  • a (str) – Attribute name.

  • b (object) – Object.

Raises:

ValidationError – If an name is not an attribute of given object.

Returns:

Alls names are attributes of object.

Return type:

bool

hidebound.core.validators.is_aws_region(item)[source]

Validates an AWS region name.

Parameters:

item (str) – AWS region name.

Raises:

ValidationError – If region name is invalid.

Returns:

Validity of region name.

Return type:

bool

hidebound.core.validators.is_bucket_name(item)[source]

Validates a bucket name.

Parameters:

item (str) – bucket name.

Raises:

ValidationError – If bucket name is invalid.

Returns:

Validity of bucket name.

Return type:

bool

hidebound.core.validators.is_cluster_option_type(item)[source]

Validates that a given cluster option type is legal. Legal types include:

  • bool

  • float

  • int

  • mapping

  • select

  • string

Parameters:

item (str) – Cluster option type.

Raises:

ValidationError – If cluster option type is illegal.

Returns:

Validity of cluster option type.

Return type:

bool

hidebound.core.validators.is_coordinate(item)[source]

Validates a coordinate.

Parameters:

item (list[int]) – Coordinate.

Raises:

ValidationError – If coordinate is invalid.

Returns:

Validity of coordinate.

Return type:

bool

hidebound.core.validators.is_descriptor(item)[source]

Validates a descriptor.

Parameters:

item (str) – Descriptor.

Raises:

ValidationError – If descriptor is invalid.

Returns:

Validity of descriptor.

Return type:

bool

hidebound.core.validators.is_directory(item)[source]

Validates thats item is a directory.

Parameters:

item (str) – Directory path.

Raises:

ValidationError – If item is not a directory or does not exist.

Returns:

State of item.

Return type:

bool

hidebound.core.validators.is_eq(a, b)[source]

Validates that a and b are equal.

Parameters:
  • a (object) – Object.

  • b (object) – Object.

Raises:

ValidationError – If a does not equal b.

Returns:

Equality of a and b.

Return type:

bool

hidebound.core.validators.is_extension(item)[source]

Validates a file extension.

Parameters:

item (str) – File extension.

Raises:

ValidationError – If extension is invalid.

Returns:

Validity of extension.

Return type:

bool

hidebound.core.validators.is_file(item)[source]

Validates thats item is a file.

Parameters:

item (str) – Filepath.

Raises:

ValidationError – If item is not a file or does not exist.

Returns:

State of item.

Return type:

bool

hidebound.core.validators.is_frame(item)[source]

Validates a frame.

Parameters:

item (int) – Frame.

Raises:

ValidationError – If frame is invalid.

Returns:

Validity of frame.

Return type:

bool

hidebound.core.validators.is_gt(a, b)[source]

Validates that a is greater than b.

Parameters:
  • a (object) – Object.

  • b (object) – Object.

Raises:

ValidationError – If a is not greater than b.

Returns:

A is greater than b.

Return type:

bool

hidebound.core.validators.is_gte(a, b)[source]

Validates that a is greater than or equal to b.

Parameters:
  • a (object) – Object.

  • b (object) – Object.

Raises:

ValidationError – If a is not greater than or equal to b.

Returns:

A is greater than or equal to b.

Return type:

bool

hidebound.core.validators.is_hidebound_directory(directory)[source]

Ensures directory name is “hidebound”.

Parameters:

directory (str or Path) – Hidebound directory.

Raises:

ValidationError – If directory is not named “hidebound”.

Return type:

None

hidebound.core.validators.is_homogenous(items)[source]

Validates thats all items are equal.

Parameters:

items (list) – List of items.

Raises:

ValidationError – If items are not all the same.

Returns:

Homogeneity of items.

Return type:

bool

hidebound.core.validators.is_http_method(method)[source]

Ensures given method is a legal HTTP method. Legal methods include:

  • get

  • put

  • post

  • delete

  • patch

Parameters:

method (str) – HTTP method.

Raises:

ValidationError – If method is not a legal HTTP method.

Return type:

None

hidebound.core.validators.is_in(a, b)[source]

Validates that each a is in b.

Parameters:
  • a (object) – Object.

  • b (object) – Object.

Raises:

ValidationError – If a is not in b.

Returns:

Alls a’s in b.

Return type:

bool

Validates that directory path is legal. Legal directory paths must:

  • Begin with /

  • Not end with /

  • Contain only the characters: /, a-z, A-Z, 0-9, _, -

Parameters:

item (str) – Directory path.

Raises:

ValidationError – If directory path is invalid.

Returns:

Validity of directory path.

Return type:

bool

hidebound.core.validators.is_lt(a, b)[source]

Validates that a is less than b.

Parameters:
  • a (object) – Object.

  • b (object) – Object.

Raises:

ValidationError – If a is not less than b.

Returns:

A is less than b.

Return type:

bool

hidebound.core.validators.is_lte(a, b)[source]

Validates that a is less than or equal to b.

Parameters:
  • a (object) – Object.

  • b (object) – Object.

Raises:

ValidationError – If a is not less than or equal to b.

Returns:

A is less than or equal to b.

Return type:

bool

hidebound.core.validators.is_metadata_type(item)[source]

Validates that a given metadata type is legal. Legal types include:

  • asset

  • file

  • asset-chunk

  • file-chunk

Parameters:

item (str) – Metadata type.

Raises:

ValidationError – If metadata type is illegal.

Returns:

Validity of metadata type.

Return type:

bool

hidebound.core.validators.is_not_missing_values(items)[source]

Validates that sequence of integers is not missing any values.

Parameters:

items (list[int]) – Integers.

Raises:

ValidationError – If items is missing values.

Returns:

State of item.

Return type:

bool

hidebound.core.validators.is_one_of(item, models)[source]

Validates whether given item matches at least one given model.

Parameters:
  • item (dict) – Item to be validated.

  • models (list[Model]) – List schematics Models.

Raises:

ValidationError – If no valid model could be found for given item.

Return type:

None

hidebound.core.validators.is_project(item)[source]

Validates a project name.

Parameters:

item (str) – Project name.

Raises:

ValidationError – If project name is invalid.

Returns:

Validity of project name.

Return type:

bool

hidebound.core.validators.is_version(item)[source]

Validates a version.

Parameters:

item (int) – Version.

Raises:

ValidationError – If version is invalid.

Returns:

Validity of version.

Return type:

bool

hidebound.core.validators.is_workflow(steps)[source]

Ensures given workflow steps are legal. Legal workflows steps include:

  • delete

  • update

  • create

  • export

Parameters:

steps (list[str]) – List of workflow steps:

Raises:

ValidationError – If method is not a legal workflow.

Return type:

None

hidebound.core.validators.validate(message)[source]

A decorator for predicate functions that raises a ValidationError if it returns False.

Parameters:

message (str) – Error message if predicate returns False.

Raises:

ValidationError – If predicate returns False.

Returns:

Function that returns a boolean.

Return type:

function

hidebound.core.validators.validate_each(message, list_first_arg=False)[source]

A decorator for predicate functions that raises a ValidationError if it returns False when applied to each argument individually.

Parameters:
  • message (str) – Error message if predicate returns False.

  • list_first_arg (str, optional) – Set to True if first argument is a list. Default: False.

Raises:

ValidationError – If predicate returns False.

Returns:

Function that returns a boolean.

Return type:

function