blob_etl

class rolling_pin.blob_etl.BlobETL(blob, separator='/')[source]

Bases: object

Converts blob data internally into a flat dictionary that is universally searchable, editable and convertable back to the data’s original structure, new blob structures or directed graphs.

__dict__ = mappingproxy({'__module__': 'rolling_pin.blob_etl', '__doc__': "\n    Converts blob data internally into a flat dictionary that is universally\n    searchable, editable and convertable back to the data's original structure,\n    new blob structures or directed graphs.\n    ", '__init__': <function BlobETL.__init__>, 'query': <function BlobETL.query>, 'filter': <function BlobETL.filter>, 'delete': <function BlobETL.delete>, 'set': <function BlobETL.set>, 'update': <function BlobETL.update>, 'set_field': <function BlobETL.set_field>, 'to_dict': <function BlobETL.to_dict>, 'to_flat_dict': <function BlobETL.to_flat_dict>, 'to_records': <function BlobETL.to_records>, 'to_dataframe': <function BlobETL.to_dataframe>, 'to_prototype': <function BlobETL.to_prototype>, 'to_networkx_graph': <function BlobETL.to_networkx_graph>, 'to_dot_graph': <function BlobETL.to_dot_graph>, 'to_html': <function BlobETL.to_html>, 'write': <function BlobETL.write>, '__dict__': <attribute '__dict__' of 'BlobETL' objects>, '__weakref__': <attribute '__weakref__' of 'BlobETL' objects>, '__annotations__': {'_data': 'Dict[str, Any]', '_separator': 'str'}})
__init__(blob, separator='/')[source]

Contructs BlobETL instance.

Parameters:
  • blob (object) – Iterable object.

  • separator (str, optional) – String to be used as a field separator in each key. Default: ‘/’.

__module__ = 'rolling_pin.blob_etl'
__weakref__

list of weak references to the object (if defined)

delete(predicate, by='key')[source]

Delete data items by key, value or key + value, according to a given predicate.

Parameters:
  • predicate (Callable[[Any], bool]) – Function that returns a boolean value.

  • by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.

Raises:

ValueError – If by keyword is not key, value, or key+value.

Returns:

New BlobETL instance.

Return type:

BlobETL

filter(predicate, by='key', invert=False)[source]

Filter data items by key, value or key + value, according to a given predicate.

Parameters:
  • predicate (Callable[[Any], bool]) – Function that returns a boolean value.

  • by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.

  • invert (bool, optional) – Whether to invert the predicate. Default: False.

Raises:

ValueError – If by keyword is not key, value, or key+value.

Returns:

New BlobETL instance.

Return type:

BlobETL

query(regex, ignore_case=True, invert=False)[source]

Filter data items by key according to given regular expression.

Parameters:
  • regex (str) – Regular expression.

  • ignore_case (bool, optional) – Whether to consider case in the regular expression search. Default: False.

  • invert (bool, optional) – Whether to invert the predicate. Default: False.

Returns:

New BlobETL instance.

Return type:

BlobETL

set(predicate=None, key_setter=None, value_setter=None)[source]

Filter data items by key, value or key + value, according to a given predicate. Then set that items key by a given function and value by a given function.

Parameters:
  • predicate (function, optional) – Function of the form: lambda k, v: bool. Default: None –> lambda k, v: True.

  • key_setter (function, optional) – Function of the form: lambda k, v: str. Default: None –> lambda k, v: k.

  • value_setter (function, optional) – Function of the form: lambda k, v: object. Default: None –> lambda k, v: v.

Returns:

New BlobETL instance.

Return type:

BlobETL

set_field(index, field_setter)[source]

Set’s a field at a given index according to a given function.

Parameters:
  • index (int) – Field index.

  • field_setter (function) – Function of form lambda str: str.

Returns:

New BlobETL instance.

Return type:

BlobETL

to_dataframe(group_by=None)[source]

Convert data to pandas DataFrame.

Parameters:

group_by (int, optional) – Field index to group rows of data by. Default: None.

Returns:

DataFrame.

Return type:

DataFrame

to_dict()[source]
Returns:

Nested representation of internal data.

Return type:

dict

to_dot_graph(orthogonal_edges=False, orient='tb', color_scheme=None)[source]

Converts internal dictionary into pydot graph. Key and value nodes and edges are colored differently.

Parameters:
  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme (Optional[Dict[str, str]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises:

ValueError – If orient is invalid.

Returns:

Dot graph representation of dictionary.

Return type:

pydot.Dot

to_flat_dict()[source]
Returns:

Flat dictionary with embedded types.

Return type:

dict

to_html(layout='dot', orthogonal_edges=False, orient='tb', color_scheme=None, as_png=False)[source]

For use in inline rendering of graph data in Jupyter Lab.

Parameters:
  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme (Optional[Dict[str, str]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns:

HTML object for inline display.

Return type:

IPython.display.HTML

to_networkx_graph()[source]

Converts internal dictionary into a networkx directed graph.

Returns:

Graph representation of dictionary.

Return type:

networkx.DiGraph

to_prototype()[source]

Convert data to prototypical representation.

Example:

>>> data = {
'users': [
        {
            'name': {
                'first': 'tom',
                'last': 'smith',
            }
        },{
            'name': {
                'first': 'dick',
                'last': 'smith',
            }
        },{
            'name': {
                'first': 'jane',
                'last': 'doe',
            }
        },
    ]
}
>>> BlobETL(data).to_prototype().to_dict()
{
    '^users': {
        '<list_[0-9]+>': {
            'name': {
                'first$': Counter({'dick': 1, 'jane': 1, 'tom': 1}),
                'last$': Counter({'doe': 1, 'smith': 2})
            }
        }
    }
}
returns:

New BlobETL instance.

rtype:

BlobETL

to_records()[source]
Returns:

Data in records format.

Return type:

list[dict]

update(item)[source]

Updates internal dictionary with given dictionary or BlobETL instance. Given dictionary is first flattened with embeded types.

Parameters:

item (dict or BlobETL) – Dictionary to be used for update.

Returns:

New BlobETL instance.

Return type:

BlobETL

write(fullpath, layout='dot', orthogonal_edges=False, orient='tb', color_scheme=None)[source]

Writes internal dictionary to a given filepath. Formats supported: svg, dot, png, json.

Parameters:
  • fulllpath (str or Path) – File tobe written to.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme (Optional[Dict[str, str]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises:

ValueError – If invalid file extension given.

Returns:

self.

Return type:

BlobETL

conform_config

class rolling_pin.conform_config.ConformConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

A class for validating configurations supplied to ConformETL.

source_rules

A list of rules for parsing directories. Default: [].

Type:

Rules

rename_rules

A list of rules for renaming source filepath to target filepaths. Default: [].

Type:

Rules

group_rules

A list of rules for grouping files. Default: [].

Type:

Rules

line_rules

A list of rules for peforming line copies and substitutions on files belonging to a given group. Default: [].

Type:

Rules

class GroupRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
name: StringType = <StringType() instance on GroupRule as 'name'>
regex: StringType = <StringType() instance on GroupRule as 'regex'>
class LineRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
exclude: StringType = <StringType() instance on LineRule as 'exclude'>
group: StringType = <StringType() instance on LineRule as 'group'>
include: StringType = <StringType() instance on LineRule as 'include'>
regex: StringType = <StringType() instance on LineRule as 'regex'>
replace: StringType = <StringType() instance on LineRule as 'replace'>
class RenameRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
regex: StringType = <StringType() instance on RenameRule as 'regex'>
replace: StringType = <StringType() instance on RenameRule as 'replace'>
class SourceRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
exclude: StringType = <StringType() instance on SourceRule as 'exclude'>
include: StringType = <StringType() instance on SourceRule as 'include'>
path: StringType = <StringType() instance on SourceRule as 'path'>
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
group_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'group_rules'>
line_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'line_rules'>
rename_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'rename_rules'>
source_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'source_rules'>
rolling_pin.conform_config.is_dir(dirpath)[source]

Validates whether a given dirpath exists.

Parameters:

dirpath (str) – Directory path.

Raises:

ValidationError – If dirpath is not a directory or does not exist.

Return type:

None

conform_etl

class rolling_pin.conform_etl.ConformETL(source_rules=[], rename_rules=[], group_rules=[], line_rules=[])[source]

Bases: object

ConformETL creates a DataFrame from a given directory of source files. Then it generates target paths given a set of rules. Finally, the conform method is called and the source files are copied to their target filepaths.

__dict__ = mappingproxy({'__module__': 'rolling_pin.conform_etl', '__doc__': '\n    ConformETL creates a DataFrame from a given directory of source files.\n    Then it generates target paths given a set of rules.\n    Finally, the conform method is called and the source files are copied to\n    their target filepaths.\n    ', '_get_data': <staticmethod(<function ConformETL._get_data>)>, 'from_yaml': <classmethod(<function ConformETL.from_yaml>)>, '__init__': <function ConformETL.__init__>, '__repr__': <function ConformETL.__repr__>, 'groups': <property object>, 'to_dataframe': <function ConformETL.to_dataframe>, 'to_blob': <function ConformETL.to_blob>, 'to_html': <function ConformETL.to_html>, 'conform': <function ConformETL.conform>, '__dict__': <attribute '__dict__' of 'ConformETL' objects>, '__weakref__': <attribute '__weakref__' of 'ConformETL' objects>, '__annotations__': {'_data': 'DataFrame', '_line_rules': 'Rules'}})
__init__(source_rules=[], rename_rules=[], group_rules=[], line_rules=[])[source]

Generates DataFrame from given source_rules and then generates target paths for them given other rules.

Parameters:
  • source_rules (Rules) – A list of rules for parsing directories. Default: [].

  • rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].

  • group_rules (Rules) – A list of rules for grouping files. Default: [].

  • line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].

Raises:

DataError – If configuration is invalid.

__module__ = 'rolling_pin.conform_etl'
__repr__()[source]

String representation of conform DataFrame.

Returns:

Table optimized for output to shell.

Return type:

str

__weakref__

list of weak references to the object (if defined)

static _get_data(source_rules=[], rename_rules=[], group_rules=[], line_rules=[])[source]

Generates DataFrame from given source_rules and then generates target paths for them given other rules.

Parameters:
  • source_rules (Rules) – A list of rules for parsing directories. Default: [].

  • rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].

  • group_rules (Rules) – A list of rules for grouping files. Default: [].

  • line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].

Returns:

Conform DataFrame.

Return type:

DataFrame

conform(groups='all')[source]

Copies source files to target filepaths.

Parameters:

groups (str or list[str]) – Groups of files which are to be conformed. ‘all’ means all groups. Default: ‘all’.

Return type:

None

classmethod from_yaml(filepath)[source]

Construct ConformETL instance from given yaml file.

Parameters:

filepath (str or Path) – YAML file.

Raises:

EnforceError – If file does not end in yml or yaml.

Returns:

ConformETL instance.

Return type:

ConformETL

property groups

List of groups found with self._data.

Type:

list[str]

to_blob()[source]

Converts self into a BlobETL object with target column as keys and source columns as values.

Returns:

BlobETL of target and source filepaths.

Return type:

BlobETL

to_dataframe()[source]
Returns:

Copy of internal data.

Return type:

DataFrame

to_html(orient='lr', color_scheme={'background': '#242424', 'edge': '#DE958E', 'edge_library': '#B6ECF3', 'edge_module': '#DE958E', 'edge_subpackage': '#A0D17B', 'edge_value': '#B6ECF3', 'node': '#343434', 'node_font': '#DE958E', 'node_library_font': '#B6ECF3', 'node_module_font': '#DE958E', 'node_subpackage_font': '#A0D17B', 'node_value': '#343434', 'node_value_font': '#B6ECF3'}, as_png=False)[source]

For use in inline rendering of graph data in Jupyter Lab. Graph from target to source filepath. Target is in red, source is in cyan.

Parameters:
  • orient (str, optional) –

    Graph layout orientation. Default: lr. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme (Dict[str, str]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.conform_etl.CONFORM_COLOR_SCHEME

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns:

HTML object for inline display.

Return type:

IPython.display.HTML

radon_etl

class rolling_pin.radon_etl.RadonETL(fullpath)[source]

Bases: object

Conforms all four radon reports (raw metrics, Halstead, maintainability and cyclomatic complexity) into a single DataFrame that can then be plotted.

__dict__ = mappingproxy({'__module__': 'rolling_pin.radon_etl', '__doc__': '\n    Conforms all four radon reports (raw metrics, Halstead, maintainability and\n    cyclomatic complexity) into a single DataFrame that can then be plotted.\n    ', '__init__': <function RadonETL.__init__>, 'report': <property object>, 'data': <property object>, 'raw_metrics': <property object>, 'maintainability_index': <property object>, 'cyclomatic_complexity_metrics': <property object>, 'halstead_metrics': <property object>, '_get_radon_data': <function RadonETL._get_radon_data>, '_get_radon_report': <staticmethod(<function RadonETL._get_radon_report>)>, '_get_raw_metrics_dataframe': <staticmethod(<function RadonETL._get_raw_metrics_dataframe>)>, '_get_maintainability_index_dataframe': <staticmethod(<function RadonETL._get_maintainability_index_dataframe>)>, '_get_cyclomatic_complexity_dataframe': <staticmethod(<function RadonETL._get_cyclomatic_complexity_dataframe>)>, '_get_halstead_dataframe': <staticmethod(<function RadonETL._get_halstead_dataframe>)>, 'write_plots': <function RadonETL.write_plots>, 'write_tables': <function RadonETL.write_tables>, '__dict__': <attribute '__dict__' of 'RadonETL' objects>, '__weakref__': <attribute '__weakref__' of 'RadonETL' objects>, '__annotations__': {}})
__init__(fullpath)[source]

Constructs a RadonETL instance.

Parameters:

fullpath (str or Path) – Python file or directory of python files.

__module__ = 'rolling_pin.radon_etl'
__weakref__

list of weak references to the object (if defined)

static _get_cyclomatic_complexity_dataframe(report)[source]

Converts radon cyclomatic complexity report into a pandas DataFrame.

Parameters:

report (dict) – Radon report blob.

Returns:

Cyclomatic complexity DataFrame.

Return type:

DataFrame

static _get_halstead_dataframe(report)[source]

Converts radon Halstead report into a pandas DataFrame.

Parameters:

report (dict) – Radon report blob.

Returns:

Halstead DataFrame.

Return type:

DataFrame

static _get_maintainability_index_dataframe(report)[source]

Converts radon maintainability index report into a pandas DataFrame.

Parameters:

report (dict) – Radon report blob.

Returns:

Maintainability DataFrame.

Return type:

DataFrame

_get_radon_data()[source]

Constructs a DataFrame representing all the radon reports generated for a given python file or directory containing python files.

Returns:

Radon report DataFrame.

Return type:

DataFrame

static _get_radon_report(fullpath)[source]

Gets all 4 report from radon and aggregates them into a single blob object.

Parameters:

fullpath (str or Path) – Python file or directory of python files.

Returns:

Radon report blob.

Return type:

dict

static _get_raw_metrics_dataframe(report)[source]

Converts radon raw metrics report into a pandas DataFrame.

Parameters:

report (dict) – Radon report blob.

Returns:

Raw metrics DataFrame.

Return type:

DataFrame

property cyclomatic_complexity_metrics

DataFrame of radon cyclomatic complexity metrics.

Type:

DataFrame

property data

DataFrame of all radon metrics.

Type:

DataFrame

property halstead_metrics

DataFrame of radon Halstead metrics.

Type:

DataFrame

property maintainability_index

DataFrame of radon maintainability index metrics.

Type:

DataFrame

property raw_metrics

DataFrame of radon raw metrics.

Type:

DataFrame

property report

Dictionary of all radon metrics.

Type:

dict

write_plots(fullpath)[source]

Writes metrics plots to given file.

Parameters:

fullpath (Path or str) – Target file.

Returns:

self.

Return type:

RadonETL

write_tables(target_dir)[source]

Writes metrics tables as HTML files to given directory.

Parameters:

target_dir (Path or str) – Target directory.

Returns:

self.

Return type:

RadonETL

repo_etl

class rolling_pin.repo_etl.RepoETL(root, include_regex='.*\\\\.py$', exclude_regex='(__init__|test_|_test|mock_)\\\\.py$')[source]

Bases: object

RepoETL is a class for extracting 1st order dependencies of modules within a given repository. This information is stored internally as a DataFrame and can be rendered as networkx, pydot or SVG graphs.

__dict__ = mappingproxy({'__module__': 'rolling_pin.repo_etl', '__doc__': '\n    RepoETL is a class for extracting 1st order dependencies of modules within a\n    given repository. This information is stored internally as a DataFrame and\n    can be rendered as networkx, pydot or SVG graphs.\n    ', '__init__': <function RepoETL.__init__>, '_get_imports': <staticmethod(<function RepoETL._get_imports>)>, '_get_data': <staticmethod(<function RepoETL._get_data>)>, '_calculate_coordinates': <staticmethod(<function RepoETL._calculate_coordinates>)>, '_anneal_coordinate': <staticmethod(<function RepoETL._anneal_coordinate>)>, '_center_coordinate': <staticmethod(<function RepoETL._center_coordinate>)>, '_to_networkx_graph': <staticmethod(<function RepoETL._to_networkx_graph>)>, 'to_networkx_graph': <function RepoETL.to_networkx_graph>, 'to_dot_graph': <function RepoETL.to_dot_graph>, 'to_dataframe': <function RepoETL.to_dataframe>, 'to_html': <function RepoETL.to_html>, 'write': <function RepoETL.write>, '__dict__': <attribute '__dict__' of 'RepoETL' objects>, '__weakref__': <attribute '__weakref__' of 'RepoETL' objects>, '__annotations__': {'_root': 'Union[str, Path]', '_data': 'DataFrame'}})
__init__(root, include_regex='.*\\\\.py$', exclude_regex='(__init__|test_|_test|mock_)\\\\.py$')[source]

Construct RepoETL instance.

Parameters:
  • root (str or Path) – Full path to repository root directory.

  • include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.

  • exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|test_|_test|mock_).py$’.

Raises:

ValueError – If include or exclude regex does not end in ‘.py$’.

__module__ = 'rolling_pin.repo_etl'
__weakref__

list of weak references to the object (if defined)

static _anneal_coordinate(data, anneal_axis='x', pin_axis='y', iterations=10)[source]

Iteratively align nodes in the anneal axis according to the mean position of their connected nodes. Node anneal coordinates are rectified at the end of each iteration according to a pin axis, so that they do not overlap. This mean that they are sorted at each level of the pin axis.

Parameters:
  • data (DataFrame) – DataFrame with x column.

  • anneal_axis (str, optional) – Coordinate column to be annealed. Default: ‘x’.

  • pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.

  • iterations (int, optional) – Number of times to update x coordinates. Default: 10.

Returns:

DataFrame with annealed anneal axis coordinates.

Return type:

DataFrame

static _calculate_coordinates(data)[source]

Calculate inital x, y coordinates for each node in given DataFrame. Node are startified by type along the y axis.

Parameters:

DataFrame – DataFrame of nodes.

Returns:

DataFrame with x and y coordinate columns.

Return type:

DataFrame

static _center_coordinate(data, center_axis='x', pin_axis='y')[source]

Sorted center_axis coordinates at each level of the pin axis.

Parameters:
  • data (DataFrame) – DataFrame with x column.

  • anneal_column (str, optional) – Coordinate column to be annealed. Default: ‘x’.

  • pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.

  • iterations (int, optional) – Number of times to update x coordinates. Default: 10.

Returns:

DataFrame with centered center axis coordinates.

Return type:

DataFrame

static _get_data(root, include_regex='.*\\\\.py$', exclude_regex='(__init__|_test)\\\\.py$')[source]

Recursively aggregates and filters all the files found with a given directory into a DataFrame. Data is used to create directed graphs.

DataFrame has these columns:

  • node_name - name of node

  • node_type - type of node, can be [module, subpackage, library]

  • x - node’s x coordinate

  • y - node’s y coordinate

  • dependencies - parent nodes

  • subpackages - parent nodes of type subpackage

  • fullpath - fullpath to the module a node represents

Parameters:
  • root (str or Path) – Root directory to be searched.

  • include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.

  • exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|_test).py$’.

Raises:
  • ValueError – If include or exclude regex does not end in ‘.py$’.

  • FileNotFoundError – If no files are found after filtering.

Returns:

DataFrame of file information.

Return type:

DataFrame

static _get_imports(fullpath)[source]

Get’s import statements from a given python module.

Parameters:

fullpath (str or Path) – Path to python module.

Returns:

List of imported modules.

Return type:

list(str)

static _to_networkx_graph(data)[source]

Converts given DataFrame into networkx directed graph.

Parameters:

DataFrame – DataFrame of nodes.

Returns:

Graph of nodes.

Return type:

networkx.DiGraph

to_dataframe()[source]
Return type:

DataFrame

Retruns:

DataFrame: DataFrame of nodes representing repo modules.

to_dot_graph(orient='tb', orthogonal_edges=False, color_scheme=None)[source]

Converts internal data into pydot graph.

Parameters:
  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises:

ValueError – If orient is invalid.

Returns:

Dot graph of nodes.

Return type:

pydot.Dot

to_html(layout='dot', orthogonal_edges=False, color_scheme=None, as_png=False)[source]

For use in inline rendering of graph data in Jupyter Lab.

Parameters:
  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • color_scheme (Optional[Dict[str, str]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns:

HTML object for inline display.

Return type:

IPython.display.HTML

to_networkx_graph()[source]

Converts internal data into networkx directed graph.

Returns:

Graph of nodes.

Return type:

networkx.DiGraph

write(fullpath, layout='dot', orient='tb', orthogonal_edges=False, color_scheme=None)[source]

Writes internal data to a given filepath. Formats supported: svg, dot, png, json.

Parameters:
  • fulllpath (str or Path) – File to be written to.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • color_scheme (Optional[Dict[str, str]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises:

ValueError – If invalid file extension given.

Returns:

Self.

Return type:

RepoETL

toml_etl

class rolling_pin.toml_etl.TomlETL(data)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'rolling_pin.toml_etl', 'from_string': <classmethod(<function TomlETL.from_string>)>, 'from_toml': <classmethod(<function TomlETL.from_toml>)>, '__init__': <function TomlETL.__init__>, 'to_dict': <function TomlETL.to_dict>, 'to_string': <function TomlETL.to_string>, 'write': <function TomlETL.write>, 'edit': <function TomlETL.edit>, 'delete': <function TomlETL.delete>, 'search': <function TomlETL.search>, '__dict__': <attribute '__dict__' of 'TomlETL' objects>, '__weakref__': <attribute '__weakref__' of 'TomlETL' objects>, '__doc__': None, '__annotations__': {}})
__init__(data)[source]

Creates a TomlETL instance from a given dictionary.

Parameters:

data (dict) – Dictionary.

__module__ = 'rolling_pin.toml_etl'
__weakref__

list of weak references to the object (if defined)

delete(regex)[source]

Returns portion of data whose keys fo not match a given regular expression.

Parameters:

regex (str) – Regular expression applied to keys.

Returns:

New TomlETL instance.

Return type:

TomlETL

edit(patch)[source]

Apply edit to internal data given TOML patch. Patch is always of the form ‘[key]=[value]’ and in TOML format.

Parameters:

patch (str) – TOML patch to be applied.

Raises:
  • TOMLDecoderError – If patch cannot be decoded.

  • EnforceError – If ‘=’ not found in patch.

Returns:

New TomlETL instance with edits.

Return type:

TomlETL

classmethod from_string(text)[source]

Creates a TomlETL instance from a given TOML string.

Parameters:

text (str) – TOML string.

Returns:

TomlETL instance.

Return type:

TomlETL

classmethod from_toml(filepath)[source]

Creates a TomlETL instance from a given TOML file.

Parameters:

filepath (str or Path) – TOML file.

Returns:

TomlETL instance.

Return type:

TomlETL

search(regex)[source]

Returns portion of data whose keys match a given regular expression.

Parameters:

regex (str) – Regular expression applied to keys.

Returns:

New TomlETL instance.

Return type:

TomlETL

to_dict()[source]

Converts instance to dictionary copy.

Returns:

Dictionary copy of instance.

Return type:

dict

to_string()[source]

Converts instance to a TOML formatted string.

Returns:

TOML string.

Return type:

str

write(filepath)[source]

Writes instance to given TOML file.

Parameters:

filepath (str or Path) – Target filepath.

Return type:

None

tools

rolling_pin.tools.LOGGER = <Logger rolling_pin.tools (WARNING)>

Contains basic functions for more complex ETL functions and classes.

rolling_pin.tools.copy_file(source, target)[source]

Copy a source file to a target file. Creating directories as needed.

Parameters:
  • source (str or Path) – Source filepath.

  • target (str or Path) – Target filepath.

Raises:

AssertionError – If source is not a file.

Return type:

None

rolling_pin.tools.directory_to_dataframe(directory, include_regex='', exclude_regex='\\\\.DS_Store')[source]

Recursively list files with in a given directory as rows in a pd.DataFrame.

Parameters:
  • directory (str or Path) – Directory to walk.

  • include_regex (str, optional) – Include filenames that match this regex. Default: None.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.

Returns:

pd.DataFrame with one file per row.

Return type:

pd.DataFrame

rolling_pin.tools.dot_to_html(dot, layout='dot', as_png=False)[source]

Converts a given pydot graph into a IPython.display.HTML object. Used in jupyter lab inline display of graph data.

Parameters:
  • dot (pydot.Dot) – Pydot Graph instance.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Raises:

ValueError – If invalid layout given.

Returns:

HTML instance.

Return type:

IPython.display.HTML

rolling_pin.tools.filter_text(text, include_regex=None, exclude_regex=None, replace_regex=None, replace_value=None)[source]

Filter given text by applying regular expressions to each line.

Parameters:
  • text (str) – Newline separated lines.

  • include_regex (str, optional) – Keep lines that match given regex. Default: None.

  • exclude_regex (str, optional) – Remove lines that match given regex. Default: None.

  • replace_regex (str, optional) – Substitutes regex matches in lines with replace_value. Default: None.

  • replace_value (str, optional) – Regex substitution value. Default: ‘’.

Raises:

AssertionError – If source is not a file.

Returns:

Filtered text.

Return type:

str

rolling_pin.tools.flatten(item, separator='/', embed_types=True)[source]

Flattens a iterable object into a flat dictionary.

Parameters:
  • item (object) – Iterable object.

  • separator (str, optional) – Field separator in keys. Default: ‘/’.

Returns:

Dictionary representation of given object.

Return type:

dict

rolling_pin.tools.get_parent_fields(key, separator='/')[source]

Get all the parent fields of a given key, split by given separator.

Parameters:
  • key (str) – Key.

  • separator (str, optional) – String that splits key into fields. Default: ‘/’.

Returns:

List of absolute parent fields.

Return type:

list(str)

rolling_pin.tools.is_dictlike(item)[source]

Determines if given item is dict-like.

Parameters:

item (object) – Object to be tested.

Returns:

Whether given item is dict-like.

Return type:

bool

rolling_pin.tools.is_iterable(item)[source]

Determines if given item is iterable.

Parameters:

item (object) – Object to be tested.

Returns:

Whether given item is iterable.

Return type:

bool

rolling_pin.tools.is_listlike(item)[source]

Determines if given item is list-like.

Parameters:

item (object) – Object to be tested.

Returns:

Whether given item is list-like.

Return type:

bool

rolling_pin.tools.list_all_files(directory, include_regex=None, exclude_regex=None)[source]

Recusively list all files within a given directory.

Parameters:
  • directory (str or Path) – Directory to walk.

  • include_regex (str, optional) – Include filenames that match this regex. Default: None.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: None.

Raises:

FileNotFoundError – If argument is not a directory or does not exist.

Yields:

Path – File.

Return type:

Generator[Path, None, None]

rolling_pin.tools.move_file(source, target)[source]

Moves a source file to a target file. Creating directories as needed.

Parameters:
  • source (str or Path) – Source filepath.

  • target (str or Path) – Target filepath.

Raises:

AssertionError – If source is not a file.

Return type:

None

rolling_pin.tools.nest(flat_dict, separator='/')[source]

Converts a flat dictionary into a nested dictionary by splitting keys by a given separator.

Parameters:
  • flat_dict (dict) – Flat dictionary.

  • separator (str, optional) – Field separator within given dictionary’s keys. Default: ‘/’.

Returns:

Nested dictionary.

Return type:

dict

rolling_pin.tools.read_text(filepath)[source]

Convenience function for reading text from given file.

Parameters:

filepath (str or Path) – File to be read.

Raises:

AssertionError – If source is not a file.

Returns:

text.

Return type:

str

rolling_pin.tools.replace_and_format(regex, replace, string, flags=0)[source]

Perform a regex substitution on a given string and format any named group found in the result with groupdict data from the pattern. Group beggining with ‘i’ will be converted to integers. Groups beggining with ‘f’ will be converted to floats.


Named group anatomy:

  • (?P<NAME>PATTERN)

  • NAME becomes a key and whatever matches PATTERN becomes its value.

>>> re.search('(?P<i>\d+)', 'foobar123').groupdict()
{'i': '123'}

Examples:

Special groups:
  • (?P<i>d) - string matched by ‘d’ will be converted to an integer

  • (?P<f>d) - string matched by ‘d’ will be converted to an float

  • (?P<i_foo>d) - string matched by ‘d’ will be converted to an integer

  • (?P<f_bar>d) - string matched by ‘d’ will be converted to an float

Named groups (long):
>>> proj = '(?P<p>[a-z0-9]+)'
>>> spec = '(?P<s>[a-z0-9]+)'
>>> desc = '(?P<d>[a-z0-9\-]+)'
>>> ver = '(?P<iv>\d+)\.'
>>> frame = '(?P<i_f>\d+)'
>>> regex = f'{proj}\.{spec}\.{desc}\.v{ver}\.{frame}.*'
>>> replace = 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg'
>>> string = 'proj.spec.desc.v1.25.png'
>>> replace_and_format(regex, replace, string, flags=re.IGNORECASE)
p-proj_s-spec_d-desc_v001_f0025.jpeg
Named groups (short):
>>> replace_and_format(
    '(?P<p>[a-z0-9]+)\.(?P<s>[a-z0-9]+)\.(?P<d>[a-z0-9\-]+)\.v(?P<iv>\d+)\.(?P<i_f>\d+).*',
    'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg',
    'proj.spec.desc.v1.25.png',
)
p-proj_s-spec_d-desc_v001_f0025.jpeg
No groups:
>>> replace_and_format('foo', 'bar', 'foobar')
barbar

type regex:

str

param regex:

Regex pattern to search string with.

type regex:

str

type replace:

str

param replace:

Replacement string which may contain formart variables ie ‘{variable}’.

type replace:

str

type string:

str

param string:

String to be converted.

type string:

str

type flags:

Any

param flags:

re.sub flags. Default: 0.

type flags:

object, optional

returns:

Converted string.

rtype:

str

rolling_pin.tools.unembed(item)[source]

Convert embeded types in dictionary keys into python types.

Parameters:

item (object) – Dictionary with embedded types.

Returns:

Converted object.

Return type:

object

rolling_pin.tools.write_dot_graph(dot, fullpath, layout='dot')[source]

Writes a pydot.Dot object to a given filepath. Formats supported: svg, dot, png.

Parameters:
  • dot (pydot.Dot) – Pydot Dot instance.

  • fulllpath (str or Path) – File to be written to.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

Raises:

ValueError – If invalid file extension given.

Return type:

None

rolling_pin.tools.write_text(text, filepath)[source]

Convenience function for writing text to given file. Creates directories as needed.

Parameters:
  • text (str) – Text to be written.

  • filepath (str or Path) – File to be written.

Return type:

None