blob_etl

class rolling_pin.blob_etl.BlobETL(blob: Any, separator: str = '/')[source]

Bases: object

Converts blob data internally into a flat dictionary that is universally searchable, editable and convertable back to the data’s original structure, new blob structures or directed graphs.

__dict__ = mappingproxy({'__module__': 'rolling_pin.blob_etl', '__doc__': "\n    Converts blob data internally into a flat dictionary that is universally\n    searchable, editable and convertable back to the data's original structure,\n    new blob structures or directed graphs.\n    ", '__init__': <function BlobETL.__init__>, 'query': <function BlobETL.query>, 'filter': <function BlobETL.filter>, 'delete': <function BlobETL.delete>, 'set': <function BlobETL.set>, 'update': <function BlobETL.update>, 'set_field': <function BlobETL.set_field>, 'to_dict': <function BlobETL.to_dict>, 'to_flat_dict': <function BlobETL.to_flat_dict>, 'to_records': <function BlobETL.to_records>, 'to_dataframe': <function BlobETL.to_dataframe>, 'to_prototype': <function BlobETL.to_prototype>, 'to_networkx_graph': <function BlobETL.to_networkx_graph>, 'to_dot_graph': <function BlobETL.to_dot_graph>, 'to_html': <function BlobETL.to_html>, 'write': <function BlobETL.write>, '__dict__': <attribute '__dict__' of 'BlobETL' objects>, '__weakref__': <attribute '__weakref__' of 'BlobETL' objects>, '__annotations__': {'_data': 'Dict[str, Any]', '_separator': 'str'}})
__init__(blob: Any, separator: str = '/') None[source]

Contructs BlobETL instance.

Parameters
  • blob (object) – Iterable object.

  • separator (str, optional) – String to be used as a field separator in each key. Default: ‘/’.

__module__ = 'rolling_pin.blob_etl'
__weakref__

list of weak references to the object (if defined)

delete(predicate: Callable[[Any], bool], by: str = 'key') BlobETL[source]

Delete data items by key, value or key + value, according to a given predicate.

Parameters
  • predicate – Function that returns a boolean value.

  • by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.

Raises

ValueError – If by keyword is not key, value, or key+value.

Returns

New BlobETL instance.

Return type

BlobETL

filter(predicate: Callable[[Any], bool], by: str = 'key', invert: bool = False) BlobETL[source]

Filter data items by key, value or key + value, according to a given predicate.

Parameters
  • predicate – Function that returns a boolean value.

  • by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.

  • invert (bool, optional) – Whether to invert the predicate. Default: False.

Raises

ValueError – If by keyword is not key, value, or key+value.

Returns

New BlobETL instance.

Return type

BlobETL

query(regex: str, ignore_case: bool = True, invert: bool = False) BlobETL[source]

Filter data items by key according to given regular expression.

Parameters
  • regex (str) – Regular expression.

  • ignore_case (bool, optional) – Whether to consider case in the regular expression search. Default: False.

  • invert (bool, optional) – Whether to invert the predicate. Default: False.

Returns

New BlobETL instance.

Return type

BlobETL

set(predicate: Optional[Callable[[Any, Any], bool]] = None, key_setter: Optional[Callable[[Any, Any], str]] = None, value_setter: Optional[Callable[[Any, Any], Any]] = None) BlobETL[source]

Filter data items by key, value or key + value, according to a given predicate. Then set that items key by a given function and value by a given function.

Parameters
  • predicate (function, optional) – Function of the form: lambda k, v: bool. Default: None –> lambda k, v: True.

  • key_setter (function, optional) – Function of the form: lambda k, v: str. Default: None –> lambda k, v: k.

  • value_setter (function, optional) – Function of the form: lambda k, v: object. Default: None –> lambda k, v: v.

Returns

New BlobETL instance.

Return type

BlobETL

set_field(index: int, field_setter: Callable[[str], str]) BlobETL[source]

Set’s a field at a given index according to a given function.

Parameters
  • index (int) – Field index.

  • field_setter (function) – Function of form lambda str: str.

Returns

New BlobETL instance.

Return type

BlobETL

to_dataframe(group_by: Optional[int] = None) DataFrame[source]

Convert data to pandas DataFrame.

Parameters

group_by (int, optional) – Field index to group rows of data by. Default: None.

Returns

DataFrame.

Return type

DataFrame

to_dict() Dict[str, Any][source]
Returns

Nested representation of internal data.

Return type

dict

to_dot_graph(orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None) Dot[source]

Converts internal dictionary into pydot graph. Key and value nodes and edges are colored differently.

Parameters
  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If orient is invalid.

Returns

Dot graph representation of dictionary.

Return type

pydot.Dot

to_flat_dict() Dict[str, Any][source]
Returns

Flat dictionary with embedded types.

Return type

dict

to_html(layout: str = 'dot', orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None, as_png: bool = False) Union[Image, HTML][source]

For use in inline rendering of graph data in Jupyter Lab.

Parameters
  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns

HTML object for inline display.

Return type

IPython.display.HTML

to_networkx_graph() DiGraph[source]

Converts internal dictionary into a networkx directed graph.

Returns

Graph representation of dictionary.

Return type

networkx.DiGraph

to_prototype() BlobETL[source]

Convert data to prototypical representation.

Example:

>>> data = {
'users': [
        {
            'name': {
                'first': 'tom',
                'last': 'smith',
            }
        },{
            'name': {
                'first': 'dick',
                'last': 'smith',
            }
        },{
            'name': {
                'first': 'jane',
                'last': 'doe',
            }
        },
    ]
}
>>> BlobETL(data).to_prototype().to_dict()
{
    '^users': {
        '<list_[0-9]+>': {
            'name': {
                'first$': Counter({'dick': 1, 'jane': 1, 'tom': 1}),
                'last$': Counter({'doe': 1, 'smith': 2})
            }
        }
    }
}
returns

New BlobETL instance.

rtype

BlobETL

to_records() List[Dict][source]
Returns

Data in records format.

Return type

list[dict]

update(item: Union[Dict, BlobETL]) BlobETL[source]

Updates internal dictionary with given dictionary or BlobETL instance. Given dictionary is first flattened with embeded types.

Parameters

item (dict or BlobETL) – Dictionary to be used for update.

Returns

New BlobETL instance.

Return type

BlobETL

write(fullpath: Union[str, Path], layout: str = 'dot', orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None) BlobETL[source]

Writes internal dictionary to a given filepath. Formats supported: svg, dot, png, json.

Parameters
  • fulllpath (str or Path) – File tobe written to.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If invalid file extension given.

Returns

self.

Return type

BlobETL

conform_config

class rolling_pin.conform_config.ConformConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

A class for validating configurations supplied to ConformETL.

source_rules

A list of rules for parsing directories. Default: [].

Type

Rules

rename_rules

A list of rules for renaming source filepath to target filepaths. Default: [].

Type

Rules

group_rules

A list of rules for grouping files. Default: [].

Type

Rules

line_rules

A list of rules for peforming line copies and substitutions on files belonging to a given group. Default: [].

Type

Rules

class GroupRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
name: StringType = <StringType() instance on GroupRule as 'name'>
regex: StringType = <StringType() instance on GroupRule as 'regex'>
class LineRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
exclude: StringType = <StringType() instance on LineRule as 'exclude'>
group: StringType = <StringType() instance on LineRule as 'group'>
include: StringType = <StringType() instance on LineRule as 'include'>
regex: StringType = <StringType() instance on LineRule as 'regex'>
replace: StringType = <StringType() instance on LineRule as 'replace'>
class RenameRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
regex: StringType = <StringType() instance on RenameRule as 'regex'>
replace: StringType = <StringType() instance on RenameRule as 'replace'>
class SourceRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]

Bases: Model

__annotations__ = {}
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
exclude: StringType = <StringType() instance on SourceRule as 'exclude'>
include: StringType = <StringType() instance on SourceRule as 'include'>
path: StringType = <StringType() instance on SourceRule as 'path'>
__module__ = 'rolling_pin.conform_config'
_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
group_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'group_rules'>
line_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'line_rules'>
rename_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'rename_rules'>
source_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'source_rules'>
rolling_pin.conform_config.is_dir(dirpath: str) None[source]

Validates whether a given dirpath exists.

Parameters

dirpath (str) – Directory path.

Raises

ValidationError – If dirpath is not a directory or does not exist.

conform_etl

class rolling_pin.conform_etl.ConformETL(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = [])[source]

Bases: object

ConformETL creates a DataFrame from a given directory of source files. Then it generates target paths given a set of rules. Finally, the conform method is called and the source files are copied to their target filepaths.

__dict__ = mappingproxy({'__module__': 'rolling_pin.conform_etl', '__doc__': '\n    ConformETL creates a DataFrame from a given directory of source files.\n    Then it generates target paths given a set of rules.\n    Finally, the conform method is called and the source files are copied to\n    their target filepaths.\n    ', '_get_data': <staticmethod(<function ConformETL._get_data>)>, 'from_yaml': <classmethod(<function ConformETL.from_yaml>)>, '__init__': <function ConformETL.__init__>, '__repr__': <function ConformETL.__repr__>, 'groups': <property object>, 'to_dataframe': <function ConformETL.to_dataframe>, 'to_blob': <function ConformETL.to_blob>, 'to_html': <function ConformETL.to_html>, 'conform': <function ConformETL.conform>, '__dict__': <attribute '__dict__' of 'ConformETL' objects>, '__weakref__': <attribute '__weakref__' of 'ConformETL' objects>, '__annotations__': {'_data': 'DataFrame', '_line_rules': 'Rules'}})
__init__(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = []) None[source]

Generates DataFrame from given source_rules and then generates target paths for them given other rules.

Parameters
  • source_rules (Rules) – A list of rules for parsing directories. Default: [].

  • rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].

  • group_rules (Rules) – A list of rules for grouping files. Default: [].

  • line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].

Raises

DataError – If configuration is invalid.

__module__ = 'rolling_pin.conform_etl'
__repr__() str[source]

String representation of conform DataFrame.

Returns

Table optimized for output to shell.

Return type

str

__weakref__

list of weak references to the object (if defined)

static _get_data(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = []) DataFrame[source]

Generates DataFrame from given source_rules and then generates target paths for them given other rules.

Parameters
  • source_rules (Rules) – A list of rules for parsing directories. Default: [].

  • rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].

  • group_rules (Rules) – A list of rules for grouping files. Default: [].

  • line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].

Returns

Conform DataFrame.

Return type

DataFrame

conform(groups: Union[str, List[str]] = 'all') None[source]

Copies source files to target filepaths.

Parameters

groups (str or list[str]) – Groups of files which are to be conformed. ‘all’ means all groups. Default: ‘all’.

classmethod from_yaml(filepath: Union[str, Path]) ConformETL[source]

Construct ConformETL instance from given yaml file.

Parameters

filepath (str or Path) – YAML file.

Raises

EnforceError – If file does not end in yml or yaml.

Returns

ConformETL instance.

Return type

ConformETL

property groups

List of groups found with self._data.

Type

list[str]

to_blob() BlobETL[source]

Converts self into a BlobETL object with target column as keys and source columns as values.

Returns

BlobETL of target and source filepaths.

Return type

BlobETL

to_dataframe() DataFrame[source]
Returns

Copy of internal data.

Return type

DataFrame

to_html(orient: str = 'lr', color_scheme: Dict[str, str] = {'background': '#242424', 'edge': '#DE958E', 'edge_library': '#B6ECF3', 'edge_module': '#DE958E', 'edge_subpackage': '#A0D17B', 'edge_value': '#B6ECF3', 'node': '#343434', 'node_font': '#DE958E', 'node_library_font': '#B6ECF3', 'node_module_font': '#DE958E', 'node_subpackage_font': '#A0D17B', 'node_value': '#343434', 'node_value_font': '#B6ECF3'}, as_png: bool = False) Union[Image, HTML][source]

For use in inline rendering of graph data in Jupyter Lab. Graph from target to source filepath. Target is in red, source is in cyan.

Parameters
  • orient (str, optional) –

    Graph layout orientation. Default: lr. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.conform_etl.CONFORM_COLOR_SCHEME

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns

HTML object for inline display.

Return type

IPython.display.HTML

radon_etl

class rolling_pin.radon_etl.RadonETL(fullpath: Union[str, Path])[source]

Bases: object

Conforms all four radon reports (raw metrics, Halstead, maintainability and cyclomatic complexity) into a single DataFrame that can then be plotted.

__dict__ = mappingproxy({'__module__': 'rolling_pin.radon_etl', '__doc__': '\n    Conforms all four radon reports (raw metrics, Halstead, maintainability and\n    cyclomatic complexity) into a single DataFrame that can then be plotted.\n    ', '__init__': <function RadonETL.__init__>, 'report': <property object>, 'data': <property object>, 'raw_metrics': <property object>, 'maintainability_index': <property object>, 'cyclomatic_complexity_metrics': <property object>, 'halstead_metrics': <property object>, '_get_radon_data': <function RadonETL._get_radon_data>, '_get_radon_report': <staticmethod(<function RadonETL._get_radon_report>)>, '_get_raw_metrics_dataframe': <staticmethod(<function RadonETL._get_raw_metrics_dataframe>)>, '_get_maintainability_index_dataframe': <staticmethod(<function RadonETL._get_maintainability_index_dataframe>)>, '_get_cyclomatic_complexity_dataframe': <staticmethod(<function RadonETL._get_cyclomatic_complexity_dataframe>)>, '_get_halstead_dataframe': <staticmethod(<function RadonETL._get_halstead_dataframe>)>, 'write_plots': <function RadonETL.write_plots>, 'write_tables': <function RadonETL.write_tables>, '__dict__': <attribute '__dict__' of 'RadonETL' objects>, '__weakref__': <attribute '__weakref__' of 'RadonETL' objects>, '__annotations__': {}})
__init__(fullpath: Union[str, Path]) None[source]

Constructs a RadonETL instance.

Parameters

fullpath (str or Path) – Python file or directory of python files.

__module__ = 'rolling_pin.radon_etl'
__weakref__

list of weak references to the object (if defined)

static _get_cyclomatic_complexity_dataframe(report: Dict) DataFrame[source]

Converts radon cyclomatic complexity report into a pandas DataFrame.

Parameters

report (dict) – Radon report blob.

Returns

Cyclomatic complexity DataFrame.

Return type

DataFrame

static _get_halstead_dataframe(report: Dict) DataFrame[source]

Converts radon Halstead report into a pandas DataFrame.

Parameters

report (dict) – Radon report blob.

Returns

Halstead DataFrame.

Return type

DataFrame

static _get_maintainability_index_dataframe(report: Dict) DataFrame[source]

Converts radon maintainability index report into a pandas DataFrame.

Parameters

report (dict) – Radon report blob.

Returns

Maintainability DataFrame.

Return type

DataFrame

_get_radon_data() DataFrame[source]

Constructs a DataFrame representing all the radon reports generated for a given python file or directory containing python files.

Returns

Radon report DataFrame.

Return type

DataFrame

static _get_radon_report(fullpath: Union[str, Path]) Dict[str, Any][source]

Gets all 4 report from radon and aggregates them into a single blob object.

Parameters

fullpath (str or Path) – Python file or directory of python files.

Returns

Radon report blob.

Return type

dict

static _get_raw_metrics_dataframe(report: Dict) DataFrame[source]

Converts radon raw metrics report into a pandas DataFrame.

Parameters

report (dict) – Radon report blob.

Returns

Raw metrics DataFrame.

Return type

DataFrame

property cyclomatic_complexity_metrics

DataFrame of radon cyclomatic complexity metrics.

Type

DataFrame

property data

DataFrame of all radon metrics.

Type

DataFrame

property halstead_metrics

DataFrame of radon Halstead metrics.

Type

DataFrame

property maintainability_index

DataFrame of radon maintainability index metrics.

Type

DataFrame

property raw_metrics

DataFrame of radon raw metrics.

Type

DataFrame

property report

Dictionary of all radon metrics.

Type

dict

write_plots(fullpath: Union[str, Path]) RadonETL[source]

Writes metrics plots to given file.

Parameters

fullpath (Path or str) – Target file.

Returns

self.

Return type

RadonETL

write_tables(target_dir: Union[str, Path]) RadonETL[source]

Writes metrics tables as HTML files to given directory.

Parameters

target_dir (Path or str) – Target directory.

Returns

self.

Return type

RadonETL

repo_etl

class rolling_pin.repo_etl.RepoETL(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|test_|_test|mock_)\\.py$')[source]

Bases: object

RepoETL is a class for extracting 1st order dependencies of modules within a given repository. This information is stored internally as a DataFrame and can be rendered as networkx, pydot or SVG graphs.

__dict__ = mappingproxy({'__module__': 'rolling_pin.repo_etl', '__doc__': '\n    RepoETL is a class for extracting 1st order dependencies of modules within a\n    given repository. This information is stored internally as a DataFrame and\n    can be rendered as networkx, pydot or SVG graphs.\n    ', '__init__': <function RepoETL.__init__>, '_get_imports': <staticmethod(<function RepoETL._get_imports>)>, '_get_data': <staticmethod(<function RepoETL._get_data>)>, '_calculate_coordinates': <staticmethod(<function RepoETL._calculate_coordinates>)>, '_anneal_coordinate': <staticmethod(<function RepoETL._anneal_coordinate>)>, '_center_coordinate': <staticmethod(<function RepoETL._center_coordinate>)>, '_to_networkx_graph': <staticmethod(<function RepoETL._to_networkx_graph>)>, 'to_networkx_graph': <function RepoETL.to_networkx_graph>, 'to_dot_graph': <function RepoETL.to_dot_graph>, 'to_dataframe': <function RepoETL.to_dataframe>, 'to_html': <function RepoETL.to_html>, 'write': <function RepoETL.write>, '__dict__': <attribute '__dict__' of 'RepoETL' objects>, '__weakref__': <attribute '__weakref__' of 'RepoETL' objects>, '__annotations__': {'_root': 'Union[str, Path]', '_data': 'DataFrame'}})
__init__(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|test_|_test|mock_)\\.py$') None[source]

Construct RepoETL instance.

Parameters
  • root (str or Path) – Full path to repository root directory.

  • include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.

  • exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|test_|_test|mock_).py$’.

Raises

ValueError – If include or exclude regex does not end in ‘.py$’.

__module__ = 'rolling_pin.repo_etl'
__weakref__

list of weak references to the object (if defined)

static _anneal_coordinate(data: DataFrame, anneal_axis: str = 'x', pin_axis: str = 'y', iterations: int = 10) DataFrame[source]

Iteratively align nodes in the anneal axis according to the mean position of their connected nodes. Node anneal coordinates are rectified at the end of each iteration according to a pin axis, so that they do not overlap. This mean that they are sorted at each level of the pin axis.

Parameters
  • data (DataFrame) – DataFrame with x column.

  • anneal_axis (str, optional) – Coordinate column to be annealed. Default: ‘x’.

  • pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.

  • iterations (int, optional) – Number of times to update x coordinates. Default: 10.

Returns

DataFrame with annealed anneal axis coordinates.

Return type

DataFrame

static _calculate_coordinates(data: DataFrame) DataFrame[source]

Calculate inital x, y coordinates for each node in given DataFrame. Node are startified by type along the y axis.

Parameters

DataFrame – DataFrame of nodes.

Returns

DataFrame with x and y coordinate columns.

Return type

DataFrame

static _center_coordinate(data, center_axis='x', pin_axis='y')[source]

Sorted center_axis coordinates at each level of the pin axis.

Parameters
  • data (DataFrame) – DataFrame with x column.

  • anneal_column (str, optional) – Coordinate column to be annealed. Default: ‘x’.

  • pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.

  • iterations (int, optional) – Number of times to update x coordinates. Default: 10.

Returns

DataFrame with centered center axis coordinates.

Return type

DataFrame

static _get_data(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|_test)\\.py$') DataFrame[source]

Recursively aggregates and filters all the files found with a given directory into a DataFrame. Data is used to create directed graphs.

DataFrame has these columns:

  • node_name - name of node

  • node_type - type of node, can be [module, subpackage, library]

  • x - node’s x coordinate

  • y - node’s y coordinate

  • dependencies - parent nodes

  • subpackages - parent nodes of type subpackage

  • fullpath - fullpath to the module a node represents

Parameters
  • root (str or Path) – Root directory to be searched.

  • include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.

  • exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|_test).py$’.

Raises
  • ValueError – If include or exclude regex does not end in ‘.py$’.

  • FileNotFoundError – If no files are found after filtering.

Returns

DataFrame of file information.

Return type

DataFrame

static _get_imports(fullpath: Union[str, Path]) List[str][source]

Get’s import statements from a given python module.

Parameters

fullpath (str or Path) – Path to python module.

Returns

List of imported modules.

Return type

list(str)

static _to_networkx_graph(data)[source]

Converts given DataFrame into networkx directed graph.

Parameters

DataFrame – DataFrame of nodes.

Returns

Graph of nodes.

Return type

networkx.DiGraph

to_dataframe() DataFrame[source]
Retruns:

DataFrame: DataFrame of nodes representing repo modules.

to_dot_graph(orient='tb', orthogonal_edges=False, color_scheme=None)[source]

Converts internal data into pydot graph.

Parameters
  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If orient is invalid.

Returns

Dot graph of nodes.

Return type

pydot.Dot

to_html(layout: str = 'dot', orthogonal_edges: bool = False, color_scheme: Optional[Dict[str, str]] = None, as_png: bool = False) HTML[source]

For use in inline rendering of graph data in Jupyter Lab.

Parameters
  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns

HTML object for inline display.

Return type

IPython.display.HTML

to_networkx_graph()[source]

Converts internal data into networkx directed graph.

Returns

Graph of nodes.

Return type

networkx.DiGraph

write(fullpath: Union[str, Path], layout: str = 'dot', orient: str = 'tb', orthogonal_edges: bool = False, color_scheme: Optional[Dict[str, str]] = None) RepoETL[source]

Writes internal data to a given filepath. Formats supported: svg, dot, png, json.

Parameters
  • fulllpath (str or Path) – File to be written to.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • orient (str, optional) –

    Graph layout orientation. Default: tb. Options include:

    • tb - top to bottom

    • bt - bottom to top

    • lr - left to right

    • rl - right to left

  • orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.

  • color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If invalid file extension given.

Returns

Self.

Return type

RepoETL

toml_etl

class rolling_pin.toml_etl.TomlETL(data: dict[str, Any])[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'rolling_pin.toml_etl', 'from_string': <classmethod(<function TomlETL.from_string>)>, 'from_toml': <classmethod(<function TomlETL.from_toml>)>, '__init__': <function TomlETL.__init__>, 'to_dict': <function TomlETL.to_dict>, 'to_string': <function TomlETL.to_string>, 'write': <function TomlETL.write>, 'edit': <function TomlETL.edit>, 'delete': <function TomlETL.delete>, 'search': <function TomlETL.search>, '__dict__': <attribute '__dict__' of 'TomlETL' objects>, '__weakref__': <attribute '__weakref__' of 'TomlETL' objects>, '__doc__': None, '__annotations__': {}})
__init__(data: dict[str, Any]) None[source]

Creates a TomlETL instance from a given dictionary.

Parameters

data (dict) – Dictionary.

__module__ = 'rolling_pin.toml_etl'
__weakref__

list of weak references to the object (if defined)

delete(regex: str) TomlETL[source]

Returns portion of data whose keys fo not match a given regular expression.

Parameters

regex (str) – Regular expression applied to keys.

Returns

New TomlETL instance.

Return type

TomlETL

edit(patch: str) TomlETL[source]

Apply edit to internal data given TOML patch. Patch is always of the form ‘[key]=[value]’ and in TOML format.

Parameters

patch (str) – TOML patch to be applied.

Raises
  • TOMLDecoderError – If patch cannot be decoded.

  • EnforceError – If ‘=’ not found in patch.

Returns

New TomlETL instance with edits.

Return type

TomlETL

classmethod from_string(text: Type[T]) T[source]

Creates a TomlETL instance from a given TOML string.

Parameters

text (str) – TOML string.

Returns

TomlETL instance.

Return type

TomlETL

classmethod from_toml(filepath: Type[T]) T[source]

Creates a TomlETL instance from a given TOML file.

Parameters

filepath (str or Path) – TOML file.

Returns

TomlETL instance.

Return type

TomlETL

search(regex: str) TomlETL[source]

Returns portion of data whose keys match a given regular expression.

Parameters

regex (str) – Regular expression applied to keys.

Returns

New TomlETL instance.

Return type

TomlETL

to_dict() dict[source]

Converts instance to dictionary copy.

Returns

Dictionary copy of instance.

Return type

dict

to_string() str[source]

Converts instance to a TOML formatted string.

Returns

TOML string.

Return type

str

write(filepath: Union[str, Path]) None[source]

Writes instance to given TOML file.

Parameters

filepath (str or Path) – Target filepath.

tools

rolling_pin.tools.LOGGER = <Logger rolling_pin.tools (WARNING)>

Contains basic functions for more complex ETL functions and classes.

rolling_pin.tools.copy_file(source: Union[str, Path], target: Union[str, Path]) None[source]

Copy a source file to a target file. Creating directories as needed.

Parameters
  • source (str or Path) – Source filepath.

  • target (str or Path) – Target filepath.

Raises

AssertionError – If source is not a file.

rolling_pin.tools.directory_to_dataframe(directory: Union[str, Path], include_regex: str = '', exclude_regex: str = '\\.DS_Store') DataFrame[source]

Recursively list files with in a given directory as rows in a pd.DataFrame.

Parameters
  • directory (str or Path) – Directory to walk.

  • include_regex (str, optional) – Include filenames that match this regex. Default: None.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.

Returns

pd.DataFrame with one file per row.

Return type

pd.DataFrame

rolling_pin.tools.dot_to_html(dot: Dot, layout: str = 'dot', as_png: bool = False) Union[HTML, Image][source]

Converts a given pydot graph into a IPython.display.HTML object. Used in jupyter lab inline display of graph data.

Parameters
  • dot (pydot.Dot) – Pydot Graph instance.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

  • as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Raises

ValueError – If invalid layout given.

Returns

HTML instance.

Return type

IPython.display.HTML

rolling_pin.tools.filter_text(text: str, include_regex: Optional[str] = None, exclude_regex: Optional[str] = None, replace_regex: Optional[str] = None, replace_value: Optional[str] = None) str[source]

Filter given text by applying regular expressions to each line.

Parameters
  • text (str) – Newline separated lines.

  • include_regex (str, optional) – Keep lines that match given regex. Default: None.

  • exclude_regex (str, optional) – Remove lines that match given regex. Default: None.

  • replace_regex (str, optional) – Substitutes regex matches in lines with replace_value. Default: None.

  • replace_value (str, optional) – Regex substitution value. Default: ‘’.

Raises

AssertionError – If source is not a file.

Returns

Filtered text.

Return type

str

rolling_pin.tools.flatten(item: Iterable, separator: str = '/', embed_types: bool = True) Dict[str, Any][source]

Flattens a iterable object into a flat dictionary.

Parameters
  • item (object) – Iterable object.

  • separator (str, optional) – Field separator in keys. Default: ‘/’.

Returns

Dictionary representation of given object.

Return type

dict

rolling_pin.tools.get_parent_fields(key: str, separator: str = '/') List[str][source]

Get all the parent fields of a given key, split by given separator.

Parameters
  • key (str) – Key.

  • separator (str, optional) – String that splits key into fields. Default: ‘/’.

Returns

List of absolute parent fields.

Return type

list(str)

rolling_pin.tools.is_dictlike(item: Any) bool[source]

Determines if given item is dict-like.

Parameters

item (object) – Object to be tested.

Returns

Whether given item is dict-like.

Return type

bool

rolling_pin.tools.is_iterable(item: Any) bool[source]

Determines if given item is iterable.

Parameters

item (object) – Object to be tested.

Returns

Whether given item is iterable.

Return type

bool

rolling_pin.tools.is_listlike(item: Any) bool[source]

Determines if given item is list-like.

Parameters

item (object) – Object to be tested.

Returns

Whether given item is list-like.

Return type

bool

rolling_pin.tools.list_all_files(directory: Union[str, Path], include_regex: Optional[str] = None, exclude_regex: Optional[str] = None) Generator[Path, None, None][source]

Recusively list all files within a given directory.

Parameters
  • directory (str or Path) – Directory to walk.

  • include_regex (str, optional) – Include filenames that match this regex. Default: None.

  • exclude_regex (str, optional) – Exclude filenames that match this regex. Default: None.

Raises

FileNotFoundError – If argument is not a directory or does not exist.

Yields

Path – File.

rolling_pin.tools.move_file(source: Union[str, Path], target: Union[str, Path]) None[source]

Moves a source file to a target file. Creating directories as needed.

Parameters
  • source (str or Path) – Source filepath.

  • target (str or Path) – Target filepath.

Raises

AssertionError – If source is not a file.

rolling_pin.tools.nest(flat_dict: Dict[str, Any], separator: str = '/') Dict[str, Any][source]

Converts a flat dictionary into a nested dictionary by splitting keys by a given separator.

Parameters
  • flat_dict (dict) – Flat dictionary.

  • separator (str, optional) – Field separator within given dictionary’s keys. Default: ‘/’.

Returns

Nested dictionary.

Return type

dict

rolling_pin.tools.read_text(filepath: Union[str, Path]) str[source]

Convenience function for reading text from given file.

Parameters

filepath (str or Path) – File to be read.

Raises

AssertionError – If source is not a file.

Returns

text.

Return type

str

rolling_pin.tools.replace_and_format(regex: str, replace: str, string: str, flags: Any = 0) str[source]

Perform a regex substitution on a given string and format any named group found in the result with groupdict data from the pattern. Group beggining with ‘i’ will be converted to integers. Groups beggining with ‘f’ will be converted to floats.


Named group anatomy:

  • (?P<NAME>PATTERN)

  • NAME becomes a key and whatever matches PATTERN becomes its value.

>>> re.search('(?P<i>\d+)', 'foobar123').groupdict()
{'i': '123'}

Examples:

Special groups:
  • (?P<i>d) - string matched by ‘d’ will be converted to an integer

  • (?P<f>d) - string matched by ‘d’ will be converted to an float

  • (?P<i_foo>d) - string matched by ‘d’ will be converted to an integer

  • (?P<f_bar>d) - string matched by ‘d’ will be converted to an float

Named groups (long):
>>> proj = '(?P<p>[a-z0-9]+)'
>>> spec = '(?P<s>[a-z0-9]+)'
>>> desc = '(?P<d>[a-z0-9\-]+)'
>>> ver = '(?P<iv>\d+)\.'
>>> frame = '(?P<i_f>\d+)'
>>> regex = f'{proj}\.{spec}\.{desc}\.v{ver}\.{frame}.*'
>>> replace = 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg'
>>> string = 'proj.spec.desc.v1.25.png'
>>> replace_and_format(regex, replace, string, flags=re.IGNORECASE)
p-proj_s-spec_d-desc_v001_f0025.jpeg
Named groups (short):
>>> replace_and_format(
    '(?P<p>[a-z0-9]+)\.(?P<s>[a-z0-9]+)\.(?P<d>[a-z0-9\-]+)\.v(?P<iv>\d+)\.(?P<i_f>\d+).*',
    'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg',
    'proj.spec.desc.v1.25.png',
)
p-proj_s-spec_d-desc_v001_f0025.jpeg
No groups:
>>> replace_and_format('foo', 'bar', 'foobar')
barbar

param regex

Regex pattern to search string with.

type regex

str

param replace

Replacement string which may contain formart variables ie ‘{variable}’.

type replace

str

param string

String to be converted.

type string

str

param flags

re.sub flags. Default: 0.

type flags

object, optional

returns

Converted string.

rtype

str

rolling_pin.tools.unembed(item: Any) Any[source]

Convert embeded types in dictionary keys into python types.

Parameters

item (object) – Dictionary with embedded types.

Returns

Converted object.

Return type

object

rolling_pin.tools.write_dot_graph(dot: Dot, fullpath: Union[str, Path], layout: str = 'dot') None[source]

Writes a pydot.Dot object to a given filepath. Formats supported: svg, dot, png.

Parameters
  • dot (pydot.Dot) – Pydot Dot instance.

  • fulllpath (str or Path) – File to be written to.

  • layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

Raises

ValueError – If invalid file extension given.

rolling_pin.tools.write_text(text: str, filepath: Union[str, Path]) None[source]

Convenience function for writing text to given file. Creates directories as needed.

Parameters
  • text (str) – Text to be written.

  • filepath (str or Path) – File to be written.