blob_etl¶

class rolling_pin.blob_etl.BlobETL(blob: Any, separator: str = '/')[source]¶

Bases: object

Converts blob data internally into a flat dictionary that is universally searchable, editable and convertable back to the data’s original structure, new blob structures or directed graphs.

__dict__ = mappingproxy({'__module__': 'rolling_pin.blob_etl', '__doc__': "\n Converts blob data internally into a flat dictionary that is universally\n searchable, editable and convertable back to the data's original structure,\n new blob structures or directed graphs.\n ", '__init__': <function BlobETL.__init__>, 'query': <function BlobETL.query>, 'filter': <function BlobETL.filter>, 'delete': <function BlobETL.delete>, 'set': <function BlobETL.set>, 'update': <function BlobETL.update>, 'set_field': <function BlobETL.set_field>, 'to_dict': <function BlobETL.to_dict>, 'to_flat_dict': <function BlobETL.to_flat_dict>, 'to_records': <function BlobETL.to_records>, 'to_dataframe': <function BlobETL.to_dataframe>, 'to_prototype': <function BlobETL.to_prototype>, 'to_networkx_graph': <function BlobETL.to_networkx_graph>, 'to_dot_graph': <function BlobETL.to_dot_graph>, 'to_html': <function BlobETL.to_html>, 'write': <function BlobETL.write>, '__dict__': <attribute '__dict__' of 'BlobETL' objects>, '__weakref__': <attribute '__weakref__' of 'BlobETL' objects>, '__annotations__': {'_data': 'Dict[str, Any]', '_separator': 'str'}})¶

__init__(blob: Any, separator: str = '/') → None[source]¶

Contructs BlobETL instance.

Parameters

blob (object) – Iterable object.
separator (str, optional) – String to be used as a field separator in each key. Default: ‘/’.

__module__ = 'rolling_pin.blob_etl'¶

__weakref__¶: list of weak references to the object (if defined)

delete(predicate: Callable[[Any], bool], by: str = 'key') → BlobETL[source]¶

Delete data items by key, value or key + value, according to a given predicate.

Parameters

predicate – Function that returns a boolean value.
by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.

Raises

ValueError – If by keyword is not key, value, or key+value.

Returns

New BlobETL instance.

Return type

BlobETL

filter(predicate: Callable[[Any], bool], by: str = 'key', invert: bool = False) → BlobETL[source]¶

Filter data items by key, value or key + value, according to a given predicate.

Parameters

predicate – Function that returns a boolean value.
by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.
invert (bool, optional) – Whether to invert the predicate. Default: False.

Raises

ValueError – If by keyword is not key, value, or key+value.

Returns

New BlobETL instance.

Return type

BlobETL

query(regex: str, ignore_case: bool = True, invert: bool = False) → BlobETL[source]¶

Filter data items by key according to given regular expression.

Parameters

regex (str) – Regular expression.
ignore_case (bool, optional) – Whether to consider case in the regular expression search. Default: False.
invert (bool, optional) – Whether to invert the predicate. Default: False.

Returns

New BlobETL instance.

Return type

BlobETL

set(predicate: Optional[Callable[[Any, Any], bool]] = None, key_setter: Optional[Callable[[Any, Any], str]] = None, value_setter: Optional[Callable[[Any, Any], Any]] = None) → BlobETL[source]¶

Filter data items by key, value or key + value, according to a given predicate. Then set that items key by a given function and value by a given function.

Parameters

predicate (function, optional) – Function of the form: lambda k, v: bool. Default: None –> lambda k, v: True.
key_setter (function, optional) – Function of the form: lambda k, v: str. Default: None –> lambda k, v: k.
value_setter (function, optional) – Function of the form: lambda k, v: object. Default: None –> lambda k, v: v.

Returns

New BlobETL instance.

Return type

BlobETL

set_field(index: int, field_setter: Callable[[str], str]) → BlobETL[source]¶

Set’s a field at a given index according to a given function.

Parameters

index (int) – Field index.
field_setter (function) – Function of form lambda str: str.

Returns

New BlobETL instance.

Return type

BlobETL

to_dataframe(group_by: Optional[int] = None) → DataFrame[source]¶

Convert data to pandas DataFrame.

Parameters: group_by (int, optional) – Field index to group rows of data by. Default: None.
Returns: DataFrame.
Return type: DataFrame

to_dict() → Dict[str, Any][source]¶

Returns: Nested representation of internal data.
Return type: dict

to_dot_graph(orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None) → Dot[source]¶

Converts internal dictionary into pydot graph. Key and value nodes and edges are colored differently.

Parameters

orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
- tb - top to bottom
- bt - bottom to top
- lr - left to right
- rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If orient is invalid.

Returns

Dot graph representation of dictionary.

Return type

pydot.Dot

to_flat_dict() → Dict[str, Any][source]¶

Returns: Flat dictionary with embedded types.
Return type: dict

to_html(layout: str = 'dot', orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None, as_png: bool = False) → Union[Image, HTML][source]¶

For use in inline rendering of graph data in Jupyter Lab.

Parameters

layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
- tb - top to bottom
- bt - bottom to top
- lr - left to right
- rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns

HTML object for inline display.

Return type

IPython.display.HTML

to_networkx_graph() → DiGraph[source]¶

Converts internal dictionary into a networkx directed graph.

Returns: Graph representation of dictionary.
Return type: networkx.DiGraph

to_prototype() → BlobETL[source]¶

Convert data to prototypical representation.

Example:¶

>>> data = {
'users': [
        {
            'name': {
                'first': 'tom',
                'last': 'smith',
            }
        },{
            'name': {
                'first': 'dick',
                'last': 'smith',
            }
        },{
            'name': {
                'first': 'jane',
                'last': 'doe',
            }
        },
    ]
}
>>> BlobETL(data).to_prototype().to_dict()
{
    '^users': {
        '<list_[0-9]+>': {
            'name': {
                'first$': Counter({'dick': 1, 'jane': 1, 'tom': 1}),
                'last$': Counter({'doe': 1, 'smith': 2})
            }
        }
    }
}

returns: New BlobETL instance.
rtype: BlobETL

to_records() → List[Dict][source]¶

Returns: Data in records format.
Return type: list[dict]

update(item: Union[Dict, BlobETL]) → BlobETL[source]¶

Updates internal dictionary with given dictionary or BlobETL instance. Given dictionary is first flattened with embeded types.

Parameters: item (dict or BlobETL) – Dictionary to be used for update.
Returns: New BlobETL instance.
Return type: BlobETL

write(fullpath: Union[str, Path], layout: str = 'dot', orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None) → BlobETL[source]¶

Writes internal dictionary to a given filepath. Formats supported: svg, dot, png, json.

Parameters

fulllpath (str or Path) – File tobe written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
- tb - top to bottom
- bt - bottom to top
- lr - left to right
- rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If invalid file extension given.

Returns

self.

Return type

BlobETL

conform_config¶

class rolling_pin.conform_config.ConformConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶

Bases: Model

A class for validating configurations supplied to ConformETL.

source_rules¶

A list of rules for parsing directories. Default: [].

Type: Rules

rename_rules¶

A list of rules for renaming source filepath to target filepaths. Default: [].

Type: Rules

group_rules¶

A list of rules for grouping files. Default: [].

Type: Rules

line_rules¶

A list of rules for peforming line copies and substitutions on files belonging to a given group. Default: [].

Type: Rules

class GroupRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶

Bases: Model

__annotations__ = {}¶

__module__ = 'rolling_pin.conform_config'¶

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶

name: StringType = <StringType() instance on GroupRule as 'name'>¶

regex: StringType = <StringType() instance on GroupRule as 'regex'>¶

class LineRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶

Bases: Model

__annotations__ = {}¶

__module__ = 'rolling_pin.conform_config'¶

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶

exclude: StringType = <StringType() instance on LineRule as 'exclude'>¶

group: StringType = <StringType() instance on LineRule as 'group'>¶

include: StringType = <StringType() instance on LineRule as 'include'>¶

regex: StringType = <StringType() instance on LineRule as 'regex'>¶

replace: StringType = <StringType() instance on LineRule as 'replace'>¶

class RenameRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶

Bases: Model

__annotations__ = {}¶

__module__ = 'rolling_pin.conform_config'¶

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶

regex: StringType = <StringType() instance on RenameRule as 'regex'>¶

replace: StringType = <StringType() instance on RenameRule as 'replace'>¶

class SourceRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶

Bases: Model

__annotations__ = {}¶

__module__ = 'rolling_pin.conform_config'¶

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶

exclude: StringType = <StringType() instance on SourceRule as 'exclude'>¶

include: StringType = <StringType() instance on SourceRule as 'include'>¶

path: StringType = <StringType() instance on SourceRule as 'path'>¶

__module__ = 'rolling_pin.conform_config'¶

_schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶

group_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'group_rules'>¶

line_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'line_rules'>¶

rename_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'rename_rules'>¶

source_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'source_rules'>¶

rolling_pin.conform_config.is_dir(dirpath: str) → None[source]¶

Validates whether a given dirpath exists.

Parameters: dirpath (str) – Directory path.
Raises: ValidationError – If dirpath is not a directory or does not exist.

conform_etl¶

class rolling_pin.conform_etl.ConformETL(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = [])[source]¶

Bases: object

ConformETL creates a DataFrame from a given directory of source files. Then it generates target paths given a set of rules. Finally, the conform method is called and the source files are copied to their target filepaths.

__dict__ = mappingproxy({'__module__': 'rolling_pin.conform_etl', '__doc__': '\n ConformETL creates a DataFrame from a given directory of source files.\n Then it generates target paths given a set of rules.\n Finally, the conform method is called and the source files are copied to\n their target filepaths.\n ', '_get_data': <staticmethod(<function ConformETL._get_data>)>, 'from_yaml': <classmethod(<function ConformETL.from_yaml>)>, '__init__': <function ConformETL.__init__>, '__repr__': <function ConformETL.__repr__>, 'groups': <property object>, 'to_dataframe': <function ConformETL.to_dataframe>, 'to_blob': <function ConformETL.to_blob>, 'to_html': <function ConformETL.to_html>, 'conform': <function ConformETL.conform>, '__dict__': <attribute '__dict__' of 'ConformETL' objects>, '__weakref__': <attribute '__weakref__' of 'ConformETL' objects>, '__annotations__': {'_data': 'DataFrame', '_line_rules': 'Rules'}})¶

__init__(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = []) → None[source]¶

Generates DataFrame from given source_rules and then generates target paths for them given other rules.

Parameters

source_rules (Rules) – A list of rules for parsing directories. Default: [].
rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].
group_rules (Rules) – A list of rules for grouping files. Default: [].
line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].

Raises

DataError – If configuration is invalid.

__module__ = 'rolling_pin.conform_etl'¶

__repr__() → str[source]¶

String representation of conform DataFrame.

Returns: Table optimized for output to shell.
Return type: str

__weakref__¶: list of weak references to the object (if defined)

static _get_data(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = []) → DataFrame[source]¶

Generates DataFrame from given source_rules and then generates target paths for them given other rules.

Parameters

source_rules (Rules) – A list of rules for parsing directories. Default: [].
rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].
group_rules (Rules) – A list of rules for grouping files. Default: [].
line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].

Returns

Conform DataFrame.

Return type

DataFrame

conform(groups: Union[str, List[str]] = 'all') → None[source]¶

Copies source files to target filepaths.

Parameters: groups (str or list[str]) – Groups of files which are to be conformed. ‘all’ means all groups. Default: ‘all’.

classmethod from_yaml(filepath: Union[str, Path]) → ConformETL[source]¶

Construct ConformETL instance from given yaml file.

Parameters: filepath (str or Path) – YAML file.
Raises: EnforceError – If file does not end in yml or yaml.
Returns: ConformETL instance.
Return type: ConformETL

property groups¶

List of groups found with self._data.

Type: list[str]

to_blob() → BlobETL[source]¶

Converts self into a BlobETL object with target column as keys and source columns as values.

Returns: BlobETL of target and source filepaths.
Return type: BlobETL

to_dataframe() → DataFrame[source]¶

Returns: Copy of internal data.
Return type: DataFrame

to_html(orient: str = 'lr', color_scheme: Dict[str, str] = {'background': '#242424', 'edge': '#DE958E', 'edge_library': '#B6ECF3', 'edge_module': '#DE958E', 'edge_subpackage': '#A0D17B', 'edge_value': '#B6ECF3', 'node': '#343434', 'node_font': '#DE958E', 'node_library_font': '#B6ECF3', 'node_module_font': '#DE958E', 'node_subpackage_font': '#A0D17B', 'node_value': '#343434', 'node_value_font': '#B6ECF3'}, as_png: bool = False) → Union[Image, HTML][source]¶

For use in inline rendering of graph data in Jupyter Lab. Graph from target to source filepath. Target is in red, source is in cyan.

Parameters

orient (str, optional) –
Graph layout orientation. Default: lr. Options include:
- tb - top to bottom
- bt - bottom to top
- lr - left to right
- rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.conform_etl.CONFORM_COLOR_SCHEME
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns

HTML object for inline display.

Return type

IPython.display.HTML

radon_etl¶

class rolling_pin.radon_etl.RadonETL(fullpath: Union[str, Path])[source]¶

Bases: object

Conforms all four radon reports (raw metrics, Halstead, maintainability and cyclomatic complexity) into a single DataFrame that can then be plotted.

__dict__ = mappingproxy({'__module__': 'rolling_pin.radon_etl', '__doc__': '\n Conforms all four radon reports (raw metrics, Halstead, maintainability and\n cyclomatic complexity) into a single DataFrame that can then be plotted.\n ', '__init__': <function RadonETL.__init__>, 'report': <property object>, 'data': <property object>, 'raw_metrics': <property object>, 'maintainability_index': <property object>, 'cyclomatic_complexity_metrics': <property object>, 'halstead_metrics': <property object>, '_get_radon_data': <function RadonETL._get_radon_data>, '_get_radon_report': <staticmethod(<function RadonETL._get_radon_report>)>, '_get_raw_metrics_dataframe': <staticmethod(<function RadonETL._get_raw_metrics_dataframe>)>, '_get_maintainability_index_dataframe': <staticmethod(<function RadonETL._get_maintainability_index_dataframe>)>, '_get_cyclomatic_complexity_dataframe': <staticmethod(<function RadonETL._get_cyclomatic_complexity_dataframe>)>, '_get_halstead_dataframe': <staticmethod(<function RadonETL._get_halstead_dataframe>)>, 'write_plots': <function RadonETL.write_plots>, 'write_tables': <function RadonETL.write_tables>, '__dict__': <attribute '__dict__' of 'RadonETL' objects>, '__weakref__': <attribute '__weakref__' of 'RadonETL' objects>, '__annotations__': {}})¶

__init__(fullpath: Union[str, Path]) → None[source]¶

Constructs a RadonETL instance.

Parameters: fullpath (str or Path) – Python file or directory of python files.

__module__ = 'rolling_pin.radon_etl'¶

__weakref__¶: list of weak references to the object (if defined)

static _get_cyclomatic_complexity_dataframe(report: Dict) → DataFrame[source]¶

Converts radon cyclomatic complexity report into a pandas DataFrame.

Parameters: report (dict) – Radon report blob.
Returns: Cyclomatic complexity DataFrame.
Return type: DataFrame

static _get_halstead_dataframe(report: Dict) → DataFrame[source]¶

Converts radon Halstead report into a pandas DataFrame.

Parameters: report (dict) – Radon report blob.
Returns: Halstead DataFrame.
Return type: DataFrame

static _get_maintainability_index_dataframe(report: Dict) → DataFrame[source]¶

Converts radon maintainability index report into a pandas DataFrame.

Parameters: report (dict) – Radon report blob.
Returns: Maintainability DataFrame.
Return type: DataFrame

_get_radon_data() → DataFrame[source]¶

Constructs a DataFrame representing all the radon reports generated for a given python file or directory containing python files.

Returns: Radon report DataFrame.
Return type: DataFrame

static _get_radon_report(fullpath: Union[str, Path]) → Dict[str, Any][source]¶

Gets all 4 report from radon and aggregates them into a single blob object.

Parameters: fullpath (str or Path) – Python file or directory of python files.
Returns: Radon report blob.
Return type: dict

static _get_raw_metrics_dataframe(report: Dict) → DataFrame[source]¶

Converts radon raw metrics report into a pandas DataFrame.

Parameters: report (dict) – Radon report blob.
Returns: Raw metrics DataFrame.
Return type: DataFrame

property cyclomatic_complexity_metrics¶

DataFrame of radon cyclomatic complexity metrics.

Type: DataFrame

property data¶

DataFrame of all radon metrics.

Type: DataFrame

property halstead_metrics¶

DataFrame of radon Halstead metrics.

Type: DataFrame

property maintainability_index¶

DataFrame of radon maintainability index metrics.

Type: DataFrame

property raw_metrics¶

DataFrame of radon raw metrics.

Type: DataFrame

property report¶

Dictionary of all radon metrics.

Type: dict

write_plots(fullpath: Union[str, Path]) → RadonETL[source]¶

Writes metrics plots to given file.

Parameters: fullpath (Path or str) – Target file.
Returns: self.
Return type: RadonETL

write_tables(target_dir: Union[str, Path]) → RadonETL[source]¶

Writes metrics tables as HTML files to given directory.

Parameters: target_dir (Path or str) – Target directory.
Returns: self.
Return type: RadonETL

repo_etl¶

class rolling_pin.repo_etl.RepoETL(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|test_|_test|mock_)\\.py$')[source]¶

Bases: object

RepoETL is a class for extracting 1st order dependencies of modules within a given repository. This information is stored internally as a DataFrame and can be rendered as networkx, pydot or SVG graphs.

__dict__ = mappingproxy({'__module__': 'rolling_pin.repo_etl', '__doc__': '\n RepoETL is a class for extracting 1st order dependencies of modules within a\n given repository. This information is stored internally as a DataFrame and\n can be rendered as networkx, pydot or SVG graphs.\n ', '__init__': <function RepoETL.__init__>, '_get_imports': <staticmethod(<function RepoETL._get_imports>)>, '_get_data': <staticmethod(<function RepoETL._get_data>)>, '_calculate_coordinates': <staticmethod(<function RepoETL._calculate_coordinates>)>, '_anneal_coordinate': <staticmethod(<function RepoETL._anneal_coordinate>)>, '_center_coordinate': <staticmethod(<function RepoETL._center_coordinate>)>, '_to_networkx_graph': <staticmethod(<function RepoETL._to_networkx_graph>)>, 'to_networkx_graph': <function RepoETL.to_networkx_graph>, 'to_dot_graph': <function RepoETL.to_dot_graph>, 'to_dataframe': <function RepoETL.to_dataframe>, 'to_html': <function RepoETL.to_html>, 'write': <function RepoETL.write>, '__dict__': <attribute '__dict__' of 'RepoETL' objects>, '__weakref__': <attribute '__weakref__' of 'RepoETL' objects>, '__annotations__': {'_root': 'Union[str, Path]', '_data': 'DataFrame'}})¶

__init__(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|test_|_test|mock_)\\.py$') → None[source]¶

Construct RepoETL instance.

Parameters

root (str or Path) – Full path to repository root directory.
include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.
exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|test_|_test|mock_).py$’.

Raises

ValueError – If include or exclude regex does not end in ‘.py$’.

__module__ = 'rolling_pin.repo_etl'¶

__weakref__¶: list of weak references to the object (if defined)

static _anneal_coordinate(data: DataFrame, anneal_axis: str = 'x', pin_axis: str = 'y', iterations: int = 10) → DataFrame[source]¶

Iteratively align nodes in the anneal axis according to the mean position of their connected nodes. Node anneal coordinates are rectified at the end of each iteration according to a pin axis, so that they do not overlap. This mean that they are sorted at each level of the pin axis.

Parameters

data (DataFrame) – DataFrame with x column.
anneal_axis (str, optional) – Coordinate column to be annealed. Default: ‘x’.
pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.
iterations (int, optional) – Number of times to update x coordinates. Default: 10.

Returns

DataFrame with annealed anneal axis coordinates.

Return type

DataFrame

static _calculate_coordinates(data: DataFrame) → DataFrame[source]¶

Calculate inital x, y coordinates for each node in given DataFrame. Node are startified by type along the y axis.

Parameters: DataFrame – DataFrame of nodes.
Returns: DataFrame with x and y coordinate columns.
Return type: DataFrame

static _center_coordinate(data, center_axis='x', pin_axis='y')[source]¶

Sorted center_axis coordinates at each level of the pin axis.

Parameters

data (DataFrame) – DataFrame with x column.
anneal_column (str, optional) – Coordinate column to be annealed. Default: ‘x’.
pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.
iterations (int, optional) – Number of times to update x coordinates. Default: 10.

Returns

DataFrame with centered center axis coordinates.

Return type

DataFrame

static _get_data(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|_test)\\.py$') → DataFrame[source]¶

Recursively aggregates and filters all the files found with a given directory into a DataFrame. Data is used to create directed graphs.

DataFrame has these columns:

node_name - name of node

node_type - type of node, can be [module, subpackage, library]

x - node’s x coordinate

y - node’s y coordinate

dependencies - parent nodes

subpackages - parent nodes of type subpackage

fullpath - fullpath to the module a node represents

Parameters

root (str or Path) – Root directory to be searched.
include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.
exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|_test).py$’.

Raises

ValueError – If include or exclude regex does not end in ‘.py$’.
FileNotFoundError – If no files are found after filtering.

Returns

DataFrame of file information.

Return type

DataFrame

static _get_imports(fullpath: Union[str, Path]) → List[str][source]¶

Get’s import statements from a given python module.

Parameters: fullpath (str or Path) – Path to python module.
Returns: List of imported modules.
Return type: list(str)

static _to_networkx_graph(data)[source]¶

Converts given DataFrame into networkx directed graph.

Parameters: DataFrame – DataFrame of nodes.
Returns: Graph of nodes.
Return type: networkx.DiGraph

to_dataframe() → DataFrame[source]¶

Retruns:: DataFrame: DataFrame of nodes representing repo modules.

to_dot_graph(orient='tb', orthogonal_edges=False, color_scheme=None)[source]¶

Converts internal data into pydot graph.

Parameters

orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
- tb - top to bottom
- bt - bottom to top
- lr - left to right
- rl - right to left
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If orient is invalid.

Returns

Dot graph of nodes.

Return type

pydot.Dot

to_html(layout: str = 'dot', orthogonal_edges: bool = False, color_scheme: Optional[Dict[str, str]] = None, as_png: bool = False) → HTML[source]¶

For use in inline rendering of graph data in Jupyter Lab.

Parameters

layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Returns

HTML object for inline display.

Return type

IPython.display.HTML

to_networkx_graph()[source]¶

Converts internal data into networkx directed graph.

Returns: Graph of nodes.
Return type: networkx.DiGraph

write(fullpath: Union[str, Path], layout: str = 'dot', orient: str = 'tb', orthogonal_edges: bool = False, color_scheme: Optional[Dict[str, str]] = None) → RepoETL[source]¶

Writes internal data to a given filepath. Formats supported: svg, dot, png, json.

Parameters

fulllpath (str or Path) – File to be written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
- tb - top to bottom
- bt - bottom to top
- lr - left to right
- rl - right to left
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME

Raises

ValueError – If invalid file extension given.

Returns

Self.

Return type

RepoETL

toml_etl¶

class rolling_pin.toml_etl.TomlETL(data: dict[str, Any])[source]¶

Bases: object

__dict__ = mappingproxy({'__module__': 'rolling_pin.toml_etl', 'from_string': <classmethod(<function TomlETL.from_string>)>, 'from_toml': <classmethod(<function TomlETL.from_toml>)>, '__init__': <function TomlETL.__init__>, 'to_dict': <function TomlETL.to_dict>, 'to_string': <function TomlETL.to_string>, 'write': <function TomlETL.write>, 'edit': <function TomlETL.edit>, 'delete': <function TomlETL.delete>, 'search': <function TomlETL.search>, '__dict__': <attribute '__dict__' of 'TomlETL' objects>, '__weakref__': <attribute '__weakref__' of 'TomlETL' objects>, '__doc__': None, '__annotations__': {}})¶

__init__(data: dict[str, Any]) → None[source]¶

Creates a TomlETL instance from a given dictionary.

Parameters: data (dict) – Dictionary.

__module__ = 'rolling_pin.toml_etl'¶

__weakref__¶: list of weak references to the object (if defined)

delete(regex: str) → TomlETL[source]¶

Returns portion of data whose keys fo not match a given regular expression.

Parameters: regex (str) – Regular expression applied to keys.
Returns: New TomlETL instance.
Return type: TomlETL

edit(patch: str) → TomlETL[source]¶

Apply edit to internal data given TOML patch. Patch is always of the form ‘[key]=[value]’ and in TOML format.

Parameters

patch (str) – TOML patch to be applied.

Raises

TOMLDecoderError – If patch cannot be decoded.
EnforceError – If ‘=’ not found in patch.

Returns

New TomlETL instance with edits.

Return type

TomlETL

classmethod from_string(text: Type[T]) → T[source]¶

Creates a TomlETL instance from a given TOML string.

Parameters: text (str) – TOML string.
Returns: TomlETL instance.
Return type: TomlETL

classmethod from_toml(filepath: Type[T]) → T[source]¶

Creates a TomlETL instance from a given TOML file.

Parameters: filepath (str or Path) – TOML file.
Returns: TomlETL instance.
Return type: TomlETL

search(regex: str) → TomlETL[source]¶

Returns portion of data whose keys match a given regular expression.

Parameters: regex (str) – Regular expression applied to keys.
Returns: New TomlETL instance.
Return type: TomlETL

to_dict() → dict[source]¶

Converts instance to dictionary copy.

Returns: Dictionary copy of instance.
Return type: dict

to_string() → str[source]¶

Converts instance to a TOML formatted string.

Returns: TOML string.
Return type: str

write(filepath: Union[str, Path]) → None[source]¶

Writes instance to given TOML file.

Parameters: filepath (str or Path) – Target filepath.

tools¶

rolling_pin.tools.LOGGER = <Logger rolling_pin.tools (WARNING)>¶: Contains basic functions for more complex ETL functions and classes.

rolling_pin.tools.copy_file(source: Union[str, Path], target: Union[str, Path]) → None[source]¶

Copy a source file to a target file. Creating directories as needed.

Parameters

source (str or Path) – Source filepath.
target (str or Path) – Target filepath.

Raises

AssertionError – If source is not a file.

rolling_pin.tools.directory_to_dataframe(directory: Union[str, Path], include_regex: str = '', exclude_regex: str = '\\.DS_Store') → DataFrame[source]¶

Recursively list files with in a given directory as rows in a pd.DataFrame.

Parameters

directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.

Returns

pd.DataFrame with one file per row.

Return type

pd.DataFrame

rolling_pin.tools.dot_to_html(dot: Dot, layout: str = 'dot', as_png: bool = False) → Union[HTML, Image][source]¶

Converts a given pydot graph into a IPython.display.HTML object. Used in jupyter lab inline display of graph data.

Parameters

dot (pydot.Dot) – Pydot Graph instance.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.

Raises

ValueError – If invalid layout given.

Returns

HTML instance.

Return type

IPython.display.HTML

rolling_pin.tools.filter_text(text: str, include_regex: Optional[str] = None, exclude_regex: Optional[str] = None, replace_regex: Optional[str] = None, replace_value: Optional[str] = None) → str[source]¶

Filter given text by applying regular expressions to each line.

Parameters

text (str) – Newline separated lines.
include_regex (str, optional) – Keep lines that match given regex. Default: None.
exclude_regex (str, optional) – Remove lines that match given regex. Default: None.
replace_regex (str, optional) – Substitutes regex matches in lines with replace_value. Default: None.
replace_value (str, optional) – Regex substitution value. Default: ‘’.

Raises

AssertionError – If source is not a file.

Returns

Filtered text.

Return type

str

rolling_pin.tools.flatten(item: Iterable, separator: str = '/', embed_types: bool = True) → Dict[str, Any][source]¶

Flattens a iterable object into a flat dictionary.

Parameters

item (object) – Iterable object.
separator (str, optional) – Field separator in keys. Default: ‘/’.

Returns

Dictionary representation of given object.

Return type

dict

rolling_pin.tools.get_parent_fields(key: str, separator: str = '/') → List[str][source]¶

Get all the parent fields of a given key, split by given separator.

Parameters

key (str) – Key.
separator (str, optional) – String that splits key into fields. Default: ‘/’.

Returns

List of absolute parent fields.

Return type

list(str)

rolling_pin.tools.is_dictlike(item: Any) → bool[source]¶

Determines if given item is dict-like.

Parameters: item (object) – Object to be tested.
Returns: Whether given item is dict-like.
Return type: bool

rolling_pin.tools.is_iterable(item: Any) → bool[source]¶

Determines if given item is iterable.

Parameters: item (object) – Object to be tested.
Returns: Whether given item is iterable.
Return type: bool

rolling_pin.tools.is_listlike(item: Any) → bool[source]¶

Determines if given item is list-like.

Parameters: item (object) – Object to be tested.
Returns: Whether given item is list-like.
Return type: bool

rolling_pin.tools.list_all_files(directory: Union[str, Path], include_regex: Optional[str] = None, exclude_regex: Optional[str] = None) → Generator[Path, None, None][source]¶

Recusively list all files within a given directory.

Parameters

directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: None.

Raises

FileNotFoundError – If argument is not a directory or does not exist.

Yields

Path – File.

rolling_pin.tools.move_file(source: Union[str, Path], target: Union[str, Path]) → None[source]¶

Moves a source file to a target file. Creating directories as needed.

Parameters

source (str or Path) – Source filepath.
target (str or Path) – Target filepath.

Raises

AssertionError – If source is not a file.

rolling_pin.tools.nest(flat_dict: Dict[str, Any], separator: str = '/') → Dict[str, Any][source]¶

Converts a flat dictionary into a nested dictionary by splitting keys by a given separator.

Parameters

flat_dict (dict) – Flat dictionary.
separator (str, optional) – Field separator within given dictionary’s keys. Default: ‘/’.

Returns

Nested dictionary.

Return type

dict

rolling_pin.tools.read_text(filepath: Union[str, Path]) → str[source]¶

Convenience function for reading text from given file.

Parameters: filepath (str or Path) – File to be read.
Raises: AssertionError – If source is not a file.
Returns: text.
Return type: str

rolling_pin.tools.replace_and_format(regex: str, replace: str, string: str, flags: Any = 0) → str[source]¶

Perform a regex substitution on a given string and format any named group found in the result with groupdict data from the pattern. Group beggining with ‘i’ will be converted to integers. Groups beggining with ‘f’ will be converted to floats.

Named group anatomy:¶

(?P<NAME>PATTERN)

NAME becomes a key and whatever matches PATTERN becomes its value.
>>> re.search('(?P<i>\d+)', 'foobar123').groupdict()
{'i': '123'}

Examples:¶

Special groups:

(?P<i>d) - string matched by ‘d’ will be converted to an integer
(?P<f>d) - string matched by ‘d’ will be converted to an float
(?P<i_foo>d) - string matched by ‘d’ will be converted to an integer
(?P<f_bar>d) - string matched by ‘d’ will be converted to an float

Named groups (long):

>>> proj = '(?P<p>[a-z0-9]+)'
>>> spec = '(?P<s>[a-z0-9]+)'
>>> desc = '(?P<d>[a-z0-9\-]+)'
>>> ver = '(?P<iv>\d+)\.'
>>> frame = '(?P<i_f>\d+)'
>>> regex = f'{proj}\.{spec}\.{desc}\.v{ver}\.{frame}.*'
>>> replace = 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg'
>>> string = 'proj.spec.desc.v1.25.png'
>>> replace_and_format(regex, replace, string, flags=re.IGNORECASE)
p-proj_s-spec_d-desc_v001_f0025.jpeg

Named groups (short):

>>> replace_and_format(
    '(?P<p>[a-z0-9]+)\.(?P<s>[a-z0-9]+)\.(?P<d>[a-z0-9\-]+)\.v(?P<iv>\d+)\.(?P<i_f>\d+).*',
    'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg',
    'proj.spec.desc.v1.25.png',
)
p-proj_s-spec_d-desc_v001_f0025.jpeg

No groups:

>>> replace_and_format('foo', 'bar', 'foobar')
barbar

param regex: Regex pattern to search string with.
type regex: str
param replace: Replacement string which may contain formart variables ie ‘{variable}’.
type replace: str
param string: String to be converted.
type string: str
param flags: re.sub flags. Default: 0.
type flags: object, optional
returns: Converted string.
rtype: str

rolling_pin.tools.unembed(item: Any) → Any[source]¶

Convert embeded types in dictionary keys into python types.

Parameters: item (object) – Dictionary with embedded types.
Returns: Converted object.
Return type: object

rolling_pin.tools.write_dot_graph(dot: Dot, fullpath: Union[str, Path], layout: str = 'dot') → None[source]¶

Writes a pydot.Dot object to a given filepath. Formats supported: svg, dot, png.

Parameters

dot (pydot.Dot) – Pydot Dot instance.
fulllpath (str or Path) – File to be written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.

Raises

ValueError – If invalid file extension given.

rolling_pin.tools.write_text(text: str, filepath: Union[str, Path]) → None[source]¶

Convenience function for writing text to given file. Creates directories as needed.

Parameters

text (str) – Text to be written.
filepath (str or Path) – File to be written.