blob_etl¶
- class rolling_pin.blob_etl.BlobETL(blob: Any, separator: str = '/')[source]¶
Bases:
object
Converts blob data internally into a flat dictionary that is universally searchable, editable and convertable back to the data’s original structure, new blob structures or directed graphs.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.blob_etl', '__doc__': "\n Converts blob data internally into a flat dictionary that is universally\n searchable, editable and convertable back to the data's original structure,\n new blob structures or directed graphs.\n ", '__init__': <function BlobETL.__init__>, 'query': <function BlobETL.query>, 'filter': <function BlobETL.filter>, 'delete': <function BlobETL.delete>, 'set': <function BlobETL.set>, 'update': <function BlobETL.update>, 'set_field': <function BlobETL.set_field>, 'to_dict': <function BlobETL.to_dict>, 'to_flat_dict': <function BlobETL.to_flat_dict>, 'to_records': <function BlobETL.to_records>, 'to_dataframe': <function BlobETL.to_dataframe>, 'to_prototype': <function BlobETL.to_prototype>, 'to_networkx_graph': <function BlobETL.to_networkx_graph>, 'to_dot_graph': <function BlobETL.to_dot_graph>, 'to_html': <function BlobETL.to_html>, 'write': <function BlobETL.write>, '__dict__': <attribute '__dict__' of 'BlobETL' objects>, '__weakref__': <attribute '__weakref__' of 'BlobETL' objects>, '__annotations__': {'_data': 'Dict[str, Any]', '_separator': 'str'}})¶
- __init__(blob: Any, separator: str = '/') None [source]¶
Contructs BlobETL instance.
- Parameters
blob (object) – Iterable object.
separator (str, optional) – String to be used as a field separator in each key. Default: ‘/’.
- __module__ = 'rolling_pin.blob_etl'¶
- __weakref__¶
list of weak references to the object (if defined)
- delete(predicate: Callable[[Any], bool], by: str = 'key') BlobETL [source]¶
Delete data items by key, value or key + value, according to a given predicate.
- Parameters
predicate – Function that returns a boolean value.
by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.
- Raises
ValueError – If by keyword is not key, value, or key+value.
- Returns
New BlobETL instance.
- Return type
- filter(predicate: Callable[[Any], bool], by: str = 'key', invert: bool = False) BlobETL [source]¶
Filter data items by key, value or key + value, according to a given predicate.
- Parameters
predicate – Function that returns a boolean value.
by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.
invert (bool, optional) – Whether to invert the predicate. Default: False.
- Raises
ValueError – If by keyword is not key, value, or key+value.
- Returns
New BlobETL instance.
- Return type
- query(regex: str, ignore_case: bool = True, invert: bool = False) BlobETL [source]¶
Filter data items by key according to given regular expression.
- Parameters
regex (str) – Regular expression.
ignore_case (bool, optional) – Whether to consider case in the regular expression search. Default: False.
invert (bool, optional) – Whether to invert the predicate. Default: False.
- Returns
New BlobETL instance.
- Return type
- set(predicate: Optional[Callable[[Any, Any], bool]] = None, key_setter: Optional[Callable[[Any, Any], str]] = None, value_setter: Optional[Callable[[Any, Any], Any]] = None) BlobETL [source]¶
Filter data items by key, value or key + value, according to a given predicate. Then set that items key by a given function and value by a given function.
- Parameters
predicate (function, optional) – Function of the form: lambda k, v: bool. Default: None –> lambda k, v: True.
key_setter (function, optional) – Function of the form: lambda k, v: str. Default: None –> lambda k, v: k.
value_setter (function, optional) – Function of the form: lambda k, v: object. Default: None –> lambda k, v: v.
- Returns
New BlobETL instance.
- Return type
- set_field(index: int, field_setter: Callable[[str], str]) BlobETL [source]¶
Set’s a field at a given index according to a given function.
- Parameters
index (int) – Field index.
field_setter (function) – Function of form lambda str: str.
- Returns
New BlobETL instance.
- Return type
- to_dataframe(group_by: Optional[int] = None) DataFrame [source]¶
Convert data to pandas DataFrame.
- Parameters
group_by (int, optional) – Field index to group rows of data by. Default: None.
- Returns
DataFrame.
- Return type
DataFrame
- to_dot_graph(orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None) Dot [source]¶
Converts internal dictionary into pydot graph. Key and value nodes and edges are colored differently.
- Parameters
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises
ValueError – If orient is invalid.
- Returns
Dot graph representation of dictionary.
- Return type
pydot.Dot
- to_flat_dict() Dict[str, Any] [source]¶
- Returns
Flat dictionary with embedded types.
- Return type
dict
- to_html(layout: str = 'dot', orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None, as_png: bool = False) Union[Image, HTML] [source]¶
For use in inline rendering of graph data in Jupyter Lab.
- Parameters
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Returns
HTML object for inline display.
- Return type
IPython.display.HTML
- to_networkx_graph() DiGraph [source]¶
Converts internal dictionary into a networkx directed graph.
- Returns
Graph representation of dictionary.
- Return type
networkx.DiGraph
- to_prototype() BlobETL [source]¶
Convert data to prototypical representation.
Example:¶
>>> data = { 'users': [ { 'name': { 'first': 'tom', 'last': 'smith', } },{ 'name': { 'first': 'dick', 'last': 'smith', } },{ 'name': { 'first': 'jane', 'last': 'doe', } }, ] } >>> BlobETL(data).to_prototype().to_dict() { '^users': { '<list_[0-9]+>': { 'name': { 'first$': Counter({'dick': 1, 'jane': 1, 'tom': 1}), 'last$': Counter({'doe': 1, 'smith': 2}) } } } }
- returns
New BlobETL instance.
- rtype
BlobETL
- update(item: Union[Dict, BlobETL]) BlobETL [source]¶
Updates internal dictionary with given dictionary or BlobETL instance. Given dictionary is first flattened with embeded types.
- write(fullpath: Union[str, Path], layout: str = 'dot', orthogonal_edges: bool = False, orient: str = 'tb', color_scheme: Optional[Dict[str, str]] = None) BlobETL [source]¶
Writes internal dictionary to a given filepath. Formats supported: svg, dot, png, json.
- Parameters
fulllpath (str or Path) – File tobe written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises
ValueError – If invalid file extension given.
- Returns
self.
- Return type
conform_config¶
- class rolling_pin.conform_config.ConformConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
A class for validating configurations supplied to ConformETL.
- source_rules¶
A list of rules for parsing directories. Default: [].
- Type
Rules
- rename_rules¶
A list of rules for renaming source filepath to target filepaths. Default: [].
- Type
Rules
- group_rules¶
A list of rules for grouping files. Default: [].
- Type
Rules
- line_rules¶
A list of rules for peforming line copies and substitutions on files belonging to a given group. Default: [].
- Type
Rules
- class GroupRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
- __annotations__ = {}¶
- __module__ = 'rolling_pin.conform_config'¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
- name: StringType = <StringType() instance on GroupRule as 'name'>¶
- regex: StringType = <StringType() instance on GroupRule as 'regex'>¶
- class LineRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
- __annotations__ = {}¶
- __module__ = 'rolling_pin.conform_config'¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
- exclude: StringType = <StringType() instance on LineRule as 'exclude'>¶
- group: StringType = <StringType() instance on LineRule as 'group'>¶
- include: StringType = <StringType() instance on LineRule as 'include'>¶
- regex: StringType = <StringType() instance on LineRule as 'regex'>¶
- replace: StringType = <StringType() instance on LineRule as 'replace'>¶
- class RenameRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
- __annotations__ = {}¶
- __module__ = 'rolling_pin.conform_config'¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
- regex: StringType = <StringType() instance on RenameRule as 'regex'>¶
- replace: StringType = <StringType() instance on RenameRule as 'replace'>¶
- class SourceRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]¶
Bases:
Model
- __annotations__ = {}¶
- __module__ = 'rolling_pin.conform_config'¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
- exclude: StringType = <StringType() instance on SourceRule as 'exclude'>¶
- include: StringType = <StringType() instance on SourceRule as 'include'>¶
- path: StringType = <StringType() instance on SourceRule as 'path'>¶
- __module__ = 'rolling_pin.conform_config'¶
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>¶
- group_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'group_rules'>¶
- line_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'line_rules'>¶
- rename_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'rename_rules'>¶
- source_rules: ListType = <ListType(ModelType) instance on ConformConfig as 'source_rules'>¶
conform_etl¶
- class rolling_pin.conform_etl.ConformETL(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = [])[source]¶
Bases:
object
ConformETL creates a DataFrame from a given directory of source files. Then it generates target paths given a set of rules. Finally, the conform method is called and the source files are copied to their target filepaths.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.conform_etl', '__doc__': '\n ConformETL creates a DataFrame from a given directory of source files.\n Then it generates target paths given a set of rules.\n Finally, the conform method is called and the source files are copied to\n their target filepaths.\n ', '_get_data': <staticmethod(<function ConformETL._get_data>)>, 'from_yaml': <classmethod(<function ConformETL.from_yaml>)>, '__init__': <function ConformETL.__init__>, '__repr__': <function ConformETL.__repr__>, 'groups': <property object>, 'to_dataframe': <function ConformETL.to_dataframe>, 'to_blob': <function ConformETL.to_blob>, 'to_html': <function ConformETL.to_html>, 'conform': <function ConformETL.conform>, '__dict__': <attribute '__dict__' of 'ConformETL' objects>, '__weakref__': <attribute '__weakref__' of 'ConformETL' objects>, '__annotations__': {'_data': 'DataFrame', '_line_rules': 'Rules'}})¶
- __init__(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = []) None [source]¶
Generates DataFrame from given source_rules and then generates target paths for them given other rules.
- Parameters
source_rules (Rules) – A list of rules for parsing directories. Default: [].
rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].
group_rules (Rules) – A list of rules for grouping files. Default: [].
line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].
- Raises
DataError – If configuration is invalid.
- __module__ = 'rolling_pin.conform_etl'¶
- __repr__() str [source]¶
String representation of conform DataFrame.
- Returns
Table optimized for output to shell.
- Return type
str
- __weakref__¶
list of weak references to the object (if defined)
- static _get_data(source_rules: List[Dict[str, str]] = [], rename_rules: List[Dict[str, str]] = [], group_rules: List[Dict[str, str]] = [], line_rules: List[Dict[str, str]] = []) DataFrame [source]¶
Generates DataFrame from given source_rules and then generates target paths for them given other rules.
- Parameters
source_rules (Rules) – A list of rules for parsing directories. Default: [].
rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].
group_rules (Rules) – A list of rules for grouping files. Default: [].
line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].
- Returns
Conform DataFrame.
- Return type
DataFrame
- conform(groups: Union[str, List[str]] = 'all') None [source]¶
Copies source files to target filepaths.
- Parameters
groups (str or list[str]) – Groups of files which are to be conformed. ‘all’ means all groups. Default: ‘all’.
- classmethod from_yaml(filepath: Union[str, Path]) ConformETL [source]¶
Construct ConformETL instance from given yaml file.
- Parameters
filepath (str or Path) – YAML file.
- Raises
EnforceError – If file does not end in yml or yaml.
- Returns
ConformETL instance.
- Return type
- property groups¶
List of groups found with self._data.
- Type
list[str]
- to_blob() BlobETL [source]¶
Converts self into a BlobETL object with target column as keys and source columns as values.
- Returns
BlobETL of target and source filepaths.
- Return type
- to_html(orient: str = 'lr', color_scheme: Dict[str, str] = {'background': '#242424', 'edge': '#DE958E', 'edge_library': '#B6ECF3', 'edge_module': '#DE958E', 'edge_subpackage': '#A0D17B', 'edge_value': '#B6ECF3', 'node': '#343434', 'node_font': '#DE958E', 'node_library_font': '#B6ECF3', 'node_module_font': '#DE958E', 'node_subpackage_font': '#A0D17B', 'node_value': '#343434', 'node_value_font': '#B6ECF3'}, as_png: bool = False) Union[Image, HTML] [source]¶
For use in inline rendering of graph data in Jupyter Lab. Graph from target to source filepath. Target is in red, source is in cyan.
- Parameters
orient (str, optional) –
Graph layout orientation. Default: lr. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.conform_etl.CONFORM_COLOR_SCHEME
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Returns
HTML object for inline display.
- Return type
IPython.display.HTML
radon_etl¶
- class rolling_pin.radon_etl.RadonETL(fullpath: Union[str, Path])[source]¶
Bases:
object
Conforms all four radon reports (raw metrics, Halstead, maintainability and cyclomatic complexity) into a single DataFrame that can then be plotted.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.radon_etl', '__doc__': '\n Conforms all four radon reports (raw metrics, Halstead, maintainability and\n cyclomatic complexity) into a single DataFrame that can then be plotted.\n ', '__init__': <function RadonETL.__init__>, 'report': <property object>, 'data': <property object>, 'raw_metrics': <property object>, 'maintainability_index': <property object>, 'cyclomatic_complexity_metrics': <property object>, 'halstead_metrics': <property object>, '_get_radon_data': <function RadonETL._get_radon_data>, '_get_radon_report': <staticmethod(<function RadonETL._get_radon_report>)>, '_get_raw_metrics_dataframe': <staticmethod(<function RadonETL._get_raw_metrics_dataframe>)>, '_get_maintainability_index_dataframe': <staticmethod(<function RadonETL._get_maintainability_index_dataframe>)>, '_get_cyclomatic_complexity_dataframe': <staticmethod(<function RadonETL._get_cyclomatic_complexity_dataframe>)>, '_get_halstead_dataframe': <staticmethod(<function RadonETL._get_halstead_dataframe>)>, 'write_plots': <function RadonETL.write_plots>, 'write_tables': <function RadonETL.write_tables>, '__dict__': <attribute '__dict__' of 'RadonETL' objects>, '__weakref__': <attribute '__weakref__' of 'RadonETL' objects>, '__annotations__': {}})¶
- __init__(fullpath: Union[str, Path]) None [source]¶
Constructs a RadonETL instance.
- Parameters
fullpath (str or Path) – Python file or directory of python files.
- __module__ = 'rolling_pin.radon_etl'¶
- __weakref__¶
list of weak references to the object (if defined)
- static _get_cyclomatic_complexity_dataframe(report: Dict) DataFrame [source]¶
Converts radon cyclomatic complexity report into a pandas DataFrame.
- Parameters
report (dict) – Radon report blob.
- Returns
Cyclomatic complexity DataFrame.
- Return type
DataFrame
- static _get_halstead_dataframe(report: Dict) DataFrame [source]¶
Converts radon Halstead report into a pandas DataFrame.
- Parameters
report (dict) – Radon report blob.
- Returns
Halstead DataFrame.
- Return type
DataFrame
- static _get_maintainability_index_dataframe(report: Dict) DataFrame [source]¶
Converts radon maintainability index report into a pandas DataFrame.
- Parameters
report (dict) – Radon report blob.
- Returns
Maintainability DataFrame.
- Return type
DataFrame
- _get_radon_data() DataFrame [source]¶
Constructs a DataFrame representing all the radon reports generated for a given python file or directory containing python files.
- Returns
Radon report DataFrame.
- Return type
DataFrame
- static _get_radon_report(fullpath: Union[str, Path]) Dict[str, Any] [source]¶
Gets all 4 report from radon and aggregates them into a single blob object.
- Parameters
fullpath (str or Path) – Python file or directory of python files.
- Returns
Radon report blob.
- Return type
dict
- static _get_raw_metrics_dataframe(report: Dict) DataFrame [source]¶
Converts radon raw metrics report into a pandas DataFrame.
- Parameters
report (dict) – Radon report blob.
- Returns
Raw metrics DataFrame.
- Return type
DataFrame
- property cyclomatic_complexity_metrics¶
DataFrame of radon cyclomatic complexity metrics.
- Type
DataFrame
- property data¶
DataFrame of all radon metrics.
- Type
DataFrame
- property halstead_metrics¶
DataFrame of radon Halstead metrics.
- Type
DataFrame
- property maintainability_index¶
DataFrame of radon maintainability index metrics.
- Type
DataFrame
- property raw_metrics¶
DataFrame of radon raw metrics.
- Type
DataFrame
- property report¶
Dictionary of all radon metrics.
- Type
dict
repo_etl¶
- class rolling_pin.repo_etl.RepoETL(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|test_|_test|mock_)\\.py$')[source]¶
Bases:
object
RepoETL is a class for extracting 1st order dependencies of modules within a given repository. This information is stored internally as a DataFrame and can be rendered as networkx, pydot or SVG graphs.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.repo_etl', '__doc__': '\n RepoETL is a class for extracting 1st order dependencies of modules within a\n given repository. This information is stored internally as a DataFrame and\n can be rendered as networkx, pydot or SVG graphs.\n ', '__init__': <function RepoETL.__init__>, '_get_imports': <staticmethod(<function RepoETL._get_imports>)>, '_get_data': <staticmethod(<function RepoETL._get_data>)>, '_calculate_coordinates': <staticmethod(<function RepoETL._calculate_coordinates>)>, '_anneal_coordinate': <staticmethod(<function RepoETL._anneal_coordinate>)>, '_center_coordinate': <staticmethod(<function RepoETL._center_coordinate>)>, '_to_networkx_graph': <staticmethod(<function RepoETL._to_networkx_graph>)>, 'to_networkx_graph': <function RepoETL.to_networkx_graph>, 'to_dot_graph': <function RepoETL.to_dot_graph>, 'to_dataframe': <function RepoETL.to_dataframe>, 'to_html': <function RepoETL.to_html>, 'write': <function RepoETL.write>, '__dict__': <attribute '__dict__' of 'RepoETL' objects>, '__weakref__': <attribute '__weakref__' of 'RepoETL' objects>, '__annotations__': {'_root': 'Union[str, Path]', '_data': 'DataFrame'}})¶
- __init__(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|test_|_test|mock_)\\.py$') None [source]¶
Construct RepoETL instance.
- Parameters
root (str or Path) – Full path to repository root directory.
include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.
exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|test_|_test|mock_).py$’.
- Raises
ValueError – If include or exclude regex does not end in ‘.py$’.
- __module__ = 'rolling_pin.repo_etl'¶
- __weakref__¶
list of weak references to the object (if defined)
- static _anneal_coordinate(data: DataFrame, anneal_axis: str = 'x', pin_axis: str = 'y', iterations: int = 10) DataFrame [source]¶
Iteratively align nodes in the anneal axis according to the mean position of their connected nodes. Node anneal coordinates are rectified at the end of each iteration according to a pin axis, so that they do not overlap. This mean that they are sorted at each level of the pin axis.
- Parameters
data (DataFrame) – DataFrame with x column.
anneal_axis (str, optional) – Coordinate column to be annealed. Default: ‘x’.
pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.
iterations (int, optional) – Number of times to update x coordinates. Default: 10.
- Returns
DataFrame with annealed anneal axis coordinates.
- Return type
DataFrame
- static _calculate_coordinates(data: DataFrame) DataFrame [source]¶
Calculate inital x, y coordinates for each node in given DataFrame. Node are startified by type along the y axis.
- Parameters
DataFrame – DataFrame of nodes.
- Returns
DataFrame with x and y coordinate columns.
- Return type
DataFrame
- static _center_coordinate(data, center_axis='x', pin_axis='y')[source]¶
Sorted center_axis coordinates at each level of the pin axis.
- Parameters
data (DataFrame) – DataFrame with x column.
anneal_column (str, optional) – Coordinate column to be annealed. Default: ‘x’.
pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.
iterations (int, optional) – Number of times to update x coordinates. Default: 10.
- Returns
DataFrame with centered center axis coordinates.
- Return type
DataFrame
- static _get_data(root: Union[str, Path], include_regex: str = '.*\\.py$', exclude_regex: str = '(__init__|_test)\\.py$') DataFrame [source]¶
Recursively aggregates and filters all the files found with a given directory into a DataFrame. Data is used to create directed graphs.
DataFrame has these columns:
node_name - name of node
node_type - type of node, can be [module, subpackage, library]
x - node’s x coordinate
y - node’s y coordinate
dependencies - parent nodes
subpackages - parent nodes of type subpackage
fullpath - fullpath to the module a node represents
- Parameters
root (str or Path) – Root directory to be searched.
include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.
exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|_test).py$’.
- Raises
ValueError – If include or exclude regex does not end in ‘.py$’.
FileNotFoundError – If no files are found after filtering.
- Returns
DataFrame of file information.
- Return type
DataFrame
- static _get_imports(fullpath: Union[str, Path]) List[str] [source]¶
Get’s import statements from a given python module.
- Parameters
fullpath (str or Path) – Path to python module.
- Returns
List of imported modules.
- Return type
list(str)
- static _to_networkx_graph(data)[source]¶
Converts given DataFrame into networkx directed graph.
- Parameters
DataFrame – DataFrame of nodes.
- Returns
Graph of nodes.
- Return type
networkx.DiGraph
- to_dot_graph(orient='tb', orthogonal_edges=False, color_scheme=None)[source]¶
Converts internal data into pydot graph.
- Parameters
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises
ValueError – If orient is invalid.
- Returns
Dot graph of nodes.
- Return type
pydot.Dot
- to_html(layout: str = 'dot', orthogonal_edges: bool = False, color_scheme: Optional[Dict[str, str]] = None, as_png: bool = False) HTML [source]¶
For use in inline rendering of graph data in Jupyter Lab.
- Parameters
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Returns
HTML object for inline display.
- Return type
IPython.display.HTML
- to_networkx_graph()[source]¶
Converts internal data into networkx directed graph.
- Returns
Graph of nodes.
- Return type
networkx.DiGraph
- write(fullpath: Union[str, Path], layout: str = 'dot', orient: str = 'tb', orthogonal_edges: bool = False, color_scheme: Optional[Dict[str, str]] = None) RepoETL [source]¶
Writes internal data to a given filepath. Formats supported: svg, dot, png, json.
- Parameters
fulllpath (str or Path) – File to be written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises
ValueError – If invalid file extension given.
- Returns
Self.
- Return type
toml_etl¶
- class rolling_pin.toml_etl.TomlETL(data: dict[str, Any])[source]¶
Bases:
object
- __dict__ = mappingproxy({'__module__': 'rolling_pin.toml_etl', 'from_string': <classmethod(<function TomlETL.from_string>)>, 'from_toml': <classmethod(<function TomlETL.from_toml>)>, '__init__': <function TomlETL.__init__>, 'to_dict': <function TomlETL.to_dict>, 'to_string': <function TomlETL.to_string>, 'write': <function TomlETL.write>, 'edit': <function TomlETL.edit>, 'delete': <function TomlETL.delete>, 'search': <function TomlETL.search>, '__dict__': <attribute '__dict__' of 'TomlETL' objects>, '__weakref__': <attribute '__weakref__' of 'TomlETL' objects>, '__doc__': None, '__annotations__': {}})¶
- __init__(data: dict[str, Any]) None [source]¶
Creates a TomlETL instance from a given dictionary.
- Parameters
data (dict) – Dictionary.
- __module__ = 'rolling_pin.toml_etl'¶
- __weakref__¶
list of weak references to the object (if defined)
- delete(regex: str) TomlETL [source]¶
Returns portion of data whose keys fo not match a given regular expression.
- Parameters
regex (str) – Regular expression applied to keys.
- Returns
New TomlETL instance.
- Return type
- edit(patch: str) TomlETL [source]¶
Apply edit to internal data given TOML patch. Patch is always of the form ‘[key]=[value]’ and in TOML format.
- Parameters
patch (str) – TOML patch to be applied.
- Raises
TOMLDecoderError – If patch cannot be decoded.
EnforceError – If ‘=’ not found in patch.
- Returns
New TomlETL instance with edits.
- Return type
- classmethod from_string(text: Type[T]) T [source]¶
Creates a TomlETL instance from a given TOML string.
- Parameters
text (str) – TOML string.
- Returns
TomlETL instance.
- Return type
- classmethod from_toml(filepath: Type[T]) T [source]¶
Creates a TomlETL instance from a given TOML file.
- Parameters
filepath (str or Path) – TOML file.
- Returns
TomlETL instance.
- Return type
- search(regex: str) TomlETL [source]¶
Returns portion of data whose keys match a given regular expression.
- Parameters
regex (str) – Regular expression applied to keys.
- Returns
New TomlETL instance.
- Return type
- to_dict() dict [source]¶
Converts instance to dictionary copy.
- Returns
Dictionary copy of instance.
- Return type
dict
tools¶
- rolling_pin.tools.LOGGER = <Logger rolling_pin.tools (WARNING)>¶
Contains basic functions for more complex ETL functions and classes.
- rolling_pin.tools.copy_file(source: Union[str, Path], target: Union[str, Path]) None [source]¶
Copy a source file to a target file. Creating directories as needed.
- Parameters
source (str or Path) – Source filepath.
target (str or Path) – Target filepath.
- Raises
AssertionError – If source is not a file.
- rolling_pin.tools.directory_to_dataframe(directory: Union[str, Path], include_regex: str = '', exclude_regex: str = '\\.DS_Store') DataFrame [source]¶
Recursively list files with in a given directory as rows in a pd.DataFrame.
- Parameters
directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.
- Returns
pd.DataFrame with one file per row.
- Return type
pd.DataFrame
- rolling_pin.tools.dot_to_html(dot: Dot, layout: str = 'dot', as_png: bool = False) Union[HTML, Image] [source]¶
Converts a given pydot graph into a IPython.display.HTML object. Used in jupyter lab inline display of graph data.
- Parameters
dot (pydot.Dot) – Pydot Graph instance.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Raises
ValueError – If invalid layout given.
- Returns
HTML instance.
- Return type
IPython.display.HTML
- rolling_pin.tools.filter_text(text: str, include_regex: Optional[str] = None, exclude_regex: Optional[str] = None, replace_regex: Optional[str] = None, replace_value: Optional[str] = None) str [source]¶
Filter given text by applying regular expressions to each line.
- Parameters
text (str) – Newline separated lines.
include_regex (str, optional) – Keep lines that match given regex. Default: None.
exclude_regex (str, optional) – Remove lines that match given regex. Default: None.
replace_regex (str, optional) – Substitutes regex matches in lines with replace_value. Default: None.
replace_value (str, optional) – Regex substitution value. Default: ‘’.
- Raises
AssertionError – If source is not a file.
- Returns
Filtered text.
- Return type
str
- rolling_pin.tools.flatten(item: Iterable, separator: str = '/', embed_types: bool = True) Dict[str, Any] [source]¶
Flattens a iterable object into a flat dictionary.
- Parameters
item (object) – Iterable object.
separator (str, optional) – Field separator in keys. Default: ‘/’.
- Returns
Dictionary representation of given object.
- Return type
dict
- rolling_pin.tools.get_parent_fields(key: str, separator: str = '/') List[str] [source]¶
Get all the parent fields of a given key, split by given separator.
- Parameters
key (str) – Key.
separator (str, optional) – String that splits key into fields. Default: ‘/’.
- Returns
List of absolute parent fields.
- Return type
list(str)
- rolling_pin.tools.is_dictlike(item: Any) bool [source]¶
Determines if given item is dict-like.
- Parameters
item (object) – Object to be tested.
- Returns
Whether given item is dict-like.
- Return type
bool
- rolling_pin.tools.is_iterable(item: Any) bool [source]¶
Determines if given item is iterable.
- Parameters
item (object) – Object to be tested.
- Returns
Whether given item is iterable.
- Return type
bool
- rolling_pin.tools.is_listlike(item: Any) bool [source]¶
Determines if given item is list-like.
- Parameters
item (object) – Object to be tested.
- Returns
Whether given item is list-like.
- Return type
bool
- rolling_pin.tools.list_all_files(directory: Union[str, Path], include_regex: Optional[str] = None, exclude_regex: Optional[str] = None) Generator[Path, None, None] [source]¶
Recusively list all files within a given directory.
- Parameters
directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: None.
- Raises
FileNotFoundError – If argument is not a directory or does not exist.
- Yields
Path – File.
- rolling_pin.tools.move_file(source: Union[str, Path], target: Union[str, Path]) None [source]¶
Moves a source file to a target file. Creating directories as needed.
- Parameters
source (str or Path) – Source filepath.
target (str or Path) – Target filepath.
- Raises
AssertionError – If source is not a file.
- rolling_pin.tools.nest(flat_dict: Dict[str, Any], separator: str = '/') Dict[str, Any] [source]¶
Converts a flat dictionary into a nested dictionary by splitting keys by a given separator.
- Parameters
flat_dict (dict) – Flat dictionary.
separator (str, optional) – Field separator within given dictionary’s keys. Default: ‘/’.
- Returns
Nested dictionary.
- Return type
dict
- rolling_pin.tools.read_text(filepath: Union[str, Path]) str [source]¶
Convenience function for reading text from given file.
- Parameters
filepath (str or Path) – File to be read.
- Raises
AssertionError – If source is not a file.
- Returns
text.
- Return type
str
- rolling_pin.tools.replace_and_format(regex: str, replace: str, string: str, flags: Any = 0) str [source]¶
Perform a regex substitution on a given string and format any named group found in the result with groupdict data from the pattern. Group beggining with ‘i’ will be converted to integers. Groups beggining with ‘f’ will be converted to floats.
Named group anatomy:¶
(?P<NAME>PATTERN)
NAME becomes a key and whatever matches PATTERN becomes its value.
>>> re.search('(?P<i>\d+)', 'foobar123').groupdict() {'i': '123'}
Examples:¶
- Special groups:
(?P<i>d) - string matched by ‘d’ will be converted to an integer
(?P<f>d) - string matched by ‘d’ will be converted to an float
(?P<i_foo>d) - string matched by ‘d’ will be converted to an integer
(?P<f_bar>d) - string matched by ‘d’ will be converted to an float
- Named groups (long):
>>> proj = '(?P<p>[a-z0-9]+)' >>> spec = '(?P<s>[a-z0-9]+)' >>> desc = '(?P<d>[a-z0-9\-]+)' >>> ver = '(?P<iv>\d+)\.' >>> frame = '(?P<i_f>\d+)' >>> regex = f'{proj}\.{spec}\.{desc}\.v{ver}\.{frame}.*' >>> replace = 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg' >>> string = 'proj.spec.desc.v1.25.png' >>> replace_and_format(regex, replace, string, flags=re.IGNORECASE) p-proj_s-spec_d-desc_v001_f0025.jpeg
- Named groups (short):
>>> replace_and_format( '(?P<p>[a-z0-9]+)\.(?P<s>[a-z0-9]+)\.(?P<d>[a-z0-9\-]+)\.v(?P<iv>\d+)\.(?P<i_f>\d+).*', 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg', 'proj.spec.desc.v1.25.png', ) p-proj_s-spec_d-desc_v001_f0025.jpeg
- No groups:
>>> replace_and_format('foo', 'bar', 'foobar') barbar
- param regex
Regex pattern to search string with.
- type regex
str
- param replace
Replacement string which may contain formart variables ie ‘{variable}’.
- type replace
str
- param string
String to be converted.
- type string
str
- param flags
re.sub flags. Default: 0.
- type flags
object, optional
- returns
Converted string.
- rtype
str
- rolling_pin.tools.unembed(item: Any) Any [source]¶
Convert embeded types in dictionary keys into python types.
- Parameters
item (object) – Dictionary with embedded types.
- Returns
Converted object.
- Return type
object
- rolling_pin.tools.write_dot_graph(dot: Dot, fullpath: Union[str, Path], layout: str = 'dot') None [source]¶
Writes a pydot.Dot object to a given filepath. Formats supported: svg, dot, png.
- Parameters
dot (pydot.Dot) – Pydot Dot instance.
fulllpath (str or Path) – File to be written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
- Raises
ValueError – If invalid file extension given.