blob_etl
- class rolling_pin.blob_etl.BlobETL(blob, separator='/')[source]
Bases:
object
Converts blob data internally into a flat dictionary that is universally searchable, editable and convertable back to the data’s original structure, new blob structures or directed graphs.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.blob_etl', '__firstlineno__': 26, '__doc__': "\nConverts blob data internally into a flat dictionary that is universally\nsearchable, editable and convertable back to the data's original structure,\nnew blob structures or directed graphs.\n", '__init__': <function BlobETL.__init__>, 'query': <function BlobETL.query>, 'filter': <function BlobETL.filter>, 'delete': <function BlobETL.delete>, 'set': <function BlobETL.set>, 'update': <function BlobETL.update>, 'set_field': <function BlobETL.set_field>, 'to_dict': <function BlobETL.to_dict>, 'to_flat_dict': <function BlobETL.to_flat_dict>, 'to_records': <function BlobETL.to_records>, 'to_dataframe': <function BlobETL.to_dataframe>, 'to_prototype': <function BlobETL.to_prototype>, 'to_networkx_graph': <function BlobETL.to_networkx_graph>, 'to_dot_graph': <function BlobETL.to_dot_graph>, 'to_html': <function BlobETL.to_html>, 'write': <function BlobETL.write>, '__static_attributes__': ('_data', '_separator'), '__dict__': <attribute '__dict__' of 'BlobETL' objects>, '__weakref__': <attribute '__weakref__' of 'BlobETL' objects>, '__annotations__': {'_data': 'Dict[str, Any]', '_separator': 'str'}})
- __firstlineno__ = 26
- __init__(blob, separator='/')[source]
Contructs BlobETL instance.
- Parameters:
blob (object) – Iterable object.
separator (str, optional) – String to be used as a field separator in each key. Default: ‘/’.
- __module__ = 'rolling_pin.blob_etl'
- __static_attributes__ = ('_data', '_separator')
- __weakref__
list of weak references to the object
- delete(predicate, by='key')[source]
Delete data items by key, value or key + value, according to a given predicate.
- Parameters:
predicate (
Callable
[[Any
],bool
]) – Function that returns a boolean value.by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.
- Raises:
ValueError – If by keyword is not key, value, or key+value.
- Returns:
New BlobETL instance.
- Return type:
- filter(predicate, by='key', invert=False)[source]
Filter data items by key, value or key + value, according to a given predicate.
- Parameters:
predicate (
Callable
[[Any
],bool
]) – Function that returns a boolean value.by (str, optional) – Value handed to predicate. Options include: key, value, key+value. Default: key.
invert (bool, optional) – Whether to invert the predicate. Default: False.
- Raises:
ValueError – If by keyword is not key, value, or key+value.
- Returns:
New BlobETL instance.
- Return type:
- query(regex, ignore_case=True, invert=False)[source]
Filter data items by key according to given regular expression.
- Parameters:
regex (str) – Regular expression.
ignore_case (bool, optional) – Whether to consider case in the regular expression search. Default: False.
invert (bool, optional) – Whether to invert the predicate. Default: False.
- Returns:
New BlobETL instance.
- Return type:
- set(predicate=None, key_setter=None, value_setter=None)[source]
Filter data items by key, value or key + value, according to a given predicate. Then set that items key by a given function and value by a given function.
- Parameters:
predicate (function, optional) – Function of the form: lambda k, v: bool. Default: None –> lambda k, v: True.
key_setter (function, optional) – Function of the form: lambda k, v: str. Default: None –> lambda k, v: k.
value_setter (function, optional) – Function of the form: lambda k, v: object. Default: None –> lambda k, v: v.
- Returns:
New BlobETL instance.
- Return type:
- set_field(index, field_setter)[source]
Set’s a field at a given index according to a given function.
- Parameters:
index (int) – Field index.
field_setter (function) – Function of form lambda str: str.
- Returns:
New BlobETL instance.
- Return type:
- to_dataframe(group_by=None)[source]
Convert data to pandas DataFrame.
- Parameters:
group_by (int, optional) – Field index to group rows of data by. Default: None.
- Returns:
DataFrame.
- Return type:
DataFrame
- to_dot_graph(orthogonal_edges=False, orient='tb', color_scheme=None)[source]
Converts internal dictionary into pydot graph. Key and value nodes and edges are colored differently.
- Parameters:
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme (
Optional
[Dict
[str
,str
]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises:
ValueError – If orient is invalid.
- Returns:
Dot graph representation of dictionary.
- Return type:
pydot.Dot
- to_html(layout='dot', orthogonal_edges=False, orient='tb', color_scheme=None, as_png=False)[source]
For use in inline rendering of graph data in Jupyter Lab.
- Parameters:
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme (
Optional
[Dict
[str
,str
]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEMEas_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Returns:
HTML object for inline display.
- Return type:
IPython.display.HTML
- to_networkx_graph()[source]
Converts internal dictionary into a networkx directed graph.
- Returns:
Graph representation of dictionary.
- Return type:
networkx.DiGraph
- to_prototype()[source]
Convert data to prototypical representation.
Example:
>>> data = { 'users': [ { 'name': { 'first': 'tom', 'last': 'smith', } },{ 'name': { 'first': 'dick', 'last': 'smith', } },{ 'name': { 'first': 'jane', 'last': 'doe', } }, ] } >>> BlobETL(data).to_prototype().to_dict() { '^users': { '<list_[0-9]+>': { 'name': { 'first$': Counter({'dick': 1, 'jane': 1, 'tom': 1}), 'last$': Counter({'doe': 1, 'smith': 2}) } } } }
- returns:
New BlobETL instance.
- rtype:
BlobETL
- update(item)[source]
Updates internal dictionary with given dictionary or BlobETL instance. Given dictionary is first flattened with embeded types.
- write(fullpath, layout='dot', orthogonal_edges=False, orient='tb', color_scheme=None)[source]
Writes internal dictionary to a given filepath. Formats supported: svg, dot, png, json.
- Parameters:
fulllpath (str or Path) – File tobe written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme (
Optional
[Dict
[str
,str
]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises:
ValueError – If invalid file extension given.
- Returns:
self.
- Return type:
conform_config
- class rolling_pin.conform_config.ConformConfig(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]
Bases:
Model
A class for validating configurations supplied to ConformETL.
- source_rules
A list of rules for parsing directories. Default: [].
- Type:
Rules
- rename_rules
A list of rules for renaming source filepath to target filepaths. Default: [].
- Type:
Rules
- group_rules
A list of rules for grouping files. Default: [].
- Type:
Rules
- line_rules
A list of rules for peforming line copies and substitutions on files belonging to a given group. Default: [].
- Type:
Rules
- class GroupRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]
Bases:
Model
- __annotations__ = {}
- __firstlineno__ = 52
- __module__ = 'rolling_pin.conform_config'
- __static_attributes__ = ()
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
-
name:
StringType
= <StringType() instance on GroupRule as 'name'>
-
regex:
StringType
= <StringType() instance on GroupRule as 'regex'>
- class LineRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]
Bases:
Model
- __annotations__ = {}
- __firstlineno__ = 56
- __module__ = 'rolling_pin.conform_config'
- __static_attributes__ = ()
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
-
exclude:
StringType
= <StringType() instance on LineRule as 'exclude'>
-
group:
StringType
= <StringType() instance on LineRule as 'group'>
-
include:
StringType
= <StringType() instance on LineRule as 'include'>
-
regex:
StringType
= <StringType() instance on LineRule as 'regex'>
-
replace:
StringType
= <StringType() instance on LineRule as 'replace'>
- class RenameRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]
Bases:
Model
- __annotations__ = {}
- __firstlineno__ = 48
- __module__ = 'rolling_pin.conform_config'
- __static_attributes__ = ()
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
-
regex:
StringType
= <StringType() instance on RenameRule as 'regex'>
-
replace:
StringType
= <StringType() instance on RenameRule as 'replace'>
- class SourceRule(raw_data=None, trusted_data=None, deserialize_mapping=None, init=True, partial=True, strict=True, validate=False, app_data=None, lazy=False, **kwargs)[source]
Bases:
Model
- __annotations__ = {}
- __firstlineno__ = 43
- __module__ = 'rolling_pin.conform_config'
- __static_attributes__ = ()
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
-
exclude:
StringType
= <StringType() instance on SourceRule as 'exclude'>
-
include:
StringType
= <StringType() instance on SourceRule as 'include'>
-
path:
StringType
= <StringType() instance on SourceRule as 'path'>
- __firstlineno__ = 29
- __module__ = 'rolling_pin.conform_config'
- __static_attributes__ = ()
- _schema = <schematics.deprecated.patch_schema.<locals>.Schema object>
-
group_rules:
ListType
= <ListType(ModelType) instance on ConformConfig as 'group_rules'>
-
line_rules:
ListType
= <ListType(ModelType) instance on ConformConfig as 'line_rules'>
-
rename_rules:
ListType
= <ListType(ModelType) instance on ConformConfig as 'rename_rules'>
-
source_rules:
ListType
= <ListType(ModelType) instance on ConformConfig as 'source_rules'>
conform_etl
- class rolling_pin.conform_etl.ConformETL(source_rules=[], rename_rules=[], group_rules=[], line_rules=[])[source]
Bases:
object
ConformETL creates a DataFrame from a given directory of source files. Then it generates target paths given a set of rules. Finally, the conform method is called and the source files are copied to their target filepaths.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.conform_etl', '__firstlineno__': 35, '__doc__': '\nConformETL creates a DataFrame from a given directory of source files.\nThen it generates target paths given a set of rules.\nFinally, the conform method is called and the source files are copied to\ntheir target filepaths.\n', '_get_data': <staticmethod(<function ConformETL._get_data>)>, 'from_yaml': <classmethod(<function ConformETL.from_yaml>)>, '__init__': <function ConformETL.__init__>, '__repr__': <function ConformETL.__repr__>, 'groups': <property object>, 'to_dataframe': <function ConformETL.to_dataframe>, 'to_blob': <function ConformETL.to_blob>, 'to_html': <function ConformETL.to_html>, 'conform': <function ConformETL.conform>, '__static_attributes__': ('_data', '_line_rules'), '__dict__': <attribute '__dict__' of 'ConformETL' objects>, '__weakref__': <attribute '__weakref__' of 'ConformETL' objects>, '__annotations__': {'_data': 'DataFrame', '_line_rules': 'Rules'}})
- __firstlineno__ = 35
- __init__(source_rules=[], rename_rules=[], group_rules=[], line_rules=[])[source]
Generates DataFrame from given source_rules and then generates target paths for them given other rules.
- Parameters:
source_rules (Rules) – A list of rules for parsing directories. Default: [].
rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].
group_rules (Rules) – A list of rules for grouping files. Default: [].
line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].
- Raises:
DataError – If configuration is invalid.
- __module__ = 'rolling_pin.conform_etl'
- __repr__()[source]
String representation of conform DataFrame.
- Returns:
Table optimized for output to shell.
- Return type:
str
- __static_attributes__ = ('_data', '_line_rules')
- __weakref__
list of weak references to the object
- static _get_data(source_rules=[], rename_rules=[], group_rules=[], line_rules=[])[source]
Generates DataFrame from given source_rules and then generates target paths for them given other rules.
- Parameters:
source_rules (Rules) – A list of rules for parsing directories. Default: [].
rename_rules (Rules) – A list of rules for renaming source filepath to target filepaths. Default: [].
group_rules (Rules) – A list of rules for grouping files. Default: [].
line_rules (Rules) – A list of rules for peforming line copies on files belonging to a given group. Default: [].
- Returns:
Conform DataFrame.
- Return type:
DataFrame
- conform(groups='all')[source]
Copies source files to target filepaths.
- Parameters:
groups (str or list[str]) – Groups of files which are to be conformed. ‘all’ means all groups. Default: ‘all’.
- Return type:
None
- classmethod from_yaml(filepath)[source]
Construct ConformETL instance from given yaml file.
- Parameters:
filepath (str or Path) – YAML file.
- Raises:
EnforceError – If file does not end in yml or yaml.
- Returns:
ConformETL instance.
- Return type:
- property groups: List[str]
List of groups found with self._data.
- Type:
list[str]
- to_blob()[source]
Converts self into a BlobETL object with target column as keys and source columns as values.
- Returns:
BlobETL of target and source filepaths.
- Return type:
- to_html(orient='lr', color_scheme={'background': '#242424', 'edge': '#DE958E', 'edge_library': '#B6ECF3', 'edge_module': '#DE958E', 'edge_subpackage': '#A0D17B', 'edge_value': '#B6ECF3', 'node': '#343434', 'node_font': '#DE958E', 'node_library_font': '#B6ECF3', 'node_module_font': '#DE958E', 'node_subpackage_font': '#A0D17B', 'node_value': '#343434', 'node_value_font': '#B6ECF3'}, as_png=False)[source]
For use in inline rendering of graph data in Jupyter Lab. Graph from target to source filepath. Target is in red, source is in cyan.
- Parameters:
orient (str, optional) –
Graph layout orientation. Default: lr. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
color_scheme (
Dict
[str
,str
]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.conform_etl.CONFORM_COLOR_SCHEMEas_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Returns:
HTML object for inline display.
- Return type:
IPython.display.HTML
radon_etl
- class rolling_pin.radon_etl.RadonETL(fullpath)[source]
Bases:
object
Conforms all four radon reports (raw metrics, Halstead, maintainability and cyclomatic complexity) into a single DataFrame that can then be plotted.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.radon_etl', '__firstlineno__': 26, '__doc__': '\nConforms all four radon reports (raw metrics, Halstead, maintainability and\ncyclomatic complexity) into a single DataFrame that can then be plotted.\n', '__init__': <function RadonETL.__init__>, 'report': <property object>, 'data': <property object>, 'raw_metrics': <property object>, 'maintainability_index': <property object>, 'cyclomatic_complexity_metrics': <property object>, 'halstead_metrics': <property object>, '_get_radon_data': <function RadonETL._get_radon_data>, '_get_radon_report': <staticmethod(<function RadonETL._get_radon_report>)>, '_get_raw_metrics_dataframe': <staticmethod(<function RadonETL._get_raw_metrics_dataframe>)>, '_get_maintainability_index_dataframe': <staticmethod(<function RadonETL._get_maintainability_index_dataframe>)>, '_get_cyclomatic_complexity_dataframe': <staticmethod(<function RadonETL._get_cyclomatic_complexity_dataframe>)>, '_get_halstead_dataframe': <staticmethod(<function RadonETL._get_halstead_dataframe>)>, 'write_plots': <function RadonETL.write_plots>, 'write_tables': <function RadonETL.write_tables>, '__static_attributes__': ('_report',), '__dict__': <attribute '__dict__' of 'RadonETL' objects>, '__weakref__': <attribute '__weakref__' of 'RadonETL' objects>, '__annotations__': {}})
- __firstlineno__ = 26
- __init__(fullpath)[source]
Constructs a RadonETL instance.
- Parameters:
fullpath (str or Path) – Python file or directory of python files.
- __module__ = 'rolling_pin.radon_etl'
- __static_attributes__ = ('_report',)
- __weakref__
list of weak references to the object
- static _get_cyclomatic_complexity_dataframe(report)[source]
Converts radon cyclomatic complexity report into a pandas DataFrame.
- Parameters:
report (dict) – Radon report blob.
- Returns:
Cyclomatic complexity DataFrame.
- Return type:
DataFrame
- static _get_halstead_dataframe(report)[source]
Converts radon Halstead report into a pandas DataFrame.
- Parameters:
report (dict) – Radon report blob.
- Returns:
Halstead DataFrame.
- Return type:
DataFrame
- static _get_maintainability_index_dataframe(report)[source]
Converts radon maintainability index report into a pandas DataFrame.
- Parameters:
report (dict) – Radon report blob.
- Returns:
Maintainability DataFrame.
- Return type:
DataFrame
- _get_radon_data()[source]
Constructs a DataFrame representing all the radon reports generated for a given python file or directory containing python files.
- Returns:
Radon report DataFrame.
- Return type:
DataFrame
- static _get_radon_report(fullpath)[source]
Gets all 4 report from radon and aggregates them into a single blob object.
- Parameters:
fullpath (str or Path) – Python file or directory of python files.
- Returns:
Radon report blob.
- Return type:
dict
- static _get_raw_metrics_dataframe(report)[source]
Converts radon raw metrics report into a pandas DataFrame.
- Parameters:
report (dict) – Radon report blob.
- Returns:
Raw metrics DataFrame.
- Return type:
DataFrame
- property cyclomatic_complexity_metrics: DataFrame
DataFrame of radon cyclomatic complexity metrics.
- Type:
DataFrame
- property data: DataFrame
DataFrame of all radon metrics.
- Type:
DataFrame
- property halstead_metrics: DataFrame
DataFrame of radon Halstead metrics.
- Type:
DataFrame
- property maintainability_index: DataFrame
DataFrame of radon maintainability index metrics.
- Type:
DataFrame
- property raw_metrics: DataFrame
DataFrame of radon raw metrics.
- Type:
DataFrame
- property report: Dict
Dictionary of all radon metrics.
- Type:
dict
repo_etl
- class rolling_pin.repo_etl.RepoETL(root, include_regex='.*\\\\.py$', exclude_regex='(__init__|test_|_test|mock_)\\\\.py$')[source]
Bases:
object
RepoETL is a class for extracting 1st order dependencies of modules within a given repository. This information is stored internally as a DataFrame and can be rendered as networkx, pydot or SVG graphs.
- __dict__ = mappingproxy({'__module__': 'rolling_pin.repo_etl', '__firstlineno__': 24, '__doc__': '\nRepoETL is a class for extracting 1st order dependencies of modules within a\ngiven repository. This information is stored internally as a DataFrame and\ncan be rendered as networkx, pydot or SVG graphs.\n', '__init__': <function RepoETL.__init__>, '_get_imports': <staticmethod(<function RepoETL._get_imports>)>, '_get_data': <staticmethod(<function RepoETL._get_data>)>, '_calculate_coordinates': <staticmethod(<function RepoETL._calculate_coordinates>)>, '_anneal_coordinate': <staticmethod(<function RepoETL._anneal_coordinate>)>, '_center_coordinate': <staticmethod(<function RepoETL._center_coordinate>)>, '_to_networkx_graph': <staticmethod(<function RepoETL._to_networkx_graph>)>, 'to_networkx_graph': <function RepoETL.to_networkx_graph>, 'to_dot_graph': <function RepoETL.to_dot_graph>, 'to_dataframe': <function RepoETL.to_dataframe>, 'to_html': <function RepoETL.to_html>, 'write': <function RepoETL.write>, '__static_attributes__': ('_data', '_root'), '__dict__': <attribute '__dict__' of 'RepoETL' objects>, '__weakref__': <attribute '__weakref__' of 'RepoETL' objects>, '__annotations__': {'_root': 'Union[str, Path]', '_data': 'DataFrame'}})
- __firstlineno__ = 24
- __init__(root, include_regex='.*\\\\.py$', exclude_regex='(__init__|test_|_test|mock_)\\\\.py$')[source]
Construct RepoETL instance.
- Parameters:
root (str or Path) – Full path to repository root directory.
include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.
exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|test_|_test|mock_).py$’.
- Raises:
ValueError – If include or exclude regex does not end in ‘.py$’.
- __module__ = 'rolling_pin.repo_etl'
- __static_attributes__ = ('_data', '_root')
- __weakref__
list of weak references to the object
- static _anneal_coordinate(data, anneal_axis='x', pin_axis='y', iterations=10)[source]
Iteratively align nodes in the anneal axis according to the mean position of their connected nodes. Node anneal coordinates are rectified at the end of each iteration according to a pin axis, so that they do not overlap. This mean that they are sorted at each level of the pin axis.
- Parameters:
data (DataFrame) – DataFrame with x column.
anneal_axis (str, optional) – Coordinate column to be annealed. Default: ‘x’.
pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.
iterations (int, optional) – Number of times to update x coordinates. Default: 10.
- Returns:
DataFrame with annealed anneal axis coordinates.
- Return type:
DataFrame
- static _calculate_coordinates(data)[source]
Calculate inital x, y coordinates for each node in given DataFrame. Node are startified by type along the y axis.
- Parameters:
DataFrame – DataFrame of nodes.
- Returns:
DataFrame with x and y coordinate columns.
- Return type:
DataFrame
- static _center_coordinate(data, center_axis='x', pin_axis='y')[source]
Sorted center_axis coordinates at each level of the pin axis.
- Parameters:
data (DataFrame) – DataFrame with x column.
anneal_column (str, optional) – Coordinate column to be annealed. Default: ‘x’.
pin_axis (str, optional) – Coordinate column to be held constant. Default: ‘y’.
iterations (int, optional) – Number of times to update x coordinates. Default: 10.
- Returns:
DataFrame with centered center axis coordinates.
- Return type:
DataFrame
- static _get_data(root, include_regex='.*\\\\.py$', exclude_regex='(__init__|_test)\\\\.py$')[source]
Recursively aggregates and filters all the files found with a given directory into a DataFrame. Data is used to create directed graphs.
DataFrame has these columns:
node_name - name of node
node_type - type of node, can be [module, subpackage, library]
x - node’s x coordinate
y - node’s y coordinate
dependencies - parent nodes
subpackages - parent nodes of type subpackage
fullpath - fullpath to the module a node represents
- Parameters:
root (str or Path) – Root directory to be searched.
include_regex (str, optional) – Files to be included in recursive directy search. Default: ‘.*.py$’.
exclude_regex (str, optional) – Files to be excluded in recursive directy search. Default: ‘(__init__|_test).py$’.
- Raises:
ValueError – If include or exclude regex does not end in ‘.py$’.
FileNotFoundError – If no files are found after filtering.
- Returns:
DataFrame of file information.
- Return type:
DataFrame
- static _get_imports(fullpath)[source]
Get’s import statements from a given python module.
- Parameters:
fullpath (str or Path) – Path to python module.
- Returns:
List of imported modules.
- Return type:
list(str)
- static _to_networkx_graph(data, escape_chars=False)[source]
Converts given DataFrame into networkx directed graph.
- Parameters:
data (DataFrame) – DataFrame of nodes.
escape_chars (bool, optional) – Escape special characters. Used to avoid dot file errors. Default: False.
- Returns:
Graph of nodes.
- Return type:
networkx.DiGraph
- to_dataframe()[source]
- Return type:
DataFrame
- Retruns:
DataFrame: DataFrame of nodes representing repo modules.
- to_dot_graph(orient='tb', orthogonal_edges=False, color_scheme=None)[source]
Converts internal data into pydot graph.
- Parameters:
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises:
ValueError – If orient is invalid.
- Returns:
Dot graph of nodes.
- Return type:
pydot.Dot
- to_html(layout='dot', orthogonal_edges=False, color_scheme=None, as_png=False)[source]
For use in inline rendering of graph data in Jupyter Lab.
- Parameters:
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme (
Optional
[Dict
[str
,str
]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEMEas_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Returns:
HTML object for inline display.
- Return type:
IPython.display.HTML
- to_networkx_graph()[source]
Converts internal data into networkx directed graph.
- Returns:
Graph of nodes.
- Return type:
networkx.DiGraph
- write(fullpath, layout='dot', orient='tb', orthogonal_edges=False, color_scheme=None)[source]
Writes internal data to a given filepath. Formats supported: svg, dot, png, json.
- Parameters:
fulllpath (str or Path) – File to be written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
orient (str, optional) –
Graph layout orientation. Default: tb. Options include:
tb - top to bottom
bt - bottom to top
lr - left to right
rl - right to left
orthogonal_edges (bool, optional) – Whether graph edges should have non-right angles. Default: False.
color_scheme (
Optional
[Dict
[str
,str
]]) – (dict, optional): Color scheme to be applied to graph. Default: rolling_pin.tools.COLOR_SCHEME
- Raises:
ValueError – If invalid file extension given.
- Returns:
Self.
- Return type:
toml_etl
- class rolling_pin.toml_etl.TomlETL(data)[source]
Bases:
object
- __dict__ = mappingproxy({'__module__': 'rolling_pin.toml_etl', '__firstlineno__': 17, 'from_string': <classmethod(<function TomlETL.from_string>)>, 'from_toml': <classmethod(<function TomlETL.from_toml>)>, '__init__': <function TomlETL.__init__>, 'to_dict': <function TomlETL.to_dict>, 'to_string': <function TomlETL.to_string>, 'write': <function TomlETL.write>, 'edit': <function TomlETL.edit>, 'delete': <function TomlETL.delete>, 'search': <function TomlETL.search>, '__static_attributes__': ('_data',), '__dict__': <attribute '__dict__' of 'TomlETL' objects>, '__weakref__': <attribute '__weakref__' of 'TomlETL' objects>, '__doc__': None, '__annotations__': {}})
- __firstlineno__ = 17
- __init__(data)[source]
Creates a TomlETL instance from a given dictionary.
- Parameters:
data (dict) – Dictionary.
- __module__ = 'rolling_pin.toml_etl'
- __static_attributes__ = ('_data',)
- __weakref__
list of weak references to the object
- delete(regex)[source]
Returns portion of data whose keys fo not match a given regular expression.
- Parameters:
regex (str) – Regular expression applied to keys.
- Returns:
New TomlETL instance.
- Return type:
- edit(patch)[source]
Apply edit to internal data given TOML patch. Patch is always of the form ‘[key]=[value]’ and in TOML format.
- Parameters:
patch (str) – TOML patch to be applied.
- Raises:
TOMLDecoderError – If patch cannot be decoded.
EnforceError – If ‘=’ not found in patch.
- Returns:
New TomlETL instance with edits.
- Return type:
- classmethod from_string(text)[source]
Creates a TomlETL instance from a given TOML string.
- Parameters:
text (str) – TOML string.
- Returns:
TomlETL instance.
- Return type:
- classmethod from_toml(filepath)[source]
Creates a TomlETL instance from a given TOML file.
- Parameters:
filepath (str or Path) – TOML file.
- Returns:
TomlETL instance.
- Return type:
- search(regex)[source]
Returns portion of data whose keys match a given regular expression.
- Parameters:
regex (str) – Regular expression applied to keys.
- Returns:
New TomlETL instance.
- Return type:
- to_dict()[source]
Converts instance to dictionary copy.
- Returns:
Dictionary copy of instance.
- Return type:
dict
tools
- rolling_pin.tools.LOGGER = <Logger rolling_pin.tools (WARNING)>
Contains basic functions for more complex ETL functions and classes.
- rolling_pin.tools.copy_file(source, target)[source]
Copy a source file to a target file. Creating directories as needed.
- Parameters:
source (str or Path) – Source filepath.
target (str or Path) – Target filepath.
- Raises:
AssertionError – If source is not a file.
- Return type:
None
- rolling_pin.tools.directory_to_dataframe(directory, include_regex='', exclude_regex='\\\\.DS_Store')[source]
Recursively list files with in a given directory as rows in a pd.DataFrame.
- Parameters:
directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: ‘.DS_Store’.
- Returns:
pd.DataFrame with one file per row.
- Return type:
pd.DataFrame
- rolling_pin.tools.dot_to_html(dot, layout='dot', as_png=False)[source]
Converts a given pydot graph into a IPython.display.HTML object. Used in jupyter lab inline display of graph data.
- Parameters:
dot (pydot.Dot) – Pydot Graph instance.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
as_png (bool, optional) – Display graph as a PNG image instead of SVG. Useful for display on Github. Default: False.
- Raises:
ValueError – If invalid layout given.
- Returns:
HTML instance.
- Return type:
IPython.display.HTML
- rolling_pin.tools.filter_text(text, include_regex=None, exclude_regex=None, replace_regex=None, replace_value=None)[source]
Filter given text by applying regular expressions to each line.
- Parameters:
text (str) – Newline separated lines.
include_regex (str, optional) – Keep lines that match given regex. Default: None.
exclude_regex (str, optional) – Remove lines that match given regex. Default: None.
replace_regex (str, optional) – Substitutes regex matches in lines with replace_value. Default: None.
replace_value (str, optional) – Regex substitution value. Default: ‘’.
- Raises:
AssertionError – If source is not a file.
- Returns:
Filtered text.
- Return type:
str
- rolling_pin.tools.flatten(item, separator='/', embed_types=True)[source]
Flattens a iterable object into a flat dictionary.
- Parameters:
item (object) – Iterable object.
separator (str, optional) – Field separator in keys. Default: ‘/’.
- Returns:
Dictionary representation of given object.
- Return type:
dict
- rolling_pin.tools.get_parent_fields(key, separator='/')[source]
Get all the parent fields of a given key, split by given separator.
- Parameters:
key (str) – Key.
separator (str, optional) – String that splits key into fields. Default: ‘/’.
- Returns:
List of absolute parent fields.
- Return type:
list(str)
- rolling_pin.tools.is_dictlike(item)[source]
Determines if given item is dict-like.
- Parameters:
item (object) – Object to be tested.
- Returns:
Whether given item is dict-like.
- Return type:
bool
- rolling_pin.tools.is_iterable(item)[source]
Determines if given item is iterable.
- Parameters:
item (object) – Object to be tested.
- Returns:
Whether given item is iterable.
- Return type:
bool
- rolling_pin.tools.is_listlike(item)[source]
Determines if given item is list-like.
- Parameters:
item (object) – Object to be tested.
- Returns:
Whether given item is list-like.
- Return type:
bool
- rolling_pin.tools.list_all_files(directory, include_regex=None, exclude_regex=None)[source]
Recusively list all files within a given directory.
- Parameters:
directory (str or Path) – Directory to walk.
include_regex (str, optional) – Include filenames that match this regex. Default: None.
exclude_regex (str, optional) – Exclude filenames that match this regex. Default: None.
- Raises:
FileNotFoundError – If argument is not a directory or does not exist.
- Yields:
Path – File.
- Return type:
Generator
[Path
,None
,None
]
- rolling_pin.tools.move_file(source, target)[source]
Moves a source file to a target file. Creating directories as needed.
- Parameters:
source (str or Path) – Source filepath.
target (str or Path) – Target filepath.
- Raises:
AssertionError – If source is not a file.
- Return type:
None
- rolling_pin.tools.nest(flat_dict, separator='/')[source]
Converts a flat dictionary into a nested dictionary by splitting keys by a given separator.
- Parameters:
flat_dict (dict) – Flat dictionary.
separator (str, optional) – Field separator within given dictionary’s keys. Default: ‘/’.
- Returns:
Nested dictionary.
- Return type:
dict
- rolling_pin.tools.read_text(filepath)[source]
Convenience function for reading text from given file.
- Parameters:
filepath (str or Path) – File to be read.
- Raises:
AssertionError – If source is not a file.
- Returns:
text.
- Return type:
str
- rolling_pin.tools.replace_and_format(regex, replace, string, flags=0)[source]
Perform a regex substitution on a given string and format any named group found in the result with groupdict data from the pattern. Group beggining with ‘i’ will be converted to integers. Groups beggining with ‘f’ will be converted to floats.
Named group anatomy:
(?P<NAME>PATTERN)
NAME becomes a key and whatever matches PATTERN becomes its value.
>>> re.search('(?P<i>\d+)', 'foobar123').groupdict() {'i': '123'}
Examples:
- Special groups:
(?P<i>d) - string matched by ‘d’ will be converted to an integer
(?P<f>d) - string matched by ‘d’ will be converted to an float
(?P<i_foo>d) - string matched by ‘d’ will be converted to an integer
(?P<f_bar>d) - string matched by ‘d’ will be converted to an float
- Named groups (long):
>>> proj = '(?P<p>[a-z0-9]+)' >>> spec = '(?P<s>[a-z0-9]+)' >>> desc = '(?P<d>[a-z0-9\-]+)' >>> ver = '(?P<iv>\d+)\.' >>> frame = '(?P<i_f>\d+)' >>> regex = f'{proj}\.{spec}\.{desc}\.v{ver}\.{frame}.*' >>> replace = 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg' >>> string = 'proj.spec.desc.v1.25.png' >>> replace_and_format(regex, replace, string, flags=re.IGNORECASE) p-proj_s-spec_d-desc_v001_f0025.jpeg
- Named groups (short):
>>> replace_and_format( '(?P<p>[a-z0-9]+)\.(?P<s>[a-z0-9]+)\.(?P<d>[a-z0-9\-]+)\.v(?P<iv>\d+)\.(?P<i_f>\d+).*', 'p-{p}_s-{s}_d-{d}_v{iv:03d}_f{i_f:04d}.jpeg', 'proj.spec.desc.v1.25.png', ) p-proj_s-spec_d-desc_v001_f0025.jpeg
- No groups:
>>> replace_and_format('foo', 'bar', 'foobar') barbar
- type regex:
str
- param regex:
Regex pattern to search string with.
- type regex:
str
- type replace:
str
- param replace:
Replacement string which may contain formart variables ie ‘{variable}’.
- type replace:
str
- type string:
str
- param string:
String to be converted.
- type string:
str
- type flags:
Any
- param flags:
re.sub flags. Default: 0.
- type flags:
object, optional
- returns:
Converted string.
- rtype:
str
- rolling_pin.tools.unembed(item)[source]
Convert embeded types in dictionary keys into python types.
- Parameters:
item (object) – Dictionary with embedded types.
- Returns:
Converted object.
- Return type:
object
- rolling_pin.tools.write_dot_graph(dot, fullpath, layout='dot')[source]
Writes a pydot.Dot object to a given filepath. Formats supported: svg, dot, png.
- Parameters:
dot (pydot.Dot) – Pydot Dot instance.
fulllpath (str or Path) – File to be written to.
layout (str, optional) – Graph layout style. Options include: circo, dot, fdp, neato, sfdp, twopi. Default: dot.
- Raises:
ValueError – If invalid file extension given.
- Return type:
None