core package¶

Submodules¶

core.image_scanner module¶

contains the ImageScanner class which is used for scanning images

class core.image_scanner.ImageScanner(image, min_resolution=(100, 100), max_resolution=(200, 200), patch_resolution=None, resample=0, rotation=None, **kwargs)[source]¶

Bases: object

Used for scanning images and producting image pathches through various techniques

get_resolutions(num=10, spacing='even')[source]¶

generates a list of patch resolutions

Parameters:	opt (spacing) – number of resolutions returned default: 10 opt – spacing between resolution sizes options include: ‘even’, ‘random’ default: ‘even’
Yields:	tuple – (x, y) resolution

grid_scan(resolutions=10, spacing='even', **kwargs)[source]¶

scans entire image in a grid-like fashion

Parameters:	opt (spacing) – number of sampling patch resolutions to return a single grid produces multiple patches (image / sampling resolution) default: 10 opt – spacing between resolution sizes options include: ‘even’, ‘random’ default: ‘even’
Yields:	PIL.Image – cropped (resized and/or rotated) patch

random_scan(patches=100, **kwargs)[source]¶

generates patches of random sample size and location from image

Parameters:	opt (patches) – number of patches returned default: 100
Yields:	PIL.Image – cropped (resized and/or rotated) patch

core.pipeline module¶

a library of core functions which define the image processing/learning pipeline

core.pipeline.get_info(source, spec=['name', 'extension'], sep=None, ignore=['\\.DS_Store'])[source]¶

creates a descriptive DataFrame based upon files contained with a source directory

Parameters:	source (str) – fullpath to directory of files opt (ignore) – naming specification of files default: [‘name’, ‘extension’] opt – regular expression with which to seperate filename components recommended value: ‘.’ default: None (split name from extension) opt – list of regex patterns used for ignoring files default: [‘.DS_Store’]
Returns:	an info DataFrame
Return type:	DataFrame

core.pipeline.info_split(info, test_size=0.2)[source]¶

split info object by rows into train and test objects

Parameters:	info – (DataFrame): info object to split opt (test_size) – percentage of info indices that will be allocated to the test object
Returns:	train, test
Return type:	2 DataFrames

core.pipeline.process_data(info, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_std', 'fft_max'])[source]¶

processes images listed in a given info object into usable data

Parameters:	info (DataFrame) – info object containing ‘source’, ‘label’ and ‘params’ columns opt (features) – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns:	processed image data
Return type:	DataFrame

core.pipeline.get_data(info, hdf_path=None, multiprocess=True, processes=24, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_std', 'fft_max'])[source]¶

generates machine-learning-ready data from an info object

Parameters:	info (DataFrame) – info object containing ‘source’, ‘label’ and ‘params’ columns opt (features) – fullpath of the file with which to store generated data default: None opt – use multiprocessing default: True opt – number of processes to employ for multiprocessing default: 24 opt – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns:	machine-learning-ready data
Return type:	DataFrame

core.pipeline.compile_predictions(pred)[source]¶

groups predictions made on patches of an image into a set of labels and confidences

Parameters:	pred (array-like) – output from call to [some sklearn model].predict
Returns:	compiled predictions
Return type:	DataFrame

core.pipeline.archive_data(train_info, test_info, hdf_path, cross_val=True, multiprocess=True, processes=24, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_max', 'fft_std'])[source]¶

convenience function for archive train, validate and test data

Parameters:	train_info (DataFrame) – info object to use for training test_info (DataFrame) – info object to use for testing hdf_path (str) – fullpath of file with which to store data opt (features) – use cross validation default: True opt – use multiprocessing default: True opt – number of processes to employ for multiprocessing default: 24 opt – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns:	DataFrames train_x, test_x, train_y, test_y if cross_val=False
Return type:	train_x, valid_x, test_x, train_y, valid_y, test_y

core.pipeline.read_archive(hdf_path, items=['train_x', 'valid_x', 'test_x', 'train_y', 'valid_y', 'test_y'])[source]¶

convenience function used for retrieving data within a hdf archive

Parameters:	hdf_path (str) – fullpath of file which data is stored in opt (items) – items to be retrieved default: [‘train_x’, ‘valid_x’, ‘test_x’, ‘train_y’, ‘valid_y’, ‘test_y’]

core.utils module¶

A utilities library for various io/data aggregation tasks

core.utils.get_report(y_true, y_pred)[source]¶

returns a classification report as a DataFrame, rather than as text

Parameters:	y_true (array-like) – list of true labels y_pred (array-like) – list of predicted labels
Returns:	DataFrame
Return type:	classification report

core.utils.pil_to_opencv(image)[source]¶

converts PIL.Image into cv2 image

Parameters:	image (PIL.Image) – pillow image
Returns:	opencv image object is in BGR color space
Return type:	cv2

core.utils.opencv_to_pil(image)[source]¶

converts cv2 image into PIL.Image

Parameters:	image (cv2 image) – cv2 image
Returns:	pillow image object is in BGR color space
Return type:	PIL.Image

core.utils.generate_samples(image, label, params)[source]¶

convenience function for generating samples from a provided image along with its label and parameters

Parameters:	image (PIL.Image) – pillow image label (str) – image label params (dict) – params to provide to ImageScanner
Returns:	matrix of patches
Return type:	list

core.utils.get_channel_histogram(image, channel, bins=256, normalize=False, **kwargs)[source]¶

generates frequency data for a given channel of a provided image

Parameters:	image (cv2 image) – opencv image to be processed channel (str) – color channel to be processed acceptable values: r, g, b, h, s, v opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images) opt – normalize histogram data default: False
Returns:	raveled array
Return type:	numpy.array

core.utils.create_histogram_stats(data, chan_data, channel)[source]¶

convenience function for appending statics based upon provided histogram data to data

Parameters:	data (DataFrame) – data to be appended to chan_data (DataFrame) – channel histogram data channel (str) – name of channel
Returns:	None
Return type:	None

core.utils.get_histograms(image, bins=256, normalize=False, colorspace='rgb')[source]¶

generates histogram data for each channel of an image

Parameters:	image (cv2 image) – opencv image to be processed opt (colorspace) – number of bins to split histogram into default: 256 (number of channel values for sRGB images) opt – normalize histogram data default: False opt – colorspace of provided image acceptable values: ‘rgb’, ‘hsv’ default: ‘rgb’
Returns:	dict of channel histograms
Return type:	dict

core.utils.plot_channel_histogram(image, channel, bins=256, normalize=False)[source]¶

plots a histogram of channel of a provided image

Parameters:	image (cv2 image) – opencv image to be processed channel (str) – color channel opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images) opt – normalize histogram data default: False
Returns:	None
Return type:	None

core.utils.plot_histograms(image, bins=256, normalize=False)[source]¶

plots a histogram of all channels of a provided image

Parameters:	image (cv2 image) – opencv image to be processed opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images) opt – normalize histogram data default: False
Returns:	None
Return type:	None

core.utils.execute_python_subshells(script, iterable)[source]¶

a simple hacky workaroud for multiprocessing’s buginess executes a new python subshell per item

Parameters:	script (str) – fullpath of python script to run (check /bin) iterable (iter) – list of argument to provide each call
Returns:	None
Return type:	None

core package¶

Submodules¶

core.image_scanner module¶

core.pipeline module¶

core.utils module¶

Module contents¶