core package

Submodules

core.image_scanner module

contains the ImageScanner class which is used for scanning images

class core.image_scanner.ImageScanner(image, min_resolution=(100, 100), max_resolution=(200, 200), patch_resolution=None, resample=0, rotation=None, **kwargs)[source]

Bases: object

Used for scanning images and producting image pathches through various techniques

get_resolutions(num=10, spacing='even')[source]

generates a list of patch resolutions

Parameters:
  • opt (spacing) – number of resolutions returned default: 10
  • opt – spacing between resolution sizes options include: ‘even’, ‘random’ default: ‘even’
Yields:

tuple – (x, y) resolution

grid_scan(resolutions=10, spacing='even', **kwargs)[source]

scans entire image in a grid-like fashion

Parameters:
  • opt (spacing) – number of sampling patch resolutions to return a single grid produces multiple patches (image / sampling resolution) default: 10
  • opt – spacing between resolution sizes options include: ‘even’, ‘random’ default: ‘even’
Yields:

PIL.Image – cropped (resized and/or rotated) patch

random_scan(patches=100, **kwargs)[source]

generates patches of random sample size and location from image

Parameters:opt (patches) – number of patches returned default: 100
Yields:PIL.Image – cropped (resized and/or rotated) patch

core.pipeline module

a library of core functions which define the image processing/learning pipeline

core.pipeline.get_info(source, spec=['name', 'extension'], sep=None, ignore=['\\.DS_Store'])[source]

creates a descriptive DataFrame based upon files contained with a source directory

Parameters:
  • source (str) – fullpath to directory of files
  • opt (ignore) – naming specification of files default: [‘name’, ‘extension’]
  • opt – regular expression with which to seperate filename components recommended value: ‘.’ default: None (split name from extension)
  • opt – list of regex patterns used for ignoring files default: [‘.DS_Store’]
Returns:

an info DataFrame

Return type:

DataFrame

core.pipeline.info_split(info, test_size=0.2)[source]

split info object by rows into train and test objects

Parameters:
  • info – (DataFrame): info object to split
  • opt (test_size) – percentage of info indices that will be allocated to the test object
Returns:

train, test

Return type:

2 DataFrames

core.pipeline.process_data(info, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_std', 'fft_max'])[source]

processes images listed in a given info object into usable data

Parameters:
  • info (DataFrame) – info object containing ‘source’, ‘label’ and ‘params’ columns
  • opt (features) – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns:

processed image data

Return type:

DataFrame

core.pipeline.get_data(info, hdf_path=None, multiprocess=True, processes=24, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_std', 'fft_max'])[source]

generates machine-learning-ready data from an info object

Parameters:
  • info (DataFrame) – info object containing ‘source’, ‘label’ and ‘params’ columns
  • opt (features) – fullpath of the file with which to store generated data default: None
  • opt – use multiprocessing default: True
  • opt – number of processes to employ for multiprocessing default: 24
  • opt – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns:

machine-learning-ready data

Return type:

DataFrame

core.pipeline.compile_predictions(pred)[source]

groups predictions made on patches of an image into a set of labels and confidences

Parameters:pred (array-like) – output from call to [some sklearn model].predict
Returns:compiled predictions
Return type:DataFrame
core.pipeline.archive_data(train_info, test_info, hdf_path, cross_val=True, multiprocess=True, processes=24, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_max', 'fft_std'])[source]

convenience function for archive train, validate and test data

Parameters:
  • train_info (DataFrame) – info object to use for training
  • test_info (DataFrame) – info object to use for testing
  • hdf_path (str) – fullpath of file with which to store data
  • opt (features) – use cross validation default: True
  • opt – use multiprocessing default: True
  • opt – number of processes to employ for multiprocessing default: 24
  • opt – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns:

DataFrames train_x, test_x, train_y, test_y if cross_val=False

Return type:

train_x, valid_x, test_x, train_y, valid_y, test_y

core.pipeline.read_archive(hdf_path, items=['train_x', 'valid_x', 'test_x', 'train_y', 'valid_y', 'test_y'])[source]

convenience function used for retrieving data within a hdf archive

Parameters:
  • hdf_path (str) – fullpath of file which data is stored in
  • opt (items) – items to be retrieved default: [‘train_x’, ‘valid_x’, ‘test_x’, ‘train_y’, ‘valid_y’, ‘test_y’]

core.utils module

A utilities library for various io/data aggregation tasks

core.utils.get_report(y_true, y_pred)[source]

returns a classification report as a DataFrame, rather than as text

Parameters:
  • y_true (array-like) – list of true labels
  • y_pred (array-like) – list of predicted labels
Returns:

DataFrame

Return type:

classification report

core.utils.pil_to_opencv(image)[source]

converts PIL.Image into cv2 image

Parameters:image (PIL.Image) – pillow image
Returns:opencv image object is in BGR color space
Return type:cv2
core.utils.opencv_to_pil(image)[source]

converts cv2 image into PIL.Image

Parameters:image (cv2 image) – cv2 image
Returns:pillow image object is in BGR color space
Return type:PIL.Image
core.utils.generate_samples(image, label, params)[source]

convenience function for generating samples from a provided image along with its label and parameters

Parameters:
  • image (PIL.Image) – pillow image
  • label (str) – image label
  • params (dict) – params to provide to ImageScanner
Returns:

matrix of patches

Return type:

list

core.utils.get_channel_histogram(image, channel, bins=256, normalize=False, **kwargs)[source]

generates frequency data for a given channel of a provided image

Parameters:
  • image (cv2 image) – opencv image to be processed
  • channel (str) – color channel to be processed acceptable values: r, g, b, h, s, v
  • opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
  • opt – normalize histogram data default: False
Returns:

raveled array

Return type:

numpy.array

core.utils.create_histogram_stats(data, chan_data, channel)[source]

convenience function for appending statics based upon provided histogram data to data

Parameters:
  • data (DataFrame) – data to be appended to
  • chan_data (DataFrame) – channel histogram data
  • channel (str) – name of channel
Returns:

None

Return type:

None

core.utils.get_histograms(image, bins=256, normalize=False, colorspace='rgb')[source]

generates histogram data for each channel of an image

Parameters:
  • image (cv2 image) – opencv image to be processed
  • opt (colorspace) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
  • opt – normalize histogram data default: False
  • opt – colorspace of provided image acceptable values: ‘rgb’, ‘hsv’ default: ‘rgb’
Returns:

dict of channel histograms

Return type:

dict

core.utils.plot_channel_histogram(image, channel, bins=256, normalize=False)[source]

plots a histogram of channel of a provided image

Parameters:
  • image (cv2 image) – opencv image to be processed
  • channel (str) – color channel
  • opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
  • opt – normalize histogram data default: False
Returns:

None

Return type:

None

core.utils.plot_histograms(image, bins=256, normalize=False)[source]

plots a histogram of all channels of a provided image

Parameters:
  • image (cv2 image) – opencv image to be processed
  • opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
  • opt – normalize histogram data default: False
Returns:

None

Return type:

None

core.utils.execute_python_subshells(script, iterable)[source]

a simple hacky workaroud for multiprocessing’s buginess executes a new python subshell per item

Parameters:
  • script (str) – fullpath of python script to run (check /bin)
  • iterable (iter) – list of argument to provide each call
Returns:

None

Return type:

None

Module contents