core package¶
Submodules¶
core.image_scanner module¶
contains the ImageScanner class which is used for scanning images
-
class
core.image_scanner.
ImageScanner
(image, min_resolution=(100, 100), max_resolution=(200, 200), patch_resolution=None, resample=0, rotation=None, **kwargs)[source]¶ Bases:
object
Used for scanning images and producting image pathches through various techniques
-
get_resolutions
(num=10, spacing='even')[source]¶ generates a list of patch resolutions
Parameters: - opt (spacing) – number of resolutions returned default: 10
- opt – spacing between resolution sizes options include: ‘even’, ‘random’ default: ‘even’
Yields: tuple – (x, y) resolution
-
grid_scan
(resolutions=10, spacing='even', **kwargs)[source]¶ scans entire image in a grid-like fashion
Parameters: - opt (spacing) – number of sampling patch resolutions to return a single grid produces multiple patches (image / sampling resolution) default: 10
- opt – spacing between resolution sizes options include: ‘even’, ‘random’ default: ‘even’
Yields: PIL.Image – cropped (resized and/or rotated) patch
-
core.pipeline module¶
a library of core functions which define the image processing/learning pipeline
-
core.pipeline.
get_info
(source, spec=['name', 'extension'], sep=None, ignore=['\\.DS_Store'])[source]¶ creates a descriptive DataFrame based upon files contained with a source directory
Parameters: - source (str) – fullpath to directory of files
- opt (ignore) – naming specification of files default: [‘name’, ‘extension’]
- opt – regular expression with which to seperate filename components recommended value: ‘.’ default: None (split name from extension)
- opt – list of regex patterns used for ignoring files default: [‘.DS_Store’]
Returns: an info DataFrame
Return type: DataFrame
-
core.pipeline.
info_split
(info, test_size=0.2)[source]¶ split info object by rows into train and test objects
Parameters: - info – (DataFrame): info object to split
- opt (test_size) – percentage of info indices that will be allocated to the test object
Returns: train, test
Return type: 2 DataFrames
-
core.pipeline.
process_data
(info, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_std', 'fft_max'])[source]¶ processes images listed in a given info object into usable data
Parameters: - info (DataFrame) – info object containing ‘source’, ‘label’ and ‘params’ columns
- opt (features) – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns: processed image data
Return type: DataFrame
-
core.pipeline.
get_data
(info, hdf_path=None, multiprocess=True, processes=24, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_std', 'fft_max'])[source]¶ generates machine-learning-ready data from an info object
Parameters: - info (DataFrame) – info object containing ‘source’, ‘label’ and ‘params’ columns
- opt (features) – fullpath of the file with which to store generated data default: None
- opt – use multiprocessing default: True
- opt – number of processes to employ for multiprocessing default: 24
- opt – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns: machine-learning-ready data
Return type: DataFrame
-
core.pipeline.
compile_predictions
(pred)[source]¶ groups predictions made on patches of an image into a set of labels and confidences
Parameters: pred (array-like) – output from call to [some sklearn model].predict Returns: compiled predictions Return type: DataFrame
-
core.pipeline.
archive_data
(train_info, test_info, hdf_path, cross_val=True, multiprocess=True, processes=24, features=['r', 'g', 'b', 'h', 's', 'v', 'fft_max', 'fft_std'])[source]¶ convenience function for archive train, validate and test data
Parameters: - train_info (DataFrame) – info object to use for training
- test_info (DataFrame) – info object to use for testing
- hdf_path (str) – fullpath of file with which to store data
- opt (features) – use cross validation default: True
- opt – use multiprocessing default: True
- opt – number of processes to employ for multiprocessing default: 24
- opt – list of features to include in the ouput data default: [‘r’, ‘g’, ‘b’, ‘h’, ‘s’, ‘v’, ‘fft_std’, ‘fft_max’]
Returns: DataFrames train_x, test_x, train_y, test_y if cross_val=False
Return type: train_x, valid_x, test_x, train_y, valid_y, test_y
-
core.pipeline.
read_archive
(hdf_path, items=['train_x', 'valid_x', 'test_x', 'train_y', 'valid_y', 'test_y'])[source]¶ convenience function used for retrieving data within a hdf archive
Parameters: - hdf_path (str) – fullpath of file which data is stored in
- opt (items) – items to be retrieved default: [‘train_x’, ‘valid_x’, ‘test_x’, ‘train_y’, ‘valid_y’, ‘test_y’]
core.utils module¶
A utilities library for various io/data aggregation tasks
-
core.utils.
get_report
(y_true, y_pred)[source]¶ returns a classification report as a DataFrame, rather than as text
Parameters: - y_true (array-like) – list of true labels
- y_pred (array-like) – list of predicted labels
Returns: DataFrame
Return type: classification report
-
core.utils.
pil_to_opencv
(image)[source]¶ converts PIL.Image into cv2 image
Parameters: image (PIL.Image) – pillow image Returns: opencv image object is in BGR color space Return type: cv2
-
core.utils.
opencv_to_pil
(image)[source]¶ converts cv2 image into PIL.Image
Parameters: image (cv2 image) – cv2 image Returns: pillow image object is in BGR color space Return type: PIL.Image
-
core.utils.
generate_samples
(image, label, params)[source]¶ convenience function for generating samples from a provided image along with its label and parameters
Parameters: Returns: matrix of patches
Return type:
-
core.utils.
get_channel_histogram
(image, channel, bins=256, normalize=False, **kwargs)[source]¶ generates frequency data for a given channel of a provided image
Parameters: - image (cv2 image) – opencv image to be processed
- channel (str) – color channel to be processed acceptable values: r, g, b, h, s, v
- opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
- opt – normalize histogram data default: False
Returns: raveled array
Return type: numpy.array
-
core.utils.
create_histogram_stats
(data, chan_data, channel)[source]¶ convenience function for appending statics based upon provided histogram data to data
Parameters: - data (DataFrame) – data to be appended to
- chan_data (DataFrame) – channel histogram data
- channel (str) – name of channel
Returns: None
Return type:
-
core.utils.
get_histograms
(image, bins=256, normalize=False, colorspace='rgb')[source]¶ generates histogram data for each channel of an image
Parameters: - image (cv2 image) – opencv image to be processed
- opt (colorspace) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
- opt – normalize histogram data default: False
- opt – colorspace of provided image acceptable values: ‘rgb’, ‘hsv’ default: ‘rgb’
Returns: dict of channel histograms
Return type:
-
core.utils.
plot_channel_histogram
(image, channel, bins=256, normalize=False)[source]¶ plots a histogram of channel of a provided image
Parameters: - image (cv2 image) – opencv image to be processed
- channel (str) – color channel
- opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
- opt – normalize histogram data default: False
Returns: None
Return type:
-
core.utils.
plot_histograms
(image, bins=256, normalize=False)[source]¶ plots a histogram of all channels of a provided image
Parameters: - image (cv2 image) – opencv image to be processed
- opt (normalize) – number of bins to split histogram into default: 256 (number of channel values for sRGB images)
- opt – normalize histogram data default: False
Returns: None
Return type: