gala.classify: Classifier tools¶
-
gala.classify.
concatenate_data_elements
(alldata)[source]¶ Return one big learning set from a list of learning sets.
A learning set is a list/tuple of length 4 containing features, labels, weights, and node merge history.
-
gala.classify.
default_classifier_extension
(cl, use_joblib=True)[source]¶ Return the default classifier file extension for the given classifier.
Parameters: cl : sklearn estimator or VigraRandomForest object
A classifier to be saved.
use_joblib : bool, optional
Whether or not joblib will be used to save the classifier.
Returns: ext : string
File extension
Examples
>>> cl = RandomForestClassifier() >>> default_classifier_extension(cl) '.classifier.joblib' >>> default_classifier_extension(cl, False) '.classifier'
-
gala.classify.
get_classifier
(name='random forest', *args, **kwargs)[source]¶ Return a classifier given a name.
Parameters: name : string
The name of the classifier, e.g. ‘random forest’ or ‘naive bayes’.
*args, **kwargs :
Additional arguments to pass to the constructor of the classifier.
Returns: cl : classifier
A classifier object implementing the scikit-learn interface.
Raises: NotImplementedError
If the classifier name is not recognized.
Examples
>>> cl = get_classifier('random forest', n_estimators=47) >>> isinstance(cl, RandomForestClassifier) True >>> cl.n_estimators 47 >>> from numpy.testing import assert_raises >>> assert_raises(NotImplementedError, get_classifier, 'perfect class')
-
gala.classify.
load_classifier
(fn)[source]¶ Load a classifier previously saved to disk, given a filename.
Supported classifier types are: - scikit-learn classifiers saved using either pickle or joblib persistence - vigra random forest classifiers saved in HDF5 format
Parameters: fn : string
Filename in which the classifier is stored.
Returns: cl : classifier object
cl is one of the supported classifier types; these support at least the standard scikit-learn interface of fit() and predict_proba()
-
gala.classify.
sample_training_data
(features, labels, num_samples=None)[source]¶ Get a random sample from a classification training dataset.
Parameters: features: np.ndarray [M x N]
The M (number of samples) by N (number of features) feature matrix.
labels: np.ndarray [M] or [M x 1]
The training label for each feature vector.
num_samples: int, optional
The size of the training sample to draw. Return full dataset if None or if num_samples >= M.
Returns: feat: np.ndarray [num_samples x N]
The sampled feature vectors.
lab: np.ndarray [num_samples] or [num_samples x 1]
The sampled training labels
-
gala.classify.
save_classifier
(cl, fn, use_joblib=True, **kwargs)[source]¶ Save a classifier to disk.
Parameters: cl : classifier object
Pickleable object or a classify.VigraRandomForest object.
fn : string
Writeable path/filename.
use_joblib : bool, optional
Whether to prefer joblib persistence to pickle.
kwargs : keyword arguments
Keyword arguments to be passed on to either pck.dump or joblib.dump.
Returns: None
Notes
For joblib persistence, compress=3 is the default.