classifiers Module

Classification algorithms for supervised learning tasks.

@author: drusk

class pml.supervised.classifiers.AbstractClassifier(training_set)[source]

This is the base class which classification algorithms should extend. It provides the common functionality for each classifier.

__init__(training_set)[source]

Constructs the classifier. Subclasses may have additional parameters in their constructors.

Args:
training_set:
A labelled DataSet object used to train the classifier.
Raises:
UnlabelledDataSetError if the training set is not labelled.
classify(sample)[source]

Predicts a sample’s classification based on the training set.

Args:
sample:
the sample or observation to be classified.
Returns:
The sample’s classification.
Raises:
InconsistentFeaturesError if the sample doesn’t have the same features as the training data.
classify_all(dataset)[source]

Predicts the classification of each sample in a dataset.

Args:
dataset: DataSet compatible object (see DataSet constructor)
the dataset whose samples (observations) will be classified.
Returns:
A ClassifiedDataSet which contains the classification results for each sample. It also contains the original data.
class pml.supervised.classifiers.ClassifiedDataSet(dataset, classifications)[source]

A collection of data which has been analysed by a classification algorithm. It contains both the original DataSet and the results of the classification. It provides methods for analysing these classification results.

__init__(dataset, classifications)[source]

Creates a new ClassifiedDataSet.

Args:
dataset: model.DataSet
A dataset which has been classified but does not hold the results.
classifications: pandas.Series
A Series with the classification results.
compute_accuracy()[source]

Calculates the percent accuracy of classification results.

Returns:
The percent accuracy of the classification results, i.e. the number of samples correctly classified divided by the total number of samples. Should be a floating point number between 0 and 1.
Raises:
UnlabelledDataSetError if the dataset is not labelled.
get_classifications()[source]

Retrieves the classifications computed for this dataset.

Returns:
A pandas Series containing each sample’s classification.

Project Versions

Previous topic

Welcome to Python Machine Learning’s documentation!

Next topic

clustering Module

This Page