onlinehd.OnlineHD

class onlinehd.OnlineHD(classes: int, features: int, dim: int = 4000)

Bases: object

Hyperdimensional classification algorithm. OnlineHD utilizes a (c, d) sized tensor for the model initialized with zeros. Every d-sized vector on this matrix will be the high dimensional representation of each class, called class hypervector.

Parameters
  • classes (int, > 0) – The number of classes of the problem.

  • features (int, > 0) – Dimensionality of original data.

  • dim (int, > 0) – The target dimensionality of the high dimensional representation.

Example

>>> import onlinehd
>>> dim = 10000
>>> n_samples = 1000
>>> features = 100
>>> clusters = 5
>>> x = torch.randn(n_samples, features) # dummy data
>>> y = torch.randint(0, classes, [n_samples]) # dummy data
>>> model = onlinehd.OnlineHD(classes, features, dim=dim)
>>> if torch.cuda.is_available():
...     print('Training on GPU!')
...     model = model.to('cuda')
...     x = x.to('cuda')
...     y = y.to('cuda')
...
Training on GPU!
>>> model.fit(x, y, epochs=10)
>>> ypred = model(x)
>>> ypred.size()
torch.Size([1000])
__call__(x: torch.Tensor, encoded: bool = False)

Returns the predicted class of each data point in x.

Parameters
  • x (torch.Tensor) – The data points to predict. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).

  • encoded (bool) – Specifies if input data is already encoded.

Returns

The predicted class of each data point. Has size (n?,).

Return type

torch.Tensor

encode(x: torch.Tensor)

Encodes input data

See also

onlinehd.Encoder for more information.

fit(x: torch.Tensor, y: torch.Tensor, encoded: bool = False, lr: float = 0.035, epochs: int = 120, batch_size: Union[int, None, float] = 1024, one_pass_fit: bool = True, bootstrap: Union[float, str] = 0.01)

Starts learning process using datapoints x as input points and y as their labels.

Parameters
  • x (torch.Tensor) – Input data points. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).

  • encoded (bool) – Specifies if input data is already encoded.

  • lr (float, > 0) – Learning rate.

  • epochs (int, > 0) – Max number of epochs allowed.

  • batch_size (int, > 0 and <= n?, or float, > 0 and <= 1, or None) – If int, the number of samples to use in each batch. If float, the fraction of the samples to use in each batch. If none the whole dataset will be used per epoch (same if used 1.0 or n?).

  • one_pass_fit (bool) – Whether to use onepass learning process or not. If true, iterative method will be used after one pass fit anyways for the number of epochs specified.

  • bootstrap (float, > 0, <= 1 or 'single-per-class') – In order to initialize class hypervectors, OnlineHD does naive accumulation with a small fragment of data. This portion is determined by this argument. If ‘single-per-class’ is used, a single datapoint per class will be used as starting class hypervector.

Warning

Using one_pass_fit is not advisable for very large data or while using GPU. It is expected to see high memory usage using this option and it does not benefit from paralellization.

Returns

self

Return type

OnlineHD

predict(x: torch.Tensor, encoded: bool = False)

Returns the predicted cluster of each data point in x. See __call__() for details.

probabilities(x: torch.Tensor, encoded: bool = False)

Returns the probabilities of belonging to a certain class for each data point in x.

Parameters
  • x (torch.Tensor) – The data points to use. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).

  • encoded (bool) – Specifies if input data is already encoded.

Returns

The class probability of each data point. Has size (n?, classes).

Return type

torch.Tensor

scores(x: torch.Tensor, encoded: bool = False)

Returns pairwise cosine similarity between datapoints in x and each class hypervector. Calling model.scores(x, encoded=True) is the same as spatial.cos_cdist(x, model.model).

Parameters
  • x (torch.Tensor) – The data points to score. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).

  • encoded (bool) – Specifies if input data is already encoded.

Returns

The cosine similarity between encoded input data and class hypervectors.

Return type

torch.Tensor

See also

spatial.cos_cdist() for details.

to(*args)

Moves data to the device specified, e.g. cuda, cpu or changes dtype of the data representation, e.g. half or double. Because the internal data is saved as torch.tensor, the parameter can be anything that torch accepts. The change is done in-place.

Parameters

device (str or torch.torch.device) –

Returns

self

Return type

OnlineHD