molml.kernel module

A module to compute kernel based representations.

The methods in this module are intended to be used directly as kernels for kernel methods (e.g. SVMs or KRR). This results in features that are dependent on the number of molecules used to fit the transformers. These should then give single vectors that have length n_fit_molecules.

class molml.kernel.AtomKernel(input_type=None, n_jobs=1, gamma=1e-07, transformer=None, same_element=True, kernel='rbf')

Bases: molml.base.InputTypeMixin, molml.base.BaseFeature

Computes a kernel between molecules using atom similarity.

This kernel comes with the benefit that because it is atom-wise, it stays size consistent. So, for extensive properties this should properly scale with the size of the molecule compared to other kernel methods.

Parameters:
input_type : string, default=None

Specifies the format the input values will be (must be one of ‘list’ or ‘filename’). Note: This input type depends on the value from transformer. See Below for more details. If this value is None, then it will take the value from transformer, or if there is no transformer then it will default to ‘list’. If a value is given and it does not match the value given for the transformer, then this will raise a ValueError.

n_jobs : int, default=1

Specifies the number of processes to create when generating the features. Positive numbers specify a specifc amount, and numbers less than 1 will use the number of cores the computer has.

gamma : float, default=1e-7

The hyperparameter to use for the width of the RBF or Laplace kernels

transformer : BaseFeature, default=None

The transformer to use to convert molecules to atom-wise features. If this is not given, then it is assumed that the features have already been created and will be passed directly to fit/transform. Note: if no transformer is given, then the assumed input type is going to be a a list of (numbers, features) pairs. Where numbers is an iterable of the atomic numbers, and features is a numpy array of the features (shape=(n_atoms, n_features)).

same_element : bool, default=True

Require that the atom-atom similarity only be computed if the two atoms are the same element.

kernel : string or callable, default=”rbf”

The kernel function to use when computing the atom-atom interactions. There possible string options are the keys of KERNELS. If a callable object is given, then it must take two arrays and return the pairwise kernel metric between them.

Raises:
ValueError

If the input_type of the transformer and the input_type keyword given do not match.

References

Barker, J.; Bulin, J.; Hamaekers, J. LC-GAP: Localized Coulomb Descriptors for the Gaussian Approximation Potential. 2016

Attributes:
_features : numpy.array, shape=(n_mols, (n_atoms, n_features))

A numpy array of numpy arrays (that may be different lengths) that stores all of the atom features for the training molecules.

_numbers : numpy.array, shape=(n_mols, (n_atoms))

A numpy array of numpy arrays (that may be different lengths) that stores all the atomic numbers for the training atoms.

ATTRIBUTES = ('_features', '_numbers')
LABELS = None
compute_kernel(self, b_feats, b_nums, symmetric=False)

Compute a kernel between molecules based on atom features.

Parameters:
b_feats : list of numpy.array, shape=(n_molecules_b, )

Each array is of shape (n_atoms, n_features), where n_atoms is for that particular molecule.

b_nums : list of lists, shape=(n_molecules_b, )

Contains all the atom elements for each molecule in group b

symmetric : bool, default=True

Whether or not the kernel is symmetric. This is just to cut the computational cost in half. This is mainly an optimization when computing the (train, train) kernel.

Returns:
kernel : numpy.array, shape=(n_molecules_b, n_molecules_fit)

The kernel matrix between the two sets of molecules

fit(self, X, y=None)

Fit the model.

If there is no self.transformer, then this assumes that the input is a list of (features, numbers) pairs where features is a numpy array of features (shape=(n_atoms, n_features)), and numbers is a list of atomic numbers in the molecule.

Otherwise, it directly passes these values to the transformer to compute the features, and extracts all the atomic numbers.

Parameters:
X : list, shape=(n_samples, )

A list of objects to use to fit.

Returns:
self : object

Returns the instance itself.

fit_transform(self, X, y=None)

A slightly cheaper way of fitting and then transforming.

This benefit comes from the resulting kernel matrix being symmetric. Meaning, that only half of it has to be computed.

Parameters:
X : list, shape=(n_samples, )

A list of objects to use to transform

Returns:
kernel : array, shape=(n_samples, n_samples)

The resulting kernel matrix

transform(self, X, y=None)

Transform features/molecules into a kernel matrix.

If there is no self.transformer, then this assumes that the input is a list of (features, numbers) pairs where features is a numpy array of features (shape=(n_atoms, n_features)), and numbers is a list of atomic numbers in the molecule.

Otherwise, it directly passes these values to the transformer to compute the features, and extracts all the atomic numbers.

Parameters:
X : list, shape=(n_samples, )

A list of objects to use to transform

Returns:
kernel : array, shape=(n_samples, n_samples_fit)

The resulting kernel matrix

Raises:
ValueError

If the transformer has not been fit.