molml.utils module

A collection of assorted utility functions.

class molml.utils.LazyValues(connections=None, coords=None, numbers=None, elements=None, unit_cell=None)

Bases: object

An object to store molecule graph properties in a lazy fashion.

This object allows only needing to compute different molecule graph properties if they are needed. The prime example of this being the computation of connections.

Parameters:
connections : dict, key->list of keys, default=None

A dictionary edge table with all the bidirectional connections.

numbers : array-like, shape=(n_atoms, ), default=None

The atomic numbers of all the atoms.

coords : array-like, shape=(n_atoms, 3), default=None

The xyz coordinates of all the atoms (in angstroms).

elements : array-like, shape=(n_atoms, ), default=None

The element symbols of all the atoms.

unit_cell : array-like, shape=(3, 3), default=None

An array of unit cell basis vectors, where the vectors are columns.

Attributes:
connections : dict, key->list of keys

A dictionary edge table with all the bidirectional connections. If the initialized value for this was None, then this will be computed from the coords and numbers/elements.

numbers : array, shape=(n_atoms, )

The atomic numbers of all the atoms. If the initialized value for this was None, then this will be computed from the elements.

coords : array, shape=(n_atoms, 3)

The xyz coordinates of all the atoms (in angstroms).

elements : array, shape=(n_atoms, )

The element symbols of all the atoms. If the initialized value for this was None, then this will be computed from the numbers.

unit_cell : array, shape=(3, 3)

An array of unit cell basis vectors, where the vectors are columns.

connections
coords
elements
fill_in_crystal(self, radius=None, units=None)

Duplicate the atoms to form a crystal.

Parameters:
radius : float, default=None

Specifies the radius of unit cell points to include

units : list or int, default=None

Specifies the number of unit cells to include on each axis. These will all be equal if it is an int.

Raises:
ValueError

If radius and units are either both None, or if both are not None.

numbers
unit_cell
molml.utils.cosine_decay(R, r_cut=6.0)

Compute all the cutoff distances.

The cutoff is defined as

\[\begin{split}f_{R_{c}}(R_{ij}) = \begin{cases} 0.5 ( \cos( \frac{\pi R_{ij}}{R_c} ) + 1 ), & R_{ij} \le R_c \\ 0, & otherwise \end{cases}\end{split}\]
Parameters:
R : array, shape=(N_atoms, N_atoms)

A distance matrix for all the atoms (scipy.spatial.cdist)

r_cut : float, default=6.

The maximum distance allowed for atoms to be considered local to the “central atom”.

Returns:
values : array, shape=(N_atoms, N_atoms)

The new distance matrix with the cutoff function applied

molml.utils.deslugify(string)

Convert a string to a feature name and its parameters.

Parameters:
string : str

The slug string to extract values from.

Returns:
name : str

The name of the class corresponding to the string.

final_params : dict

A dictionary of the feature parameters.

molml.utils.get_angles(coords)

Get the angles between all triples of coords.

The resulting values are \([0, \pi]\) and all invalid values are NaNs.

Parameters:
coords : numpy.array, shape=(n_atoms, n_dim)

An array of all the coordinates.

Returns:
res : numpy.array, shape=(n_atoms, n_atoms, n_atoms)

An array the angles of all triples.

molml.utils.get_bond_type(element1, element2, dist)

Get the bond type between two elements based on their distance.

If there is no bond, return None.

Parameters:
element1 : str

The element of the first atom

element2 : str

The element of the second atom

dist : float

The distance between the two atoms

Returns
——-
key : str

The type of the bond

molml.utils.get_connections(elements1, coords1, elements2=None, coords2=None)

Return a dictionary edge list

If two sets of elements and coordinates are given, then they will be treated as two disjoint sets of atoms.

Each value is is a tuple of the index of the connecting atom and the bond order as a string. Where the bond order is one of [‘1’, ‘Ar’, ‘2’, ‘3’].

Note: If two sets are given, this returns only the connections from the first set to the second. This is in contrast to returning connections from both directions.

Parameters:
elements1 : list

All the elements in set 1.

coords1 : array, shape=(n_atoms, 3)

The coordinates of the atoms in set 1.

elements2 : list, default=None

All the elements in set 2.

coords2 : array, shape=(n_atoms, 3), default=None

The coordinates of the atoms in set 2.

Returns:
connections : dict, int->dict

Contains all atoms that are connected to each atom and bond type.

molml.utils.get_coulomb_matrix(numbers, coords, alpha=1, use_decay=False)

Return the coulomb matrix for the given coords and numbers.

\[\begin{split}C_{ij} = \begin{cases} \frac{Z_i Z_j}{\| r_i - r_j \|^\alpha} & i \neq j\\ \frac{1}{2} Z_i^{2.4} & i = j \end{cases}\end{split}\]
Parameters:
numbers : array-like, shape=(n_atoms, )

The atomic numbers of all the atoms

coords : array-like, shape=(n_atoms, 3)

The xyz coordinates of all the atoms (in angstroms)

alpha : number, default=6

Some value to exponentiate the distance in the coulomb matrix.

use_decay : bool, default=False

This setting defines an extra decay for the values as they get futher away from the “central atom”. This is to alleviate issues the arise as atoms enter or leave the cutoff radius.

Returns:
top : array, shape=(n_atoms, n_atoms)

The coulomb matrix

molml.utils.get_depth_threshold_mask_connections(connections, min_depth=0, max_depth=<Mock name='mock.inf' id='140635230277840'>)

Get the depth threshold mask from connections.

Parameters:
connections : dict, index->list of indices

A dictionary that contains lists of all connected atoms.

min_depth : int, default=0

The minimum depth to allow in the masking

max_depth : int, default=numpy.inf

The maximum depth to allow in the masking

Returns:
mask : numpy.array, shape=(len(connections), len(connections))

A mask of all the atoms that are less than or equal to max_depth away.

molml.utils.get_dict_func_getter(d, label='')
molml.utils.get_element_pairs(elements)

Extract all the element pairs in a molecule.

Parameters:
elements : list

All the elements in the molecule

Returns:
value : list

All the element pairs in the molecule

molml.utils.get_graph_distance(connections)

Compute the graph distance between all pairs of atoms using Floyd-Warshall

Parameters:
connections : dict, index->list of indices

A dictionary that contains lists of all connected atoms.

Returns:
dist : numpy.array, shape=(len(connections), len(connections))

The graph distance between all pairs of atoms

molml.utils.get_index_mapping(values, depth, add_unknown)

Determine the ordering and mapping of feature groups.

Parameters:
values : list

A list of possible values.

depth : int

The number of elements to use from each values value.

add_unknown : bool

Whether or not to include an extra collector for unknown values.

Returns:
map_func : function(key)->int

A function that gives the mapping index for a given key.

length : int

The length of the mapping values.

both : bool

Indicates whether both values are needed in a loop (A, B) vs (B, A).

molml.utils.get_smoothing_function(key)
molml.utils.get_spacing_function(key)
molml.utils.lerp_smooth(x)
molml.utils.load_json(f)

Load the model data from a json file

Parameters:
f : str or file descriptor

The path to save the data or a file descriptor to save it to.

Returns:
obj : Transformer

The transformer object.

molml.utils.multi_beta(f)
molml.utils.needs_reversal(chain)

Determine if the chain needs to be reversed.

This is to set the chains such that they are in a canonical ordering

Parameters:
chain : tuple

A tuple of elements to treat as a chain

Returns:
needs_flip : bool

Whether or not the chain needs to be reversed

molml.utils.sort_chain(chain)

Sort a chain from the inside out.

Parameters:
chain : tuple

A tuple of elements to treat as a chain

Returns:
chain : tuple

The sorted chain