# molml.utils module¶

A collection of assorted utility functions.

class molml.utils.IndexMap(values, depth, add_unknown=False, use_comb_idxs=False)

Bases: object

An object to handle dynamic mapping of groups to indices.

The intention of the class is to allow for dynamic subselection from lists to give new mapping groups.

For example, with the a group of values that have length three might be reduced as follows:

(‘A’, ‘B’, ‘C’) -> (‘A’, ‘C’)

using some predefined index selection. Then, this new reduced value is used to find the index for this new value just like a dict that maps the new shorter values to an int index.

This class also allows handling of groups that are not in the map.

Parameters: values : list of tuples A collection of values overwhich the mapping will be done. depth : int The number of values that are retained in the subselection. add_unknown : bool, default=False Whether or not to allocate an UNKNOWN index. use_comb_idxs : bool, default=False Whether or not to use all combinations of indices when doing the subselection. If this is False, it will use the old style of trying to select indices from the center of the value outward.
get_idx_iter(self, key, other=None)
static get_index_mapping(values, depth, idx_groups)

Determine the ordering and mapping of feature groups.

Parameters: values : list A list of possible values. depth : int The number of elements to use from each values value. idx_groups : list of list of int A list of list of indices to select. mapping : dict(key)->int A dict that gives the mapping index for a given key.
get_value_order(self)
is_valid(self, values)
class molml.utils.LazyValues(connections=None, coords=None, numbers=None, elements=None, unit_cell=None)

Bases: object

An object to store molecule graph properties in a lazy fashion.

This object allows only needing to compute different molecule graph properties if they are needed. The prime example of this being the computation of connections.

Parameters: connections : dict, key->list of keys, default=None A dictionary edge table with all the bidirectional connections. numbers : array-like, shape=(n_atoms, ), default=None The atomic numbers of all the atoms. coords : array-like, shape=(n_atoms, 3), default=None The xyz coordinates of all the atoms (in angstroms). elements : array-like, shape=(n_atoms, ), default=None The element symbols of all the atoms. unit_cell : array-like, shape=(3, 3), default=None An array of unit cell basis vectors, where the vectors are columns. connections : dict, key->list of keys A dictionary edge table with all the bidirectional connections. If the initialized value for this was None, then this will be computed from the coords and numbers/elements. numbers : array, shape=(n_atoms, ) The atomic numbers of all the atoms. If the initialized value for this was None, then this will be computed from the elements. coords : array, shape=(n_atoms, 3) The xyz coordinates of all the atoms (in angstroms). elements : array, shape=(n_atoms, ) The element symbols of all the atoms. If the initialized value for this was None, then this will be computed from the numbers. unit_cell : array, shape=(3, 3) An array of unit cell basis vectors, where the vectors are columns.
connections
coords
elements
fill_in_crystal(self, radius=None, units=None)

Duplicate the atoms to form a crystal.

Parameters: radius : float, default=None Specifies the radius of unit cell points to include units : list or int, default=None Specifies the number of unit cells to include on each axis. These will all be equal if it is an int. ValueError If radius and units are either both None, or if both are not None.
numbers
unit_cell
molml.utils.cosine_decay(R, r_cut=6.0)

Compute all the cutoff distances.

The cutoff is defined as

$\begin{split}f_{R_{c}}(R_{ij}) = \begin{cases} 0.5 ( \cos( \frac{\pi R_{ij}}{R_c} ) + 1 ), & R_{ij} \le R_c \\ 0, & otherwise \end{cases}\end{split}$
Parameters: R : array, shape=(N_atoms, N_atoms) A distance matrix for all the atoms (scipy.spatial.cdist) r_cut : float, default=6. The maximum distance allowed for atoms to be considered local to the “central atom”. values : array, shape=(N_atoms, N_atoms) The new distance matrix with the cutoff function applied
molml.utils.deslugify(string)

Convert a string to a feature name and its parameters.

Parameters: string : str The slug string to extract values from. name : str The name of the class corresponding to the string. final_params : dict A dictionary of the feature parameters.
molml.utils.get_angles(coords)

Get the angles between all triples of coords.

The resulting values are $$[0, \pi]$$ and all invalid values are NaNs.

Parameters: coords : numpy.array, shape=(n_atoms, n_dim) An array of all the coordinates. res : numpy.array, shape=(n_atoms, n_atoms, n_atoms) An array the angles of all triples.
molml.utils.get_bond_type(element1, element2, dist)

Get the bond type between two elements based on their distance.

If there is no bond, return None.

Parameters: element1 : str The element of the first atom element2 : str The element of the second atom dist : float The distance between the two atoms Returns ——- key : str The type of the bond
molml.utils.get_connections(elements1, coords1, elements2=None, coords2=None)

Return a dictionary edge list

If two sets of elements and coordinates are given, then they will be treated as two disjoint sets of atoms.

Each value is is a tuple of the index of the connecting atom and the bond order as a string. Where the bond order is one of [‘1’, ‘Ar’, ‘2’, ‘3’].

Note: If two sets are given, this returns only the connections from the first set to the second. This is in contrast to returning connections from both directions.

Parameters: elements1 : list All the elements in set 1. coords1 : array, shape=(n_atoms, 3) The coordinates of the atoms in set 1. elements2 : list, default=None All the elements in set 2. coords2 : array, shape=(n_atoms, 3), default=None The coordinates of the atoms in set 2. connections : dict, int->dict Contains all atoms that are connected to each atom and bond type.
molml.utils.get_coulomb_matrix(numbers, coords, alpha=1, use_decay=False)

Return the coulomb matrix for the given coords and numbers.

$\begin{split}C_{ij} = \begin{cases} \frac{Z_i Z_j}{\| r_i - r_j \|^\alpha} & i \neq j\\ \frac{1}{2} Z_i^{2.4} & i = j \end{cases}\end{split}$
Parameters: numbers : array-like, shape=(n_atoms, ) The atomic numbers of all the atoms coords : array-like, shape=(n_atoms, 3) The xyz coordinates of all the atoms (in angstroms) alpha : number, default=6 Some value to exponentiate the distance in the coulomb matrix. use_decay : bool, default=False This setting defines an extra decay for the values as they get futher away from the “central atom”. This is to alleviate issues the arise as atoms enter or leave the cutoff radius. top : array, shape=(n_atoms, n_atoms) The coulomb matrix
molml.utils.get_depth_threshold_mask_connections(connections, min_depth=0, max_depth=<Mock name='mock.inf' id='140147023639312'>)

Get the depth threshold mask from connections.

Parameters: connections : dict, index->list of indices A dictionary that contains lists of all connected atoms. min_depth : int, default=0 The minimum depth to allow in the masking max_depth : int, default=numpy.inf The maximum depth to allow in the masking mask : numpy.array, shape=(len(connections), len(connections)) A mask of all the atoms that are less than or equal to max_depth away.
molml.utils.get_dict_func_getter(d, label='')
molml.utils.get_element_pairs(elements)

Extract all the element pairs in a molecule.

Parameters: elements : list All the elements in the molecule value : list All the element pairs in the molecule
molml.utils.get_graph_distance(connections)

Compute the graph distance between all pairs of atoms using Floyd-Warshall

Parameters: connections : dict, index->list of indices A dictionary that contains lists of all connected atoms. dist : numpy.array, shape=(len(connections), len(connections)) The graph distance between all pairs of atoms
molml.utils.get_smoothing_function(key)
molml.utils.get_spacing_function(key)
molml.utils.lerp_smooth(x)
molml.utils.load_json(f)

Load the model data from a json file

Parameters: f : str or file descriptor The path to save the data or a file descriptor to save it to. obj : Transformer The transformer object.
molml.utils.multi_beta(f)
molml.utils.needs_reversal(chain)

Determine if the chain needs to be reversed.

This is to set the chains such that they are in a canonical ordering

Parameters: chain : tuple A tuple of elements to treat as a chain needs_flip : bool Whether or not the chain needs to be reversed
molml.utils.sort_chain(chain)

Sort a chain from the inside out.

Parameters: chain : tuple A tuple of elements to treat as a chain chain : tuple The sorted chain