aimsim.ops package

Submodules

aimsim.ops.clustering module

Operation for clustering molecules

class aimsim.ops.clustering.Cluster(n_clusters, clustering_method, **kwargs)

Bases: object

Wrapper class for different clustering algorithms. .. attribute:: clustering_method

Label for the specific algorithm used. ‘complete_linkage’, ‘complete’:

Complete linkage agglomerative hierarchical clustering [2].

‘average_linkage’, ‘average’:
average linkage agglomerative hierarchical clustering [2].

‘single_linkage’, ‘single’:
single linkage agglomerative hierarchical clustering [2].

‘ward’:
for Ward’s algorithm [2]. This method is useful for Euclidean descriptors.

type

str

n_clusters

Number of clusters.

Type: int

model_

The clustering estimator.

Type: sklearn.cluster.AgglomerativeClustering

labels_

cluster labels of the training set samples.

Type: np.ndarray of shape (n_samples,)

fit(X): Fit the estimator.

predict(X): Get prediction from the estimator.

get_labels(): Get cluster labels of the training set samples.

References

[1] Hastie, T., Tibshirani R. and Friedman J.,: The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).
[2] Murtagh, F. and Contreras, P., Algorithms for hierarchical: clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53

__init__(n_clusters, clustering_method, **kwargs)

Constructor for the Cluster class. :param n_clusters: Number of clusters. :type n_clusters: int :param clustering_method: Label for the specific algorithm used.

‘complete_linkage’, ‘complete’ for complete linkage
agglomerative hierarchical clustering [2].

‘average_linkage’, ‘average’ for average linkage agglomerative
hierarchical clustering [2].

‘single_linkage’, ‘single’ for single linkage agglomerative
hierarchical clustering [2].

‘ward’ for Ward’s algorithm [2]. This method is useful for
Euclidean descriptors.

Parameters: kwargs (dict) – Keyword arguments. These are passed to the estimators. Refer to the following documentation page for agglomerative hierarchical clustering: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html

References: [1] Hastie, T., Tibshirani R. and Friedman J.,

The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).

[2] Murtagh, F. and Contreras, P., Algorithms for hierarchical: clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53

fit(X): Fit the estimator. :param X: Distance matrix. :type X: np.ndarray or list

get_labels()

Get cluster labels of the training set samples. :returns:

Returns self.labels_,
cluster labels of the training set samples.

Return type: np.ndarray of shape (n_samples,))

predict(X)

Get predictions from the estimator. :param X: samples to predict on. :type X: np.ndarray or list

Raises: sklearn.exceptions.NotFittedError if estimator is not fitted. –

aimsim.ops.descriptor module

This module contains methods to featurize molecules.

class aimsim.ops.descriptor.Descriptor(value=None)

Bases: object

Class for descriptors.

label_

Label used to denote the type of descriptor being used.

Type: str

numpy_

Value of the descriptor in the numpy format.

Type: np.ndarray

rdkit_

Value of the descriptor in the rdkit format.

Type: rdkit.DataStructs.cDataStructs.UIntSparseIntVec

to_numpy(): Get the numpy_ attribute. If it does not exist, it is created.

to_rdkit(): Get the rdkit_ attribute. If it does not exist, it is created.

check_init(): check if the Descriptor object is initialized. This is done by checking the existence of the numpy_ or rdkit_ attribute.

__init__(value=None)

check_init()

Check initialization status of the Descriptor object.

Returns: True if object is initialized.
Return type: (bool)

static fold_to_equal_length(fingerprint1, fingerprint2)

Get back two fingerprint arrays of equal length. The longer fingerprint is folded to the size of the smaller one.

Parameters

fingerprint1 (Descriptor) – Fingerprint one.
fingerprint2 (Descriptor) – Fingerprint two.

Returns

(np.ndarray, np.ndarray)

static get_all_supported_descriptors()

Returns a list of descriptors which _can_ be used with AIMSim but are considered experimental or are complex/rarely used and are therefore excluded from the UI.

Returns: List of strings of all supported descriptors.
Return type: List

get_folded_fprint(fold_to_length)

Get the folded value of a fingerprint to a specified length. :param fold_to_length: Number of bits to fold to. :type fold_to_length: int

Returns: Folded fingerprint.
Return type: (np.ndarray)

get_label()

get_params()

static get_supported_fprints(): Return a list of strings for the currently implemented molecular fingerprints. :returns: List of strings. :rtype: List

is_fingerprint()

make_fingerprint(molecule_graph, fingerprint_type, fingerprint_params=None)

Make fingerprint of a molecule based on a graph representation. Set the state of the descriptor to this fingerprint.

Parameters

molecule_graph (RDKIT object) – The graph object used to make a fingerprint.
fingerprint_type (str) – label for the type of fingerprint. Invokes get_supported_descriptors()[‘fingerprints’] for list of supported fingerprints.
fingerprint_params (dict) – Keyword arguments used to modify parameters of fingerprint. Default is None.

set_manually(arbitrary_descriptor_val)

Set the descriptor value manually based on user specified value.

Parameters: arbitrary_descriptor_val (np.ndarray or list) – Vectorized representation of descriptor values.

static shorten_label(label)

Shorten the label of a fingerprint. Useful for plotting purposes.

Parameters: label (str) – Label of fingerprint to shorten.
Returns: Shortened label.
Return type: (str)
Raises: InvalidConfigurationError – if label not in get_supported_descriptors()

Currently implemented shortening strategies:

Fingerprints: remove ‘_fingerprint’ from the label

to_numpy()

Return numpy_ attribute of Descriptor. Attribute will be initialized if not present.

Returns: Numpy vector of descriptor.
Return type: (np.array)

to_rdkit()

Return rdkit_ attribute of Descriptor.

Returns

Fingerprint value as a bit vector.

Return type

(DataStructs.ExplicitBitVect)

Raises

(NotInitializedError) – If object not initialized with a fingerprint.
(ValueError) – If only arbitrary numpy descriptor is used to initialize the object. This cannot be converted to bit vectors.

aimsim.ops.similarity_measures module

This module contains methods to find similarities between molecules.

class aimsim.ops.similarity_measures.RegisteringType(name, bases, attrs)

Bases: type

__init__(name, bases, attrs)

class aimsim.ops.similarity_measures.SimilarityMeasure(metric)

Bases: object

__init__(metric)

static get_compatible_metrics()

Return a dictionary with which types of metrics each fingerprint supports.

Returns: comptabile FP’s: metrics
Return type: dict

static get_supported_binary_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics, which only support binary vectors.

Returns: List of strings.
Return type: List

static get_supported_general_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics, which support vectors other then binary vectors.

Returns: List of strings.
Return type: List

static get_supported_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics.

Returns: List of strings.
Return type: List

static get_uniq_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics. Each unique similarity metric is uniquely represented with redundant tags removed.

Returns: List of strings.
Return type: List

is_distance_metric()

Check if the similarity measure is a distance metric.

Returns: True if it is a distance metric.
Return type: bool

aimsim.ops.similarity_measures.register(*args, type='discrete', to_distance=None)

aimsim.ops package

Submodules

aimsim.ops.clustering module

aimsim.ops.descriptor module

aimsim.ops.similarity_measures module

Module contents