aimsim.ops package

Submodules

aimsim.ops.clustering module

Operation for clustering molecules

class aimsim.ops.clustering.Cluster(n_clusters, clustering_method, **kwargs)

Bases: object

Wrapper class for different clustering algorithms. .. attribute:: clustering_method

Label for the specific algorithm used. ‘complete_linkage’, ‘complete’:

Complete linkage agglomerative hierarchical clustering [2].

‘average_linkage’, ‘average’:

average linkage agglomerative hierarchical clustering [2].

‘single_linkage’, ‘single’:

single linkage agglomerative hierarchical clustering [2].

‘ward’:

for Ward’s algorithm [2]. This method is useful for Euclidean descriptors.

type

str

n_clusters

Number of clusters.

Type

int

model_

The clustering estimator.

Type

sklearn.cluster.AgglomerativeClustering

labels_

cluster labels of the training set samples.

Type

np.ndarray of shape (n_samples,)

fit(X)

Fit the estimator.

predict(X)

Get prediction from the estimator.

get_labels()

Get cluster labels of the training set samples.

References

[1] Hastie, T., Tibshirani R. and Friedman J.,

The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).

[2] Murtagh, F. and Contreras, P., Algorithms for hierarchical

clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53

__init__(n_clusters, clustering_method, **kwargs)

Constructor for the Cluster class. :param n_clusters: Number of clusters. :type n_clusters: int :param clustering_method: Label for the specific algorithm used.

‘complete_linkage’, ‘complete’ for complete linkage

agglomerative hierarchical clustering [2].

‘average_linkage’, ‘average’ for average linkage agglomerative

hierarchical clustering [2].

‘single_linkage’, ‘single’ for single linkage agglomerative

hierarchical clustering [2].

‘ward’ for Ward’s algorithm [2]. This method is useful for

Euclidean descriptors.

Parameters

kwargs (dict) – Keyword arguments. These are passed to the estimators. Refer to the following documentation page for agglomerative hierarchical clustering: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html

References: [1] Hastie, T., Tibshirani R. and Friedman J.,

The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).

[2] Murtagh, F. and Contreras, P., Algorithms for hierarchical

clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53

fit(X)

Fit the estimator. :param X: Distance matrix. :type X: np.ndarray or list

get_labels()

Get cluster labels of the training set samples. :returns:

Returns self.labels_,

cluster labels of the training set samples.

Return type

np.ndarray of shape (n_samples,))

predict(X)

Get predictions from the estimator. :param X: samples to predict on. :type X: np.ndarray or list

Raises

sklearn.exceptions.NotFittedError if estimator is not fitted.

aimsim.ops.descriptor module

This module contains methods to featurize molecules.

class aimsim.ops.descriptor.Descriptor(value=None)

Bases: object

Class for descriptors.

label_

Label used to denote the type of descriptor being used.

Type

str

numpy_

Value of the descriptor in the numpy format.

Type

np.ndarray

rdkit_

Value of the descriptor in the rdkit format.

Type

rdkit.DataStructs.cDataStructs.UIntSparseIntVec

to_numpy()

Get the numpy_ attribute. If it does not exist, it is created.

to_rdkit()

Get the rdkit_ attribute. If it does not exist, it is created.

check_init()

check if the Descriptor object is initialized. This is done by checking the existence of the numpy_ or rdkit_ attribute.

__init__(value=None)
check_init()

Check initialization status of the Descriptor object.

Returns

True if object is initialized.

Return type

(bool)

static fold_to_equal_length(fingerprint1, fingerprint2)

Get back two fingerprint arrays of equal length. The longer fingerprint is folded to the size of the smaller one.

Parameters
Returns

(np.ndarray, np.ndarray)

static get_all_supported_descriptors()

Returns a list of descriptors which _can_ be used with AIMSim but are considered experimental or are complex/rarely used and are therefore excluded from the UI.

Returns

List of strings of all supported descriptors.

Return type

List

get_folded_fprint(fold_to_length)

Get the folded value of a fingerprint to a specified length. :param fold_to_length: Number of bits to fold to. :type fold_to_length: int

Returns

Folded fingerprint.

Return type

(np.ndarray)

get_label()
get_params()
static get_supported_fprints()

Return a list of strings for the currently implemented molecular fingerprints. :returns: List of strings. :rtype: List

is_fingerprint()
make_fingerprint(molecule_graph, fingerprint_type, fingerprint_params=None)

Make fingerprint of a molecule based on a graph representation. Set the state of the descriptor to this fingerprint.

Parameters
  • molecule_graph (RDKIT object) – The graph object used to make a fingerprint.

  • fingerprint_type (str) – label for the type of fingerprint. Invokes get_supported_descriptors()[‘fingerprints’] for list of supported fingerprints.

  • fingerprint_params (dict) – Keyword arguments used to modify parameters of fingerprint. Default is None.

set_manually(arbitrary_descriptor_val)

Set the descriptor value manually based on user specified value.

Parameters

arbitrary_descriptor_val (np.ndarray or list) – Vectorized representation of descriptor values.

static shorten_label(label)

Shorten the label of a fingerprint. Useful for plotting purposes.

Parameters

label (str) – Label of fingerprint to shorten.

Returns

Shortened label.

Return type

(str)

Raises

InvalidConfigurationError – if label not in get_supported_descriptors()

Currently implemented shortening strategies:
  1. Fingerprints: remove ‘_fingerprint’ from the label

to_numpy()

Return numpy_ attribute of Descriptor. Attribute will be initialized if not present.

Returns

Numpy vector of descriptor.

Return type

(np.array)

to_rdkit()

Return rdkit_ attribute of Descriptor.

Returns

Fingerprint value as a bit vector.

Return type

(DataStructs.ExplicitBitVect)

Raises
  • (NotInitializedError) – If object not initialized with a fingerprint.

  • (ValueError) – If only arbitrary numpy descriptor is used to initialize the object. This cannot be converted to bit vectors.

aimsim.ops.similarity_measures module

This module contains methods to find similarities between molecules.

class aimsim.ops.similarity_measures.RegisteringType(name, bases, attrs)

Bases: type

__init__(name, bases, attrs)
class aimsim.ops.similarity_measures.SimilarityMeasure(metric)

Bases: object

__init__(metric)
static get_compatible_metrics()

Return a dictionary with which types of metrics each fingerprint supports.

Returns

comptabile FP’s: metrics

Return type

dict

static get_supported_binary_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics, which only support binary vectors.

Returns

List of strings.

Return type

List

static get_supported_general_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics, which support vectors other then binary vectors.

Returns

List of strings.

Return type

List

static get_supported_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics.

Returns

List of strings.

Return type

List

static get_uniq_metrics()

Return a list of strings for the currently implemented similarity measures, aka metrics. Each unique similarity metric is uniquely represented with redundant tags removed.

Returns

List of strings.

Return type

List

is_distance_metric()

Check if the similarity measure is a distance metric.

Returns

True if it is a distance metric.

Return type

bool

aimsim.ops.similarity_measures.register(*args, type='discrete', to_distance=None)

Module contents