aimsim.ops package
Submodules
aimsim.ops.clustering module
Operation for clustering molecules
- class aimsim.ops.clustering.Cluster(n_clusters, clustering_method, **kwargs)
Bases:
object
Wrapper class for different clustering algorithms. .. attribute:: clustering_method
Label for the specific algorithm used. ‘complete_linkage’, ‘complete’:
Complete linkage agglomerative hierarchical clustering [2].
- ‘average_linkage’, ‘average’:
average linkage agglomerative hierarchical clustering [2].
- ‘single_linkage’, ‘single’:
single linkage agglomerative hierarchical clustering [2].
- ‘ward’:
for Ward’s algorithm [2]. This method is useful for Euclidean descriptors.
- type
str
- n_clusters
Number of clusters.
- Type
int
- model_
The clustering estimator.
- Type
sklearn.cluster.AgglomerativeClustering
- labels_
cluster labels of the training set samples.
- Type
np.ndarray of shape (n_samples,)
- fit(X)
Fit the estimator.
- predict(X)
Get prediction from the estimator.
- get_labels()
Get cluster labels of the training set samples.
References
- [1] Hastie, T., Tibshirani R. and Friedman J.,
The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).
- [2] Murtagh, F. and Contreras, P., Algorithms for hierarchical
clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53
- __init__(n_clusters, clustering_method, **kwargs)
Constructor for the Cluster class. :param n_clusters: Number of clusters. :type n_clusters: int :param clustering_method: Label for the specific algorithm used.
- ‘complete_linkage’, ‘complete’ for complete linkage
agglomerative hierarchical clustering [2].
- ‘average_linkage’, ‘average’ for average linkage agglomerative
hierarchical clustering [2].
- ‘single_linkage’, ‘single’ for single linkage agglomerative
hierarchical clustering [2].
- ‘ward’ for Ward’s algorithm [2]. This method is useful for
Euclidean descriptors.
- Parameters
kwargs (dict) – Keyword arguments. These are passed to the estimators. Refer to the following documentation page for agglomerative hierarchical clustering: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html
References: [1] Hastie, T., Tibshirani R. and Friedman J.,
The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).
- [2] Murtagh, F. and Contreras, P., Algorithms for hierarchical
clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53
- fit(X)
Fit the estimator. :param X: Distance matrix. :type X: np.ndarray or list
- get_labels()
Get cluster labels of the training set samples. :returns:
- Returns self.labels_,
cluster labels of the training set samples.
- Return type
np.ndarray of shape (n_samples,))
- predict(X)
Get predictions from the estimator. :param X: samples to predict on. :type X: np.ndarray or list
- Raises
sklearn.exceptions.NotFittedError if estimator is not fitted. –
aimsim.ops.descriptor module
This module contains methods to featurize molecules.
- class aimsim.ops.descriptor.Descriptor(value=None)
Bases:
object
Class for descriptors.
- label_
Label used to denote the type of descriptor being used.
- Type
str
- numpy_
Value of the descriptor in the numpy format.
- Type
np.ndarray
- rdkit_
Value of the descriptor in the rdkit format.
- Type
rdkit.DataStructs.cDataStructs.UIntSparseIntVec
- check_init()
check if the Descriptor object is initialized. This is done by checking the existence of the numpy_ or rdkit_ attribute.
- __init__(value=None)
- check_init()
Check initialization status of the Descriptor object.
- Returns
True if object is initialized.
- Return type
(bool)
- static fold_to_equal_length(fingerprint1, fingerprint2)
Get back two fingerprint arrays of equal length. The longer fingerprint is folded to the size of the smaller one.
- Parameters
fingerprint1 (Descriptor) – Fingerprint one.
fingerprint2 (Descriptor) – Fingerprint two.
- Returns
(np.ndarray, np.ndarray)
- static get_all_supported_descriptors()
Returns a list of descriptors which _can_ be used with AIMSim but are considered experimental or are complex/rarely used and are therefore excluded from the UI.
- Returns
List of strings of all supported descriptors.
- Return type
List
- get_folded_fprint(fold_to_length)
Get the folded value of a fingerprint to a specified length. :param fold_to_length: Number of bits to fold to. :type fold_to_length: int
- Returns
Folded fingerprint.
- Return type
(np.ndarray)
- get_label()
- get_params()
- static get_supported_fprints()
Return a list of strings for the currently implemented molecular fingerprints. :returns: List of strings. :rtype: List
- is_fingerprint()
- make_fingerprint(molecule_graph, fingerprint_type, fingerprint_params=None)
Make fingerprint of a molecule based on a graph representation. Set the state of the descriptor to this fingerprint.
- Parameters
molecule_graph (RDKIT object) – The graph object used to make a fingerprint.
fingerprint_type (str) – label for the type of fingerprint. Invokes get_supported_descriptors()[‘fingerprints’] for list of supported fingerprints.
fingerprint_params (dict) – Keyword arguments used to modify parameters of fingerprint. Default is None.
- set_manually(arbitrary_descriptor_val)
Set the descriptor value manually based on user specified value.
- Parameters
arbitrary_descriptor_val (np.ndarray or list) – Vectorized representation of descriptor values.
- static shorten_label(label)
Shorten the label of a fingerprint. Useful for plotting purposes.
- Parameters
label (str) – Label of fingerprint to shorten.
- Returns
Shortened label.
- Return type
(str)
- Raises
InvalidConfigurationError – if label not in get_supported_descriptors()
- Currently implemented shortening strategies:
Fingerprints: remove ‘_fingerprint’ from the label
- to_numpy()
Return numpy_ attribute of Descriptor. Attribute will be initialized if not present.
- Returns
Numpy vector of descriptor.
- Return type
(np.array)
- to_rdkit()
Return rdkit_ attribute of Descriptor.
- Returns
Fingerprint value as a bit vector.
- Return type
(DataStructs.ExplicitBitVect)
- Raises
(NotInitializedError) – If object not initialized with a fingerprint.
(ValueError) – If only arbitrary numpy descriptor is used to initialize the object. This cannot be converted to bit vectors.
aimsim.ops.similarity_measures module
This module contains methods to find similarities between molecules.
- class aimsim.ops.similarity_measures.RegisteringType(name, bases, attrs)
Bases:
type
- __init__(name, bases, attrs)
- class aimsim.ops.similarity_measures.SimilarityMeasure(metric)
Bases:
object
- __init__(metric)
- static get_compatible_metrics()
Return a dictionary with which types of metrics each fingerprint supports.
- Returns
comptabile FP’s: metrics
- Return type
dict
- static get_supported_binary_metrics()
Return a list of strings for the currently implemented similarity measures, aka metrics, which only support binary vectors.
- Returns
List of strings.
- Return type
List
- static get_supported_general_metrics()
Return a list of strings for the currently implemented similarity measures, aka metrics, which support vectors other then binary vectors.
- Returns
List of strings.
- Return type
List
- static get_supported_metrics()
Return a list of strings for the currently implemented similarity measures, aka metrics.
- Returns
List of strings.
- Return type
List
- static get_uniq_metrics()
Return a list of strings for the currently implemented similarity measures, aka metrics. Each unique similarity metric is uniquely represented with redundant tags removed.
- Returns
List of strings.
- Return type
List
- is_distance_metric()
Check if the similarity measure is a distance metric.
- Returns
True if it is a distance metric.
- Return type
bool
- aimsim.ops.similarity_measures.register(*args, type='discrete', to_distance=None)