aimsim.tasks package

Submodules

aimsim.tasks.cluster_data module

Data clustering task.

class aimsim.tasks.cluster_data.ClusterData(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs)

Constructor for the ClusterData class.

Parameters

configs (dict) – Dictionary of configurations. Default is None.
**kwargs – Keyword arguments to modify configs fields.

Notes

The configuration structure with default values are: ‘n_clusters’ (int): ‘clustering_method’ (str): None cluster_plot_settings: {‘cluster_colors’ (list):

colors from tab20 cmap,

‘response’ (str): ‘Response’}

embedding_plot_settings:

{‘plot_color’: ‘red’: ‘plot_title’: ‘2-D projected space’, ‘xlabel’: ‘Dimension 1’, ‘ylabel’: ‘Dimension 2’, ‘embedding’: {‘method’: “mds”,

‘params’: {‘random_state’: 42}}}

aimsim.tasks.compare_target_molecule module

class aimsim.tasks.compare_target_molecule.CompareTargetMolecule(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs): Get configurations. :param configs: Parameters of the task :type configs: dict

get_hits_dissimilar_to(molecule_set=None)

Get sorted list of num_hits Molecule in the Set most dissimilar to a query Molecule.This is defined as the sorted set (decreasing dissimilarity) of molecules with the highest (query_molecule, set_molecule) dissimilarity.

Parameters

molecule_set (AIMSim.chemical_datastructures MoleculeSet) – MoleculeSet object used to calculate sorted similarities. Only used if self.similarities or self.sorted_similarities not set.

Returns

Ids of most similar molecules in decreasing: order of dissimilarity.

np.ndarray(float): Corresponding similarity values.

Return type

np.ndarray(int)

get_hits_similar_to(molecule_set=None)

Get sorted list of num_hits Molecule in the Set most similar to a query Molecule.This is defined as the sorted set (decreasing similarity) of molecules with the highest (query_molecule, set_molecule) similarity.

Parameters

molecule_set (AIMSim.chemical_datastructures MoleculeSet) – MoleculeSet object used to calculate sorted similarities. Only used if self.similarities or self.sorted_similarities not set.

Returns

Ids of most similar: molecules in decreasing order of similarity.

np.ndarray(float): Corresponding similarity values.

Return type

np.ndarray(int)

aimsim.tasks.extended_similarity_indices module

Calculates the Extended Similarity Indexes as shown in this table: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00505-3/tables/1 and described in:

Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics - https://doi.org/10.1186/s13321-021-00505-3

Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection - https://link.springer.com/article/10.1186/s13321-021-00504-4

Both gen_sim_dict and calculate_counters were provided by Ramon Alain Miranda Quintana, similar to that which is here: https://github.com/ramirandaq/MultipleComparisons

class aimsim.tasks.extended_similarity_indices.ExtendedSimilarityIndices(configs)

Bases: Task

calculate_counters(c_total, n_fingerprints, c_threshold=None, w_factor='fraction')

Calculate 1-similarity, 0-similarity, and dissimilarity counters

Parameters

c_total (np.ndarray) – Vector containing the sums of each column of the fingerprint matrix.
n_fingerprints (int) – Number of objects to be compared.
c_threshold ({None, 'dissimilar', int}) – Coincidence threshold. None : Default, c_threshold = n_fingerprints % 2 ‘dissimilar’ : c_threshold = ceil(n_fingerprints / 2) int : Integer number < n_fingerprints float : Real number in the (0 , 1) interval. Indicates the % of the total data that will serve as threshold.
w_factor ({"fraction", "power_n"}) –
Type of weight function that will be used. ‘fraction’ : similarity = d[k]/n

dissimilarity = 1 - (d[k] - n_fingerprints % 2)/n_fingerprints

’power_n’similarity = n**-(n_fingerprints - d[k])
dissimilarity = n**-(d[k] - n_fingerprints % 2)

other values : similarity = dissimilarity = 1

Returns

counters – Dictionary with the weighted and non-weighted counters.

Return type

dict

Notes

Please, cite the original papers on the n-ary indices: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00505-3 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00504-4

gen_sim_dict(c_total, n_fingerprints, c_threshold=None, w_factor='fraction')

aimsim.tasks.identify_outliers module

Subclass of Task that implements an IsolationForest to identify outliers.

class aimsim.tasks.identify_outliers.IdentifyOutliers(configs=None, **kwargs)

Bases: Task

Subclass of Task to identify outliers via an IsolationForest.

Parameters: Task (abstract class) – Parent abstract class.

__init__(configs=None, **kwargs): Get configurations. :param configs: Parameters of the task :type configs: dict

aimsim.tasks.measure_search module

Identify most appropriate choice of fingerprint and similarity measure by evaluating the response of the nearest and furthest neighbors. This is called measure choice for brevity (although both measure and features are chosen)

class aimsim.tasks.measure_search.MeasureSearch(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs): Get configurations. :param configs: Parameters of the task :type configs: dict

get_best_measure(molecule_set_configs, fingerprint_type=None, similarity_measure=None, subsample_subset_size=0.01, optim_algo='max_min', only_metric=False, show_top=0)

Get the best measure for quantity of interest.

Parameters

molecule_set_configs (dict) – All configurations (except fingerprint_type and similarity_measure) needed to form the moleculeSet.
fingerprint_type (str) – Label to indicate which fingerprint to use. If supplied, fingerprint is fixed and optimization carried out over similarity measures. Use None to indicate that optimization needs to be carried out over fingerprints. Default is None.
similarity_measure (str) – Label to indicate which similarity measure to use. If supplied, similarity measure is fixed and optimization carried out over similarity measures. Use None to indicate that optimization needs to be carried out over fingerprints. Default is None.
subsample_subset_size (float) – Fraction of molecule_set to subsample. This is separate from the sample_ratio parameter used when creating a moleculeSet since it is recommended to have an more aggressive subsampling strategy for this task due to the combinatorial explosion of looking at multiple fingerprints and similarity measures. Default is 0.01.
optim_algo (str) –
Label to indicate the optimization algorithm chosen. Options are: ‘max’: The measure choice which maximizes correlation

of properties between nearest neighbors (most similar). This is the default.

’min’: The measure choice which minimizes the absolute value of
property correlation between furthest neighbors (most dissimilar).

’max_min’: The measure choice which maximizes correlation
of properties between nearest neighbors (most similar) and minimizes he absolute value of property correlation between furthest neighbors (most dissimilar). This is the default.
only_metric (bool) – If True only similarity measures satisfying the metricity property (i.e. can be converted to distance metrics) are selected.

Returns

Top performer with fields:

fingerprint_type (str): Label for fingerprint type

similarity_measure (str): Label for similarity measure nearest_neighbor_correlation (float): Correlation of property

of molecule and its nearest neighbor.

furthest_neighbor_correlation (float): Correlation of property: of molecule and its furthest neighbor.
score_ (float): Overall score based on optimization strategy.: More is better.

Return type

(NamedTuple)

aimsim.tasks.see_property_variation_with_similarity module

Task to visualize property similarity over a dataset.

class aimsim.tasks.see_property_variation_with_similarity.SeePropertyVariationWithSimilarity(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs): Get configurations. :param configs: Parameters of the task :type configs: dict

get_property_correlations_in_most_dissimilar(molecule_set)

Get the correlation between the property of molecules and their furthest (most dissimilar) neighbors :param molecule_set: Molecules object of the molecule database. :type molecule_set: AIMSim.chemical_datastructures MoleculeSet

Returns: Correlation between properties.
Return type: (float)

get_property_correlations_in_most_similar(molecule_set)

Get the correlation between the property of molecules and their nearest (most similar) neighbors :param molecule_set: Molecules object of the molecule database. :type molecule_set: AIMSim.chemical_datastructures MoleculeSet

Returns: Correlation between properties.
Return type: (float)

aimsim.tasks.task module

Abstract task class.

class aimsim.tasks.task.Task(configs)

Bases: ABC

__init__(configs): Get configurations. :param configs: Parameters of the task :type configs: dict

aimsim.tasks.task_manager module

Class to call al tasks in sequence.

class aimsim.tasks.task_manager.TaskManager(tasks)

Bases: object

__init__(tasks)

Sequentially launches all the tasks from the configuration file.

Parameters: tasks (dict) – The tasks field of the config yaml containing various tasks and their parameters.

aimsim.tasks.visualize_dataset module

Create similarity plots for the dataset.

class aimsim.tasks.visualize_dataset.VisualizeDataset(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs)

Constructor for the VisualizeDataset class.

Parameters

configs (dict) – Dictionary of configurations. Default is None.
**kwargs – Keyword arguments to modify configs fields.

Notes

The configuration structure with default values are: heatmap_plot_settings: {} # pass through keywords for

# aimsim.utils.plotting_scripts. # plot_heatmap

similarity_plot_settings: {} # pass through keywords for

# aimsim.utils.plotting_scripts. #plot_density

embedding_plot_settings:

{‘plot_color’: ‘red’: ‘plot_title’: ‘2-D projected space’, ‘xlabel’: ‘Dimension 1’, ‘ylabel’: ‘Dimension 2’, ‘embedding’: {‘method’: “mds”,

‘params’: {‘random_state’: 42}}}

aimsim.tasks package

Submodules

aimsim.tasks.cluster_data module

aimsim.tasks.compare_target_molecule module

aimsim.tasks.extended_similarity_indices module

aimsim.tasks.identify_outliers module

aimsim.tasks.measure_search module

aimsim.tasks.see_property_variation_with_similarity module

aimsim.tasks.task module

aimsim.tasks.task_manager module

aimsim.tasks.visualize_dataset module

Module contents