aimsim.tasks package

Submodules

aimsim.tasks.cluster_data module

Data clustering task.

class aimsim.tasks.cluster_data.ClusterData(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs)

Constructor for the ClusterData class.

Parameters
  • configs (dict) – Dictionary of configurations. Default is None.

  • **kwargs – Keyword arguments to modify configs fields.

Notes

The configuration structure with default values are: ‘n_clusters’ (int): ‘clustering_method’ (str): None cluster_plot_settings: {‘cluster_colors’ (list):

colors from tab20 cmap,

‘response’ (str): ‘Response’}

embedding_plot_settings:
{‘plot_color’: ‘red’

‘plot_title’: ‘2-D projected space’, ‘xlabel’: ‘Dimension 1’, ‘ylabel’: ‘Dimension 2’, ‘embedding’: {‘method’: “mds”,

‘params’: {‘random_state’: 42}}}

aimsim.tasks.compare_target_molecule module

class aimsim.tasks.compare_target_molecule.CompareTargetMolecule(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs)

Get configurations. :param configs: Parameters of the task :type configs: dict

get_hits_dissimilar_to(molecule_set=None)

Get sorted list of num_hits Molecule in the Set most dissimilar to a query Molecule.This is defined as the sorted set (decreasing dissimilarity) of molecules with the highest (query_molecule, set_molecule) dissimilarity.

Parameters

molecule_set (AIMSim.chemical_datastructures MoleculeSet) – MoleculeSet object used to calculate sorted similarities. Only used if self.similarities or self.sorted_similarities not set.

Returns

Ids of most similar molecules in decreasing

order of dissimilarity.

np.ndarray(float): Corresponding similarity values.

Return type

np.ndarray(int)

get_hits_similar_to(molecule_set=None)

Get sorted list of num_hits Molecule in the Set most similar to a query Molecule.This is defined as the sorted set (decreasing similarity) of molecules with the highest (query_molecule, set_molecule) similarity.

Parameters

molecule_set (AIMSim.chemical_datastructures MoleculeSet) – MoleculeSet object used to calculate sorted similarities. Only used if self.similarities or self.sorted_similarities not set.

Returns

Ids of most similar

molecules in decreasing order of similarity.

np.ndarray(float): Corresponding similarity values.

Return type

np.ndarray(int)

aimsim.tasks.extended_similarity_indices module

Calculates the Extended Similarity Indexes as shown in this table: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00505-3/tables/1 and described in:

Both gen_sim_dict and calculate_counters were provided by Ramon Alain Miranda Quintana, similar to that which is here: https://github.com/ramirandaq/MultipleComparisons

class aimsim.tasks.extended_similarity_indices.ExtendedSimilarityIndices(configs)

Bases: Task

calculate_counters(c_total, n_fingerprints, c_threshold=None, w_factor='fraction')

Calculate 1-similarity, 0-similarity, and dissimilarity counters

Parameters
  • c_total (np.ndarray) – Vector containing the sums of each column of the fingerprint matrix.

  • n_fingerprints (int) – Number of objects to be compared.

  • c_threshold ({None, 'dissimilar', int}) – Coincidence threshold. None : Default, c_threshold = n_fingerprints % 2 ‘dissimilar’ : c_threshold = ceil(n_fingerprints / 2) int : Integer number < n_fingerprints float : Real number in the (0 , 1) interval. Indicates the % of the total data that will serve as threshold.

  • w_factor ({"fraction", "power_n"}) –

    Type of weight function that will be used. ‘fraction’ : similarity = d[k]/n

    dissimilarity = 1 - (d[k] - n_fingerprints % 2)/n_fingerprints

    ’power_n’similarity = n**-(n_fingerprints - d[k])

    dissimilarity = n**-(d[k] - n_fingerprints % 2)

    other values : similarity = dissimilarity = 1

Returns

counters – Dictionary with the weighted and non-weighted counters.

Return type

dict

Notes

Please, cite the original papers on the n-ary indices: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00505-3 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00504-4

gen_sim_dict(c_total, n_fingerprints, c_threshold=None, w_factor='fraction')

aimsim.tasks.identify_outliers module

Subclass of Task that implements an IsolationForest to identify outliers.

class aimsim.tasks.identify_outliers.IdentifyOutliers(configs=None, **kwargs)

Bases: Task

Subclass of Task to identify outliers via an IsolationForest.

Parameters

Task (abstract class) – Parent abstract class.

__init__(configs=None, **kwargs)

Get configurations. :param configs: Parameters of the task :type configs: dict

aimsim.tasks.see_property_variation_with_similarity module

Task to visualize property similarity over a dataset.

class aimsim.tasks.see_property_variation_with_similarity.SeePropertyVariationWithSimilarity(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs)

Get configurations. :param configs: Parameters of the task :type configs: dict

get_property_correlations_in_most_dissimilar(molecule_set)

Get the correlation between the property of molecules and their furthest (most dissimilar) neighbors :param molecule_set: Molecules object of the molecule database. :type molecule_set: AIMSim.chemical_datastructures MoleculeSet

Returns

Correlation between properties.

Return type

(float)

get_property_correlations_in_most_similar(molecule_set)

Get the correlation between the property of molecules and their nearest (most similar) neighbors :param molecule_set: Molecules object of the molecule database. :type molecule_set: AIMSim.chemical_datastructures MoleculeSet

Returns

Correlation between properties.

Return type

(float)

aimsim.tasks.task module

Abstract task class.

class aimsim.tasks.task.Task(configs)

Bases: ABC

__init__(configs)

Get configurations. :param configs: Parameters of the task :type configs: dict

aimsim.tasks.task_manager module

Class to call al tasks in sequence.

class aimsim.tasks.task_manager.TaskManager(tasks)

Bases: object

__init__(tasks)

Sequentially launches all the tasks from the configuration file.

Parameters

tasks (dict) – The tasks field of the config yaml containing various tasks and their parameters.

aimsim.tasks.visualize_dataset module

Create similarity plots for the dataset.

class aimsim.tasks.visualize_dataset.VisualizeDataset(configs=None, **kwargs)

Bases: Task

__init__(configs=None, **kwargs)

Constructor for the VisualizeDataset class.

Parameters
  • configs (dict) – Dictionary of configurations. Default is None.

  • **kwargs – Keyword arguments to modify configs fields.

Notes

The configuration structure with default values are: heatmap_plot_settings: {} # pass through keywords for

# aimsim.utils.plotting_scripts. # plot_heatmap

similarity_plot_settings: {} # pass through keywords for

# aimsim.utils.plotting_scripts. #plot_density

embedding_plot_settings:
{‘plot_color’: ‘red’

‘plot_title’: ‘2-D projected space’, ‘xlabel’: ‘Dimension 1’, ‘ylabel’: ‘Dimension 2’, ‘embedding’: {‘method’: “mds”,

‘params’: {‘random_state’: 42}}}

Module contents