aimsim.tasks package
Submodules
aimsim.tasks.cluster_data module
Data clustering task.
- class aimsim.tasks.cluster_data.ClusterData(configs=None, **kwargs)
Bases:
Task
- __init__(configs=None, **kwargs)
Constructor for the ClusterData class.
- Parameters
configs (dict) – Dictionary of configurations. Default is None.
**kwargs – Keyword arguments to modify configs fields.
Notes
The configuration structure with default values are: ‘n_clusters’ (int): ‘clustering_method’ (str): None cluster_plot_settings: {‘cluster_colors’ (list):
colors from tab20 cmap,
‘response’ (str): ‘Response’}
- embedding_plot_settings:
- {‘plot_color’: ‘red’
‘plot_title’: ‘2-D projected space’, ‘xlabel’: ‘Dimension 1’, ‘ylabel’: ‘Dimension 2’, ‘embedding’: {‘method’: “mds”,
‘params’: {‘random_state’: 42}}}
aimsim.tasks.compare_target_molecule module
- class aimsim.tasks.compare_target_molecule.CompareTargetMolecule(configs=None, **kwargs)
Bases:
Task
- __init__(configs=None, **kwargs)
Get configurations. :param configs: Parameters of the task :type configs: dict
- get_hits_dissimilar_to(molecule_set=None)
Get sorted list of num_hits Molecule in the Set most dissimilar to a query Molecule.This is defined as the sorted set (decreasing dissimilarity) of molecules with the highest (query_molecule, set_molecule) dissimilarity.
- Parameters
molecule_set (AIMSim.chemical_datastructures MoleculeSet) – MoleculeSet object used to calculate sorted similarities. Only used if self.similarities or self.sorted_similarities not set.
- Returns
- Ids of most similar molecules in decreasing
order of dissimilarity.
np.ndarray(float): Corresponding similarity values.
- Return type
np.ndarray(int)
- get_hits_similar_to(molecule_set=None)
Get sorted list of num_hits Molecule in the Set most similar to a query Molecule.This is defined as the sorted set (decreasing similarity) of molecules with the highest (query_molecule, set_molecule) similarity.
- Parameters
molecule_set (AIMSim.chemical_datastructures MoleculeSet) – MoleculeSet object used to calculate sorted similarities. Only used if self.similarities or self.sorted_similarities not set.
- Returns
- Ids of most similar
molecules in decreasing order of similarity.
np.ndarray(float): Corresponding similarity values.
- Return type
np.ndarray(int)
aimsim.tasks.extended_similarity_indices module
Calculates the Extended Similarity Indexes as shown in this table: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00505-3/tables/1 and described in:
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics - https://doi.org/10.1186/s13321-021-00505-3
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection - https://link.springer.com/article/10.1186/s13321-021-00504-4
Both gen_sim_dict and calculate_counters were provided by Ramon Alain Miranda Quintana, similar to that which is here: https://github.com/ramirandaq/MultipleComparisons
- class aimsim.tasks.extended_similarity_indices.ExtendedSimilarityIndices(configs)
Bases:
Task
- calculate_counters(c_total, n_fingerprints, c_threshold=None, w_factor='fraction')
Calculate 1-similarity, 0-similarity, and dissimilarity counters
- Parameters
c_total (np.ndarray) – Vector containing the sums of each column of the fingerprint matrix.
n_fingerprints (int) – Number of objects to be compared.
c_threshold ({None, 'dissimilar', int}) – Coincidence threshold. None : Default, c_threshold = n_fingerprints % 2 ‘dissimilar’ : c_threshold = ceil(n_fingerprints / 2) int : Integer number < n_fingerprints float : Real number in the (0 , 1) interval. Indicates the % of the total data that will serve as threshold.
w_factor ({"fraction", "power_n"}) –
Type of weight function that will be used. ‘fraction’ : similarity = d[k]/n
dissimilarity = 1 - (d[k] - n_fingerprints % 2)/n_fingerprints
- ’power_n’similarity = n**-(n_fingerprints - d[k])
dissimilarity = n**-(d[k] - n_fingerprints % 2)
other values : similarity = dissimilarity = 1
- Returns
counters – Dictionary with the weighted and non-weighted counters.
- Return type
dict
Notes
Please, cite the original papers on the n-ary indices: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00505-3 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00504-4
- gen_sim_dict(c_total, n_fingerprints, c_threshold=None, w_factor='fraction')
aimsim.tasks.identify_outliers module
Subclass of Task that implements an IsolationForest to identify outliers.
- class aimsim.tasks.identify_outliers.IdentifyOutliers(configs=None, **kwargs)
Bases:
Task
Subclass of Task to identify outliers via an IsolationForest.
- Parameters
Task (abstract class) – Parent abstract class.
- __init__(configs=None, **kwargs)
Get configurations. :param configs: Parameters of the task :type configs: dict
aimsim.tasks.measure_search module
Identify most appropriate choice of fingerprint and similarity measure by evaluating the response of the nearest and furthest neighbors. This is called measure choice for brevity (although both measure and features are chosen)
- class aimsim.tasks.measure_search.MeasureSearch(configs=None, **kwargs)
Bases:
Task
- __init__(configs=None, **kwargs)
Get configurations. :param configs: Parameters of the task :type configs: dict
- get_best_measure(molecule_set_configs, fingerprint_type=None, similarity_measure=None, subsample_subset_size=0.01, optim_algo='max_min', only_metric=False, show_top=0)
Get the best measure for quantity of interest.
- Parameters
molecule_set_configs (dict) – All configurations (except fingerprint_type and similarity_measure) needed to form the moleculeSet.
fingerprint_type (str) – Label to indicate which fingerprint to use. If supplied, fingerprint is fixed and optimization carried out over similarity measures. Use None to indicate that optimization needs to be carried out over fingerprints. Default is None.
similarity_measure (str) – Label to indicate which similarity measure to use. If supplied, similarity measure is fixed and optimization carried out over similarity measures. Use None to indicate that optimization needs to be carried out over fingerprints. Default is None.
subsample_subset_size (float) – Fraction of molecule_set to subsample. This is separate from the sample_ratio parameter used when creating a moleculeSet since it is recommended to have an more aggressive subsampling strategy for this task due to the combinatorial explosion of looking at multiple fingerprints and similarity measures. Default is 0.01.
optim_algo (str) –
Label to indicate the optimization algorithm chosen. Options are: ‘max’: The measure choice which maximizes correlation
of properties between nearest neighbors (most similar). This is the default.
- ’min’: The measure choice which minimizes the absolute value of
property correlation between furthest neighbors (most dissimilar).
- ’max_min’: The measure choice which maximizes correlation
of properties between nearest neighbors (most similar) and minimizes he absolute value of property correlation between furthest neighbors (most dissimilar). This is the default.
only_metric (bool) – If True only similarity measures satisfying the metricity property (i.e. can be converted to distance metrics) are selected.
- Returns
- Top performer with fields:
fingerprint_type (str): Label for fingerprint type
similarity_measure (str): Label for similarity measure nearest_neighbor_correlation (float): Correlation of property
of molecule and its nearest neighbor.
- furthest_neighbor_correlation (float): Correlation of property
of molecule and its furthest neighbor.
- score_ (float): Overall score based on optimization strategy.
More is better.
- Return type
(NamedTuple)
aimsim.tasks.see_property_variation_with_similarity module
Task to visualize property similarity over a dataset.
- class aimsim.tasks.see_property_variation_with_similarity.SeePropertyVariationWithSimilarity(configs=None, **kwargs)
Bases:
Task
- __init__(configs=None, **kwargs)
Get configurations. :param configs: Parameters of the task :type configs: dict
- get_property_correlations_in_most_dissimilar(molecule_set)
Get the correlation between the property of molecules and their furthest (most dissimilar) neighbors :param molecule_set: Molecules object of the molecule database. :type molecule_set: AIMSim.chemical_datastructures MoleculeSet
- Returns
Correlation between properties.
- Return type
(float)
- get_property_correlations_in_most_similar(molecule_set)
Get the correlation between the property of molecules and their nearest (most similar) neighbors :param molecule_set: Molecules object of the molecule database. :type molecule_set: AIMSim.chemical_datastructures MoleculeSet
- Returns
Correlation between properties.
- Return type
(float)
aimsim.tasks.task module
Abstract task class.
aimsim.tasks.task_manager module
Class to call al tasks in sequence.
aimsim.tasks.visualize_dataset module
Create similarity plots for the dataset.
- class aimsim.tasks.visualize_dataset.VisualizeDataset(configs=None, **kwargs)
Bases:
Task
- __init__(configs=None, **kwargs)
Constructor for the VisualizeDataset class.
- Parameters
configs (dict) – Dictionary of configurations. Default is None.
**kwargs – Keyword arguments to modify configs fields.
Notes
The configuration structure with default values are: heatmap_plot_settings: {} # pass through keywords for
# aimsim.utils.plotting_scripts. # plot_heatmap
- similarity_plot_settings: {} # pass through keywords for
# aimsim.utils.plotting_scripts. #plot_density
- embedding_plot_settings:
- {‘plot_color’: ‘red’
‘plot_title’: ‘2-D projected space’, ‘xlabel’: ‘Dimension 1’, ‘ylabel’: ‘Dimension 2’, ‘embedding’: {‘method’: “mds”,
‘params’: {‘random_state’: 42}}}