DECIMER package#

Submodules#

DECIMER.decimer module#

DECIMER.decimer.detokenize_output(predicted_array)[source]#
Return type:

2024

This function takes the predited tokens from the DECIMER model and returns the decoded SMILES string.

Args:

predicted_array (int): Predicted tokens from DECIMER

Returns:

(str): SMILES representation of the molecule

DECIMER.decimer.detokenize_output_add_confidence(predicted_array, confidence_array)[source]#
Return type:

2024

This function takes the predicted array of tokens as well as the confidence values returned by the Transformer Decoder and returns a list of tuples that contain each token of the predicted SMILES string and the confidence value.

Args:

predicted_array (tf.Tensor): Transformer Decoder output array (predicted tokens)

Returns:

str: SMILES string

DECIMER.decimer.get_models(model_urls)[source]#

Download and load models from the provided URLs.

This function downloads models from the provided URLs to a default location, then loads tokenizers and TensorFlow saved models.

Args:

model_urls (dict): A dictionary containing model names as keys and their corresponding URLs as values.

Returns:
tuple: A tuple containing loaded tokenizer and TensorFlow saved models.
  • tokenizer (object): Tokenizer for DECIMER model.

  • DECIMER_V2 (tf.saved_model): TensorFlow saved model for DECIMER.

  • DECIMER_Hand_drawn (tf.saved_model): TensorFlow saved model for DECIMER HandDrawn.

DECIMER.decimer.main()[source]#

This function take the path of the image as user input and returns the predicted SMILES as output in CLI.

Agrs:

str: image_path

Returns:

str: predicted SMILES

DECIMER.decimer.predict_SMILES(image_path, confidence=False, hand_drawn=False)[source]#
Return type:

2024

Predicts SMILES representation of a molecule depicted in the given image.

Args:

image_path (str): Path of chemical structure depiction image confidence (bool): Flag to indicate whether to return confidence values along with SMILES prediction hand_drawn (bool): Flag to indicate whether the molecule in the image is hand-drawn

Returns:

str: SMILES representation of the molecule in the input image, optionally with confidence values

DECIMER.config module#

class DECIMER.config.Config[source]#

Bases: object

Configuration class.

initialize_encoder_config(image_embedding_dim, preprocessing_fn, backbone_fn, image_shape, do_permute=False, pretrained_weights=None)[source]#

This functions initializes the Efficient-Net V2 encoder with user defined configurations.

Args:

image_embedding_dim (int): Embedding dimention of the input image preprocessing_fn (method): Efficient Net preprocessing function for input image backbone_fn (method): Calls Efficient-Net V2 as backbone for encoder image_shape (int): Shape of the input image do_permute (bool, optional): . Defaults to False. pretrained_weights (keras weights, optional): Use pretrainined efficient net weights or not. Defaults to None.

initialize_lr_config(warm_steps, n_epochs)[source]#

This function sets the configuration to initialize learning rate.

Args:

warm_steps (int): Number of steps The learning rate is increased n_epochs (int): Number of epochs

initialize_transformer_config(vocab_len, max_len, n_transformer_layers, transformer_d_dff, transformer_n_heads, image_embedding_dim, rate=0.1)[source]#

This functions initializes the Transformer model as decoder with user defined configurations.

Args:

vocab_len (int): Total number of words in the input vocabulary max_len (int): Maximum length of the string found on the training dataset n_transformer_layers (int): Number of layers present in the transformer model transformer_d_dff (int): Transformer feed forward upwards projection size transformer_n_heads (int): Number of heads present in the transformer model image_embedding_dim (int): Total number of dimension the image gets embeddeded dropout_rate (float, optional): Fraction of the input units to drop. Defaults to 0.1.

class DECIMER.config.CustomSchedule(d_model, warmup_steps=4000)[source]#

Bases: LearningRateSchedule

Custom schedule for learning rate used during training.

Args:

tf (_type_): keras learning rate schedule

DECIMER.config.HEIF_to_pillow(image_path)[source]#

Converts Appleā€™s HEIF format to useful pillow object Returns: image_path (str): path of input image Returns: PIL.Image

DECIMER.config.PIL_im_to_BytesIO(image)[source]#

Convert pillow image to io.BytesIO object Args: PIL.Image Returns: io.BytesIO object with the image data

DECIMER.config.central_square_image(image)[source]#

This function takes a Pillow Image object and will add white padding so that the image has a square shape with the width/height of the longest side of the original image.

Args: PIL.Image Returns: PIL.Image

DECIMER.config.decode_image(image_path)[source]#

Loads an image and preprocesses the input image in several steps to get the image ready for DECIMER input.

Args:

image_path (str): path of input image

Returns:

Processed image

DECIMER.config.delete_empty_borders(image)[source]#

This function takes a Pillow Image object, converts it to grayscale and deletes white space at the borders.

Args: PIL.Image Returns: PIL.Image

DECIMER.config.download_trained_weights(model_url, model_path, verbose=1)[source]#

This function downloads the trained models and tokenizers to a default location. After downloading the zipped file the function unzips the file automatically. If the model exists on the default location this function will not work.

Args:

model_url (str): trained model url for downloading. model_path (str): model default path to download.

Returns:

path (str): downloaded model.

DECIMER.config.get_bnw_image(image)[source]#

converts images to black and white Args: PIL.Image Returns: PIL.Image

DECIMER.config.get_resize(image)[source]#

This function used to decide how to resize a given image without losing much information.

Args: PIL.Image Returns: PIL.Image

DECIMER.config.increase_brightness(image)[source]#

This function adjusts the brightness of the given image.

Args: PIL.Image Returns: PIL.Image

DECIMER.config.increase_contrast(image)[source]#

This function increases the contrast of an image input.

Args: PIL.Image Returns: PIL.Image

DECIMER.config.prepare_models(encoder_config, transformer_config, replica_batch_size, verbose=0)[source]#

This function is used to initiate the Encoder and the Transformer with appropriate configs set by the user. After initiating the models this function returns the Encoder,Transformer and the optimizer.

Args:

encoder_config ([type]): Encoder configuration set by user in the config class. transformer_config ([type]): Transformer configuration set by user in the config class. replica_batch_size ([type]): Per replica batch size set by user(during distributed training). verbose (int, optional): Defaults to 0.

Returns:

[type]: Optimizer, Encoder model and the Transformer

DECIMER.config.remove_transparent(image_path)[source]#

Removes the transparent layer from a PNG image with an alpha channel Args: image_path (str): path of input image Returns: PIL.Image

DECIMER.config.resize_byratio(image)[source]#

This function takes a Pillow Image object and will resize the image by 512 x 512 To upscale or to downscale the image LANCZOS resampling method is used.

with the new pillow version the antialias is turned on when using LANCZOS. Args: PIL.Image Returns: PIL.Image