How DNAcrypt-AI works

Quick Overview

DNAcrypt-AI generates a random password (alphanumeric + symbols) or cryptographic key (alphanumeric only) based on a user-defined length. The generated password or key is encrypted by mapping it to coordinates of randomly sampled, variable-length DNA sequences from the human genome, referenced to the hg19 and hg38 assemblies.

To decrypt the information, DNAcrypt-AI reconstructs the corresponding DNA sequences using a high-throughput sequence reconstitution pipeline (FAS2rDNA) and interprets them with a sequence-informed machine learning model (Covary).

Supported Encodings

DNAcrypt-AI supports the following character encodings:

  • Alphanumeric (Passwords & Keys):
    a–z, A–Z, 0–9

  • Symbols (Passwords only):
    ! @ # $ % ^ & * ( ) - _ + = ”

Human Genome Assemblies

DNAcrypt-AI uses the hg19 and hg38 human genome assemblies as reference spaces for encryption and decryption. These assemblies provide the biological sequence coordinates used to store and retrieve encrypted information. Work is currently underway to expand support to multi-species genome assemblies, which will further increase the genome vocabulary and entropy of DNAcrypt-AI. Suggestions and contributions to improve this capability are welcome.

Encrypting a password or key

DNAcrypt-AI is designed to be intuitive and easy to use, as a Jupyter notebook in Google Colab. To generate and encrypt a password or key, users follows the procedure below:

  1. Create a user configuration

    • Set char_count to define the desired length

    • Select a Use case:

      • Password: alphanumeric + symbols

      • Encryption: alphanumeric only

  2. Run DNAcrypt-AI

    • Select RuntimeRun all

  3. Download and store your encrypted data

    • The following files will be generated:

      • DNAcrypt_metadata.json

      • kmer_dict.json (only if a custom k-mer dictionary is used)

The enrypted files are automatically downloaded. If your browser blocks downloads, you can retrieve them through the File browser manually from:

  • /content/

  • /content/DNAcrypt/outputs/

Decrypting a password or key

Decrypting your data is straightforward:

  1. Modify the user configuration

    • Set the Use case to Decryption

  2. Run DNAcrypt-AI

    • Select RuntimeRun all

  3. Upload your encrypted data

    • Always upload DNAcrypt_metadata.json

    • Upload kmer_dict.json only if a custom k-mer dictionary was used during encryption

  4. Wait for decryption to finish

    • Decryption typically completes within 15 minutes or less, depending on the size of the genome vocabulary used

Using a custom kmer_dict

Custom kmer_dict (kmer dictionary) allows users to vary the kmer-to-charcter encodings, providing them with the ability to customized their character dictionaries. This feature is valuable for users to serve as second layer of encryption, refactor their compromised data to generate new sequence, or customize their library for specific needs.

A. During encryption

  1. In the user configuration:

    • Set char_count

    • Select the appropriate use case (Password or Encryption)

    • Choose Custom under K-mer Dictionary

  2. Run DNAcrypt-AI as usual

  3. Store the following files for future recovery:

    • DNAcrypt_metadata.json

    • kmer_dict.json

B. During decryption

  1. Select Decryption as the use case

  2. Run DNAcrypt-AI

  3. Upload both:

    • DNAcrypt_metadata.json

    • kmer_dict.json

Handling and storing encrypted data

Your encrypted files must not be modified. Any loss, alteration, or unintended addition may prevent successful decryption. Tampering with either DNAcrypt_metadata.json or kmer_dict.json will affect recovery of your password or key.

File names may be changed, but file contents must remain intact.

Encrypted data is lightweight (typically under 200 KB) and can be stored in several ways:

  • Printed copy: Highly private and offline, but requires re-encoding into digital form before use

  • Digital copy: Immediately usable, but anyone with access to the files can attempt decryption