Strumenti Utente

Strumenti Sito


calcoloscientifico:userguide:boltz2

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
calcoloscientifico:userguide:boltz2 [17/12/2025 18:06] federico.prostcalcoloscientifico:userguide:boltz2 [17/12/2025 21:02] (versione attuale) fabio.spataro
Linea 1: Linea 1:
 ===== Boltz2 ===== ===== Boltz2 =====
  
-[[https://build.nvidia.com/mit/boltz2|Boltz2]] +  * [[https://build.nvidia.com/mit/boltz2|Boltz2]] 
-[[https://boltz.bio/boltz2|Boltz2]]+  [[https://boltz.bio/boltz2|Introducing Boltz-2]]
  
 === Boltz2 Apptainer File Image === === Boltz2 Apptainer File Image ===
Linea 14: Linea 14:
 === Boltz2 python script === === Boltz2 python script ===
  
-<code>+Download the ''Boltz2'' script file ''boltz2.py'' and save it: 
 + 
 +<code python boltz2.py>
 import requests import requests
 import json import json
Linea 86: Linea 88:
 </code> </code>
  
-=== Alphafold3 GPU job ===+=== Boltz2 GPU job ===
  
-Download the Alphafold3 input file ''fold_input.json'' and save it in ''af_input'' folder:+Script ''slurm-boltz2-gpu-a100_40g.sh'' to run ''boltz2'' on 1 node with 1 A100 (40 GB) GPU (8 tasks per node):
  
-<code json fold_input.json> +<code bash slurm-boltz2-gpu-a100_40g.sh>
-+
-  "name": "2PV7", +
-  "sequences":+
-    { +
-      "protein":+
-        "id": ["A", "B"], +
-        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG" +
-      } +
-    } +
-  ], +
-  "modelSeeds": [1], +
-  "dialect": "alphafold3", +
-  "version":+
-+
-</code> +
- +
-Script ''slurm-alphafold-gpu-a100_40g.sh'' to run ''alphafold'' on 1 node with 1 A100 (40 GB) GPU (8 tasks per node): +
- +
-<code bash slurm-alphafold-gpu-a100_40g.sh>+
 #!/bin/bash --login #!/bin/bash --login
-#SBATCH --job-name=alphafold+#SBATCH --job-name=boltz2
 #SBATCH --output=af_output/%x.d%j/%x.o%j #SBATCH --output=af_output/%x.d%j/%x.o%j
 #SBATCH --error=af_output/%x.d%j/%x.e%j #SBATCH --error=af_output/%x.d%j/%x.e%j
Linea 118: Linea 101:
 #SBATCH --cpus-per-task=8 #SBATCH --cpus-per-task=8
 #SBATCH --time=0-02:00:00 #SBATCH --time=0-02:00:00
-#SBATCH --mem=10G+#SBATCH --mem=40G
 #SBATCH --partition=gpu #SBATCH --partition=gpu
 #SBATCH --qos=gpu #SBATCH --qos=gpu
Linea 129: Linea 112:
  
 module load apptainer module load apptainer
-module load alphafold/3.0.1 +module load boltz2
- +
-test -n "$ALPHAFOLD_CONTAINER" || exit 1 +
- +
-set -x +
- +
-ALPHAFOLD_JSON_INPUT_FILE='fold_input.json' +
-ALPHAFOLD_INPUT_DIR="$PWD/af_input" +
-ALPHAFOLD_OUTPUT_DIR="$PWD/af_output/${SLURM_JOB_NAME}.d${SLURM_JOB_ID}" +
- +
-mkdir -p "$ALPHAFOLD_OUTPUT_DIR" +
- +
-apptainer exec \ +
-    --nv \ +
-    --bind "$ALPHAFOLD_INPUT_DIR:/root/af_input"+
-    --bind "$ALPHAFOLD_OUTPUT_DIR:/root/af_output"+
-    "$ALPHAFOLD_CONTAINER"+
-    python /app/alphafold/run_alphafold.py \ +
-    --json_path="/root/af_input/$ALPHAFOLD_JSON_INPUT_FILE"+
-    --model_dir=/root/models \ +
-    --db_dir=/root/public_databases \ +
-    --db_dir=/root/public_databases_fallback \ +
-    --output_dir=/root/af_output +
-</code> +
- +
-The processing result will be saved in the ''af output'' folder. +
- +
-Scripts for specific NVIDIA GPU models to run ''alphafold'' on 1 node with 1 GPU (8 tasks per node): +
- +
-^ GPU  ^ Path  ^ +
-| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-p100|P100 (12 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-p100.sh'' +
-| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-v100|V100 (32 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu_guest-v100_hylab.sh'' +
-| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-a100-40-gb|A100 (40 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_40g.sh'' +
-| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#accelerator-hardware-requirements|A100 (80 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_80g.sh'' +
- +
-=== Documentation ===  +
- +
-How to get a list of all flags of ''run_alphafold.py'' (version 3.0.1): +
- +
-<code bash> +
-module load apptainer +
-module load alphafold/3.0.1 +
- +
-apptainer exec "$ALPHAFOLD_CONTAINER" python /app/alphafold/run_alphafold.py --helpfull +
-</code> +
- +
-List of all flags of ''run_alphafold.py'' (version 3.0.1): +
- +
-<code> +
-AlphaFold 3 structure prediction script. +
- +
-AlphaFold 3 source code is licensed under CC BY-NC-SA 4.0. To view a copy of +
-this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/ +
- +
-To request access to the AlphaFold 3 model parameters, follow the process set +
-out at https://github.com/google-deepmind/alphafold3. You may only use these +
-if received directly from Google. Use is subject to terms of use available at +
-https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md +
- +
-flags: +
- +
-run_alphafold.py: +
-  --buckets: Strictly increasing order of token sizes for which to cache compilations. For any input with more tokens than the largest bucket size, a new +
-    bucket is created for exactly that number of tokens. +
-    (default: '256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120'+
-    (a comma separated list) +
-  --conformer_max_iterations: Optional override for maximum number of iterations to run for RDKit conformer search. +
-    (an integer) +
-  --db_dir: Path to the directory containing the databases. Can be specified multiple times to search multiple directories in order.; +
-    repeat this option to specify a list of values +
-    (default: "['/hpc/home/sti_calcolo/public_databases']"+
-  --flash_attention_implementation: <triton|cudnn|xla>: Flash attention implementation to use. 'triton' and 'cudnn' uses a Triton and cuDNN flash attention +
-    implementation, respectively. The Triton kernel is fastest and has been tested more thoroughly. The Triton and cuDNN kernels require Ampere GPUs or later. +
-    'xla' uses an XLA attention implementation (no flash attention) and is portable across GPU devices. +
-    (default: 'triton'+
-  --gpu_device: Optional override for the GPU device to use for inference. Defaults to the 1st GPU on the system. Useful on multi-GPU systems to pin each run +
-    to a specific GPU. +
-    (default: '0'+
-    (an integer) +
-  --hmmalign_binary_path: Path to the Hmmalign binary. +
-    (default: '/hmmer/bin/hmmalign'+
-  --hmmbuild_binary_path: Path to the Hmmbuild binary. +
-    (default: '/hmmer/bin/hmmbuild'+
-  --hmmsearch_binary_path: Path to the Hmmsearch binary. +
-    (default: '/hmmer/bin/hmmsearch'+
-  --input_dir: Path to the directory containing input JSON files. +
-  --jackhmmer_binary_path: Path to the Jackhmmer binary. +
-    (default: '/hmmer/bin/jackhmmer'+
-  --jackhmmer_n_cpu: Number of CPUs to use for Jackhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup. +
-    (default: '8'+
-    (an integer) +
-  --jax_compilation_cache_dir: Path to a directory for the JAX compilation cache. +
-  --json_path: Path to the input JSON file. +
-  --max_template_date: Maximum template release date to consider. Format: YYYY-MM-DD. All templates released after this date will be ignored. +
-    (default: '2021-09-30'+
-  --mgnify_database_path: Mgnify database path, used for protein MSA search. +
-    (default: '${DB_DIR}/mgy_clusters_2022_05.fa'+
-  --model_dir: Path to the model to use for inference. +
-    (default: '/hpc/home/sti_calcolo/models'+
-  --nhmmer_binary_path: Path to the Nhmmer binary. +
-    (default: '/hmmer/bin/nhmmer'+
-  --nhmmer_n_cpu: Number of CPUs to use for Nhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup. +
-    (default: '8'+
-    (an integer) +
-  --ntrna_database_path: NT-RNA database path, used for RNA MSA search. +
-    (default: '${DB_DIR}/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta'+
-  --num_diffusion_samples: Number of diffusion samples to generate. +
-    (default: '5'+
-    (a positive integer) +
-  --num_recycles: Number of recycles to use during inference. +
-    (default: '10'+
-    (a positive integer) +
-  --num_seeds: Number of seeds to use for inference. If set, only a single seed must be provided in the input JSON. AlphaFold 3 will then generate random +
-    seeds in sequence, starting from the single seed specified in the input JSON. The full input JSON produced by AlphaFold 3 will include the generated +
-    random seeds. If not set, AlphaFold 3 will use the seeds as provided in the input JSON. +
-    (a positive integer) +
-  --output_dir: Path to a directory where the results will be saved. +
-  --pdb_database_path: PDB database directory with mmCIF files path, used for template search. +
-    (default: '${DB_DIR}/mmcif_files'+
-  --rfam_database_path: Rfam database path, used for RNA MSA search. +
-    (default: '${DB_DIR}/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta'+
-  --rna_central_database_path: RNAcentral database path, used for RNA MSA search. +
-    (default: '${DB_DIR}/rnacentral_active_seq_id_90_cov_80_linclust.fasta'+
-  --[no]run_data_pipeline: Whether to run the data pipeline on the fold inputs. +
-    (default: 'true'+
-  --[no]run_inference: Whether to run inference on the fold inputs. +
-    (default: 'true'+
-  --[no]save_embeddings: Whether to save the final trunk single and pair embeddings in the output. +
-    (default: 'false'+
-  --seqres_database_path: PDB sequence database path, used for template search. +
-    (default: '${DB_DIR}/pdb_seqres_2022_09_28.fasta'+
-  --small_bfd_database_path: Small BFD database path, used for protein MSA search. +
-    (default: '${DB_DIR}/bfd-first_non_consensus_sequences.fasta'+
-  --uniprot_cluster_annot_database_path: UniProt database path, used for protein paired MSA search. +
-    (default: '${DB_DIR}/uniprot_all_2021_04.fa'+
-  --uniref90_database_path: UniRef90 database path, used for MSA search. The MSA obtained by searching it is used to construct the profile for template +
-    search. +
-    (default: '${DB_DIR}/uniref90_2022_05.fa')+
  
-absl.app: +export NGC_API_KEY="INSERT API KEY HERE" 
-  -?,--[no]help: show this help +export NIM_HTTP_API_PORT=$(hpc-find-free-tcp4-port 2>/dev/null || echo 8000
-    (default: 'false'+export TMPDIR=$HOME/nim-cache 
-  --[no]helpfull: show full help +export APPTAINERENV_NGC_API_KEY=$NGC_API_KEY 
-    (default: 'false'+export APPTAINERENV_LOCAL_NIM_CACHE=/nim-cache 
-  --[no]helpshort: show this help +export APPTAINERENV_NIM_CACHE=/nim-cache 
-    (default: 'false'+export APPTAINERENV_NIM_WORKSPACE=/nim-cache/workspace 
-  --[no]helpxml: like --helpfull, but generates XML output +export APPTAINERENV_NIM_HTTP_API_PORT=$NIM_HTTP_API_PORT
-    (default: 'false') +
-  --[no]only_check_args: Set to true to validate args and exit. +
-    (default: 'false'+
-  --[no]pdb: Alias for --pdb_post_mortem. +
-    (default: 'false'+
-  --[no]pdb_post_mortem: Set to true to handle uncaught exceptions with PDB post mortem. +
-    (default: 'false'+
-  --profile_file: Dump profile information to a file (for python -m pstats). Implies --run_with_profiling. +
-  --[no]run_with_pdb: Set to true for PDB debug mode +
-    (default: 'false'+
-  --[no]run_with_profiling: Set to true for profiling the script. Execution will be slower, and the output format might change over time. +
-    (default: 'false'+
-  --[no]use_cprofile_for_profiling: Use cProfile instead of the profile module for profiling. This has no effect unless --run_with_profiling is set. +
-    (default: 'true')+
  
-absl.logging: +mkdir -p $HOME/nim-cache/workspace
-  --[no]alsologtostderr: also log to stderr? +
-    (default: 'false'+
-  --log_dir: directory to write logfiles into +
-    (default: ''+
-  --logger_levels: Specify log level of loggers. The format is a CSV list of `name:level`. Where `name` is the logger name used with `logging.getLogger()`, +
-    and `level` is a level name  (INFO, DEBUG, etc). e.g. `myapp.foo:INFO,other.logger:DEBUG` +
-    (default: ''+
-  --[no]logtostderr: Should only log to stderr? +
-    (default: 'false'+
-  --[no]showprefixforinfo: If False, do not prepend prefix to info messages when it's logged to stderr, --verbosity is set to INFO level, and python logging +
-    is used. +
-    (default: 'true'+
-  --stderrthreshold: log messages at this level, or more severe, to stderr in addition to the logfile.  Possible values are 'debug', 'info', 'warning', +
-    'error', and 'fatal' Obsoletes --alsologtostderr. Using --alsologtostderr cancels the effect of this flag. Please also note that this flag is subject to +
-    --verbosity and requires logfile not be stderr. +
-    (default: 'fatal'+
-  -v,--verbosity: Logging verbosity level. Messages logged at this level or lower will be included. Set to 1 for debug logging. If the flag was not set or +
-    supplied, the value will be changed from the default of -1 (warning) to 0 (info) after flags are parsed. +
-    (default: '-1'+
-    (an integer)+
  
-absl.testing.absltest: +apptainer instance run \ 
-  --test_random_seed: Random seed for testing. Some test frameworks may change the default value of this flag between runs, so it is not appropriate for +  --nv \ 
-    seeding probabilistic tests. +  --bind $HOME/nim-cache:/nim-cache \ 
-    (default: '301'+  --bind $HOME/nim-cache:/opt/nim/.cache \ 
-    (an integer) +  --bind $HOME/nim-cache/workspace:/opt/nim/workspace \ 
-  --test_randomize_ordering_seedIf positive, use this as a seed to randomize the execution order for test cases. If "random", pick a random seed to use. If +  $BOLTZ2_CONTAINER $SLURM_JOB_ID
-    0 or not set, do not randomize test case execution order. This flag also overrides the TEST_RANDOMIZE_ORDERING_SEED environment variable. +
-    (default: '') +
-  --test_srcdirRoot of directory tree where source files live +
-    (default: '') +
-  --test_tmpdir: Directory for temporary testing files +
-    (default'/tmp/absl_testing') +
-  --xml_output_file: File to store XML test results +
-    (default: '')+
  
-chex._src.fake: +apptainer instance list
-  --[no]chex_assert_multiple_cpu_devices: Whether to fail if a number of CPU devices is less than 2. +
-    (default: 'false'+
-  --chex_n_cpu_devices: Number of CPU threads to use as devices in tests. +
-    (default: '1'+
-    (an integer)+
  
-chex._src.variants: +# Waiting for the boltz2 server to start 
-  --[no]chex_skip_pmap_variant_if_single_deviceWhether to skip pmap variant if only one device is available+until curl -sSf http://localhost:$NIM_HTTP_API_PORT/health/ready; do 
-    (default: 'true')+    echo "Server not ready, waiting 10 seconds..." 
 +    sleep 10 
 +done 
 +echo "Boltz2 server ready!"
  
-absl.flags: +# Testing boltz2 
-  --flagfile: Insert flag definitions from the given file into the command line+module load miniconda3 
-    (default: ''+source "$CONDA_PREFIX/etc/profile.d/conda.sh" 
-  --undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name. +conda activate boltz2-env 
-    IMPORTANT: flags in this list that have arguments MUST use the --flag=value format. +python $HOME/boltz2.py
-    (default: '')+
 </code> </code>
  
calcoloscientifico/userguide/boltz2.1765991183.txt.gz · Ultima modifica: da federico.prost

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki