Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

--- calcoloscientifico:userguide:alphafold [24/01/2025 11:02] – fabio.spataro
+++ calcoloscientifico:userguide:alphafold [06/02/2025 19:46] (versione attuale) – fabio.spataro
@@ Linea 13: / Linea 13: @@
 <code>
 /hpc/share/containers/apptainer/alphafold/3.0.1/alphafold-3.0.1.sif
+</code>
+=== Alphafold3 GPU demo ===
+<code>
+mkdir -p demo/af_input
+cp -p /hpc/share/containers/apptainer/alphafold/3/af_input/fold_input.json demo/af_input
+cp -p /hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_40g.sh demo
+cd demo
+sbatch slurm-alphafold-gpu-a100_40g.sh
 </code>
@@ Linea 36: / Linea 46: @@
 </code>
-Script ''slurm-alphafold.sh'' to run ''alphafold'' on 1 node with 1 GPU (8 tasks per node):
+Script ''slurm-alphafold-gpu-a100_40g.sh'' to run ''alphafold'' on 1 node with 1 A100 (40 GB) GPU (8 tasks per node):
-<code bash slurm-alphafold.sh>
+<code bash slurm-alphafold-gpu-a100_40g.sh>
 #!/bin/bash --login
 #SBATCH --job-name=alphafold
@@ Linea 62: / Linea 72: @@
 test -n "$ALPHAFOLD_CONTAINER" || exit 1
-ALPHAFOLD_N_CPU=$SLURM_CPUS_PER_TASK
+set -x
+ALPHAFOLD_JSON_INPUT_FILE='fold_input.json'
 ALPHAFOLD_INPUT_DIR="$PWD/af_input"
 ALPHAFOLD_OUTPUT_DIR="$PWD/af_output/${SLURM_JOB_NAME}.d${SLURM_JOB_ID}"
@@ Linea 69: / Linea 81: @@
 apptainer exec \
-    --bind '/opt/hpc/system/nvidia/driver:/usr/local/nvidia/bin' \
+    --nv \
-    --bind '/opt/hpc/system/nvidia/driver:/usr/local/nvidia/lib' \
     --bind "$ALPHAFOLD_INPUT_DIR:/root/af_input" \
     --bind "$ALPHAFOLD_OUTPUT_DIR:/root/af_output" \
-    --bind "$ALPHAFOLD_MODEL_DIR:/root/models" \
-    --bind "$ALPHAFOLD_DB_DIR:/root/public_databases" \
     "$ALPHAFOLD_CONTAINER" \
     python /app/alphafold/run_alphafold.py \
-    --json_path=/root/af_input/fold_input.json \
+    --json_path="/root/af_input/$ALPHAFOLD_JSON_INPUT_FILE" \
     --model_dir=/root/models \
     --db_dir=/root/public_databases \
-    --pdb_database_path=/root/public_databases/mmcif_files \
+    --db_dir=/root/public_databases_fallback \
-    --output_dir=/root/af_output \
+    --output_dir=/root/af_output
-    --jackhmmer_n_cpu=$ALPHAFOLD_N_CPU \
-    --nhmmer_n_cpu=$ALPHAFOLD_N_CPU
 </code>
 The processing result will be saved in the ''af output'' folder.
+Scripts for specific NVIDIA GPU models to run ''alphafold'' on 1 node with 1 GPU (8 tasks per node):
+^ GPU  ^ Path  ^
+| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-p100|P100 (12 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-p100.sh''  |
+| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-v100|V100 (32 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu_guest-v100_hylab.sh''  |
+| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-a100-40-gb|A100 (40 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_40g.sh''  |
+| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#accelerator-hardware-requirements|A100 (80 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_80g.sh''  |
+=== Documentation ===
+How to get a list of all flags of ''run_alphafold.py'' (version 3.0.1):
+<code bash>
+module load apptainer
+module load alphafold/3.0.1
+apptainer exec "$ALPHAFOLD_CONTAINER" python /app/alphafold/run_alphafold.py --helpfull
+</code>
+List of all flags of ''run_alphafold.py'' (version 3.0.1):
+<code>
+AlphaFold 3 structure prediction script.
+AlphaFold 3 source code is licensed under CC BY-NC-SA 4.0. To view a copy of
+this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
+To request access to the AlphaFold 3 model parameters, follow the process set
+out at https://github.com/google-deepmind/alphafold3. You may only use these
+if received directly from Google. Use is subject to terms of use available at
+https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md
+flags:
+run_alphafold.py:
+  --buckets: Strictly increasing order of token sizes for which to cache compilations. For any input with more tokens than the largest bucket size, a new
+    bucket is created for exactly that number of tokens.
+    (default: '256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120')
+    (a comma separated list)
+  --conformer_max_iterations: Optional override for maximum number of iterations to run for RDKit conformer search.
+    (an integer)
+  --db_dir: Path to the directory containing the databases. Can be specified multiple times to search multiple directories in order.;
+    repeat this option to specify a list of values
+    (default: "['/hpc/home/sti_calcolo/public_databases']")
+  --flash_attention_implementation: <triton|cudnn|xla>: Flash attention implementation to use. 'triton' and 'cudnn' uses a Triton and cuDNN flash attention
+    implementation, respectively. The Triton kernel is fastest and has been tested more thoroughly. The Triton and cuDNN kernels require Ampere GPUs or later.
+    'xla' uses an XLA attention implementation (no flash attention) and is portable across GPU devices.
+    (default: 'triton')
+  --gpu_device: Optional override for the GPU device to use for inference. Defaults to the 1st GPU on the system. Useful on multi-GPU systems to pin each run
+    to a specific GPU.
+    (default: '0')
+    (an integer)
+  --hmmalign_binary_path: Path to the Hmmalign binary.
+    (default: '/hmmer/bin/hmmalign')
+  --hmmbuild_binary_path: Path to the Hmmbuild binary.
+    (default: '/hmmer/bin/hmmbuild')
+  --hmmsearch_binary_path: Path to the Hmmsearch binary.
+    (default: '/hmmer/bin/hmmsearch')
+  --input_dir: Path to the directory containing input JSON files.
+  --jackhmmer_binary_path: Path to the Jackhmmer binary.
+    (default: '/hmmer/bin/jackhmmer')
+  --jackhmmer_n_cpu: Number of CPUs to use for Jackhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup.
+    (default: '8')
+    (an integer)
+  --jax_compilation_cache_dir: Path to a directory for the JAX compilation cache.
+  --json_path: Path to the input JSON file.
+  --max_template_date: Maximum template release date to consider. Format: YYYY-MM-DD. All templates released after this date will be ignored.
+    (default: '2021-09-30')
+  --mgnify_database_path: Mgnify database path, used for protein MSA search.
+    (default: '${DB_DIR}/mgy_clusters_2022_05.fa')
+  --model_dir: Path to the model to use for inference.
+    (default: '/hpc/home/sti_calcolo/models')
+  --nhmmer_binary_path: Path to the Nhmmer binary.
+    (default: '/hmmer/bin/nhmmer')
+  --nhmmer_n_cpu: Number of CPUs to use for Nhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup.
+    (default: '8')
+    (an integer)
+  --ntrna_database_path: NT-RNA database path, used for RNA MSA search.
+    (default: '${DB_DIR}/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta')
+  --num_diffusion_samples: Number of diffusion samples to generate.
+    (default: '5')
+    (a positive integer)
+  --num_recycles: Number of recycles to use during inference.
+    (default: '10')
+    (a positive integer)
+  --num_seeds: Number of seeds to use for inference. If set, only a single seed must be provided in the input JSON. AlphaFold 3 will then generate random
+    seeds in sequence, starting from the single seed specified in the input JSON. The full input JSON produced by AlphaFold 3 will include the generated
+    random seeds. If not set, AlphaFold 3 will use the seeds as provided in the input JSON.
+    (a positive integer)
+  --output_dir: Path to a directory where the results will be saved.
+  --pdb_database_path: PDB database directory with mmCIF files path, used for template search.
+    (default: '${DB_DIR}/mmcif_files')
+  --rfam_database_path: Rfam database path, used for RNA MSA search.
+    (default: '${DB_DIR}/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta')
+  --rna_central_database_path: RNAcentral database path, used for RNA MSA search.
+    (default: '${DB_DIR}/rnacentral_active_seq_id_90_cov_80_linclust.fasta')
+  --[no]run_data_pipeline: Whether to run the data pipeline on the fold inputs.
+    (default: 'true')
+  --[no]run_inference: Whether to run inference on the fold inputs.
+    (default: 'true')
+  --[no]save_embeddings: Whether to save the final trunk single and pair embeddings in the output.
+    (default: 'false')
+  --seqres_database_path: PDB sequence database path, used for template search.
+    (default: '${DB_DIR}/pdb_seqres_2022_09_28.fasta')
+  --small_bfd_database_path: Small BFD database path, used for protein MSA search.
+    (default: '${DB_DIR}/bfd-first_non_consensus_sequences.fasta')
+  --uniprot_cluster_annot_database_path: UniProt database path, used for protein paired MSA search.
+    (default: '${DB_DIR}/uniprot_all_2021_04.fa')
+  --uniref90_database_path: UniRef90 database path, used for MSA search. The MSA obtained by searching it is used to construct the profile for template
+    search.
+    (default: '${DB_DIR}/uniref90_2022_05.fa')
+absl.app:
+  -?,--[no]help: show this help
+    (default: 'false')
+  --[no]helpfull: show full help
+    (default: 'false')
+  --[no]helpshort: show this help
+    (default: 'false')
+  --[no]helpxml: like --helpfull, but generates XML output
+    (default: 'false')
+  --[no]only_check_args: Set to true to validate args and exit.
+    (default: 'false')
+  --[no]pdb: Alias for --pdb_post_mortem.
+    (default: 'false')
+  --[no]pdb_post_mortem: Set to true to handle uncaught exceptions with PDB post mortem.
+    (default: 'false')
+  --profile_file: Dump profile information to a file (for python -m pstats). Implies --run_with_profiling.
+  --[no]run_with_pdb: Set to true for PDB debug mode
+    (default: 'false')
+  --[no]run_with_profiling: Set to true for profiling the script. Execution will be slower, and the output format might change over time.
+    (default: 'false')
+  --[no]use_cprofile_for_profiling: Use cProfile instead of the profile module for profiling. This has no effect unless --run_with_profiling is set.
+    (default: 'true')
+absl.logging:
+  --[no]alsologtostderr: also log to stderr?
+    (default: 'false')
+  --log_dir: directory to write logfiles into
+    (default: '')
+  --logger_levels: Specify log level of loggers. The format is a CSV list of `name:level`. Where `name` is the logger name used with `logging.getLogger()`,
+    and `level` is a level name  (INFO, DEBUG, etc). e.g. `myapp.foo:INFO,other.logger:DEBUG`
+    (default: '')
+  --[no]logtostderr: Should only log to stderr?
+    (default: 'false')
+  --[no]showprefixforinfo: If False, do not prepend prefix to info messages when it's logged to stderr, --verbosity is set to INFO level, and python logging
+    is used.
+    (default: 'true')
+  --stderrthreshold: log messages at this level, or more severe, to stderr in addition to the logfile.  Possible values are 'debug', 'info', 'warning',
+    'error', and 'fatal'.  Obsoletes --alsologtostderr. Using --alsologtostderr cancels the effect of this flag. Please also note that this flag is subject to
+    --verbosity and requires logfile not be stderr.
+    (default: 'fatal')
+  -v,--verbosity: Logging verbosity level. Messages logged at this level or lower will be included. Set to 1 for debug logging. If the flag was not set or
+    supplied, the value will be changed from the default of -1 (warning) to 0 (info) after flags are parsed.
+    (default: '-1')
+    (an integer)
+absl.testing.absltest:
+  --test_random_seed: Random seed for testing. Some test frameworks may change the default value of this flag between runs, so it is not appropriate for
+    seeding probabilistic tests.
+    (default: '301')
+    (an integer)
+  --test_randomize_ordering_seed: If positive, use this as a seed to randomize the execution order for test cases. If "random", pick a random seed to use. If
+or not set, do not randomize test case execution order. This flag also overrides the TEST_RANDOMIZE_ORDERING_SEED environment variable.
+    (default: '')
+  --test_srcdir: Root of directory tree where source files live
+    (default: '')
+  --test_tmpdir: Directory for temporary testing files
+    (default: '/tmp/absl_testing')
+  --xml_output_file: File to store XML test results
+    (default: '')
+chex._src.fake:
+  --[no]chex_assert_multiple_cpu_devices: Whether to fail if a number of CPU devices is less than 2.
+    (default: 'false')
+  --chex_n_cpu_devices: Number of CPU threads to use as devices in tests.
+    (default: '1')
+    (an integer)
+chex._src.variants:
+  --[no]chex_skip_pmap_variant_if_single_device: Whether to skip pmap variant if only one device is available.
+    (default: 'true')
+absl.flags:
+  --flagfile: Insert flag definitions from the given file into the command line.
+    (default: '')
+  --undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name.
+    IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
+    (default: '')
+</code>