Strumenti Utente

Strumenti Sito


calcoloscientifico:userguide:alphafold

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
calcoloscientifico:userguide:alphafold [24/01/2025 11:02] fabio.spatarocalcoloscientifico:userguide:alphafold [06/02/2025 19:46] (versione attuale) fabio.spataro
Linea 13: Linea 13:
 <code> <code>
 /hpc/share/containers/apptainer/alphafold/3.0.1/alphafold-3.0.1.sif /hpc/share/containers/apptainer/alphafold/3.0.1/alphafold-3.0.1.sif
 +</code>
 +
 +=== Alphafold3 GPU demo ===
 +
 +<code>
 +mkdir -p demo/af_input
 +cp -p /hpc/share/containers/apptainer/alphafold/3/af_input/fold_input.json demo/af_input
 +cp -p /hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_40g.sh demo
 +cd demo
 +sbatch slurm-alphafold-gpu-a100_40g.sh
 </code> </code>
  
Linea 36: Linea 46:
 </code> </code>
  
-Script ''slurm-alphafold.sh'' to run ''alphafold'' on 1 node with 1 GPU (8 tasks per node):+Script ''slurm-alphafold-gpu-a100_40g.sh'' to run ''alphafold'' on 1 node with 1 A100 (40 GB) GPU (8 tasks per node):
  
-<code bash slurm-alphafold.sh>+<code bash slurm-alphafold-gpu-a100_40g.sh>
 #!/bin/bash --login #!/bin/bash --login
 #SBATCH --job-name=alphafold #SBATCH --job-name=alphafold
Linea 62: Linea 72:
 test -n "$ALPHAFOLD_CONTAINER" || exit 1 test -n "$ALPHAFOLD_CONTAINER" || exit 1
  
-ALPHAFOLD_N_CPU=$SLURM_CPUS_PER_TASK+set -x 
 + 
 +ALPHAFOLD_JSON_INPUT_FILE='fold_input.json'
 ALPHAFOLD_INPUT_DIR="$PWD/af_input" ALPHAFOLD_INPUT_DIR="$PWD/af_input"
 ALPHAFOLD_OUTPUT_DIR="$PWD/af_output/${SLURM_JOB_NAME}.d${SLURM_JOB_ID}" ALPHAFOLD_OUTPUT_DIR="$PWD/af_output/${SLURM_JOB_NAME}.d${SLURM_JOB_ID}"
Linea 69: Linea 81:
  
 apptainer exec \ apptainer exec \
-    --bind '/opt/hpc/system/nvidia/driver:/usr/local/nvidia/bin'+    --nv \
-    --bind '/opt/hpc/system/nvidia/driver:/usr/local/nvidia/lib' \+
     --bind "$ALPHAFOLD_INPUT_DIR:/root/af_input" \     --bind "$ALPHAFOLD_INPUT_DIR:/root/af_input" \
     --bind "$ALPHAFOLD_OUTPUT_DIR:/root/af_output" \     --bind "$ALPHAFOLD_OUTPUT_DIR:/root/af_output" \
-    --bind "$ALPHAFOLD_MODEL_DIR:/root/models" \ 
-    --bind "$ALPHAFOLD_DB_DIR:/root/public_databases" \ 
     "$ALPHAFOLD_CONTAINER" \     "$ALPHAFOLD_CONTAINER" \
     python /app/alphafold/run_alphafold.py \     python /app/alphafold/run_alphafold.py \
-    --json_path=/root/af_input/fold_input.json \+    --json_path="/root/af_input/$ALPHAFOLD_JSON_INPUT_FILE" \
     --model_dir=/root/models \     --model_dir=/root/models \
     --db_dir=/root/public_databases \     --db_dir=/root/public_databases \
-    --pdb_database_path=/root/public_databases/mmcif_files +    --db_dir=/root/public_databases_fallback 
-    --output_dir=/root/af_output +    --output_dir=/root/af_output
-    --jackhmmer_n_cpu=$ALPHAFOLD_N_CPU \ +
-    --nhmmer_n_cpu=$ALPHAFOLD_N_CPU+
 </code> </code>
  
 The processing result will be saved in the ''af output'' folder. The processing result will be saved in the ''af output'' folder.
 +
 +Scripts for specific NVIDIA GPU models to run ''alphafold'' on 1 node with 1 GPU (8 tasks per node):
 +
 +^ GPU  ^ Path  ^
 +| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-p100|P100 (12 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-p100.sh''  |
 +| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-v100|V100 (32 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu_guest-v100_hylab.sh''  |
 +| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-a100-40-gb|A100 (40 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_40g.sh''  |
 +| NVIDIA [[https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#accelerator-hardware-requirements|A100 (80 GB)]]  | ''/hpc/share/containers/apptainer/alphafold/3.0.1/slurm-alphafold-gpu-a100_80g.sh''  |
 +
 +=== Documentation === 
 +
 +How to get a list of all flags of ''run_alphafold.py'' (version 3.0.1):
 +
 +<code bash>
 +module load apptainer
 +module load alphafold/3.0.1
 +
 +apptainer exec "$ALPHAFOLD_CONTAINER" python /app/alphafold/run_alphafold.py --helpfull
 +</code>
 +
 +List of all flags of ''run_alphafold.py'' (version 3.0.1):
 +
 +<code>
 +AlphaFold 3 structure prediction script.
 +
 +AlphaFold 3 source code is licensed under CC BY-NC-SA 4.0. To view a copy of
 +this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
 +
 +To request access to the AlphaFold 3 model parameters, follow the process set
 +out at https://github.com/google-deepmind/alphafold3. You may only use these
 +if received directly from Google. Use is subject to terms of use available at
 +https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md
 +
 +flags:
 +
 +run_alphafold.py:
 +  --buckets: Strictly increasing order of token sizes for which to cache compilations. For any input with more tokens than the largest bucket size, a new
 +    bucket is created for exactly that number of tokens.
 +    (default: '256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120')
 +    (a comma separated list)
 +  --conformer_max_iterations: Optional override for maximum number of iterations to run for RDKit conformer search.
 +    (an integer)
 +  --db_dir: Path to the directory containing the databases. Can be specified multiple times to search multiple directories in order.;
 +    repeat this option to specify a list of values
 +    (default: "['/hpc/home/sti_calcolo/public_databases']")
 +  --flash_attention_implementation: <triton|cudnn|xla>: Flash attention implementation to use. 'triton' and 'cudnn' uses a Triton and cuDNN flash attention
 +    implementation, respectively. The Triton kernel is fastest and has been tested more thoroughly. The Triton and cuDNN kernels require Ampere GPUs or later.
 +    'xla' uses an XLA attention implementation (no flash attention) and is portable across GPU devices.
 +    (default: 'triton')
 +  --gpu_device: Optional override for the GPU device to use for inference. Defaults to the 1st GPU on the system. Useful on multi-GPU systems to pin each run
 +    to a specific GPU.
 +    (default: '0')
 +    (an integer)
 +  --hmmalign_binary_path: Path to the Hmmalign binary.
 +    (default: '/hmmer/bin/hmmalign')
 +  --hmmbuild_binary_path: Path to the Hmmbuild binary.
 +    (default: '/hmmer/bin/hmmbuild')
 +  --hmmsearch_binary_path: Path to the Hmmsearch binary.
 +    (default: '/hmmer/bin/hmmsearch')
 +  --input_dir: Path to the directory containing input JSON files.
 +  --jackhmmer_binary_path: Path to the Jackhmmer binary.
 +    (default: '/hmmer/bin/jackhmmer')
 +  --jackhmmer_n_cpu: Number of CPUs to use for Jackhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup.
 +    (default: '8')
 +    (an integer)
 +  --jax_compilation_cache_dir: Path to a directory for the JAX compilation cache.
 +  --json_path: Path to the input JSON file.
 +  --max_template_date: Maximum template release date to consider. Format: YYYY-MM-DD. All templates released after this date will be ignored.
 +    (default: '2021-09-30')
 +  --mgnify_database_path: Mgnify database path, used for protein MSA search.
 +    (default: '${DB_DIR}/mgy_clusters_2022_05.fa')
 +  --model_dir: Path to the model to use for inference.
 +    (default: '/hpc/home/sti_calcolo/models')
 +  --nhmmer_binary_path: Path to the Nhmmer binary.
 +    (default: '/hmmer/bin/nhmmer')
 +  --nhmmer_n_cpu: Number of CPUs to use for Nhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup.
 +    (default: '8')
 +    (an integer)
 +  --ntrna_database_path: NT-RNA database path, used for RNA MSA search.
 +    (default: '${DB_DIR}/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta')
 +  --num_diffusion_samples: Number of diffusion samples to generate.
 +    (default: '5')
 +    (a positive integer)
 +  --num_recycles: Number of recycles to use during inference.
 +    (default: '10')
 +    (a positive integer)
 +  --num_seeds: Number of seeds to use for inference. If set, only a single seed must be provided in the input JSON. AlphaFold 3 will then generate random
 +    seeds in sequence, starting from the single seed specified in the input JSON. The full input JSON produced by AlphaFold 3 will include the generated
 +    random seeds. If not set, AlphaFold 3 will use the seeds as provided in the input JSON.
 +    (a positive integer)
 +  --output_dir: Path to a directory where the results will be saved.
 +  --pdb_database_path: PDB database directory with mmCIF files path, used for template search.
 +    (default: '${DB_DIR}/mmcif_files')
 +  --rfam_database_path: Rfam database path, used for RNA MSA search.
 +    (default: '${DB_DIR}/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta')
 +  --rna_central_database_path: RNAcentral database path, used for RNA MSA search.
 +    (default: '${DB_DIR}/rnacentral_active_seq_id_90_cov_80_linclust.fasta')
 +  --[no]run_data_pipeline: Whether to run the data pipeline on the fold inputs.
 +    (default: 'true')
 +  --[no]run_inference: Whether to run inference on the fold inputs.
 +    (default: 'true')
 +  --[no]save_embeddings: Whether to save the final trunk single and pair embeddings in the output.
 +    (default: 'false')
 +  --seqres_database_path: PDB sequence database path, used for template search.
 +    (default: '${DB_DIR}/pdb_seqres_2022_09_28.fasta')
 +  --small_bfd_database_path: Small BFD database path, used for protein MSA search.
 +    (default: '${DB_DIR}/bfd-first_non_consensus_sequences.fasta')
 +  --uniprot_cluster_annot_database_path: UniProt database path, used for protein paired MSA search.
 +    (default: '${DB_DIR}/uniprot_all_2021_04.fa')
 +  --uniref90_database_path: UniRef90 database path, used for MSA search. The MSA obtained by searching it is used to construct the profile for template
 +    search.
 +    (default: '${DB_DIR}/uniref90_2022_05.fa')
 +
 +absl.app:
 +  -?,--[no]help: show this help
 +    (default: 'false')
 +  --[no]helpfull: show full help
 +    (default: 'false')
 +  --[no]helpshort: show this help
 +    (default: 'false')
 +  --[no]helpxml: like --helpfull, but generates XML output
 +    (default: 'false')
 +  --[no]only_check_args: Set to true to validate args and exit.
 +    (default: 'false')
 +  --[no]pdb: Alias for --pdb_post_mortem.
 +    (default: 'false')
 +  --[no]pdb_post_mortem: Set to true to handle uncaught exceptions with PDB post mortem.
 +    (default: 'false')
 +  --profile_file: Dump profile information to a file (for python -m pstats). Implies --run_with_profiling.
 +  --[no]run_with_pdb: Set to true for PDB debug mode
 +    (default: 'false')
 +  --[no]run_with_profiling: Set to true for profiling the script. Execution will be slower, and the output format might change over time.
 +    (default: 'false')
 +  --[no]use_cprofile_for_profiling: Use cProfile instead of the profile module for profiling. This has no effect unless --run_with_profiling is set.
 +    (default: 'true')
 +
 +absl.logging:
 +  --[no]alsologtostderr: also log to stderr?
 +    (default: 'false')
 +  --log_dir: directory to write logfiles into
 +    (default: '')
 +  --logger_levels: Specify log level of loggers. The format is a CSV list of `name:level`. Where `name` is the logger name used with `logging.getLogger()`,
 +    and `level` is a level name  (INFO, DEBUG, etc). e.g. `myapp.foo:INFO,other.logger:DEBUG`
 +    (default: '')
 +  --[no]logtostderr: Should only log to stderr?
 +    (default: 'false')
 +  --[no]showprefixforinfo: If False, do not prepend prefix to info messages when it's logged to stderr, --verbosity is set to INFO level, and python logging
 +    is used.
 +    (default: 'true')
 +  --stderrthreshold: log messages at this level, or more severe, to stderr in addition to the logfile.  Possible values are 'debug', 'info', 'warning',
 +    'error', and 'fatal' Obsoletes --alsologtostderr. Using --alsologtostderr cancels the effect of this flag. Please also note that this flag is subject to
 +    --verbosity and requires logfile not be stderr.
 +    (default: 'fatal')
 +  -v,--verbosity: Logging verbosity level. Messages logged at this level or lower will be included. Set to 1 for debug logging. If the flag was not set or
 +    supplied, the value will be changed from the default of -1 (warning) to 0 (info) after flags are parsed.
 +    (default: '-1')
 +    (an integer)
 +
 +absl.testing.absltest:
 +  --test_random_seed: Random seed for testing. Some test frameworks may change the default value of this flag between runs, so it is not appropriate for
 +    seeding probabilistic tests.
 +    (default: '301')
 +    (an integer)
 +  --test_randomize_ordering_seed: If positive, use this as a seed to randomize the execution order for test cases. If "random", pick a random seed to use. If
 +    0 or not set, do not randomize test case execution order. This flag also overrides the TEST_RANDOMIZE_ORDERING_SEED environment variable.
 +    (default: '')
 +  --test_srcdir: Root of directory tree where source files live
 +    (default: '')
 +  --test_tmpdir: Directory for temporary testing files
 +    (default: '/tmp/absl_testing')
 +  --xml_output_file: File to store XML test results
 +    (default: '')
 +
 +chex._src.fake:
 +  --[no]chex_assert_multiple_cpu_devices: Whether to fail if a number of CPU devices is less than 2.
 +    (default: 'false')
 +  --chex_n_cpu_devices: Number of CPU threads to use as devices in tests.
 +    (default: '1')
 +    (an integer)
 +
 +chex._src.variants:
 +  --[no]chex_skip_pmap_variant_if_single_device: Whether to skip pmap variant if only one device is available.
 +    (default: 'true')
 +
 +absl.flags:
 +  --flagfile: Insert flag definitions from the given file into the command line.
 +    (default: '')
 +  --undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name.
 +    IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
 +    (default: '')
 +</code>
  
calcoloscientifico/userguide/alphafold.1737712926.txt.gz · Ultima modifica: 24/01/2025 11:02 da fabio.spataro

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki