Inference Logoinference.sh

Troubleshooting

Common issues and solutions for inference.sh apps.


Import Errors

"ModuleNotFoundError" in Production

Solutions:

  1. Add __init__.py files to all packages

  2. Add current directory to Python path:

python
1import sys, os2sys.path.append(os.path.dirname(os.path.abspath(__file__)))
  1. For local packages, use editable installs:
txt
1-e ./local_package_directory

Memory Issues

"CUDA out of memory"

Solutions:

  1. Reduce batch size
  2. Use mixed precision: model.to(dtype=torch.float16)
  3. Enable gradient checkpointing: model.gradient_checkpointing_enable()
  4. Clear cache: torch.cuda.empty_cache()
  5. Increase VRAM in inf.yml

Memory Leaks

Clean up after each request:

python
1import gc, torch2 3async def run(self, input_data):4    result = self.process(input_data)5    if torch.cuda.is_available():6        torch.cuda.empty_cache()7    gc.collect()8    return result

Device Errors

"Expected all tensors to be on the same device"

Ensure all tensors are on the same device:

python
1input_tensor = input_tensor.to(self.device)

"CUDA not available"

  1. Check inf.yml GPU requirements:
yaml
1resources:2  gpu:3    count: 14    vram: 24  # 24GB
  1. Use device detection:
python
1from accelerate import Accelerator2device = Accelerator().device

Model Loading Errors

"File not found" After Download

Don't assume file paths:

python
1model_path = snapshot_download(repo_id="org/model")2config_path = os.path.join(model_path, "config.yaml")3if os.path.exists(config_path):4    # Load config

"Token required for gated model"

Add HF_TOKEN to secrets:

yaml
1secrets:2  - key: HF_TOKEN3    description: HuggingFace token for gated models

File Path Issues

Temporary Files Deleted Too Early

Use delete=False:

python
1with tempfile.NamedTemporaryFile(suffix='.jpg', delete=False) as tmp:2    output_path = tmp.name

Path Separators

Use os.path.join:

python
1# ✅ Good2path = os.path.join("models", "config", "settings.json")

Dependency Issues

Version Conflicts

Pin compatible versions:

txt
1torch==2.6.02numpy>=1.23.5,<2

Flash Attention Build Errors

Use pre-built wheels in requirements2.txt


Debug Mode

Add logging:

python
1import logging2logging.basicConfig(level=logging.DEBUG)3 4async def setup(self, config):5    logging.debug(f"Config: {config}")6    logging.info("Starting model load...")

Next

Best Practices - Optimization patterns

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.