Common issues and solutions for inference.sh apps.
Import Errors
"ModuleNotFoundError" in Production
Solutions:
-
Add
__init__.pyfiles to all packages -
Add current directory to Python path:
python
1import sys, os2sys.path.append(os.path.dirname(os.path.abspath(__file__)))- For local packages, use editable installs:
txt
1-e ./local_package_directoryMemory Issues
"CUDA out of memory"
Solutions:
- Reduce batch size
- Use mixed precision:
model.to(dtype=torch.float16) - Enable gradient checkpointing:
model.gradient_checkpointing_enable() - Clear cache:
torch.cuda.empty_cache() - Increase VRAM in
inf.yml
Memory Leaks
Clean up after each request:
python
1import gc, torch2 3async def run(self, input_data):4 result = self.process(input_data)5 if torch.cuda.is_available():6 torch.cuda.empty_cache()7 gc.collect()8 return resultDevice Errors
"Expected all tensors to be on the same device"
Ensure all tensors are on the same device:
python
1input_tensor = input_tensor.to(self.device)"CUDA not available"
- Check
inf.ymlGPU requirements:
yaml
1resources:2 gpu:3 count: 14 vram: 24 # 24GB- Use device detection:
python
1from accelerate import Accelerator2device = Accelerator().deviceModel Loading Errors
"File not found" After Download
Don't assume file paths:
python
1model_path = snapshot_download(repo_id="org/model")2config_path = os.path.join(model_path, "config.yaml")3if os.path.exists(config_path):4 # Load config"Token required for gated model"
Add HF_TOKEN to secrets:
yaml
1secrets:2 - key: HF_TOKEN3 description: HuggingFace token for gated modelsFile Path Issues
Temporary Files Deleted Too Early
Use delete=False:
python
1with tempfile.NamedTemporaryFile(suffix='.jpg', delete=False) as tmp:2 output_path = tmp.namePath Separators
Use os.path.join:
python
1# ✅ Good2path = os.path.join("models", "config", "settings.json")Dependency Issues
Version Conflicts
Pin compatible versions:
txt
1torch==2.6.02numpy>=1.23.5,<2Flash Attention Build Errors
Use pre-built wheels in requirements2.txt
Debug Mode
Add logging:
python
1import logging2logging.basicConfig(level=logging.DEBUG)3 4async def setup(self, config):5 logging.debug(f"Config: {config}")6 logging.info("Starting model load...")Next
→ Best Practices - Optimization patterns