AILM-05

Do you limit your solution's LLM resource use per request, per step, and per action?

Explanation

This question is asking whether your LLM solution implements resource usage limits at different operational levels (per request, per step, and per action). What it means: - 'Per request' refers to limiting resources for each individual API call or user interaction with the LLM - 'Per step' refers to limiting resources during each processing stage within a single request - 'Per action' refers to limiting resources for specific operations the LLM performs Why it's being asked: This is primarily a security concern about denial of service (DoS) attacks. Without proper resource limits, an attacker could craft requests that consume excessive computational resources (CPU, memory, tokens, etc.), potentially: 1. Degrading service for other users 2. Causing system instability or crashes 3. Generating excessive costs in pay-per-use models 4. Preventing legitimate users from accessing the service Resource limits also help prevent unintentional issues from poorly formed requests that might otherwise consume excessive resources. How to best answer it: Provide specific details about the resource limits implemented in your solution, including: - Token limits (input and output) - Request rate limits - Computation time limits - Memory usage limits - Any automatic termination mechanisms for runaway processes - How these limits are enforced at different operational levels If you have different tiers of service with different limits, explain that structure. Include both technical and policy-based controls.

Guidance

Looking for resource use limits to mitigate denial of service (DoS) attacks.

Example Responses

Example Response 1

Yes, our LLM solution implements comprehensive resource limits at multiple levels Per request, we enforce a maximum of 4,096 input tokens and 8,192 output tokens, with automatic truncation when limits are exceeded Per step, we implement a 30-second computation timeout for any single processing stage, after which the operation is terminated with an appropriate error message Per action, we enforce memory usage limits of 4GB for standard requests and 8GB for complex operations, with automatic scaling controls to prevent resource exhaustion Additionally, our rate limiting system restricts users to 60 requests per minute by default, with configurable limits for enterprise customers All these limits are enforced through our API gateway and internal monitoring systems, with detailed logging of any limit violations to detect potential abuse patterns.

Example Response 2

Yes, our platform implements multi-layered resource controls At the request level, we limit each API call to a maximum of 32,000 combined tokens (input+output) and enforce a 3-minute total processing time For multi-step operations, each individual step has a 45-second timeout and 2GB memory allocation For specific actions like code generation or complex reasoning chains, we implement specialized limits: code generation is limited to 1,024 lines of output, while reasoning chains are limited to 5 sequential steps Our system also implements dynamic resource allocation that automatically detects and terminates anomalous usage patterns, such as requests consuming 3x more resources than the typical baseline for similar operations All limits are configurable through our administrative console for enterprise deployments, allowing organizations to set custom thresholds based on their specific security requirements.

Example Response 3

No, our current LLM implementation does not have comprehensive resource limits at all levels requested While we do implement basic rate limiting at the API level (100 requests per minute per API key) and have a global timeout of 5 minutes per request, we do not currently have granular controls for per-step or per-action resource usage This is a known limitation in our current architecture that we're addressing in our next major release (scheduled for Q3 2023) Our development roadmap includes implementing token consumption limits, per-step timeouts, and memory usage constraints In the interim, we mitigate potential DoS risks through active monitoring of our infrastructure and manual intervention when abnormal usage patterns are detected We recognize this is not ideal from a security perspective and are prioritizing these enhancements.

Context

Tab: AI
Category: AI Large Language Model (LLM)