Do you limit your solution's LLM resource use per request, per step, and per action?
Explanation
Guidance
Looking for resource use limits to mitigate denial of service (DoS) attacks.
Example Responses
Example Response 1
Yes, our LLM solution implements comprehensive resource limits at multiple levels Per request, we enforce a maximum of 4,096 input tokens and 8,192 output tokens, with automatic truncation when limits are exceeded Per step, we implement a 30-second computation timeout for any single processing stage, after which the operation is terminated with an appropriate error message Per action, we enforce memory usage limits of 4GB for standard requests and 8GB for complex operations, with automatic scaling controls to prevent resource exhaustion Additionally, our rate limiting system restricts users to 60 requests per minute by default, with configurable limits for enterprise customers All these limits are enforced through our API gateway and internal monitoring systems, with detailed logging of any limit violations to detect potential abuse patterns.
Example Response 2
Yes, our platform implements multi-layered resource controls At the request level, we limit each API call to a maximum of 32,000 combined tokens (input+output) and enforce a 3-minute total processing time For multi-step operations, each individual step has a 45-second timeout and 2GB memory allocation For specific actions like code generation or complex reasoning chains, we implement specialized limits: code generation is limited to 1,024 lines of output, while reasoning chains are limited to 5 sequential steps Our system also implements dynamic resource allocation that automatically detects and terminates anomalous usage patterns, such as requests consuming 3x more resources than the typical baseline for similar operations All limits are configurable through our administrative console for enterprise deployments, allowing organizations to set custom thresholds based on their specific security requirements.
Example Response 3
No, our current LLM implementation does not have comprehensive resource limits at all levels requested While we do implement basic rate limiting at the API level (100 requests per minute per API key) and have a global timeout of 5 minutes per request, we do not currently have granular controls for per-step or per-action resource usage This is a known limitation in our current architecture that we're addressing in our next major release (scheduled for Q3 2023) Our development roadmap includes implementing token consumption limits, per-step timeouts, and memory usage constraints In the interim, we mitigate potential DoS risks through active monitoring of our infrastructure and manual intervention when abnormal usage patterns are detected We recognize this is not ideal from a security perspective and are prioritizing these enhancements.
Context
- Tab
- AI
- Category
- AI Large Language Model (LLM)

