Fix inaccurate num_ctx documentation description

The previous description 'Sets the size of the context window used to generate the next token' was misleading. This clarifies that num_ctx refers to the total context window size - the maximum number of tokens the model can process in total, including both prompt tokens and generated tokens (and tool calling tokens if applicable). This helps users understand why they might encounter issues when exceeding the context window limit.
2026-04-23 08:45:14 +00:00 · 2026-04-14 22:24:59 +08:00 · 2026-04-14 22:24:59 +08:00 · c7e7abb01a
parent c8fbaa9e5b
commit c7e7abb01a
1 changed files with 1 additions and 1 deletions
--- a/docs/modelfile.mdx
+++ b/docs/modelfile.mdx
@ -150,7 +150,7 @@ PARAMETER <parameter> <parametervalue>

 | Parameter      | Description                                                                                                                                                                                                                                                                                                                                                                     | Value Type | Example Usage        |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | -------------------- |
-| num_ctx        | Sets the size of the context window used to generate the next token. (Default: 2048)                                                                                                                                                                                                                                                                                            | int        | num_ctx 4096         |
+| num_ctx        | Sets the maximum number of tokens the model can process in total, including prompt tokens and generated tokens. (Default: 2048)                                                                                                                                                                                                                                           | int        | num_ctx 4096         |
 | repeat_last_n  | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)                                                                                                                                                                                                                                                                   | int        | repeat_last_n 64     |
 | repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)                                                                                                                                                                                             | float      | repeat_penalty 1.1   |
 | temperature    | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)                                                                                                                                                                                                                                                             | float      | temperature 0.7      |