Configuration Guideline for Model
Every application chart for LLM
should include a modelConfig.yaml
file in the root directory. This file provides the essential information required by an LLM.
Here's an example of what a modelConfig.yaml file might look like:
modelConfig.yaml Example
source_url: https://huggingface.co/TheBloke/Yarn-Mistral-7B-128k-GGUF/resolve/main/yarn-mistral-7b-128k.Q4_K_M.gguf
id: yarnmistral7b
object: model
name: Yarn Mistral 7B Q4
version: '1.0'
description: Yarn Mistral 7B is a language model for long context and supports a 128k token context window.
format: gguf
settings:
ctx_len: 4096
prompt_template: |-
{prompt}
parameters:
temperature: 0.7
top_p: 0.95
stream: true
max_tokens: 4096
stop: []
frequency_penalty: 0
presence_penalty: 0
metadata:
author: NousResearch, The Bloke
tags:
- 7B
- Finetuned
size: 4370000000
engine: nitro
source_url
- Type:
string
The model download source. It can be an external url or a local filepath.
id
- Type:
string
The model identifier, which can be referenced in the API endpoints.
object
- Type:
string
- Default:
model
The type of the object.
name
- Type:
string
Human-readable name that is used for UI.
version
- Type:
string
The version of the model.
description
- Type:
string
The description of the model.
format
- Type:
string
The format of the model.
settings
The model settings.
Configuration example:
settings:
ctx_len: 4096
prompt_template: |-
{prompt}
ctx_len
- Type:
int
The context length of the model.
prompt_template
- Type:
string
The prompt template of the model, which is used to generate the prompt part of the model input.
parameters
Parameters of the model.
Configuration example
parameters:
temperature: 0.7
top_p: 0.95
stream: true
max_tokens: 4096
stop: []
frequency_penalty: 0
presence_penalty: 0
temperature
- Type:
float
The temperature parameter when the model generates text.
top_p
- Type:
float
The top-p parameter when the model generates text, which controls the probability distribution range of the output.
stream
- Type:
bool
Indicates whether the model generates text in a streaming manner.
max_tokens
- Type:
int
The maximum number of tokens generated by the model.
stop
- Type:
array
List of stop words.
frequency_penalty
- Type:
int
Frequency penalty parameter, used to adjust the frequency of vocabulary in the generated text.
presence_penalty
- Type:
int
Presence penalty parameter, used to adjust the probability of the presence of vocabulary in the generated text.
metadata
Metadata of the model.
Configuration example
metadata:
author: NousResearch, The Bloke
tags:
- 7B
- Finetuned
size: 4370000000
author
- Type:
string
The author name of the model.
tags
- Type:
array
List of tags, used to describe the attributes or features of the model.
size
- Type:
int
The size of the model.
engine
- Type:
string
The model engine used.