Configure arey
This article is a detailed reference for the arey configuration file.
Location
Arey stores the configuration file in following locations:
~/.config/arey/arey.ymlfor Linux, MAC, or WSL2 in Windows.C:\users\<username>\.arey\arey.ymlfor Windows.
Configuration file is a valid YAML file.
A default configuration file is created by arey on the first run. You can
refer here for the latest configuration file snapshot.
Sections
Models
Model section provides a list of local LLM models for Llama.cpp or Ollama backends.
# Example configuration for arey app
#
# Looked up from XDG_CONFIG_DIR (~/.config/arey/arey.yml) on Linux or
# ~/.arey on Windows.
models:
tinydolphin:
path: ~/models/tinydolphin-2.8-1.1b.Q4_K_M.gguf
type: llama
template: chatml
ollama-tinydolphin:
name: tinydolphin:latest # name of the model, see http://localhost:11434/api/tags
type: ollama
template: chatml
models is a list of following elements. The key, e.g., tinydolphin can be
user specified, it is used further to reference the model setting.
name(only for Ollama): specify the ollama model name. Should be a valid name in the local library. Match with value from http://localhost:11434/api/tags.path(only for Llama.cpp): specify the model file path. Supports expansion of user directory marker~.type: must beollamafor Ollama models. Any other value is considered asllamamodel.template: conversation template used by the model. We use this for thearey chatcommand to convert user and assistant messages. See the Templates section below for details.
Profiles
Profiles section is a collection of settings used for generating LLM response.
You can define as many profiles as necessary. The key e.g., precise is used
in other sections to refer to the settings.
profiles:
# See https://www.reddit.com/r/LocalLLaMA/comments/1343bgz/what_model_parameters_is_everyone_using/
precise:
# factual responses and straightforward assistant answers
temperature: 0.7
repeat_penalty: 1.176
top_k: 40
top_p: 0.1
creative:
# chatting, storywriting, and interesting assistant answers
temperature: 0.72
repeat_penalty: 1.1
top_k: 0
top_p: 0.73
sphinx:
# varied storywriting (on 30B/65B) and unconventional chatting
temperature: 1.99
repeat_penalty: 1.15
top_k: 30
top_p: 0.18
Each profile can specify the completion settings specific to the model type.
Below are a few common settings.
| Parameter | Value | Purpose |
|---|---|---|
| repeat_penalty | 1-2 | Higher value discourages repetition of token |
| stop | [] | Comma separated list of stop words |
| temperature | 0.0-1.0 | Lower temperature implies precise response |
| top_k | 0-30 | Number of tokens to consider for sampling |
| top_p | 0.0-1.0 | Lower value samples from most likely tokens |
Ollama models: see the list of all parameters in Model file API documentation.
Llama.cpp models: see the list of all parameters in create_completion API documentation.
Chat and Task settings
chat and task settings specify the model and profile for the chat and
ask commands respectively.
chat:
model: ollama-tinydolphin
profile: precise
task:
model: ollama-tinydolphin
profile: precise
settings:
host: http://localhost:11434/
For either section, you can specify the model settings in a settings member.
Following model settings are supported for each model type.
Ollama models
| Setting key | Value | Remark |
|---|---|---|
| host | http://localhost:11434/ | Base url for Ollama server |
Llama.cpp models
| Setting key | Value | Remark |
|---|---|---|
| n_threads | Half of CPU count | Number of threads to run |
| n_ctx | 4096 | Context window size |
| n_batch | 512 | Batch size |
| n_gpu_layers | 0 | Number of layers to offload to GPU |
| use_mlock | False | Lock the model in main memory |
| verbose | False | Show verbose logs |
Prompt templates
A prompt template allows to specify tokens that are replaced during the runtime.
See https://github.com/codito/arey/blob/master/arey/data/prompts/chatml.yml for an example.