Configure arey
This article is a detailed reference for the arey
configuration file.
Location
Arey stores the configuration file in following locations:
~/.config/arey/arey.yml
for Linux, MAC, or WSL2 in Windows.C:\users\<username>\.arey\arey.yml
for Windows.
Configuration file is a valid YAML file.
A default configuration file is created by arey
on the first run. You can
refer here for the latest configuration file snapshot.
Sections
Models
Model section provides a list of local LLM models for Llama.cpp or Ollama backends.
# Example configuration for arey app
#
# Looked up from XDG_CONFIG_DIR (~/.config/arey/arey.yml) on Linux or
# ~/.arey on Windows.
models:
tinydolphin:
path: ~/models/tinydolphin-2.8-1.1b.Q4_K_M.gguf
type: llama
template: chatml
ollama-tinydolphin:
name: tinydolphin:latest # name of the model, see http://localhost:11434/api/tags
type: ollama
template: chatml
models
is a list of following elements. The key
, e.g., tinydolphin
can be
user specified, it is used further to reference the model setting.
name
(only for Ollama): specify the ollama model name. Should be a valid name in the local library. Match with value from http://localhost:11434/api/tags.path
(only for Llama.cpp): specify the model file path. Supports expansion of user directory marker~
.type
: must beollama
for Ollama models. Any other value is considered asllama
model.template
: conversation template used by the model. We use this for thearey chat
command to convert user and assistant messages. See the Templates section below for details.
Profiles
Profiles section is a collection of settings used for generating LLM response.
You can define as many profiles as necessary. The key
e.g., precise
is used
in other sections to refer to the settings.
profiles:
# See https://www.reddit.com/r/LocalLLaMA/comments/1343bgz/what_model_parameters_is_everyone_using/
precise:
# factual responses and straightforward assistant answers
temperature: 0.7
repeat_penalty: 1.176
top_k: 40
top_p: 0.1
creative:
# chatting, storywriting, and interesting assistant answers
temperature: 0.72
repeat_penalty: 1.1
top_k: 0
top_p: 0.73
sphinx:
# varied storywriting (on 30B/65B) and unconventional chatting
temperature: 1.99
repeat_penalty: 1.15
top_k: 30
top_p: 0.18
Each profile
can specify the completion settings specific to the model type.
Below are a few common settings.
Parameter | Value | Purpose |
---|---|---|
repeat_penalty | 1-2 | Higher value discourages repetition of token |
stop | [] | Comma separated list of stop words |
temperature | 0.0-1.0 | Lower temperature implies precise response |
top_k | 0-30 | Number of tokens to consider for sampling |
top_p | 0.0-1.0 | Lower value samples from most likely tokens |
Ollama models: see the list of all parameters in Model file API documentation.
Llama.cpp models: see the list of all parameters in create_completion API documentation.
Chat and Task settings
chat
and task
settings specify the model and profile for the chat
and
ask
commands respectively.
chat:
model: ollama-tinydolphin
profile: precise
task:
model: ollama-tinydolphin
profile: precise
settings:
host: http://localhost:11434/
For either section, you can specify the model settings in a settings
member.
Following model settings are supported for each model type.
Ollama models
Setting key | Value | Remark |
---|---|---|
host | http://localhost:11434/ | Base url for Ollama server |
Llama.cpp models
Setting key | Value | Remark |
---|---|---|
n_threads | Half of CPU count | Number of threads to run |
n_ctx | 4096 | Context window size |
n_batch | 512 | Batch size |
n_gpu_layers | 0 | Number of layers to offload to GPU |
use_mlock | False | Lock the model in main memory |
verbose | False | Show verbose logs |
Prompt templates
A prompt template allows to specify tokens that are replaced during the runtime.
See https://github.com/codito/arey/blob/master/arey/data/prompts/chatml.yml for an example.