Customize Engine Settings

In this guide, we'll walk you through the process of customizing your engine settings by configuring the nitro.json file

Navigate to the App Settings > Advanced > Open App Directory > ~/jan/engine folder.

cd ~/jan/engines

C:/Users/<your_user_name>/jan/engines

cd ~/jan/engines

Modify the nitro.json file based on your needs. The default settings are shown below.

~/jan/engines/nitro.json
{
  "ctx_len": 2048,
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false
}

The table below describes the parameters in the nitro.json file.

Parameter	Type	Description
`ctx_len`	Integer	Typically set at `2048`, `ctx_len` provides ample context for model operations like `GPT-3.5`. (Maximum: `4096`, Minimum: `1`)
`ngl`	Integer	Defaulted at `100`, `ngl` determines GPU layer usage.
`cpu_threads`	Integer	Determines CPU inference threads, limited by hardware and OS. (Maximum determined by system)
`cont_batching`	Integer	Controls continuous batching, enhancing throughput for LLM inference.
`embedding`	Integer	Enables embedding utilization for tasks like document-enhanced chat in RAG-based applications.

tip

By default, the value of ngl is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set ngl to 15 because most models on Mistral or Llama are around ~ 30 layers.
To utilize the embedding feature, include the JSON parameter "embedding": true. It will enable Nitro to process inferences with embedding capabilities. Please refer to the Embedding in the Nitro documentation for a more detailed explanation.
To utilize the continuous batching feature for boosting throughput and minimizing latency in large language model (LLM) inference, include cont_batching: true. For details, please refer to the Continuous Batching in the Nitro documentation.

Assistance and Support

If you have questions, please join our Discord community for support, updates, and discussions.