Add more configuration params#1019
Conversation
JiwaniZakir
left a comment
There was a problem hiding this comment.
The rename from embedding_function to embeddingFunction in the chroma config block (visible in the docs diff) is a silent breaking change for any existing users who have already configured this field — there's no deprecation warning, fallback handling, or migration note. At minimum, the code that reads this config should check both keys and warn if the old snake_case form is found.
In engine.py, a new ThreadPoolExecutor is created on every call to query() but is never explicitly shut down. With maxWorkers now configurable up to 32, repeated queries could accumulate thread pool overhead. The executor should be used as a context manager (async with or a try/finally with .shutdown(wait=False)) or reused as an instance variable.
Finally, the config values read in _create_vector_embeddings (e.g., self.config["server"]["engine"]["minChunksToAnalyze"]["percentage"]) lack any runtime validation — a user setting percentage to 0.0 and minValue to 0 would result in minimum_chunks_to_analyze = 0, silently producing empty analysis. Given that ranges are documented (e.g., 0.01-1.0), there should be a validation step — either at config load time or with a clear error before use.
This pull request introduces a comprehensive set of configuration options for performance tuning and search quality in SeaGOAT. It adds new server-side configuration sections for vector search, text search, engine processing, and query defaults, allowing users to fine-tune behavior for different repository sizes and hardware constraints. The codebase is updated to use these new config values throughout, replacing previous hardcoded limits and improving flexibility.
Configuration System Enhancements
chroma,ripgrep,engine,query) todocs/configuration.mdandseagoat/utils/config.py, with schema validation and sensible defaults for vector search, text search, engine processing, and query parameters. [1] [2] [3] [4]Engine and Query Processing
seagoat/server.pyuse configurable defaults for result limits and context lines, improving usability and consistency. [1] [2] [3]Vector Search Improvements
Ripgrep Text Search Improvements
These changes make SeaGOAT much more configurable, allowing users to optimize performance and search quality for their specific use case.