diff --git a/README.md b/README.md index b1e84ea..06235e0 100644 --- a/README.md +++ b/README.md @@ -1 +1,117 @@ # LLM QA + +A proof of concept question-answering system for different types of text data. + +Currently implemented: + +- Plain text +- Markdown + +## Key Features + +### Dockerized development environment + +- Easy, quick and reproducible setup + +### Automatic pull and serve of declared models + +- Ollama models are automatically pulled and served by the FastAPI server + +### Detailed logging + +- Key potential bottlenecks are timed and logged + +#### Upsert + +```console +2024-02-15 01:10:54,341 - llm_qa.services.upsert - INFO - Split `MARKDOWN` type text into 8 document chunks in 0.01 seconds +2024-02-15 01:10:54,759 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK" +2024-02-15 01:11:03,121 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK" +2024-02-15 01:11:03,140 - llm_qa.services.upsert - INFO - Upserted 8 document chunks to Qdrant collection `showcase` in 8.80 seconds +2024-02-15 01:11:03,142 - uvicorn.access - INFO - 127.0.0.1:55868 - "POST /api/v1/upsert-text HTTP/1.1" 200 OK +``` + +#### Chat + +```console +2024-02-15 01:02:03,408 - llm_qa.dependencies - INFO - Ollama auto-pull enabled, checking if model is available +2024-02-15 01:02:03,441 - httpx - INFO - HTTP Request: POST http://ollama:11434/api/show "HTTP/1.1 200 OK" +2024-02-15 01:02:03,441 - llm_qa.dependencies - INFO - Ollama model `openchat:7b-v3.5-0106-q4_K_M` already exists +2024-02-15 01:02:03,645 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK" +2024-02-15 01:02:03,653 - llm_qa.chains.time_logger - INFO - Chain `VectorStoreRetriever` finished in 0.08 seconds +2024-02-15 01:02:23,192 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference-rerank/rerank "HTTP/1.1 200 OK" +2024-02-15 01:02:23,194 - llm_qa.chains.time_logger - INFO - Chain `RerankAndTake` finished in 19.54 seconds +2024-02-15 01:02:29,817 - llm_qa.chains.time_logger - INFO - Chain `ChatOllama` finished in 6.62 seconds +2024-02-15 01:02:29,817 - llm_qa.services.chat - INFO - Chat chain finished in 26.27 seconds +2024-02-15 01:02:29,823 - uvicorn.access - INFO - 127.0.0.1:50100 - "POST /api/v1/chat HTTP/1.1" 200 OK +``` + +### Hierarchical document chunking + +- Hierarchical text, such as markdown, is split into document chunks by headers +- All previous parent headers are also included in the chunk, separated by `...` +- This enriches the context of the chunk and solves the problem of global context being lost when splitting the text + +Example: + +```md +# AWS::SageMaker::ModelQualityJobDefinition MonitoringGroundTruthS3Input +... +## Syntax +... +### YAML +``` [S3Uri](#cfn-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-s3uri): String ``` +``` + +### Retrieval query rewriting + +- After the first message, subsequent messages are rewritten to include previous messages context +- This allows for a more natural conversation flow and retrieval of more relevant chunks + +Example: + +```md +### User: What are all AWS regions where SageMaker is available? +### AI: SageMaker is available in most AWS regions, except for the following: Asia Pacific (Jakarta), Africa (Cape Town), Middle East (UAE), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Milan), AWS GovCloud (US-East), Europe (Spain), and Europe (Zurich) Region. + +### User: What about the Bedrock service? +### Retrieval Query: What is the availability of AWS SageMaker in relation to the Bedrock service? +``` + +### Reranking + +- Retrieval of a larger number of document chunks is first performed using a vector store +- Then, the chunks are reranked using a reranker model +- This process more precisely selects the most relevant chunks for the user query + +## Development + +### Setup + +First copy the `.devcontainer/.env.example` file to `.devcontainer/.env` and adjust the settings and models to your needs. + +Then simply open the project devcontainer in a compatible IDE. +This will setup all required tools and project dependencies for Python development. +It will also run Docker containers for all required services. + +### Configuration + +Create a `llm-qa/.env` file to override selective default environment variables located in `llm-qa/.env.default`. + +### Running + +To run the FastAPI server, run the `llm_qa.web` submodule: + +```bash +poetry run python -m llm_qa.web +``` + +To run the minimal CLI client, run the `llm_qa.client` submodule: + +```bash +poetry run python -m llm_qa.client +``` + +## Deployment + +Not yet implemented. diff --git a/llm-qa/README.md b/llm-qa/README.md index 06235e0..b1e84ea 100644 --- a/llm-qa/README.md +++ b/llm-qa/README.md @@ -1,117 +1 @@ # LLM QA - -A proof of concept question-answering system for different types of text data. - -Currently implemented: - -- Plain text -- Markdown - -## Key Features - -### Dockerized development environment - -- Easy, quick and reproducible setup - -### Automatic pull and serve of declared models - -- Ollama models are automatically pulled and served by the FastAPI server - -### Detailed logging - -- Key potential bottlenecks are timed and logged - -#### Upsert - -```console -2024-02-15 01:10:54,341 - llm_qa.services.upsert - INFO - Split `MARKDOWN` type text into 8 document chunks in 0.01 seconds -2024-02-15 01:10:54,759 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK" -2024-02-15 01:11:03,121 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK" -2024-02-15 01:11:03,140 - llm_qa.services.upsert - INFO - Upserted 8 document chunks to Qdrant collection `showcase` in 8.80 seconds -2024-02-15 01:11:03,142 - uvicorn.access - INFO - 127.0.0.1:55868 - "POST /api/v1/upsert-text HTTP/1.1" 200 OK -``` - -#### Chat - -```console -2024-02-15 01:02:03,408 - llm_qa.dependencies - INFO - Ollama auto-pull enabled, checking if model is available -2024-02-15 01:02:03,441 - httpx - INFO - HTTP Request: POST http://ollama:11434/api/show "HTTP/1.1 200 OK" -2024-02-15 01:02:03,441 - llm_qa.dependencies - INFO - Ollama model `openchat:7b-v3.5-0106-q4_K_M` already exists -2024-02-15 01:02:03,645 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK" -2024-02-15 01:02:03,653 - llm_qa.chains.time_logger - INFO - Chain `VectorStoreRetriever` finished in 0.08 seconds -2024-02-15 01:02:23,192 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference-rerank/rerank "HTTP/1.1 200 OK" -2024-02-15 01:02:23,194 - llm_qa.chains.time_logger - INFO - Chain `RerankAndTake` finished in 19.54 seconds -2024-02-15 01:02:29,817 - llm_qa.chains.time_logger - INFO - Chain `ChatOllama` finished in 6.62 seconds -2024-02-15 01:02:29,817 - llm_qa.services.chat - INFO - Chat chain finished in 26.27 seconds -2024-02-15 01:02:29,823 - uvicorn.access - INFO - 127.0.0.1:50100 - "POST /api/v1/chat HTTP/1.1" 200 OK -``` - -### Hierarchical document chunking - -- Hierarchical text, such as markdown, is split into document chunks by headers -- All previous parent headers are also included in the chunk, separated by `...` -- This enriches the context of the chunk and solves the problem of global context being lost when splitting the text - -Example: - -```md -# AWS::SageMaker::ModelQualityJobDefinition MonitoringGroundTruthS3Input -... -## Syntax -... -### YAML -``` [S3Uri](#cfn-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-s3uri): String ``` -``` - -### Retrieval query rewriting - -- After the first message, subsequent messages are rewritten to include previous messages context -- This allows for a more natural conversation flow and retrieval of more relevant chunks - -Example: - -```md -### User: What are all AWS regions where SageMaker is available? -### AI: SageMaker is available in most AWS regions, except for the following: Asia Pacific (Jakarta), Africa (Cape Town), Middle East (UAE), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Milan), AWS GovCloud (US-East), Europe (Spain), and Europe (Zurich) Region. - -### User: What about the Bedrock service? -### Retrieval Query: What is the availability of AWS SageMaker in relation to the Bedrock service? -``` - -### Reranking - -- Retrieval of a larger number of document chunks is first performed using a vector store -- Then, the chunks are reranked using a reranker model -- This process more precisely selects the most relevant chunks for the user query - -## Development - -### Setup - -First copy the `.devcontainer/.env.example` file to `.devcontainer/.env` and adjust the settings and models to your needs. - -Then simply open the project devcontainer in a compatible IDE. -This will setup all required tools and project dependencies for Python development. -It will also run Docker containers for all required services. - -### Configuration - -Create a `llm-qa/.env` file to override selective default environment variables located in `llm-qa/.env.default`. - -### Running - -To run the FastAPI server, run the `llm_qa.web` submodule: - -```bash -poetry run python -m llm_qa.web -``` - -To run the minimal CLI client, run the `llm_qa.client` submodule: - -```bash -poetry run python -m llm_qa.client -``` - -## Deployment - -Not yet implemented.