Move README content

2024-02-15 02:12:44 +00:00 · 2024-02-15 02:12:44 +00:00 · b31af2f9a8
parent c68b66e4f6
commit b31af2f9a8
2 changed files with 116 additions and 116 deletions
--- a/README.md
+++ b/README.md
@ -1 +1,117 @@
 # LLM QA
+
+A proof of concept question-answering system for different types of text data.
+
+Currently implemented:
+
+- Plain text
+- Markdown
+
+## Key Features
+
+### Dockerized development environment
+
+- Easy, quick and reproducible setup
+
+### Automatic pull and serve of declared models
+
+- Ollama models are automatically pulled and served by the FastAPI server
+
+### Detailed logging
+
+- Key potential bottlenecks are timed and logged
+
+#### Upsert
+
+```console
+2024-02-15 01:10:54,341 - llm_qa.services.upsert - INFO - Split `MARKDOWN` type text into 8 document chunks in 0.01 seconds
+2024-02-15 01:10:54,759 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
+2024-02-15 01:11:03,121 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
+2024-02-15 01:11:03,140 - llm_qa.services.upsert - INFO - Upserted 8 document chunks to Qdrant collection `showcase` in 8.80 seconds
+2024-02-15 01:11:03,142 - uvicorn.access - INFO - 127.0.0.1:55868 - "POST /api/v1/upsert-text HTTP/1.1" 200 OK
+```
+
+#### Chat
+
+```console
+2024-02-15 01:02:03,408 - llm_qa.dependencies - INFO - Ollama auto-pull enabled, checking if model is available
+2024-02-15 01:02:03,441 - httpx - INFO - HTTP Request: POST http://ollama:11434/api/show "HTTP/1.1 200 OK"
+2024-02-15 01:02:03,441 - llm_qa.dependencies - INFO - Ollama model `openchat:7b-v3.5-0106-q4_K_M` already exists
+2024-02-15 01:02:03,645 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
+2024-02-15 01:02:03,653 - llm_qa.chains.time_logger - INFO - Chain `VectorStoreRetriever` finished in 0.08 seconds
+2024-02-15 01:02:23,192 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference-rerank/rerank "HTTP/1.1 200 OK"
+2024-02-15 01:02:23,194 - llm_qa.chains.time_logger - INFO - Chain `RerankAndTake` finished in 19.54 seconds
+2024-02-15 01:02:29,817 - llm_qa.chains.time_logger - INFO - Chain `ChatOllama` finished in 6.62 seconds
+2024-02-15 01:02:29,817 - llm_qa.services.chat - INFO - Chat chain finished in 26.27 seconds
+2024-02-15 01:02:29,823 - uvicorn.access - INFO - 127.0.0.1:50100 - "POST /api/v1/chat HTTP/1.1" 200 OK
+```
+
+### Hierarchical document chunking
+
+- Hierarchical text, such as markdown, is split into document chunks by headers
+- All previous parent headers are also included in the chunk, separated by `...`
+- This enriches the context of the chunk and solves the problem of global context being lost when splitting the text
+
+Example:
+
+```md
+# AWS::SageMaker::ModelQualityJobDefinition MonitoringGroundTruthS3Input<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input"></a>
+...
+## Syntax<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax"></a>
+...
+### YAML<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax.yaml"></a>
+``` [S3Uri](#cfn-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-s3uri): String ```
+```
+
+### Retrieval query rewriting
+
+- After the first message, subsequent messages are rewritten to include previous messages context
+- This allows for a more natural conversation flow and retrieval of more relevant chunks
+
+Example:
+
+```md
+### User: What are all AWS regions where SageMaker is available?
+### AI:  SageMaker is available in most AWS regions, except for the following: Asia Pacific (Jakarta), Africa (Cape Town), Middle East (UAE), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Milan), AWS GovCloud (US-East), Europe (Spain), and Europe (Zurich) Region.
+
+### User: What about the Bedrock service?
+### Retrieval Query:  What is the availability of AWS SageMaker in relation to the Bedrock service?
+```
+
+### Reranking
+
+- Retrieval of a larger number of document chunks is first performed using a vector store
+- Then, the chunks are reranked using a reranker model
+- This process more precisely selects the most relevant chunks for the user query
+
+## Development
+
+### Setup
+
+First copy the `.devcontainer/.env.example` file to `.devcontainer/.env` and adjust the settings and models to your needs.
+
+Then simply open the project devcontainer in a compatible IDE.
+This will setup all required tools and project dependencies for Python development.
+It will also run Docker containers for all required services.
+
+### Configuration
+
+Create a `llm-qa/.env` file to override selective default environment variables located in `llm-qa/.env.default`.
+
+### Running
+
+To run the FastAPI server, run the `llm_qa.web` submodule:
+
+```bash
+poetry run python -m llm_qa.web
+```
+
+To run the minimal CLI client, run the `llm_qa.client` submodule:
+
+```bash
+poetry run python -m llm_qa.client
+```
+
+## Deployment
+
+Not yet implemented.
--- a/llm-qa/README.md
+++ b/llm-qa/README.md
@ -1,117 +1 @@
 # LLM QA
-
-A proof of concept question-answering system for different types of text data.
-
-Currently implemented:
-
- Plain text
- Markdown
-
-## Key Features
-
-### Dockerized development environment
-
- Easy, quick and reproducible setup
-
-### Automatic pull and serve of declared models
-
- Ollama models are automatically pulled and served by the FastAPI server
-
-### Detailed logging
-
- Key potential bottlenecks are timed and logged
-
-#### Upsert
-
-```console
-2024-02-15 01:10:54,341 - llm_qa.services.upsert - INFO - Split `MARKDOWN` type text into 8 document chunks in 0.01 seconds
-2024-02-15 01:10:54,759 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
-2024-02-15 01:11:03,121 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
-2024-02-15 01:11:03,140 - llm_qa.services.upsert - INFO - Upserted 8 document chunks to Qdrant collection `showcase` in 8.80 seconds
-2024-02-15 01:11:03,142 - uvicorn.access - INFO - 127.0.0.1:55868 - "POST /api/v1/upsert-text HTTP/1.1" 200 OK
-```
-
-#### Chat
-
-```console
-2024-02-15 01:02:03,408 - llm_qa.dependencies - INFO - Ollama auto-pull enabled, checking if model is available
-2024-02-15 01:02:03,441 - httpx - INFO - HTTP Request: POST http://ollama:11434/api/show "HTTP/1.1 200 OK"
-2024-02-15 01:02:03,441 - llm_qa.dependencies - INFO - Ollama model `openchat:7b-v3.5-0106-q4_K_M` already exists
-2024-02-15 01:02:03,645 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
-2024-02-15 01:02:03,653 - llm_qa.chains.time_logger - INFO - Chain `VectorStoreRetriever` finished in 0.08 seconds
-2024-02-15 01:02:23,192 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference-rerank/rerank "HTTP/1.1 200 OK"
-2024-02-15 01:02:23,194 - llm_qa.chains.time_logger - INFO - Chain `RerankAndTake` finished in 19.54 seconds
-2024-02-15 01:02:29,817 - llm_qa.chains.time_logger - INFO - Chain `ChatOllama` finished in 6.62 seconds
-2024-02-15 01:02:29,817 - llm_qa.services.chat - INFO - Chat chain finished in 26.27 seconds
-2024-02-15 01:02:29,823 - uvicorn.access - INFO - 127.0.0.1:50100 - "POST /api/v1/chat HTTP/1.1" 200 OK
-```
-
-### Hierarchical document chunking
-
- Hierarchical text, such as markdown, is split into document chunks by headers
- All previous parent headers are also included in the chunk, separated by `...`
- This enriches the context of the chunk and solves the problem of global context being lost when splitting the text
-
-Example:
-
-```md
-# AWS::SageMaker::ModelQualityJobDefinition MonitoringGroundTruthS3Input<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input"></a>
-...
-## Syntax<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax"></a>
-...
-### YAML<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax.yaml"></a>
-``` [S3Uri](#cfn-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-s3uri): String ```
-```
-
-### Retrieval query rewriting
-
- After the first message, subsequent messages are rewritten to include previous messages context
- This allows for a more natural conversation flow and retrieval of more relevant chunks
-
-Example:
-
-```md
-### User: What are all AWS regions where SageMaker is available?
-### AI:  SageMaker is available in most AWS regions, except for the following: Asia Pacific (Jakarta), Africa (Cape Town), Middle East (UAE), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Milan), AWS GovCloud (US-East), Europe (Spain), and Europe (Zurich) Region.
-
-### User: What about the Bedrock service?
-### Retrieval Query:  What is the availability of AWS SageMaker in relation to the Bedrock service?
-```
-
-### Reranking
-
- Retrieval of a larger number of document chunks is first performed using a vector store
- Then, the chunks are reranked using a reranker model
- This process more precisely selects the most relevant chunks for the user query
-
-## Development
-
-### Setup
-
-First copy the `.devcontainer/.env.example` file to `.devcontainer/.env` and adjust the settings and models to your needs.
-
-Then simply open the project devcontainer in a compatible IDE.
-This will setup all required tools and project dependencies for Python development.
-It will also run Docker containers for all required services.
-
-### Configuration
-
-Create a `llm-qa/.env` file to override selective default environment variables located in `llm-qa/.env.default`.
-
-### Running
-
-To run the FastAPI server, run the `llm_qa.web` submodule:
-
-```bash
-poetry run python -m llm_qa.web
-```
-
-To run the minimal CLI client, run the `llm_qa.client` submodule:
-
-```bash
-poetry run python -m llm_qa.client
-```
-
-## Deployment
-
-Not yet implemented.