Move README content
This commit is contained in:
parent
c68b66e4f6
commit
b31af2f9a8
116
README.md
116
README.md
|
|
@ -1 +1,117 @@
|
|||
# LLM QA
|
||||
|
||||
A proof of concept question-answering system for different types of text data.
|
||||
|
||||
Currently implemented:
|
||||
|
||||
- Plain text
|
||||
- Markdown
|
||||
|
||||
## Key Features
|
||||
|
||||
### Dockerized development environment
|
||||
|
||||
- Easy, quick and reproducible setup
|
||||
|
||||
### Automatic pull and serve of declared models
|
||||
|
||||
- Ollama models are automatically pulled and served by the FastAPI server
|
||||
|
||||
### Detailed logging
|
||||
|
||||
- Key potential bottlenecks are timed and logged
|
||||
|
||||
#### Upsert
|
||||
|
||||
```console
|
||||
2024-02-15 01:10:54,341 - llm_qa.services.upsert - INFO - Split `MARKDOWN` type text into 8 document chunks in 0.01 seconds
|
||||
2024-02-15 01:10:54,759 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:11:03,121 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:11:03,140 - llm_qa.services.upsert - INFO - Upserted 8 document chunks to Qdrant collection `showcase` in 8.80 seconds
|
||||
2024-02-15 01:11:03,142 - uvicorn.access - INFO - 127.0.0.1:55868 - "POST /api/v1/upsert-text HTTP/1.1" 200 OK
|
||||
```
|
||||
|
||||
#### Chat
|
||||
|
||||
```console
|
||||
2024-02-15 01:02:03,408 - llm_qa.dependencies - INFO - Ollama auto-pull enabled, checking if model is available
|
||||
2024-02-15 01:02:03,441 - httpx - INFO - HTTP Request: POST http://ollama:11434/api/show "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:02:03,441 - llm_qa.dependencies - INFO - Ollama model `openchat:7b-v3.5-0106-q4_K_M` already exists
|
||||
2024-02-15 01:02:03,645 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:02:03,653 - llm_qa.chains.time_logger - INFO - Chain `VectorStoreRetriever` finished in 0.08 seconds
|
||||
2024-02-15 01:02:23,192 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference-rerank/rerank "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:02:23,194 - llm_qa.chains.time_logger - INFO - Chain `RerankAndTake` finished in 19.54 seconds
|
||||
2024-02-15 01:02:29,817 - llm_qa.chains.time_logger - INFO - Chain `ChatOllama` finished in 6.62 seconds
|
||||
2024-02-15 01:02:29,817 - llm_qa.services.chat - INFO - Chat chain finished in 26.27 seconds
|
||||
2024-02-15 01:02:29,823 - uvicorn.access - INFO - 127.0.0.1:50100 - "POST /api/v1/chat HTTP/1.1" 200 OK
|
||||
```
|
||||
|
||||
### Hierarchical document chunking
|
||||
|
||||
- Hierarchical text, such as markdown, is split into document chunks by headers
|
||||
- All previous parent headers are also included in the chunk, separated by `...`
|
||||
- This enriches the context of the chunk and solves the problem of global context being lost when splitting the text
|
||||
|
||||
Example:
|
||||
|
||||
```md
|
||||
# AWS::SageMaker::ModelQualityJobDefinition MonitoringGroundTruthS3Input<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input"></a>
|
||||
...
|
||||
## Syntax<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax"></a>
|
||||
...
|
||||
### YAML<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax.yaml"></a>
|
||||
``` [S3Uri](#cfn-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-s3uri): String ```
|
||||
```
|
||||
|
||||
### Retrieval query rewriting
|
||||
|
||||
- After the first message, subsequent messages are rewritten to include previous messages context
|
||||
- This allows for a more natural conversation flow and retrieval of more relevant chunks
|
||||
|
||||
Example:
|
||||
|
||||
```md
|
||||
### User: What are all AWS regions where SageMaker is available?
|
||||
### AI: SageMaker is available in most AWS regions, except for the following: Asia Pacific (Jakarta), Africa (Cape Town), Middle East (UAE), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Milan), AWS GovCloud (US-East), Europe (Spain), and Europe (Zurich) Region.
|
||||
|
||||
### User: What about the Bedrock service?
|
||||
### Retrieval Query: What is the availability of AWS SageMaker in relation to the Bedrock service?
|
||||
```
|
||||
|
||||
### Reranking
|
||||
|
||||
- Retrieval of a larger number of document chunks is first performed using a vector store
|
||||
- Then, the chunks are reranked using a reranker model
|
||||
- This process more precisely selects the most relevant chunks for the user query
|
||||
|
||||
## Development
|
||||
|
||||
### Setup
|
||||
|
||||
First copy the `.devcontainer/.env.example` file to `.devcontainer/.env` and adjust the settings and models to your needs.
|
||||
|
||||
Then simply open the project devcontainer in a compatible IDE.
|
||||
This will setup all required tools and project dependencies for Python development.
|
||||
It will also run Docker containers for all required services.
|
||||
|
||||
### Configuration
|
||||
|
||||
Create a `llm-qa/.env` file to override selective default environment variables located in `llm-qa/.env.default`.
|
||||
|
||||
### Running
|
||||
|
||||
To run the FastAPI server, run the `llm_qa.web` submodule:
|
||||
|
||||
```bash
|
||||
poetry run python -m llm_qa.web
|
||||
```
|
||||
|
||||
To run the minimal CLI client, run the `llm_qa.client` submodule:
|
||||
|
||||
```bash
|
||||
poetry run python -m llm_qa.client
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Not yet implemented.
|
||||
|
|
|
|||
116
llm-qa/README.md
116
llm-qa/README.md
|
|
@ -1,117 +1 @@
|
|||
# LLM QA
|
||||
|
||||
A proof of concept question-answering system for different types of text data.
|
||||
|
||||
Currently implemented:
|
||||
|
||||
- Plain text
|
||||
- Markdown
|
||||
|
||||
## Key Features
|
||||
|
||||
### Dockerized development environment
|
||||
|
||||
- Easy, quick and reproducible setup
|
||||
|
||||
### Automatic pull and serve of declared models
|
||||
|
||||
- Ollama models are automatically pulled and served by the FastAPI server
|
||||
|
||||
### Detailed logging
|
||||
|
||||
- Key potential bottlenecks are timed and logged
|
||||
|
||||
#### Upsert
|
||||
|
||||
```console
|
||||
2024-02-15 01:10:54,341 - llm_qa.services.upsert - INFO - Split `MARKDOWN` type text into 8 document chunks in 0.01 seconds
|
||||
2024-02-15 01:10:54,759 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:11:03,121 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:11:03,140 - llm_qa.services.upsert - INFO - Upserted 8 document chunks to Qdrant collection `showcase` in 8.80 seconds
|
||||
2024-02-15 01:11:03,142 - uvicorn.access - INFO - 127.0.0.1:55868 - "POST /api/v1/upsert-text HTTP/1.1" 200 OK
|
||||
```
|
||||
|
||||
#### Chat
|
||||
|
||||
```console
|
||||
2024-02-15 01:02:03,408 - llm_qa.dependencies - INFO - Ollama auto-pull enabled, checking if model is available
|
||||
2024-02-15 01:02:03,441 - httpx - INFO - HTTP Request: POST http://ollama:11434/api/show "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:02:03,441 - llm_qa.dependencies - INFO - Ollama model `openchat:7b-v3.5-0106-q4_K_M` already exists
|
||||
2024-02-15 01:02:03,645 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference/embed "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:02:03,653 - llm_qa.chains.time_logger - INFO - Chain `VectorStoreRetriever` finished in 0.08 seconds
|
||||
2024-02-15 01:02:23,192 - httpx - INFO - HTTP Request: POST http://text-embeddings-inference-rerank/rerank "HTTP/1.1 200 OK"
|
||||
2024-02-15 01:02:23,194 - llm_qa.chains.time_logger - INFO - Chain `RerankAndTake` finished in 19.54 seconds
|
||||
2024-02-15 01:02:29,817 - llm_qa.chains.time_logger - INFO - Chain `ChatOllama` finished in 6.62 seconds
|
||||
2024-02-15 01:02:29,817 - llm_qa.services.chat - INFO - Chat chain finished in 26.27 seconds
|
||||
2024-02-15 01:02:29,823 - uvicorn.access - INFO - 127.0.0.1:50100 - "POST /api/v1/chat HTTP/1.1" 200 OK
|
||||
```
|
||||
|
||||
### Hierarchical document chunking
|
||||
|
||||
- Hierarchical text, such as markdown, is split into document chunks by headers
|
||||
- All previous parent headers are also included in the chunk, separated by `...`
|
||||
- This enriches the context of the chunk and solves the problem of global context being lost when splitting the text
|
||||
|
||||
Example:
|
||||
|
||||
```md
|
||||
# AWS::SageMaker::ModelQualityJobDefinition MonitoringGroundTruthS3Input<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input"></a>
|
||||
...
|
||||
## Syntax<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax"></a>
|
||||
...
|
||||
### YAML<a name="aws-properties-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-syntax.yaml"></a>
|
||||
``` [S3Uri](#cfn-sagemaker-modelqualityjobdefinition-monitoringgroundtruths3input-s3uri): String ```
|
||||
```
|
||||
|
||||
### Retrieval query rewriting
|
||||
|
||||
- After the first message, subsequent messages are rewritten to include previous messages context
|
||||
- This allows for a more natural conversation flow and retrieval of more relevant chunks
|
||||
|
||||
Example:
|
||||
|
||||
```md
|
||||
### User: What are all AWS regions where SageMaker is available?
|
||||
### AI: SageMaker is available in most AWS regions, except for the following: Asia Pacific (Jakarta), Africa (Cape Town), Middle East (UAE), Asia Pacific (Hyderabad), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Milan), AWS GovCloud (US-East), Europe (Spain), and Europe (Zurich) Region.
|
||||
|
||||
### User: What about the Bedrock service?
|
||||
### Retrieval Query: What is the availability of AWS SageMaker in relation to the Bedrock service?
|
||||
```
|
||||
|
||||
### Reranking
|
||||
|
||||
- Retrieval of a larger number of document chunks is first performed using a vector store
|
||||
- Then, the chunks are reranked using a reranker model
|
||||
- This process more precisely selects the most relevant chunks for the user query
|
||||
|
||||
## Development
|
||||
|
||||
### Setup
|
||||
|
||||
First copy the `.devcontainer/.env.example` file to `.devcontainer/.env` and adjust the settings and models to your needs.
|
||||
|
||||
Then simply open the project devcontainer in a compatible IDE.
|
||||
This will setup all required tools and project dependencies for Python development.
|
||||
It will also run Docker containers for all required services.
|
||||
|
||||
### Configuration
|
||||
|
||||
Create a `llm-qa/.env` file to override selective default environment variables located in `llm-qa/.env.default`.
|
||||
|
||||
### Running
|
||||
|
||||
To run the FastAPI server, run the `llm_qa.web` submodule:
|
||||
|
||||
```bash
|
||||
poetry run python -m llm_qa.web
|
||||
```
|
||||
|
||||
To run the minimal CLI client, run the `llm_qa.client` submodule:
|
||||
|
||||
```bash
|
||||
poetry run python -m llm_qa.client
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Not yet implemented.
|
||||
|
|
|
|||
Loading…
Reference in New Issue