admin管理员组

文章数量:1561805

打定思路后针对原始PDF文件进行了处理,完全转化成word文档格式,其中的格式为图片的表格以及说明等都进行了相应的文字转换,接下来我们进行了代码实现。实现思路主要有三种:

1. Playground里提示代码使用adapter方式允许在completion里指定Cognitive Search为datasource,不过这个思路还需要进一步完善,通过这个方案返回来的信息并没有被组装成具有自然语义的内容,进一步把这个信息和Promp再送给completion进行正常处理时遇到“The extensions chat completions operation must have at least one extension”错误,核心原因在于前面的adapter修改了completion的请求目标URL,后续如何进行进一步调整网上可找得到的代码不多,需要另行研究;

2. 通过Cognitive Search的SearchClient类来进行查询(一般建议Hybrid模式,Vector+关键字),查询回来的结果再和Prompt一起送给completion进行处理;

3. 通过langchain来进行处理,langchain支持的向量查询库很多,不一定要用Cognitive Search,关于这一块可以另行尝试。

本文主要提供第2种思路的代码:

import os  
import json  
import openai  
import streamlit as st
import requests
from dotenv import load_dotenv  
from tenacity import retry, wait_random_exponential, stop_after_attempt  
from azure.core.credentials import AzureKeyCredential  
from azure.search.documents import SearchClient  
from azure.search.documents.indexes import SearchIndexClient  
from azure.search.documents.models import Vector  
from azure.search.documents.indexes.models import (  
    SearchIndex,  
    SearchField,  
    SearchFieldDataType,  
    SimpleField,  
    SearchableField,  
    SearchIndex,  
    SemanticConfiguration,  
    PrioritizedFields,  
    SemanticField,  
    SearchField,  
    SemanticSettings,  
    VectorSearch,  
    HnswVectorSearchAlgorithmConfiguration,  
)  
# References: https://github/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-python-sample.ipynb
# 初始化openai,其中通过Streamlit获取key的方式可以被任何方式替代
openai.api_key = st.secrets["OPENAI_API_KEY"]
openai.api_type = "azure"
openai.api_version = "2023-08-01-preview"
openai.api_base = "https://***.openai.azure/"
deployment_id = "gpt4model"

search_endpoint = "https://***.search.windows"
search_key = st.secrets["SEARCH_KEY"]
search_index_name = "***index01"

credential = AzureKeyCredential(search_key)
search_client = SearchClient(endpoint=search_endpoint, index_name=search_index_name, credential=credential)

def generate_embeddings(text):
    response = openai.Embedding.create(
        input=text, engine="embeddingmodel")
    embeddings = response['data'][0]['embedding']
    return embeddings

# Step 1: Query from Azure Cognitive Search
# Create query vector

prompt = "静脉留置针有什么特点?"
vector = Vector(value=generate_embeddings(prompt), k=3, fields="contentVector")

results = search_client.search(  
    search_text=prompt,
    top=3,
    vectors= [vector],
    select=["title", "content"],
)  
rawdata = ''

for result in results:  
    rawdata += f"Title: {result['title']}\n"
    rawdata += f"Score: {result['@search.score']}\n"
    rawdata += f"Content: {result['content']}\n"  

# Step 2: Query from OpenAI    
prompt += '###\n' + rawdata + '\n###\n'
completion = openai.ChatCompletion.create(
    engine="gpt4model",
    messages=[{"role": "user", "content": prompt}],
)
rawdata = json.dumps(completion, ensure_ascii=False)
print(rawdata)

这个思路往下如何进一步优化?下一篇文章继续。

本文标签: 数据openAIAzureCognitivegpt