手把手教你用FastAPI给DeepSeek-OCR模型做个Web界面,还能兼容OpenAI的API格式

张开发
2026/4/17 13:34:14 15 分钟阅读

分享文章

手把手教你用FastAPI给DeepSeek-OCR模型做个Web界面,还能兼容OpenAI的API格式
从零构建兼容OpenAI的DeepSeek-OCR Web服务实战指南当我们需要将本地AI模型快速转化为可交互的Web服务时FastAPI无疑是最佳选择之一。本文将带你完整实现一个支持图片上传、文本识别的OCR服务并使其API格式与OpenAI完全兼容让现有OpenAI生态工具能够无缝接入。1. 环境准备与项目初始化在开始编码前我们需要配置好开发环境。推荐使用Python 3.12版本通过conda或venv创建隔离环境conda create -n deepseekocr python3.12.9 conda activate deepseekocr pip install torch2.6.0 transformers4.46.3 fastapi uvicorn[standard] python-multipart Pillow项目目录结构建议如下project/ ├─ app.py # 后端主服务 ├─ static/ │ └─ ui.html # 单页前端界面 └─ README.md # 项目说明2. 核心功能设计与实现2.1 FastAPI后端架构我们的后端需要实现几个关键端点/v1/chat/completions兼容OpenAI的API格式/parserToText直接图片转文本的简化接口/ui前端页面快捷入口首先创建FastAPI应用实例并配置CORSfrom fastapi import FastAPI from fastapi.middleware.cors import CORSMiddleware app FastAPI(titleDeepSeek-OCR服务) app.add_middleware( CORSMiddleware, allow_origins[*], allow_methods[*], allow_headers[*], )2.2 模型加载与推理DeepSeek-OCR模型的加载需要特别注意硬件适配from transformers import AutoModel, AutoTokenizer MODEL_NAME deepseek-ai/DeepSeek-OCR tokenizer AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_codeTrue) model AutoModel.from_pretrained(MODEL_NAME, trust_remote_codeTrue) # 自动适配硬件 if torch.cuda.is_available(): device torch.device(cuda:0) model model.eval().to(device) try: model model.to(torch.bfloat16) except: model model.to(torch.float16) else: device torch.device(cpu) model model.eval().to(device)2.3 图片处理模块支持三种图片输入方式Base64编码的data URI本地文件路径远程HTTP(S) URLdef handle_image_input(image_url: str) - str: if image_url.startswith(data:): # 处理Base64图片 header, b64 image_url.split(,, 1) raw base64.b64decode(b64) return save_to_temp(raw) elif image_url.startswith((http://, https://)): # 下载远程图片 resp requests.get(image_url, timeout30) return save_to_temp(resp.content) else: # 本地文件处理 with open(image_url, rb) as f: return save_to_temp(f.read())3. OpenAI兼容API实现3.1 /v1/chat/completions接口这是最核心的接口需要完全匹配OpenAI的请求响应格式app.post(/v1/chat/completions) async def chat_completions(request: Request): payload await request.json() messages payload.get(messages) # 解析消息内容 prompt_text, image_path parse_messages(messages) # 执行OCR推理 ocr_result run_ocr(prompt_text, image_path) return { id: chatcmpl-123, object: chat.completion, created: int(time.time()), model: deepseek-ocr, choices: [{ index: 0, message: {role: assistant, content: ocr_result}, finish_reason: stop }], usage: { prompt_tokens: len(prompt_text), completion_tokens: len(ocr_result), total_tokens: len(prompt_text) len(ocr_result) } }3.2 消息解析逻辑OpenAI格式的messages数组可能包含混合的文本和图片内容def parse_messages(messages: List[dict]) - Tuple[str, Optional[str]]: texts [] image_url None for msg in messages: content msg.get(content) if isinstance(content, str): texts.append(content) elif isinstance(content, list): for item in content: if item.get(type) text: texts.append(item.get(text, )) elif item.get(type) image_url and not image_url: image_url item.get(image_url, {}).get(url) return \n.join(texts), image_url4. 前端交互实现4.1 单页Web UI设计我们使用纯HTMLJS实现一个简洁的前端主要功能包括图片上传与预览预设指令选择自定义提示输入结果展示原始文本和Markdown渲染!doctype html html head titleDeepSeek-OCR Web UI/title style /* 简约的暗色主题样式 */ body { background: #0f172a; color: #e2e8f0; } .card { background: #1e293b; border-radius: 0.5rem; } button { background: #3b82f6; color: white; } /style /head body div classcontainer h1DeepSeek-OCR/h1 div classcard input typefile idimageUpload acceptimage/* img idimagePreview stylemax-width: 300px; /div div classcard textarea idprompt placeholder输入指令.../textarea button idsubmit识别/button /div div classcard div idresult/div /div /div script // 前端交互逻辑 document.getElementById(submit).addEventListener(click, async () { const file document.getElementById(imageUpload).files[0]; const prompt document.getElementById(prompt).value; // 将图片转为Base64 const reader new FileReader(); reader.onload async () { const response await fetch(/v1/chat/completions, { method: POST, headers: { Content-Type: application/json }, body: JSON.stringify({ model: deepseek-ocr, messages: [{ role: user, content: [ { type: text, text: prompt }, { type: image_url, image_url: { url: reader.result } } ] }] }) }); const result await response.json(); document.getElementById(result).innerText result.choices[0].message.content; }; reader.readAsDataURL(file); }); /script /body /html4.2 图片处理与API调用前端关键是将用户上传的图片转换为Base64格式并通过API发送async function processImage(file) { return new Promise((resolve) { const reader new FileReader(); reader.onload () resolve(reader.result); reader.readAsDataURL(file); }); } async function callOCRAPI(imageData, prompt) { const response await fetch(/v1/chat/completions, { method: POST, headers: { Content-Type: application/json }, body: JSON.stringify({ model: deepseek-ocr, messages: [{ role: user, content: [ { type: text, text: prompt }, { type: image_url, image_url: { url: imageData } } ] }] }) }); return await response.json(); }5. 部署与优化建议5.1 服务启动与配置使用uvicorn运行服务uvicorn app:app --host 0.0.0.0 --port 8000对于生产环境建议添加Gunicorn作为WSGI服务器Nginx反向代理进程管理工具如Supervisor5.2 性能优化技巧模型量化将模型转换为FP16或INT8减少内存占用model model.half() # 转换为FP16批处理支持修改API支持同时处理多张图片缓存机制对相同图片的重复请求返回缓存结果异步处理长时间任务使用Celery等队列系统app.post(/async-ocr) async def async_ocr_task(image: UploadFile File(...)): task_id str(uuid.uuid4()) # 将任务放入队列 celery.send_task(process_ocr, args[await image.read()], task_idtask_id) return {task_id: task_id}5.3 安全增强措施添加API密钥认证API_KEYS {your-secret-key: True} app.middleware(http) async def auth_middleware(request: Request, call_next): if request.url.path.startswith(/v1/): if request.headers.get(Authorization) not in API_KEYS: return JSONResponse({error: Unauthorized}, status_code401) return await call_next(request)限制文件上传大小app FastAPI( max_upload_size10 * 1024 * 1024 # 10MB )添加速率限制from fastapi import FastAPI, Request from fastapi.middleware import Middleware from slowapi import Limiter from slowapi.util import get_remote_address limiter Limiter(key_funcget_remote_address) app FastAPI(middleware[Middleware(limiter)])6. 客户端调用示例6.1 使用OpenAI官方SDK由于我们兼容OpenAI API格式可以直接使用openai包from openai import OpenAI client OpenAI(base_urlhttp://localhost:8000/v1, api_keysk-xxx) response client.chat.completions.create( modeldeepseek-ocr, messages[{ role: user, content: [ {type: text, text: 提取图片中的文字}, {type: image_url, image_url: {url: path/to/image.png}} ] }] ) print(response.choices[0].message.content)6.2 直接HTTP请求示例import requests import base64 with open(receipt.jpg, rb) as image_file: base64_image base64.b64encode(image_file.read()).decode(utf-8) response requests.post( http://localhost:8000/v1/chat/completions, headers{Content-Type: application/json}, json{ model: deepseek-ocr, messages: [{ role: user, content: [ {type: text, text: 提取发票信息}, {type: image_url, image_url: {url: fdata:image/jpeg;base64,{base64_image}}} ] }] } ) print(response.json()[choices][0][message][content])这套解决方案不仅实现了OCR核心功能还通过OpenAI兼容接口大大扩展了应用场景。开发者可以将其集成到现有支持OpenAI的系统中或者基于Web界面快速构建业务应用。

更多文章