ai-multimodal
mrgoonie · AI & ML
通过Google Gemini API统一处理音频、图像、视频和文档,支持图像生成。可处理长达9.5小时的音频、6小时的视频、1000页的PDF。包含详细的模型选择指南、代码示例和常见API问题排查。
Provides unified access to Google Gemini's multimodal capabilities for processing audio, images, videos, and documents, plus image generation. Handles files up to 9.5 hours audio, 6 hours video, 1000-page PDFs. Includes detailed model selection guides, code examples, and troubleshooting for common API issues.
npx skills add https://github.com/mrgoonie/claudekit-skills --skill ai-multimodal
星标 2129 · 安装量 2