ai-multimodal

mrgoonie · AI & ML

通过Google Gemini API统一处理音频、图像、视频和文档,支持图像生成。可处理长达9.5小时的音频、6小时的视频、1000页的PDF。包含详细的模型选择指南、代码示例和常见API问题排查。

Provides unified access to Google Gemini's multimodal capabilities for processing audio, images, videos, and documents, plus image generation. Handles files up to 9.5 hours audio, 6 hours video, 1000-page PDFs. Includes detailed model selection guides, code examples, and troubleshooting for common API issues.

npx skills add https://github.com/mrgoonie/claudekit-skills --skill ai-multimodal

星标 2129 · 安装量 2

GitHub · SkillBox 全部技能