Gemini Pro Vision

Updated Dec 1365,536 context

$0.25 / 1M input tokens$0.50 / 1M output tokens$2.50 / 1K input images

Google’s flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.

See the benchmarks and prompting guidelines from Deepmind.

Note: Preview models are offered for testing purposes and should not be used in production apps.

#multimodal

Recent Posts