Why Use Gemini

https://news.ycombinator.com/item?id=42952605

Important Know Hows

Best Practices

For best results:

  • Rotate pages to the correct orientation before uploading.
  • Avoid blurry pages.
  • If using a single page, place the text prompt after the page.
  • PDF Payloads < 20MB

    • Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20MB.
  • base64 encoded documents or directly uploading locally stored docs

  • Gemini supports a maximum of 1,000 document pages. Each document page is equivalent to 258 tokens.

  • larger pages are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller pages are scaled up to 768x768 pixels. There is no cost reduction for pages at lower sizes, other than bandwidth, or performance improvement for pages at higher resolution.

  • However, document vision only meaningfully understands PDFs. Other types will be extracted as pure text, and the model won’t be able to interpret what we see in the rendering of those files. Any file-type specifics like charts, diagrams, HTML tags, Markdown formatting, etc., will be lost.

Packages/APIs/References used

Code Snippets

fetch a PDF from a URL and convert it to bytes for processing

from google import genai
from google.genai import types
import httpx
 
client = genai.Client()
 
doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"
 
# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content
 
prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-2.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Large PDFs from URLs

Use the File API to simplify uploading and processing large PDF files from URLs:

from google import genai
from google.genai import types
import io
import httpx
 
client = genai.Client()
 
long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"
 
# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)
 
sample_doc = client.files.upload(
  # You can pass a path or a file-like object here
  file=doc_io,
  config=dict(
    mime_type='application/pdf')
)
 
prompt = "Summarize this document"
 
response = client.models.generate_content(
  model="gemini-2.5-flash",
  contents=[sample_doc, prompt])
print(response.text)

Passing multiple files

from google import genai
import io
import httpx
 
client = genai.Client()
 
doc_url_1 = "https://arxiv.org/pdf/2312.11805"
doc_url_2 = "https://arxiv.org/pdf/2403.05530"
 
# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)
 
sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)
 
prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."
 
response = client.models.generate_content(
  model="gemini-2.5-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)