• setVeryLoud(true);@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 days ago

    Additionally, when you feed an image into a prompt UI, it simply generates a text description of the image using image recognition and feeds it into the LLM.

    All the LLM receives is “Picture containing a slice of pizza”, it has no control over the granularity of the image recognition software, nor is that software designed to provide anything more than OCR and a rough description of the image by way of pattern matching.