• jj4211@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 day ago

    Also, generally the best interfaces for LLM will combine non-LLM facilities transparently. The LLM might be able to translate the prose to the format the math engine desires and then an intermediate layer recognizes a tag to submit an excerpt to a math engine and substitute the chunk with output from the math engine.

    Even for servicing a request to generate an image, the text generation model runs independent of the image generation, and the intermediate layer combines them. Which can cause fun disconnects like the guy asking for a full glass of wine. The text generation half is completely oblivious to the image generation half. So it responds playing the role of a graphic artist dutifully doing the work without ever ‘seeing’ the image, but it assumes the image is good because that’s consistent with training output, but then the user corrects it and it goes about admitting that the picture (that it never ‘looked’ at) was wrong and retrying the image generator with the additional context, to produce a similarly botched picture.