• dual_sport_dork 🐧🗡️@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    ·
    edit-2
    17 hours ago

    Web browsers collapse whitespace by default which means that sans any trickery or   deliberately   using    nonbreaking    spaces,   any amount of spaces between words to be reduced into one. Since apparently every single thing in the modern world is displayed via some kind of encapsulated little browser engine nowadays, the majority of double spaces left in the universe that are not already firmly nailed down into print now appear as singles. And thus the convention is almost totally lost.

    • redjard@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      12 hours ago

      This seems to match up with some quick tests I did just now, on the pseudonyminized chatbot interface of duckduckgo.
      chatgpt, llama, and claude all managed to use double spaces themselves, and all but llama managed to tell I was using them too.
      It might well depend on the platform, with the “native” applications for them stripping them on both ends.

      tests

      Mistral seems a bit confused and uses tripple-spaces.

      • SGforce@lemmy.ca
        link
        fedilink
        arrow-up
        1
        ·
        13 minutes ago

        Tokenization can make it difficult for them.

        The word chunks often contain a space because it’s efficient. I would think an extra space would stand out. Writing it back should be easier, assuming there is a dedicated “space” token like other punctuation tokens, there must be.

        Hard mode would be asking it how many spaces there are in your sentence. I don’t think they’d figure it out unless their own list of tokens and a description is trained into them specifically.