• turmoil@feddit.org
    link
    fedilink
    arrow-up
    2
    ·
    21 hours ago

    Only explanation I would have is that a higher number of users would cause load balancing to serve lower parameter models. That way you’d have less reliable answers at peak usage times, but could enable more simultaneous users.