Disappointed: Vision feature is unusable (Outputs infinite <pad> tokens)

#11

by ratalai - opened 2 days ago

•

Before detailing the main issue, I want to note that the standard text/conversation mode works fine. In fact, it can even be jailbroken quite easily (for example, getting it to generate product serial numbers, regardless of whether they actually work).

However, I was really excited to test the vision capabilities of this model, and I'm quite disappointed to find that it doesn't seem to work at all for image recognition.

Whenever I attempt to process an image, the model gets stuck in an endless generation loop. I have tried adjusting the generation parameters to force a more deterministic output—specifically by setting temperature=0 and completely disabling the thinking feature—but the result remains exactly the same.
Output:
＜pad＞＜pad＞＜pad＞＜pad＞＜pad＞＜pad＞＜pad＞ ... [repeats infinitely]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment