Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tokenization can represent uncommon words with multiple tokens. Inputting your example on https://platform.openai.com/tokenizer (GPT-4o) gives me (tokens separated by "|"):

    lower|case|un|se|parated|name




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: