> information (like titles) can be meaningfully extracted From a technical persp...

abecedarius · on Sept 25, 2021

Sometimes you have to paraphrase the title due to the length limit.

nl · on Sept 24, 2021

The issue is likely mostly paywalled sites and SPAs where this isn't as simple.

SquishyPanda23 · on Sept 25, 2021

Sure, and I agree that this problem isn't worth spending a lot of resources on if the level of toil is acceptable.

But most paywall sites do display the title so that you know what great journalism you're being asked to pay for. I suspect that the typical SPA shows titles as well. So greping should work in those cases.

But my point was while title extraction is a hard problem requiring you to solve lots of corner cases, title greping is simple and handles the vast majority of cases. The corner cases are then handled by humans (as, IIUC, all cases are currently).

Accepting a user-generated title and comparing it to the text gives you a boolean. If they don't match you can just ask the user to affirm that what they submitted is really the title. Then, if you like, you can have a "this title may be dodgy" icon on posts that don't match.

tgsovlerkhgsel · on Sept 25, 2021

Yet another reason to ban paywalled sites.

I really don't understand why a site that wants people to actually read the article and discuss the contents promotes articles that at least 90% of readers won't be able to (easily) access.