Discuss Scratch

celebimaster
Scratcher
68 posts

Spam characters

Lately there have been a lot of spam ASCII characters ( Like this: ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎) going around and covering up projects, profiles and comments.


Any ways to fix it?
TheHockeyist
Scratcher
1000+ posts

Spam characters

No. The best way is to just report it.
Scratchifier
Scratcher
1000+ posts

Spam characters

Perhaps Scratch should blacklist things like those… I would support that.
TheHockeyist
Scratcher
1000+ posts

Spam characters

But the question is… how would you keep up with the latest “fashion trends” in spam?
celebimaster
Scratcher
68 posts

Spam characters

TheHockeyist wrote:

No. The best way is to just report it.
I already did, didn't work.
TheHockeyist
Scratcher
1000+ posts

Spam characters

celebimaster wrote:

TheHockeyist wrote:

No. The best way is to just report it.
I already did, didn't work.
The ST will get to it eventually - they read every report.
Iditaroid
Scratcher
500+ posts

Spam characters

Scratchifier wrote:

Perhaps Scratch should blacklist things like those… I would support that.
What if that leads to issues with Scratchers typing in other languages? These characters are still things that people actually use, you know!
TheHockeyist
Scratcher
1000+ posts

Spam characters

Iditaroid wrote:

Scratchifier wrote:

Perhaps Scratch should blacklist things like those… I would support that.
What if that leads to issues with Scratchers typing in other languages? These characters are still things that people actually use, you know!
Yep. If people were spamming qqqqqqqqqqqqqqqq… and you didn't speak a language that had q in its alphabet, it's like asking for "q' to be blacklisted. And then English speakers are upset when their queens cannot quarrel in tranquility and Spanish speakers cannot ask what their cheese is (¿Qué es queso?), etc.

Last edited by TheHockeyist (Jan. 29, 2015 23:18:34)

stickfire-test
Scratcher
100+ posts

Spam characters

There is a difference though:










ฎ๎ is a legitimate letter. ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ isn't.

The solution to this would be to make any sequence of more than, say, 3 combining diacritics (the things that stack up on top of letters) in a row (more if any languages actually use more than 3, but I don't think any do) get cut down to just the first three, so the spam character would just get reduced to ฎ๎๎๎ and it wouldn't be able to cover other people's posts.
Paddle2See
Scratch Team
1000+ posts

Spam characters

stickfire-test wrote:

There is a difference though:










ฎ๎ is a legitimate letter. ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ isn't.

The solution to this would be to make any sequence of more than, say, 3 combining diacritics (the things that stack up on top of letters) in a row (more if any languages actually use more than 3, but I don't think any do) get cut down to just the first three, so the spam character would just get reduced to ฎ๎๎๎ and it wouldn't be able to cover other people's posts.
Sounds like a good approach! Maybe we can generalize it to no more than 3 or ANY character. Can anybody think of a time when we would need more than 3 of the same character in a row on a comment (not counting digits)?

I would think that a dedicated spammer could find a way around this, though, by combining a couple of characters.
Superdoggy
Scratcher
1000+ posts

Spam characters

Paddle2See wrote:

stickfire-test wrote:

There is a difference though:










ฎ๎ is a legitimate letter. ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ isn't.

The solution to this would be to make any sequence of more than, say, 3 combining diacritics (the things that stack up on top of letters) in a row (more if any languages actually use more than 3, but I don't think any do) get cut down to just the first three, so the spam character would just get reduced to ฎ๎๎๎ and it wouldn't be able to cover other people's posts.
Sounds like a good approach! Maybe we can generalize it to no more than 3 or ANY character. Can anybody think of a time when we would need more than 3 of the same character in a row on a comment (not counting digits)?

I would think that a dedicated spammer could find a way around this, though, by combining a couple of characters.

1. I'm sleepy, Zzzzzzzz
2. Streeeeetched Ouuuuuuut Woooooooords
3. Silly emoticons, xDDDDDD
4. Block instructions, ((j) + (8 * (4) + ((3 * (z))))
5. It's ovar nine thousaaaaaaaaaaaaaaand!

I think that restricting the “no more than 3 in a row” rule to only special characters would be a good idea, as not to impede normal Scratcher commenting. ;D
Zro716
Scratcher
1000+ posts

Spam characters

Paddle2See wrote:

Maybe we can generalize it to no more than 3 or ANY character. Can anybody think of a time when we would need more than 3 of the same character in a row on a comment (not counting digits)?

I would think that a dedicated spammer could find a way around this, though, by combining a couple of characters.
kashyyyk, home world of the wookies
mmmmm I love chocolate
nooooooooooooo!

how about instead of blacklisting triple letters, let us find things to consider spam and consolidate them in a list, then let the mods know about them
Scratchifier
Scratcher
1000+ posts

Spam characters

Superdoggy wrote:

Paddle2See wrote:

stickfire-test wrote:

There is a difference though:










ฎ๎ is a legitimate letter. ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ isn't.

The solution to this would be to make any sequence of more than, say, 3 combining diacritics (the things that stack up on top of letters) in a row (more if any languages actually use more than 3, but I don't think any do) get cut down to just the first three, so the spam character would just get reduced to ฎ๎๎๎ and it wouldn't be able to cover other people's posts.
Sounds like a good approach! Maybe we can generalize it to no more than 3 or ANY character. Can anybody think of a time when we would need more than 3 of the same character in a row on a comment (not counting digits)?

I would think that a dedicated spammer could find a way around this, though, by combining a couple of characters.

1. I'm sleepy, Zzzzzzzz
2. Streeeeetched Ouuuuuuut Woooooooords
3. Silly emoticons, xDDDDDD
4. Block instructions, ((j) + (8 * (4) + ((3 * (z))))
5. It's ovar nine thousaaaaaaaaaaaaaaand!

I think that restricting the “no more than 3 in a row” rule to only special characters would be a good idea, as not to impede normal Scratcher commenting. ;D

And more:

Well………..
————————————————————————————————— (dividers!)
                          (spacers)
?????? or !!!!!!!!!
Referring to random usernames: E.G. http://scratch.mit.edu.ezproxyberklee.flo.org/users/meeeeegan/ (not just searched up to prove a point, but I know that they're somehow related to skyset )
TheHockeyist
Scratcher
1000+ posts

Spam characters

Not many words with three identical letters in a row, and most of them are obscure or onomatopoeia.

Also you might need to mention usernames (e.g. if there was a user called whatever55555 or freeeelsforsale, the filter will think you are spamming). But for me, six identicals in a row starts to get spammy for me.

Last edited by TheHockeyist (Jan. 30, 2015 02:20:09)

stickfiregames
Scratcher
1000+ posts

Spam characters

Paddle2See wrote:

stickfire-test wrote:

There is a difference though:










ฎ๎ is a legitimate letter. ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ isn't.

The solution to this would be to make any sequence of more than, say, 3 combining diacritics (the things that stack up on top of letters) in a row (more if any languages actually use more than 3, but I don't think any do) get cut down to just the first three, so the spam character would just get reduced to ฎ๎๎๎ and it wouldn't be able to cover other people's posts.
Sounds like a good approach! Maybe we can generalize it to no more than 3 or ANY character. Can anybody think of a time when we would need more than 3 of the same character in a row on a comment (not counting digits)?

I would think that a dedicated spammer could find a way around this, though, by combining a couple of characters.
They don't have to be the same character, just any sequence of combining diacritics would be cut down. So ฬืํี๋้๎็ูัฺ้ึิํ๋ัุํํํ้ึิํ๋ัุํํํ้ึิํ๋ัุํํํํ๊ำิ๋็ูํ้ึิํ๋ัุํํํ๊ำิ๋็ู would be cut down to ฬืํี. It wouldn't prevent just spamming sequences of letters, but it would stop spam from covering up other comments.
TheHockeyist
Scratcher
1000+ posts

Spam characters

I had this conversation with Paddle2See on his profile:

TheHockeyist wrote:

By the way, I don't know of any instance where you would need more than four in a row - five could be my lower limit - I don't know of any language that could have “eeeee” or “sssss” in the middle of a word.

TheHockeyist wrote:

Four identical letters do occur in German, e.g. Nausikaaaal or Ballyhoooogenese (http://www.cs.utsa.edu/~wagner/spell.html)

Paddle2See wrote:

Good to know!

TheHockeyist wrote:

No prob. I haven't found any 5-letter examples if they do occur. The closest I have found was a Japanese example, but it looked archaic and suspiciously artificial.

TheHockeyist wrote:

Here it was: 東欧を覆う (To cover Eastern Europe) (One method of romanization would put this as “toooo o oou”)

TheHockeyist wrote:

You could argue that would be seven identical letters together in romanization if you left out the spaces like the real Japanese do.)

Paddle2See wrote:

Okay, good research. But why not put all this on the forum topic so others can see it?

TheHockeyist wrote:

Oh… I apologize for any inconvenience.
And thus the conversation is here for all to see and comment and criticize.
ealgase
Scratcher
100+ posts

Spam characters

Support!
OmnipotentPotato
Scratcher
1000+ posts

Spam characters

Had a topic exactly like this a while ago. Support though for what Hockeyist is saying.
NolanAwesome
Scratcher
500+ posts

Spam characters

Support!
TheGamingStar
Scratcher
500+ posts

Spam characters

Support!

Powered by DjangoBB