Discuss Scratch
- Discussion Forums
- » Suggestions
- » After 999,999,999... (Moved from Questions About Scratch)
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
This is only a suggestion for the Scratch Team to do.
Now that we have surpassed 360 million projects created, we have only 640 million projects to go. Based on the rate of growth on Scratch, it seems like the remainder to 1 billion can be filled up in 12-15 years.
It's never too early to come up with ideas for after the 1 billionth project.
Here's mine:
Switch to base 52. The project IDs will still have 10 characters, as 1 billion would have if we kept going with base 10. What's base 52? ABCDEFGHIJK…TUVWXYZabc…xyz. (A is zero) With base 52, we now have opened up 144 quadrillion possbilities.
Now, you might be saying, aren't some of those project ID's gonna spell out bad words? Yes, they will. And because there are SO many possibilities, what the ST should do is take the time to go through EVERY SINGLE WORD in English, and erradicate any URLs that spell out ANY of these words.
But isn't that gonna severely limit the number of possibilities on Scratch?
Short answer: absolutely not.
Long answer: Let's say we count the number of words in English like this:
2-letter words: 1 million
3-letters: 2 million
4: 3 million
5: 4 million
6: 5 million
7: 5 million
8: 4 million
9: 3 million
10: 2 million
which is WAY less than we actually have. We do not need to worry about 11-letter words and beyond, as the IDs will only have 10 characters.
Now, let's count every possibility of that word being in a URL:
aabbbbbbbb
baabbbbbbb
bbaabbbbbb
bbbaabbbbb
bbbbaabbbb
bbbbbaabbb
bbbbbbaabb
bbbbbbbaab
bbbbbbbbaa
Apart from it looking very cool, we can see that there are 9 possible combinations for one word to appear in a URL. If we multiply the number of 2-letter words by that amount, we get 9 million. Nowhere near that 144 quadrillion number.
Even better, the number of possible combinations for a word to appear in a URL will decrease the longer the word gets.
For 3-letter words, there are 8.
For 4-letter words, there are 7.
You get the point.
Let's multiply.
2m * 8 = 16 million
3m * 7 = 21 million
4m * 6 = 24 million
5m * 5 = 25 million
5m * 4 = 20 million
4m * 3 = 12 million
3m * 2 = 6 million
2m * 1 = 2 million
Now let's hugely (I really mean it) exaggerate them (and therefore cover all possibilities of capitalization by A LOT) and say that there are, in total, 1 trillion URLs with a word in them.
That comes nowhere near 144 quadrillion.
So the long answer is no, there is no possible way, even with such marvelous exaggeration, and even when covering every case of capitalization, that removing all possible combinations of words being in the URL would make the slightest difference in the number of possible URLs with base 52.
Now, even if there actually were 1 trillion URLs to take care of, how would the Scratch Team know which ones to skip???
Simple:
Let's say we have a URL: AAAAAABwTc
This would be disected into:
AA, AA, AA, AA, AA, AB, Bw, wT, Tc, AAA, AAA, AAA, AAA, AAB, ABw, BwT, wTc, AAAA, AAAA, AAAA, AAAB, AABw, ABwT, BwTc, AAAAA, AAAAA, AAAAA, AAAAB, AAABw, AABwT, ABwTc, AAAAAA, AAAAAB, AAAABw, AAABwT, AABwTc, AAAAAAB, AAAAABw, AAAABwT, AAABwTc, AAAAAABw, AAAAABwT, AAAABwTc, AAAAAABwT, AAAAABwTc, and AAAAAABwTc. 46 segments.
Now the Scratch Team would check each and every one of these words in some giant dictionary. If even one of these segments spells out a word (and it doesn't matter if it is capitalized incorrectly), then the whole URL would be thrown away.
This sounds like a TON of work for them, and indeed, it is, but there is an easy way to reduce the burden dramatically upon the Scratch Team. Just keep creating projects, but this time, think 2704 (52 squared) URLs ahead. If, for example, AAAAAAOoFb spelled out a word, they might go to AAAAAAOoFc instead. If that did too, just keep going. This would all happen even before the project loading screen appears. If all of the 2704 pre-prepared URLs spelled out a word, scramble up the letters. For example, a URL called UwiEodMnWI might be created instead. Then, the Scratch Team would check that. If the new URL didn't spell out a word, great. Load the blank canvas for them. If not, keep scrambling until a URL with no words spelled out is found, then load. Also, make sure that no URL is repeated (which should be no problem considering there are 144 quadrillion possibilities).
If that still sounds like a problem for the Scratch Team, they might need some bigger servers (no offense) that can handle it.
(Edit: Skip XXXXXXXXXX, if we ever get there. I don't think we will.)
INSPIRED FROM BASE-64: Instead of base-52, what about base-54? The additional characters would be - and _, which gives us 66.2 quadrillion more possibilities, or about 45.8%! And even better, because they are not letters, the Scratch Team doesn't have to go through 45.8% more work than the tremendous burden that has already been placed upon them by my base-52 suggestion, just because there are 45.8% more possibilities. Yes, they still have to go through some of them, but not all.
The more -'s and _'s in a URL, the easier it is to find profanity because now, if there is a URL ——-rLp, the Scratch Team only has to check the parts in the rLp part of the URL. If it was –_Ep_-M-n, they'd need to check just one segment, Ep (it doesn't make sense to check individual letters, as even though it'd be much easier than to eliminate, for example, “pig”, that's a lot of possibilities removed!).
Other suggestions:
CONSONANT-ONLY URLs: I believe WindOctahedron made this idea, and I found it quite interesting. If you look at EVERY single word in this article, they have at least one vowel in them. What if we just removed vowels? That means no words… in English. I haven't done any research yet, but I hypothesize that out of so many languages in the world, there's gonna be one that has at least one word that is made entirely of consonants (English consonants… every letter except a, e, i, o, and u (maybe y)). As Sheep_maker pointed out, there is slang in other languages too. Who knows if some slang is spelled entirely of consonants? If that was the case, it would need to be added into the dictionary in the same way as any other English word. Still, this is quite appealing, as there are an estimated 140,000 words in English, and despite the relatively low number, it would be an extremely tedious effort by the Scratch Team to get rid of all of them.
Now that we have surpassed 360 million projects created, we have only 640 million projects to go. Based on the rate of growth on Scratch, it seems like the remainder to 1 billion can be filled up in 12-15 years.
It's never too early to come up with ideas for after the 1 billionth project.
Here's mine:
Switch to base 52. The project IDs will still have 10 characters, as 1 billion would have if we kept going with base 10. What's base 52? ABCDEFGHIJK…TUVWXYZabc…xyz. (A is zero) With base 52, we now have opened up 144 quadrillion possbilities.
Now, you might be saying, aren't some of those project ID's gonna spell out bad words? Yes, they will. And because there are SO many possibilities, what the ST should do is take the time to go through EVERY SINGLE WORD in English, and erradicate any URLs that spell out ANY of these words.
But isn't that gonna severely limit the number of possibilities on Scratch?
Short answer: absolutely not.
Long answer: Let's say we count the number of words in English like this:
2-letter words: 1 million
3-letters: 2 million
4: 3 million
5: 4 million
6: 5 million
7: 5 million
8: 4 million
9: 3 million
10: 2 million
which is WAY less than we actually have. We do not need to worry about 11-letter words and beyond, as the IDs will only have 10 characters.
Now, let's count every possibility of that word being in a URL:
aabbbbbbbb
baabbbbbbb
bbaabbbbbb
bbbaabbbbb
bbbbaabbbb
bbbbbaabbb
bbbbbbaabb
bbbbbbbaab
bbbbbbbbaa
Apart from it looking very cool, we can see that there are 9 possible combinations for one word to appear in a URL. If we multiply the number of 2-letter words by that amount, we get 9 million. Nowhere near that 144 quadrillion number.
Even better, the number of possible combinations for a word to appear in a URL will decrease the longer the word gets.
For 3-letter words, there are 8.
For 4-letter words, there are 7.
You get the point.
Let's multiply.
2m * 8 = 16 million
3m * 7 = 21 million
4m * 6 = 24 million
5m * 5 = 25 million
5m * 4 = 20 million
4m * 3 = 12 million
3m * 2 = 6 million
2m * 1 = 2 million
Now let's hugely (I really mean it) exaggerate them (and therefore cover all possibilities of capitalization by A LOT) and say that there are, in total, 1 trillion URLs with a word in them.
That comes nowhere near 144 quadrillion.
So the long answer is no, there is no possible way, even with such marvelous exaggeration, and even when covering every case of capitalization, that removing all possible combinations of words being in the URL would make the slightest difference in the number of possible URLs with base 52.
Now, even if there actually were 1 trillion URLs to take care of, how would the Scratch Team know which ones to skip???
Simple:
Let's say we have a URL: AAAAAABwTc
This would be disected into:
AA, AA, AA, AA, AA, AB, Bw, wT, Tc, AAA, AAA, AAA, AAA, AAB, ABw, BwT, wTc, AAAA, AAAA, AAAA, AAAB, AABw, ABwT, BwTc, AAAAA, AAAAA, AAAAA, AAAAB, AAABw, AABwT, ABwTc, AAAAAA, AAAAAB, AAAABw, AAABwT, AABwTc, AAAAAAB, AAAAABw, AAAABwT, AAABwTc, AAAAAABw, AAAAABwT, AAAABwTc, AAAAAABwT, AAAAABwTc, and AAAAAABwTc. 46 segments.
Now the Scratch Team would check each and every one of these words in some giant dictionary. If even one of these segments spells out a word (and it doesn't matter if it is capitalized incorrectly), then the whole URL would be thrown away.
This sounds like a TON of work for them, and indeed, it is, but there is an easy way to reduce the burden dramatically upon the Scratch Team. Just keep creating projects, but this time, think 2704 (52 squared) URLs ahead. If, for example, AAAAAAOoFb spelled out a word, they might go to AAAAAAOoFc instead. If that did too, just keep going. This would all happen even before the project loading screen appears. If all of the 2704 pre-prepared URLs spelled out a word, scramble up the letters. For example, a URL called UwiEodMnWI might be created instead. Then, the Scratch Team would check that. If the new URL didn't spell out a word, great. Load the blank canvas for them. If not, keep scrambling until a URL with no words spelled out is found, then load. Also, make sure that no URL is repeated (which should be no problem considering there are 144 quadrillion possibilities).
If that still sounds like a problem for the Scratch Team, they might need some bigger servers (no offense) that can handle it.
(Edit: Skip XXXXXXXXXX, if we ever get there. I don't think we will.)
INSPIRED FROM BASE-64: Instead of base-52, what about base-54? The additional characters would be - and _, which gives us 66.2 quadrillion more possibilities, or about 45.8%! And even better, because they are not letters, the Scratch Team doesn't have to go through 45.8% more work than the tremendous burden that has already been placed upon them by my base-52 suggestion, just because there are 45.8% more possibilities. Yes, they still have to go through some of them, but not all.
The more -'s and _'s in a URL, the easier it is to find profanity because now, if there is a URL ——-rLp, the Scratch Team only has to check the parts in the rLp part of the URL. If it was –_Ep_-M-n, they'd need to check just one segment, Ep (it doesn't make sense to check individual letters, as even though it'd be much easier than to eliminate, for example, “pig”, that's a lot of possibilities removed!).
Other suggestions:
CONSONANT-ONLY URLs: I believe WindOctahedron made this idea, and I found it quite interesting. If you look at EVERY single word in this article, they have at least one vowel in them. What if we just removed vowels? That means no words… in English. I haven't done any research yet, but I hypothesize that out of so many languages in the world, there's gonna be one that has at least one word that is made entirely of consonants (English consonants… every letter except a, e, i, o, and u (maybe y)). As Sheep_maker pointed out, there is slang in other languages too. Who knows if some slang is spelled entirely of consonants? If that was the case, it would need to be added into the dictionary in the same way as any other English word. Still, this is quite appealing, as there are an estimated 140,000 words in English, and despite the relatively low number, it would be an extremely tedious effort by the Scratch Team to get rid of all of them.
Last edited by GC123456 (Jan. 28, 2020 00:12:03)
- LBormi
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
This is great.
- --Explosion--
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
Interesting idea, what would happen to existing project id's? Also, no support for case sensitivity, then it would be really hard to type in a link.
- Flowermanvista
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
Where did you get “360,000,000 projects” from? The Statistics page says we're only at about 50,000,000.
And to think, we celebrated big at 2,000,000 a long while ago…Then 3M, then 4M, then 5M, then the milestones kept coming so fast that they stopped being special.
And to think, we celebrated big at 2,000,000 a long while ago…Then 3M, then 4M, then 5M, then the milestones kept coming so fast that they stopped being special.
- CatsUnited
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
I bet if Scratch ever surpasses their billionth project that all they're going to do is throw another digit on like what they've been doing for over a decade now. That's not to say that utilising URLs with both letters and numbers is a bad thing, since Youtube does that to keep their URLs for their videos relatively short despite the vast amount of content uploaded to their site.
Last edited by CatsUnited (Jan. 26, 2020 15:06:45)
- Za-Chary
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
I merged some of the posts made on the “Questions about Scratch version” of your suggestion, moving them to the “Suggestions version” of your suggestion. The “Questions about Scratch version” of your suggestion was deleted.
- Arceu
-
500+ posts
After 999,999,999... (Moved from Questions About Scratch)
There are 50,000,000 shared projects, while 360,000,000 have been created in total. Where did you get “360,000,000 projects” from? The Statistics page says we're only at about 50,000,000.
Last edited by Arceu (Jan. 26, 2020 15:44:33)
- WindOctahedron
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
(And every project has a unique URL, and it doesn't matter if it's shared or not.)There are 50,000,000 shared projects, while 360,000,000 have been created in total. Where did you get “360,000,000 projects” from? The Statistics page says we're only at about 50,000,000.
The problem with URLs that contain words can be easily solved by only allowing consonants in it (with Y treated as a vowel).
Last edited by WindOctahedron (Jan. 26, 2020 16:06:18)
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
I bet if Scratch ever surpasses their billionth project that all they're going to do is throw another digit on like what they've been doing for over a decade now. That's not to say that utilising URLs with both letters and numbers is a bad thing, since Youtube does that to keep their URLs for their videos relatively short despite the vast amount of content uploaded to their site.
But then providing links to these projects would be impossible, as 1 billion 103, for instance, would be replaced with XXXXXXXXXX
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
Great idea!(And every project has a unique URL, and it doesn't matter if it's shared or not.)There are 50,000,000 shared projects, while 360,000,000 have been created in total. Where did you get “360,000,000 projects” from? The Statistics page says we're only at about 50,000,000.
The problem with URLs that contain words can be easily solved by only allowing consonants in it (with Y treated as a vowel).
- coder2045
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
I think 1234567890 would be replaced with XXXXX67890.I bet if Scratch ever surpasses their billionth project that all they're going to do is throw another digit on like what they've been doing for over a decade now. That's not to say that utilising URLs with both letters and numbers is a bad thing, since Youtube does that to keep their URLs for their videos relatively short despite the vast amount of content uploaded to their site.
But then providing links to these projects would be impossible, as 1 billion 103, for instance, would be replaced with XXXXXXXXXX
- coder2045
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
I can't understand your suggestion. Also, why base 52?
- Sheep_maker
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
They could fix the phone number detection to not detect long numbers in URLs, which is probably easier than overhauling the ID systemI bet if Scratch ever surpasses their billionth project that all they're going to do is throw another digit on like what they've been doing for over a decade now. That's not to say that utilising URLs with both letters and numbers is a bad thing, since Youtube does that to keep their URLs for their videos relatively short despite the vast amount of content uploaded to their site.
But then providing links to these projects would be impossible, as 1 billion 103, for instance, would be replaced with XXXXXXXXXX
52 is the number of uppercase and lowercase letters in the English alphabet (26 + 26) I can't understand your suggestion. Also, why base 52?
The proposed ID system doesn't use digits, so that could be a way to distinguish between the two systems Interesting idea, what would happen to existing project id's? Also, no support for case sensitivity, then it would be really hard to type in a link.
aren't some of those project ID's gonna spell out bad words? Yes, they will. And because there are SO many possibilities, what the ST should do is take the time to go through EVERY SINGLE WORD in English, and erradicate any URLs that spell out ANY of these words.Scratch is a multilingual site; they'll have to filter out words from other languages too, and there is a lot of regional slang in every language, so collecting them all would be quite difficult. Languages can also evolve over time; for example, in American English, “boomer” used to be a fairly neutral term, but it's now considered ageist by some. Now, you might be saying,
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
They could fix the phone number detection to not detect long numbers in URLs, which is probably easier than overhauling the ID systemI bet if Scratch ever surpasses their billionth project that all they're going to do is throw another digit on like what they've been doing for over a decade now. That's not to say that utilising URLs with both letters and numbers is a bad thing, since Youtube does that to keep their URLs for their videos relatively short despite the vast amount of content uploaded to their site.
But then providing links to these projects would be impossible, as 1 billion 103, for instance, would be replaced with XXXXXXXXXX
{…}
Wait a minute, if that was the case, then people could write URLs of THEIR phone numbers to sneak past the censorer.
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
{…}aren't some of those project ID's gonna spell out bad words? Yes, they will. And because there are SO many possibilities, what the ST should do is take the time to go through EVERY SINGLE WORD in English, and erradicate any URLs that spell out ANY of these words.Scratch is a multilingual site; they'll have to filter out words from other languages too, and there is a lot of regional slang in every language, so collecting them all would be quite difficult. Languages can also evolve over time; for example, in American English, “boomer” used to be a fairly neutral term, but it's now considered ageist by some. Now, you might be saying,
The problem with URLs that contain words can be easily solved by only allowing consonants in it (with Y treated as a vowel).
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
Interesting idea, what would happen to existing project id's? Also, no support for case sensitivity, then it would be really hard to type in a link.Last edited by kaj (Tomorrow 00:00:00)???
That aside, youtube links are hard to type.
Why not learn something from them? Copying and pasting is commonplace. Sorry if I sounded mean, I thought it was pretty obvious.
Last edited by GC123456 (Jan. 26, 2020 21:55:25)
- Za-Chary
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
This doesn't solve the issue though. There exist inappropriate abbreviations that could be put into the project URL, including ones that don't use vowels. One example is something similar to the abbreviation of “what the heck”, but the “h” in the abbreviation is replaced with another letter that stands for something more vulgar…Scratch is a multilingual site; they'll have to filter out words from other languages too, and there is a lot of regional slang in every language, so collecting them all would be quite difficult. Languages can also evolve over time; for example, in American English, “boomer” used to be a fairly neutral term, but it's now considered ageist by some.The problem with URLs that contain words can be easily solved by only allowing consonants in it (with Y treated as a vowel).
Last edited by Za-Chary (Jan. 26, 2020 21:49:31)
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
Y'know, I am VERY tempted to just say to you all to grab all the dictionaries in all the languages ever, but of course that would be way too much work.This doesn't solve the issue though. There exist inappropriate abbreviations that could be put into the project URL, including ones that don't use vowels. One example is something similar to the abbreviation of “what the heck”, but the “h” in the abbreviation is replaced with another letter that stands for something more vulgar…Scratch is a multilingual site; they'll have to filter out words from other languages too, and there is a lot of regional slang in every language, so collecting them all would be quite difficult. Languages can also evolve over time; for example, in American English, “boomer” used to be a fairly neutral term, but it's now considered ageist by some.The problem with URLs that contain words can be easily solved by only allowing consonants in it (with Y treated as a vowel).
If you could actually do it, I would advocate for it, but when I first wrote the post, I (naturally) didn't want to overwhelm the Scratch Team with the daunting task of preventing ANY vulgar (that was the ultimate goal) of ANY type from being spelled out in any of its base-52 URLs.
Last edited by GC123456 (Jan. 26, 2020 21:54:04)
- Za-Chary
-
1000+ posts
After 999,999,999... (Moved from Questions About Scratch)
That's understandable — but if we're not careful, people could use URLs to sneak past the filterbot, similarly to what you pointed out with phone numbers. Y'know, I am VERY tempted to just say to you all to grab all the dictionaries in all the languages ever, but of course that would be way too much work.
If you could actually do it, I would advocate for it, but when I first wrote the post, I (naturally) didn't want to overwhelm the Scratch Team with the daunting task of preventing ANY vulgar (that was the ultimate goal) of ANY type from being spelled out in any of its base-52 URLs.
- GC123456
-
100+ posts
After 999,999,999... (Moved from Questions About Scratch)
That's the whole point, and this post would have been a lot shorter if I hadn't mentioned any of it.That's understandable — but if we're not careful, people could use URLs to sneak past the filterbot, similarly to what you pointed out with phone numbers. Y'know, I am VERY tempted to just say to you all to grab all the dictionaries in all the languages ever, but of course that would be way too much work.
If you could actually do it, I would advocate for it, but when I first wrote the post, I (naturally) didn't want to overwhelm the Scratch Team with the daunting task of preventing ANY vulgar (that was the ultimate goal) of ANY type from being spelled out in any of its base-52 URLs.
- Discussion Forums
- » Suggestions
-
» After 999,999,999... (Moved from Questions About Scratch)