Discuss Scratch
- 46009361
-
1000+ posts
Automatic project copy detector
(To clarify, this isn't exactly a “duplicate.”)
We want to be able to check if a project is an exact copy of the project before the “share” button executes its normal command. They may take up load on the servers, but that's okay. Only the code of the project matters; not the title, instructions, Notes and Credits, comments, thumbnail, remixes, studios, and data on cloud variables. The code must be in the same place, you cannot have it detect with cleaned up blocks vs. not cleaned up blocks.
Here's how I want to check for copies automatically on Scratch's servers (for new shared projects):
First, it checks if the project exactly matches the default project that you start with. If so, it shares it right away.
Otherwise, it checks if it's a remix. If so, it checks each remix box (the thanks box) and compares the MD5 checksum of the SB3 files of each other project to the one you want to share. If so, it presents a warning to the user customized for that project.
Lastly, if a warning isn't presented or it's an original project, it checks every shared project, from the first one to the last. If any exact copies are found published before that time who is not by the same user or email address, it presents the warning.
Already shared projects before the release should be taken down in the same method. The project that was shared first for the first time it was shared (if unshared and reshared), according to each person's What's Happening? section, should stay up as they own the rights to it. This also applies to reshared projects too. This is to prevent someone from sharing a project, waiting until one of their real-life enemies downloads the SB3 file or remixes it, the original person unsharing it, then the new person (uploads and) shares the copy.
Edit: My 100th post!
We want to be able to check if a project is an exact copy of the project before the “share” button executes its normal command. They may take up load on the servers, but that's okay. Only the code of the project matters; not the title, instructions, Notes and Credits, comments, thumbnail, remixes, studios, and data on cloud variables. The code must be in the same place, you cannot have it detect with cleaned up blocks vs. not cleaned up blocks.
Here's how I want to check for copies automatically on Scratch's servers (for new shared projects):
First, it checks if the project exactly matches the default project that you start with. If so, it shares it right away.
Otherwise, it checks if it's a remix. If so, it checks each remix box (the thanks box) and compares the MD5 checksum of the SB3 files of each other project to the one you want to share. If so, it presents a warning to the user customized for that project.
Lastly, if a warning isn't presented or it's an original project, it checks every shared project, from the first one to the last. If any exact copies are found published before that time who is not by the same user or email address, it presents the warning.
Already shared projects before the release should be taken down in the same method. The project that was shared first for the first time it was shared (if unshared and reshared), according to each person's What's Happening? section, should stay up as they own the rights to it. This also applies to reshared projects too. This is to prevent someone from sharing a project, waiting until one of their real-life enemies downloads the SB3 file or remixes it, the original person unsharing it, then the new person (uploads and) shares the copy.
Edit: My 100th post!
Last edited by 46009361 (March 23, 2021 04:39:47)
- DarthVader4Life
-
1000+ posts
Automatic project copy detector
I'm not sure what this is saying, but, correct me if I'm wrong, this would be hard to code and possibly take quite a bit of server space. Off topic: Congrats on 100 posts!
Last edited by DarthVader4Life (Aug. 16, 2021 22:21:26)
- 46009361
-
1000+ posts
Automatic project copy detector
Would be easy to code, but this would be great as previous projects would automatically be reported as a copy using bots to remove (not report) it to prevent moderators from taking too much time if they reported. They could also program a second server to take over the job if the first one crashes. Once all the existing copies have been found, it checks new copies going forward. i'm not sure what this is saying, but, correct me if i'm wrong, this would be hard to code and possibly take quite a bit of server space. off topic:congrats on 100 posts!
- DarthVader4Life
-
1000+ posts
Automatic project copy detector
Actually, no, it wouldn't be easy to code. While I may not be an expert, I can grasp the complexity of coding it. And we don't want the servers to crash…. ever…..Would be easy to code, but this would be great as previous projects would automatically be reported as a copy using bots to remove (not report) it to prevent moderators from taking too much time if they reported. They could also program a second server to take over the job if the first one crashes. Once all the existing copies have been found, it checks new copies going forward. -snip-
Last edited by DarthVader4Life (Aug. 16, 2021 22:22:10)
- 46009361
-
1000+ posts
Automatic project copy detector
Yeah, it would be easy to code. First, it checks project 1, 2, 3, etc. until it finds a shared project. It stores that number to continue the count. Then it finds any exact copies. Once taken down, it retrieves the stored number (let's say 10) and then go 11, 12, 12, etc. until it find another shared (not private) project. Continues doing. The project IDs go up by one each time a new project is uploaded or created, so a number must be stored on the server indicating the max projects when it looks for coding. It could also gather the data (sb3 file contents) of all shared projects (which only takes a few seconds for millions of then because of the way computers work), sort them, find the exact copies after it's been sorted in alphabetical order, then go back to the original project (because it records the way the projects connect to the files), and takes down based on sharing date. Medium, but not too difficult.actually, no it wouldn't be easy to code, while i may not be an expert, i can grasp the complexity of coding it. and we don't want the servers to crash…. ever…..Would be easy to code, but this would be great as previous projects would automatically be reported as a copy using bots to remove (not report) it to prevent moderators from taking too much time if they reported. They could also program a second server to take over the job if the first one crashes. Once all the existing copies have been found, it checks new copies going forward. -snip-
There is only a limited number of projects that Scratch will ever have. The servers won't be guaranteed to crash unless it's infinite. It's not an infinite loop.
Last edited by 46009361 (March 23, 2021 15:17:17)
- WolfCat67
-
1000+ posts
Automatic project copy detector
This suggestion seems very similar to this one from 2013, with the only key difference being that the older one is for remixes specifically. Another post from mid-2014 held a suggestion that is effectively yours with less detail, but is now closed. A neat thing is that Scratch Team member @cheddargirl had responded to it, stating that a similar feature was available in Scratch 1.4 but that adding it into later versions (specifically 2.0 at the time) was not a priority.
As for the suggestion itself, I feel like it may take up too many resources on servers, especially if multiple people are checking it at once. Now, I'll admit, I'm not experienced with computer science, but comparing one thing with a million other things that have so many differences may take a little bit more time than the “seconds” you expect (again, I am not fully sure). Though I am aware Scratch projects don't have a very large file size at all, so maybe? A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.
As for the suggestion itself, I feel like it may take up too many resources on servers, especially if multiple people are checking it at once. Now, I'll admit, I'm not experienced with computer science, but comparing one thing with a million other things that have so many differences may take a little bit more time than the “seconds” you expect (again, I am not fully sure). Though I am aware Scratch projects don't have a very large file size at all, so maybe? A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.
Last edited by WolfCat67 (Oct. 19, 2019 20:27:25)
- 46009361
-
1000+ posts
Automatic project copy detector
this one from 2013, with the only key difference being that the older one is for remixes specifically. Another post from mid-2014 held a suggestion that is effectively yours with less detail, but is now closed. A neat thing is that Scratch Team member @cheddargirl had responded to it, stating that a similar feature was available in Scratch 1.4 but that adding it into later versions (specifically 2.0 at the time) was not a priority.Didn't seem like there was any duplicates using the instructions from the stickied thread called Guide to Finding Duplicates by Wahsp. This suggestion seems very similar to
As for the suggestion itself, I feel like it would be too much strain on the servers. I'm not very experienced with computer science, but I feel like sifting through over a million projects and checking so many different things over and over again can't be fast. It'd either take too long or take up too much resources to be useful. A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.
- WolfCat67
-
1000+ posts
Automatic project copy detector
I used the same strategy to find all of the ones listed. Perhaps we just used different key words?Didn't seem like there was any duplicates using the instructions from the stickied thread called Guide to Finding Duplicates by Wahsp. - snip -
Last edited by Paddle2See (March 30, 2021 09:29:09)
- 46009361
-
1000+ posts
Automatic project copy detector
Yeah, I just used “automatic project copy”.I used the same strategy to find all of the ones listed. Perhaps we just used different key words?Didn't seem like there was any duplicates using the instructions from the stickied thread called Guide to Finding Duplicates by Wahsp. - snip -
Last edited by 46009361 (March 23, 2021 15:19:32)
- Starstriker3000
-
1000+ posts
Automatic project copy detector
If you see a project that contains no noticeable changes, please use the Report button on it so the Scratch Team can take a look at it.
- DarthVader4Life
-
1000+ posts
Automatic project copy detector
Remixed Project Duplicate CheckPretty much. Is this proposing the same thing as this topic
?
Last edited by DarthVader4Life (Aug. 16, 2021 22:22:23)
- 46009361
-
1000+ posts
Automatic project copy detector
this one from 2013, with the only key difference being that the older one is for remixes specifically. Another post from mid-2014 held a suggestion that is effectively yours with less detail, but is now closed. A neat thing is that Scratch Team member @cheddargirl had responded to it, stating that a similar feature was available in Scratch 1.4 but that adding it into later versions (specifically 2.0 at the time) was not a priority.Replying again as the previous reply is now moderately irrelevant to me. The thing is, the servers wouldn't normally crash as it checks remixes FIRST if any. If any copies are found, then it stops. Otherwise, it stores the numbers and check every other project (eliminating the stored project IDs). This suggestion seems very similar to
As for the suggestion itself, I feel like it may take up too many resources on servers, especially if multiple people are checking it at once. Now, I'll admit, I'm not experienced with computer science, but comparing one thing with a million other things that have so many differences may take a little bit more time than the “seconds” you expect (again, I am not fully sure). Though I am aware Scratch projects don't have a very large file size at all, so maybe? A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.
Last edited by 46009361 (Oct. 22, 2019 00:23:56)
- Starstriker3000
-
1000+ posts
Automatic project copy detector
What about remixes that are changed but the change isn't noticeable to other Scratchers? The servers would still count those as changed remixes.
- 46009361
-
1000+ posts
Automatic project copy detector
Really not part of this topic. The change may be a change in an easter egg in the game, but that is still not a copy. What about remixes that are changed but the change isn't noticeable to other Scratchers? The servers would still count those as changed remixes.
- Dragonlord767
-
1000+ posts
Automatic project copy detector
Couldn't they just change the position of one block? Or add this script?
when green flag clickedWould it detect an art change? Couldn't you change one pixel?
broadcast [start v]
when I receive [start v]
do rest of script
- 46009361
-
1000+ posts
Automatic project copy detector
I reported the last post by Paddle2See and it got deleted (and the topic got reopened). To clarify, this includes both remixes and non-remixes, whereas the original suggestion linked only contained remixes.
Last edited by 46009361 (March 23, 2021 04:40:19)
- fdreerf
-
1000+ posts
Automatic project copy detector
So it would scan through all 73,000,000 projects on the site before a project can get shared? Even if it took a millisecond, which is extremely optimistic and completely unfeasible, this would take around 20 hours to complete. If it took 500 milliseconds which is more reasonable but still unlikely, a project shared today, Tuesday, will finish on Thursday… I reported the last post and it got deleted (and the topic got reopened). To clarify, this includes both remixes and non-remixes, whereas the original suggestion linked only contained remixes.
…May 19th, 2023.
Last edited by fdreerf (March 23, 2021 04:45:03)
- 46009361
-
1000+ posts
Automatic project copy detector
You didn’t specify (but I had to figure out) that a millisecond was for just one project. It also has to take time to switch from one project ID to the next and check if it's shared (i.e. switching tasks). And then imagine if you unshare and reshare. It would take even more work. However, imagine if a project became super popular while also being a copy of an unpopular shared project (which is what happened to WO997's project. Someone copied it and got featured and then years later, another user got curated by LlamaGodLuke for copying the same plane project (apart from the thumbnail) and saying it was 100% original).So it would scan through all 73,000,000 projects on the site before a project can get shared? Even if it took a millisecond, which is extremely optimistic and completely unfeasible, this would take around 20 hours to complete. If it took 500 milliseconds which is more reasonable but still unlikely, a project shared today, Tuesday, will finish on Thursday… I reported the last post and it got deleted (and the topic got reopened). To clarify, this includes both remixes and non-remixes, whereas the original suggestion linked only contained remixes.
…May 19th, 2023.
Because of timezones, it's still Monday for me right now.
Last edited by 46009361 (March 23, 2021 05:14:53)
- the2000
-
1000+ posts
Automatic project copy detector
Uh, what's your point? We all know that stealing is bad, but that doesn't warrant waiting two years for a project to be shared. However, imagine if a project became super popular while also being a copy of an unpopular shared project (which is what happened to WO997's project. Someone copied it and got featured and then years later, another user got curated by LlamaGodLuke for copying the same plane project (apart from the thumbnail) and saying it was 100% original).