Discuss Scratch

46009361
Scratcher
1000+ posts

Automatic project copy detector

(To clarify, this isn't exactly a “duplicate.”)
We want to be able to check if a project is an exact copy of the project before the “share” button executes its normal command. They may take up load on the servers, but that's okay. Only the code of the project matters; not the title, instructions, Notes and Credits, comments, thumbnail, remixes, studios, and data on cloud variables. The code must be in the same place, you cannot have it detect with cleaned up blocks vs. not cleaned up blocks.
Here's how I want to check for copies automatically on Scratch's servers (for new shared projects):
First, it checks if the project exactly matches the default project that you start with. If so, it shares it right away.
Otherwise, it checks if it's a remix. If so, it checks each remix box (the thanks box) and compares the MD5 checksum of the SB3 files of each other project to the one you want to share. If so, it presents a warning to the user customized for that project.
Lastly, if a warning isn't presented or it's an original project, it checks every shared project, from the first one to the last. If any exact copies are found published before that time who is not by the same user or email address, it presents the warning.
Already shared projects before the release should be taken down in the same method. The project that was shared first for the first time it was shared (if unshared and reshared), according to each person's What's Happening? section, should stay up as they own the rights to it. This also applies to reshared projects too. This is to prevent someone from sharing a project, waiting until one of their real-life enemies downloads the SB3 file or remixes it, the original person unsharing it, then the new person (uploads and) shares the copy.
Edit: My 100th post!

Last edited by 46009361 (March 23, 2021 04:39:47)

DarthVader4Life
Scratcher
1000+ posts

Automatic project copy detector

I'm not sure what this is saying, but, correct me if I'm wrong, this would be hard to code and possibly take quite a bit of server space. Off topic: Congrats on 100 posts!

Last edited by DarthVader4Life (Aug. 16, 2021 22:21:26)

46009361
Scratcher
1000+ posts

Automatic project copy detector

DarthVader4Life wrote:

i'm not sure what this is saying, but, correct me if i'm wrong, this would be hard to code and possibly take quite a bit of server space. off topic:congrats on 100 posts!
Would be easy to code, but this would be great as previous projects would automatically be reported as a copy using bots to remove (not report) it to prevent moderators from taking too much time if they reported. They could also program a second server to take over the job if the first one crashes. Once all the existing copies have been found, it checks new copies going forward.
DarthVader4Life
Scratcher
1000+ posts

Automatic project copy detector

46009361 wrote:

DarthVader4Life wrote:

-snip-
Would be easy to code, but this would be great as previous projects would automatically be reported as a copy using bots to remove (not report) it to prevent moderators from taking too much time if they reported. They could also program a second server to take over the job if the first one crashes. Once all the existing copies have been found, it checks new copies going forward.
Actually, no, it wouldn't be easy to code. While I may not be an expert, I can grasp the complexity of coding it. And we don't want the servers to crash…. ever…..

Last edited by DarthVader4Life (Aug. 16, 2021 22:22:10)

46009361
Scratcher
1000+ posts

Automatic project copy detector

DarthVader4Life wrote:

46009361 wrote:

DarthVader4Life wrote:

-snip-
Would be easy to code, but this would be great as previous projects would automatically be reported as a copy using bots to remove (not report) it to prevent moderators from taking too much time if they reported. They could also program a second server to take over the job if the first one crashes. Once all the existing copies have been found, it checks new copies going forward.
actually, no it wouldn't be easy to code, while i may not be an expert, i can grasp the complexity of coding it. and we don't want the servers to crash…. ever…..
Yeah, it would be easy to code. First, it checks project 1, 2, 3, etc. until it finds a shared project. It stores that number to continue the count. Then it finds any exact copies. Once taken down, it retrieves the stored number (let's say 10) and then go 11, 12, 12, etc. until it find another shared (not private) project. Continues doing. The project IDs go up by one each time a new project is uploaded or created, so a number must be stored on the server indicating the max projects when it looks for coding. It could also gather the data (sb3 file contents) of all shared projects (which only takes a few seconds for millions of then because of the way computers work), sort them, find the exact copies after it's been sorted in alphabetical order, then go back to the original project (because it records the way the projects connect to the files), and takes down based on sharing date. Medium, but not too difficult.

There is only a limited number of projects that Scratch will ever have. The servers won't be guaranteed to crash unless it's infinite. It's not an infinite loop.

Last edited by 46009361 (March 23, 2021 15:17:17)

WolfCat67
Scratcher
1000+ posts

Automatic project copy detector

This suggestion seems very similar to this one from 2013, with the only key difference being that the older one is for remixes specifically. Another post from mid-2014 held a suggestion that is effectively yours with less detail, but is now closed. A neat thing is that Scratch Team member @cheddargirl had responded to it, stating that a similar feature was available in Scratch 1.4 but that adding it into later versions (specifically 2.0 at the time) was not a priority.

As for the suggestion itself, I feel like it may take up too many resources on servers, especially if multiple people are checking it at once. Now, I'll admit, I'm not experienced with computer science, but comparing one thing with a million other things that have so many differences may take a little bit more time than the “seconds” you expect (again, I am not fully sure). Though I am aware Scratch projects don't have a very large file size at all, so maybe? A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.

Last edited by WolfCat67 (Oct. 19, 2019 20:27:25)

46009361
Scratcher
1000+ posts

Automatic project copy detector

WolfCat67 wrote:

This suggestion seems very similar to this one from 2013, with the only key difference being that the older one is for remixes specifically. Another post from mid-2014 held a suggestion that is effectively yours with less detail, but is now closed. A neat thing is that Scratch Team member @cheddargirl had responded to it, stating that a similar feature was available in Scratch 1.4 but that adding it into later versions (specifically 2.0 at the time) was not a priority.

As for the suggestion itself, I feel like it would be too much strain on the servers. I'm not very experienced with computer science, but I feel like sifting through over a million projects and checking so many different things over and over again can't be fast. It'd either take too long or take up too much resources to be useful. A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.
Didn't seem like there was any duplicates using the instructions from the stickied thread called Guide to Finding Duplicates by Wahsp.
WolfCat67
Scratcher
1000+ posts

Automatic project copy detector

46009361 wrote:

WolfCat67 wrote:

- snip -
Didn't seem like there was any duplicates using the instructions from the stickied thread called Guide to Finding Duplicates by Wahsp.
I used the same strategy to find all of the ones listed. Perhaps we just used different key words?

Last edited by Paddle2See (March 30, 2021 09:29:09)

46009361
Scratcher
1000+ posts

Automatic project copy detector

WolfCat67 wrote:

46009361 wrote:

WolfCat67 wrote:

- snip -
Didn't seem like there was any duplicates using the instructions from the stickied thread called Guide to Finding Duplicates by Wahsp.
I used the same strategy to find all of the ones listed. Perhaps we just used different key words?
Yeah, I just used “automatic project copy”.

Last edited by 46009361 (March 23, 2021 15:19:32)

Paddle2See
Scratch Team
1000+ posts

Automatic project copy detector

Is this proposing the same thing as this topic Remixed Project Duplicate Check
?

Starstriker3000
Scratcher
1000+ posts

Automatic project copy detector

Za-Chary wrote:

If you see a project that contains no noticeable changes, please use the Report button on it so the Scratch Team can take a look at it.
DarthVader4Life
Scratcher
1000+ posts

Automatic project copy detector

Paddle2See wrote:

Is this proposing the same thing as this topic Remixed Project Duplicate Check
?

Pretty much.

Last edited by DarthVader4Life (Aug. 16, 2021 22:22:23)

46009361
Scratcher
1000+ posts

Automatic project copy detector

WolfCat67 wrote:

This suggestion seems very similar to this one from 2013, with the only key difference being that the older one is for remixes specifically. Another post from mid-2014 held a suggestion that is effectively yours with less detail, but is now closed. A neat thing is that Scratch Team member @cheddargirl had responded to it, stating that a similar feature was available in Scratch 1.4 but that adding it into later versions (specifically 2.0 at the time) was not a priority.

As for the suggestion itself, I feel like it may take up too many resources on servers, especially if multiple people are checking it at once. Now, I'll admit, I'm not experienced with computer science, but comparing one thing with a million other things that have so many differences may take a little bit more time than the “seconds” you expect (again, I am not fully sure). Though I am aware Scratch projects don't have a very large file size at all, so maybe? A simpler version could only check remixes for any changes from the original, but that would be the same suggestion as the suggestion from 2013.
Replying again as the previous reply is now moderately irrelevant to me. The thing is, the servers wouldn't normally crash as it checks remixes FIRST if any. If any copies are found, then it stops. Otherwise, it stores the numbers and check every other project (eliminating the stored project IDs).

Last edited by 46009361 (Oct. 22, 2019 00:23:56)

Starstriker3000
Scratcher
1000+ posts

Automatic project copy detector

What about remixes that are changed but the change isn't noticeable to other Scratchers? The servers would still count those as changed remixes.
46009361
Scratcher
1000+ posts

Automatic project copy detector

Starstriker3000 wrote:

What about remixes that are changed but the change isn't noticeable to other Scratchers? The servers would still count those as changed remixes.
Really not part of this topic. The change may be a change in an easter egg in the game, but that is still not a copy.
Dragonlord767
Scratcher
1000+ posts

Automatic project copy detector

Couldn't they just change the position of one block? Or add this script?
when green flag clicked
broadcast [start v]
when I receive [start v]
do rest of script
Would it detect an art change? Couldn't you change one pixel?
46009361
Scratcher
1000+ posts

Automatic project copy detector

I reported the last post by Paddle2See and it got deleted (and the topic got reopened). To clarify, this includes both remixes and non-remixes, whereas the original suggestion linked only contained remixes.

Last edited by 46009361 (March 23, 2021 04:40:19)

fdreerf
Scratcher
1000+ posts

Automatic project copy detector

46009361 wrote:

I reported the last post and it got deleted (and the topic got reopened). To clarify, this includes both remixes and non-remixes, whereas the original suggestion linked only contained remixes.
So it would scan through all 73,000,000 projects on the site before a project can get shared? Even if it took a millisecond, which is extremely optimistic and completely unfeasible, this would take around 20 hours to complete. If it took 500 milliseconds which is more reasonable but still unlikely, a project shared today, Tuesday, will finish on Thursday…

…May 19th, 2023.

Last edited by fdreerf (March 23, 2021 04:45:03)

46009361
Scratcher
1000+ posts

Automatic project copy detector

fdreerf wrote:

46009361 wrote:

I reported the last post and it got deleted (and the topic got reopened). To clarify, this includes both remixes and non-remixes, whereas the original suggestion linked only contained remixes.
So it would scan through all 73,000,000 projects on the site before a project can get shared? Even if it took a millisecond, which is extremely optimistic and completely unfeasible, this would take around 20 hours to complete. If it took 500 milliseconds which is more reasonable but still unlikely, a project shared today, Tuesday, will finish on Thursday…

…May 19th, 2023.
You didn’t specify (but I had to figure out) that a millisecond was for just one project. It also has to take time to switch from one project ID to the next and check if it's shared (i.e. switching tasks). And then imagine if you unshare and reshare. It would take even more work. However, imagine if a project became super popular while also being a copy of an unpopular shared project (which is what happened to WO997's project. Someone copied it and got featured and then years later, another user got curated by LlamaGodLuke for copying the same plane project (apart from the thumbnail) and saying it was 100% original).

Because of timezones, it's still Monday for me right now.

Last edited by 46009361 (March 23, 2021 05:14:53)

the2000
Scratcher
1000+ posts

Automatic project copy detector

46009361 wrote:

However, imagine if a project became super popular while also being a copy of an unpopular shared project (which is what happened to WO997's project. Someone copied it and got featured and then years later, another user got curated by LlamaGodLuke for copying the same plane project (apart from the thumbnail) and saying it was 100% original).
Uh, what's your point? We all know that stealing is bad, but that doesn't warrant waiting two years for a project to be shared.

Powered by DjangoBB