Discuss Scratch
- ajskateboarder
-
1000+ posts
Text similarity
I did a little looking for duplicates and couldn't find anything similar except for this, but it was quite ambiguous and is closed anyways. Let me know if there is one
This was edited quite a bit so any posts in the first page are basically irrelevant
The blocks
Finding same/different letters
Suggested by cookieclickerer here (idk why I was insistent on keeping my old blocks lol these are far more useful (thanks))
This should report the letters that are the same between both texts by looking character-by-character. This would account for this case as well:
There should also be an alternative block that looks for different characters:
Finding nth most similar item from a list
This finds the edit distance of every item in the list given the input text, which would look like this under the hood (pseudocode):
And it would pick the nth smallest item from that list. You don't need to know any of that, I was just adding it as a technical detaildata:image/s3,"s3://crabby-images/9eb77/9eb7708372dcc57c3e6d7ad031a81765528b2890" alt=""
This was edited quite a bit so any posts in the first page are basically irrelevant
The blocks
Finding same/different letters
([same v] letters in [text] and [t3xt]::operators) // txt
Suggested by cookieclickerer here (idk why I was insistent on keeping my old blocks lol these are far more useful (thanks))
This should report the letters that are the same between both texts by looking character-by-character. This would account for this case as well:
([same v] letters in [act] and [cat]::operators) // t
There should also be an alternative block that looks for different characters:
([different v] letters in [text] and [t3xt]::operators) // 3
([different v] letters in [act] and [cat]::operators) // ca
Finding nth most similar item from a list
(item of [list v] that is (2) closest to (input text)::list) // the 2nd most similar item
This finds the edit distance of every item in the list given the input text, which would look like this under the hood (pseudocode):
add ((length of (item (n) of [input list v])) - (length of (same letters in (item (n) of [input list v]) and (input text) ::operators))) to [distances v] // a number; the lower the number, the more similar it is
And it would pick the nth smallest item from that list. You don't need to know any of that, I was just adding it as a technical detail
data:image/s3,"s3://crabby-images/07550/075501ada99efe6c646cff30337f0ff0b27ffb7a" alt=""
Last edited by ajskateboarder (Dec. 11, 2023 15:52:39)
- cookieclickerer33
-
1000+ posts
Text similarity
Seems a little too complex and vague for scratch
- ajskateboarder
-
1000+ posts
Text similarity
The tldr is: these reporters basically tell you how similar one string is to another Seems a little too complex and vague for scratch
You don't really have to understand the background knowledge
- IndexErrorException
-
500+ posts
Text similarity
this, but it was quite ambiguous and is closed anyways. Let me know if there is oneI did a little looking for duplicates and couldn't find anything similar except for
Here's some context: https://en.wikipedia.org/wiki/Approximate_string_matching and https://en.wikipedia.org/wiki/Levenshtein_distance
The blocks
Similarity between two texts(similarity between [text] and [t3xt] ::operators) // value from 0-100
This would use Levenshtein distance as mentioned in the context
Find nth similar item from list with text(most similar item in (list ::list) with [text] ::operators)
So if I have a list with “apple”, “appel”, “bananana” and “cherry”, and I look for the item that is the most similar to “banana”, then this block would report “bananana”(item (2) of most similar in (list ::list) with [text] ::operators)
Basically the previous block but it could report the 2nd most similar, the 3rd, and so on
–
You could make:
- a Q&A system
- a spell-checker
- any real world use of fuzzy string matching
This is cool, but I would rather is be similar to string.compareTo() in Java, comparing two string's lexicographic distance from each other, fancy way of saying how close are they, alphabetically speaking.
- MagicCoder330
-
1000+ posts
Text similarity
How would it tell how “similar” something is to another thing?
- Gamer_Logan819
-
1000+ posts
Text similarity
It is stated in the post. Check the top two links How would it tell how “similar” something is to another thing?
- qwerty_wasd_gone
-
1000+ posts
Text similarity
That is not what it should be, here's a better one.(most similar item in (list ::list) with [text] ::operators)
(most similar item in [list v] with [text] ::list)
(item (2) of most similar in (list ::list) with [text] ::operators)
Basically the previous block but it could report the 2nd most similar, the 3rd, and so on
(item (2) of most similar item in [list v] with [text] ::list)But this one is confusing, the text on the block says nothing about what you're saying it's supposed to do, it just looks like its made for 2D lists (of which is rejected.)
- ajskateboarder
-
1000+ posts
Text similarity
It would be the nth most similar item from a list, not just the most similar. This reporter doesn't relate to 2D lists, although I don't know how to describe it better. Would it be like this?(item (2) of most similar in (list ::list) with [text] ::operators)
Basically the previous block but it could report the 2nd most similar, the 3rd, and so on(item (2) of most similar item in [list v] with [text] ::list)But this one is confusing, the text on the block says nothing about what you're saying it's supposed to do, it just looks like its made for 2D lists (of which is rejected.)
(item [] nd of most similar item in [list v] with [text] :: list)
But that wouldn't always be correct, grammatically :/
Last edited by ajskateboarder (Nov. 7, 2023 19:45:41)
- qwerty_wasd_gone
-
1000+ posts
Text similarity
It still has the same issue, it's confusing to see what it does without having to view this topic, it still looks related to 2D lists even though it isn't. -snip-
It would be the nth most similar item from a list, not just the most similar. This reporter doesn't relate to 2D lists, although I don't know how to describe it better. Would it be like this?(item [] nd of most similar item in [list v] with [text] :: list)
But that wouldn't always be correct, grammatically :/
I think this one would be better:
(item of [list v] that is ()nd closest to [text] :: list)I just came up with it.
Alternative:
(item of [list v] that is () closest to [text] :: list)
Last edited by qwerty_wasd_gone (Nov. 7, 2023 19:53:56)
- SSScratcherSSS
-
78 posts
Text similarity
Support. This could be interesting in some complex projects and could make scratch’s possibilities greater. qwerty_wasd_gone’s ideas should be used if this were to be implemented to make it more clear what the block whpould do.
Last edited by SSScratcherSSS (Nov. 8, 2023 00:18:38)
- cookieclickerer33
-
1000+ posts
Text similarity
i know what they do and (now) how they work. but this just seems to be a little vauge. you cant really tell what it would do at a glance and its too big and combersomeThe tldr is: these reporters basically tell you how similar one string is to another Seems a little too complex and vague for scratch
You don't really have to understand the background knowledge
- yadayadayadagoodbye
-
1000+ posts
Text similarity
I was going to say that an issue was ambiguity, but after looking at the wikipedia article, i found out that the proccess was extremely straightforward
- qwerty_wasd_gone
-
1000+ posts
Text similarity
Well, I support, because this would be helpful in “spelling-bee” games, and others.
- mcsquaggle
-
500+ posts
Text similarity
does case matter?
<[OMG] is similar to [OmG]?::operators> // false?support.
<[OMG] is similar to [OMG]?::operators> // true?
- ajskateboarder
-
1000+ posts
Text similarity
This isn't the block I'm suggesting, see the OP does case matter?<[OMG] is similar to [OmG]?::operators> // false?support.
<[OMG] is similar to [OMG]?::operators> // true?
data:image/s3,"s3://crabby-images/d8be5/d8be5f3ef806eda00f8b0c21f5cb3df51b90ce67" alt=""
- mcsquaggle
-
500+ posts
Text similarity
i saw, it just made it a boolean because that could work + i dont know why you would need something to report a number that should be true/falseThis isn't the block I'm suggesting, see the OP does case matter?<[OMG] is similar to [OmG]?::operators> // false?support.
<[OMG] is similar to [OMG]?::operators> // true?
- ajskateboarder
-
1000+ posts
Text similarity
It reports how similar texts are from 0% to 100%. That's far less confusing than reporting “false?” or “true?”. Plus you can do thisi saw, it just made it a boolean because that could work + i dont know why you would need something to report a number that should be true/falseThis isn't the block I'm suggesting, see the OP does case matter?<[OMG] is similar to [OmG]?::operators> // false?support.
<[OMG] is similar to [OMG]?::operators> // true?
if <(similarity between [OMG] and [OmG] ::operators) > [75]> then
say [It's similar enough for me]
end
- mcsquaggle
-
500+ posts
Text similarity
oooohhhhhhhIt reports how similar texts are from 0% to 100%. That's far less confusing than reporting “false?” or “true?”. Plus you can do thisi saw, it just made it a boolean because that could work + i dont know why you would need something to report a number that should be true/falseThis isn't the block I'm suggesting, see the OP does case matter?<[OMG] is similar to [OmG]?::operators> // false?support.
<[OMG] is similar to [OMG]?::operators> // true?if <(similarity between [OMG] and [OmG] ::operators) > [75]> then
say [It's similar enough for me]
end
- KingRat_1
-
100+ posts
Text similarity
no support
not sure how this would work or where itd be necessary.
not sure how this would work or where itd be necessary.