Discuss Scratch

ajskateboarder
Scratcher
1000+ posts

Text similarity

I did a little looking for duplicates and couldn't find anything similar except for this, but it was quite ambiguous and is closed anyways. Let me know if there is one

This was edited quite a bit so any posts in the first page are basically irrelevant

The blocks

Finding same/different letters

([same v] letters in [text] and [t3xt]::operators) // txt

Suggested by cookieclickerer here (idk why I was insistent on keeping my old blocks lol these are far more useful (thanks))

This should report the letters that are the same between both texts by looking character-by-character. This would account for this case as well:

([same v] letters in [act] and [cat]::operators) // t

There should also be an alternative block that looks for different characters:

([different v] letters in [text] and [t3xt]::operators) // 3
([different v] letters in [act] and [cat]::operators) // ca

Finding nth most similar item from a list

(item of [list v] that is (2) closest to (input text)::list) // the 2nd most similar item

This finds the edit distance of every item in the list given the input text, which would look like this under the hood (pseudocode):

add ((length of (item (n) of [input list v])) - (length of (same letters in (item (n) of [input list v]) and (input text) ::operators))) to [distances v] // a number; the lower the number, the more similar it is

And it would pick the nth smallest item from that list. You don't need to know any of that, I was just adding it as a technical detail

Last edited by ajskateboarder (Dec. 11, 2023 15:52:39)

cookieclickerer33
Scratcher
1000+ posts

Text similarity

Seems a little too complex and vague for scratch
ajskateboarder
Scratcher
1000+ posts

Text similarity

cookieclickerer33 wrote:

Seems a little too complex and vague for scratch
The tldr is: these reporters basically tell you how similar one string is to another

You don't really have to understand the background knowledge
ajskateboarder
Scratcher
1000+ posts

Text similarity

Bump
IndexErrorException
Scratcher
500+ posts

Text similarity

ajskateboarder wrote:

I did a little looking for duplicates and couldn't find anything similar except for this, but it was quite ambiguous and is closed anyways. Let me know if there is one

Here's some context: https://en.wikipedia.org/wiki/Approximate_string_matching and https://en.wikipedia.org/wiki/Levenshtein_distance

The blocks

Similarity between two texts

(similarity between [text] and [t3xt] ::operators) // value from 0-100

This would use Levenshtein distance as mentioned in the context

Find nth similar item from list with text

(most similar item in (list ::list) with [text] ::operators)

So if I have a list with “apple”, “appel”, “bananana” and “cherry”, and I look for the item that is the most similar to “banana”, then this block would report “bananana”

(item (2) of most similar in (list ::list) with [text] ::operators)

Basically the previous block but it could report the 2nd most similar, the 3rd, and so on



You could make:

- a Q&A system
- a spell-checker
- any real world use of fuzzy string matching

This is cool, but I would rather is be similar to string.compareTo() in Java, comparing two string's lexicographic distance from each other, fancy way of saying how close are they, alphabetically speaking.
MagicCoder330
Scratcher
1000+ posts

Text similarity

How would it tell how “similar” something is to another thing?
Gamer_Logan819
Scratcher
1000+ posts

Text similarity

MagicCoder330 wrote:

How would it tell how “similar” something is to another thing?
It is stated in the post. Check the top two links
qwerty_wasd_gone
Scratcher
1000+ posts

Text similarity

ajskateboarder wrote:

(most similar item in (list ::list) with [text] ::operators)
That is not what it should be, here's a better one.
(most similar item in [list v] with [text] ::list)

ajskateboarder wrote:

(item (2) of most similar in (list ::list) with [text] ::operators)

Basically the previous block but it could report the 2nd most similar, the 3rd, and so on
(item (2) of most similar item in [list v] with [text] ::list)
But this one is confusing, the text on the block says nothing about what you're saying it's supposed to do, it just looks like its made for 2D lists (of which is rejected.)
ajskateboarder
Scratcher
1000+ posts

Text similarity

qwerty_wasd_gone wrote:

ajskateboarder wrote:

(item (2) of most similar in (list ::list) with [text] ::operators)

Basically the previous block but it could report the 2nd most similar, the 3rd, and so on
(item (2) of most similar item in [list v] with [text] ::list)
But this one is confusing, the text on the block says nothing about what you're saying it's supposed to do, it just looks like its made for 2D lists (of which is rejected.)
It would be the nth most similar item from a list, not just the most similar. This reporter doesn't relate to 2D lists, although I don't know how to describe it better. Would it be like this?

(item [] nd of most similar item in [list v] with [text] :: list)

But that wouldn't always be correct, grammatically :/

Last edited by ajskateboarder (Nov. 7, 2023 19:45:41)

qwerty_wasd_gone
Scratcher
1000+ posts

Text similarity

ajskateboarder wrote:

-snip-
It would be the nth most similar item from a list, not just the most similar. This reporter doesn't relate to 2D lists, although I don't know how to describe it better. Would it be like this?

(item [] nd of most similar item in [list v] with [text] :: list)

But that wouldn't always be correct, grammatically :/
It still has the same issue, it's confusing to see what it does without having to view this topic, it still looks related to 2D lists even though it isn't.

I think this one would be better:
(item of [list v] that is ()nd closest to [text] :: list)
I just came up with it.

Alternative:
(item of [list v] that is () closest to [text] :: list)

Last edited by qwerty_wasd_gone (Nov. 7, 2023 19:53:56)

SSScratcherSSS
Scratcher
78 posts

Text similarity

Support. This could be interesting in some complex projects and could make scratch’s possibilities greater. qwerty_wasd_gone’s ideas should be used if this were to be implemented to make it more clear what the block whpould do.

Last edited by SSScratcherSSS (Nov. 8, 2023 00:18:38)

cookieclickerer33
Scratcher
1000+ posts

Text similarity

ajskateboarder wrote:

cookieclickerer33 wrote:

Seems a little too complex and vague for scratch
The tldr is: these reporters basically tell you how similar one string is to another

You don't really have to understand the background knowledge
i know what they do and (now) how they work. but this just seems to be a little vauge. you cant really tell what it would do at a glance and its too big and combersome
yadayadayadagoodbye
Scratcher
1000+ posts

Text similarity

I was going to say that an issue was ambiguity, but after looking at the wikipedia article, i found out that the proccess was extremely straightforward
qwerty_wasd_gone
Scratcher
1000+ posts

Text similarity

Well, I support, because this would be helpful in “spelling-bee” games, and others.
mcsquaggle
Scratcher
500+ posts

Text similarity

does case matter?
<[OMG] is similar to [OmG]?::operators> // false?

<[OMG] is similar to [OMG]?::operators> // true?
support.
ajskateboarder
Scratcher
1000+ posts

Text similarity

mcsquaggle wrote:

does case matter?
<[OMG] is similar to [OmG]?::operators> // false?

<[OMG] is similar to [OMG]?::operators> // true?
support.
This isn't the block I'm suggesting, see the OP
mcsquaggle
Scratcher
500+ posts

Text similarity

ajskateboarder wrote:

mcsquaggle wrote:

does case matter?
<[OMG] is similar to [OmG]?::operators> // false?

<[OMG] is similar to [OMG]?::operators> // true?
support.
This isn't the block I'm suggesting, see the OP
i saw, it just made it a boolean because that could work + i dont know why you would need something to report a number that should be true/false
ajskateboarder
Scratcher
1000+ posts

Text similarity

mcsquaggle wrote:

ajskateboarder wrote:

mcsquaggle wrote:

does case matter?
<[OMG] is similar to [OmG]?::operators> // false?

<[OMG] is similar to [OMG]?::operators> // true?
support.
This isn't the block I'm suggesting, see the OP
i saw, it just made it a boolean because that could work + i dont know why you would need something to report a number that should be true/false
It reports how similar texts are from 0% to 100%. That's far less confusing than reporting “false?” or “true?”. Plus you can do this

if <(similarity between [OMG] and [OmG] ::operators) > [75]> then
say [It's similar enough for me]
end
mcsquaggle
Scratcher
500+ posts

Text similarity

ajskateboarder wrote:

mcsquaggle wrote:

ajskateboarder wrote:

mcsquaggle wrote:

does case matter?
<[OMG] is similar to [OmG]?::operators> // false?

<[OMG] is similar to [OMG]?::operators> // true?
support.
This isn't the block I'm suggesting, see the OP
i saw, it just made it a boolean because that could work + i dont know why you would need something to report a number that should be true/false
It reports how similar texts are from 0% to 100%. That's far less confusing than reporting “false?” or “true?”. Plus you can do this

if <(similarity between [OMG] and [OmG] ::operators) > [75]> then
say [It's similar enough for me]
end
oooohhhhhhh
KingRat_1
Scratcher
100+ posts

Text similarity

no support
not sure how this would work or where itd be necessary.

Powered by DjangoBB