Discuss Scratch

gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

(No, this is not me asking how to get posts on Scratch. This is about how you implement getting them. Also, I'm not encouraging you to make another library for automating Scratch. We had too much of that already.)
tl;dr: From these five methods of getting posts, which one would you choose on your library?

Say you want to make an API library for Scratch or something else. You want to make your library multi-user, so that you can set up multiple users at once. You do this by implementing the Session class, which you use like this:
/* This is initially written like in Python, but I translated it into JS because:
1. We're seriously lacking on them on Scratch (not that we need another one)
2. We're getting too much Python libraries for Scratch
*/
import {Session} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
Now say you want to make a post or something else using one of the session. The problem is you don't know how to specify which session to use.
As I see it, there's five ways to achieve this:

• Each object stores their creator
On each object, there is a variable of the session creating it. They will use this variable for functions that requires a session. This approach is the most common I've seen implemented here; I know scratchclient and scratchattach uses it.
import {Session} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
session1.getPost(123).edit("Edit 1"); // uses session1
session2.getPost(456).edit("Edit 2"); // uses session2
The problem with this solution is that objects that's supposed to store only data has to also store this Session thing. If one has to make a post object, they have to specify which session it uses for the functions that uses it. This could be a problem if, say, you're saving these objects in a database.
import {Session, Post} from "library";
let session = new Session().login("username", "password");
let post = Post({content: "This is a post.", session});
Also, there's a concern that having something important stored opaquely could bring unexpected behavior, and is perhaps a bug waiting to happen. In fact, having a state at all is undesirable to some people, allegedly because it's harder to test that way (maybe the object-oriented brains can't justify that proposition).

• Each object requires a session as an input
For all the object functions that uses a session, the session to use must be passed as an argument.
import {Session} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
session1.getPost(123).edit(session2, "Edit 1");
session2.getPost(456).edit(session1, "Edit 2");
This works, but the problem is really obvious: it gets very tedious to type out what session to use every time you want to do something using them. This is especially a problem if you only want to use just one session, instead of multiple.
Now, there is an alternative way to use this solution in a more elegant manner; see Make the session do the work below.

• Each object uses some default session
There's a global variable containing the default session, set by the `make_default`function. This variable will be used by objects needing it.
import {Session} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
session1.make_default();
session1.getPost(123).edit("Edit 1"); // uses session1
session2.getPost(456).edit("Edit 2"); // also uses session1 to edit the post
An alternate, more fancy method is to create a session context; even better if your programming language of choice supports it:
import {Session, Post} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
let postID;
session1.use(() => {
    postID = new Post({content: "Hello, world!"}).submit(8).id;
});
session2.use(() => {
    new Post({content: "This is a reply."}).reply(postID);
});
from library import Session, Post
session1 = Session().login("username1", "password1")
session2 = Session().login("username2", "password2")
with session1:
    postID = Post(content="Hello, world!").submit(8).id
with session2:
    Post(content="This is a reply.").reply(postID)
There's another way to implement session contexts; see Each object is wrapped on a context.

While this is an acceptable solution, it could create code repetition on the library, if we want to allow programmers to pick an explicit session like the first example. Nothing that function call overrides couldn't fix though, but it is still a point to consider.

• Make the session do the work
Instead of calling the objects, call the session instead. The session takes an object as the input, rather than the inverse.
import {Session} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
session1.edit(new Post({content: "Edit 1", id: 123});
session2.edit(new Post({content: "Edit 2", id: 456});
The problem with this solution is the definition of Session would be massive, potentially the biggest on your library. This would be considered an anti-pattern for those that thinks different types of actions should be written in different parts of a program (see SOLID, specifically Single-responsibility principle).
One possible solution to this is to override the attribute getter of Session to call the functions of the input, if the language supports it. Internally, this is just the Each object requires a session as an input solution, but it is written in a much more elegant way.
Another problem is that function chaining/composition is now impossible with this method. I mean, you could, but…
import {Session} from "library";
let session = new Session().login("username", "password");
session.edit(session.getPost(123).update({content: "Edit 1"}));
…don't.

Each object is wrapped on a context (new!)
When a session creates an object, instead of returning it directly, they wrap it in a session context. The wrapper will provide the object the session to use. Session contexts can also be created from the objects themselves.
import {Session} from "library";
let session1 = new Session().login("username1", "password1");
let session2 = new Session().login("username2", "password2");
// Implicit wrapping: Session.getPost wraps the result in a session context
session1.getPost(123).edit("Edit 1"); // uses session1
// Explicit wrapping: Post.using wraps itself in a session context
new Post({id: 456}).using(session2).edit("Edit 2"); // uses session2
This provides a compromise between the Each object stores their creator solutions and the other solutions, and provides a more straightforward approach of Each object requires a session as an input. The session context give some shared state to an otherwise data-only object. In a way, the context “extends” the capabilities of the object it wraps.

However, the context somewhat distances the written call (for a lack of a better terminology) to the actual call (ditto). Since you're calling the context, not the object, the context needs to know the object's functions that the context can use. Not a problem for languages that can intercept function calls (I heard JavaScript has Proxy) or simulate getting and setting properties of any name. But for those that don't, you either need to define what calls objects should have, or I suppose have a single “do this” function, a conglomerate of every single action the objects can do, with the method passed as an argument.
Correct me if I'm wrong, but I also think that this solutions is teetering into the functional paradigm. Not that it matters, since contrary to popular belief, OOP and FP aren't polar opposites (in fact they can be orthogonal IIRC). But it's still something to keep in mind.


Of course, these are just my solutions, using a language I know, conforming the object-oriented paradigm (or at least my perception of it). There might be some other solutions that you might devise from other paradigms, or just ingenuity from yourselves. You may bring it up as the fifth+ option, but the question I'm asking here is “from these methods, which one would you use if you want to make a library like this?”

For me, I've used the Each object uses some default session solution, without the explicit session feature, and instead encourages programmers to use session contexts (since Python has it). Though it really makes my library look more like NumPy and less like scratchattach; I've decided to implement the Each object is wrapped on a context solution because of this. How about yours?

Last edited by gilbert_given_189 (Jan. 15, 2025 18:58:16)

Redstone1080
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

fyi, do not use the with statement in javascript, mdn web docs has some good reasons why

as for my solution, i think have the session do it is the best one, code principles be darned

Last edited by Redstone1080 (Aug. 23, 2024 01:18:56)

gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

bump
BigNate469
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

I would just do
curl https://scratch-mit-edu.ezproxyberklee.flo.org/discuss/feeds/topic/[1-800000] -o file#1.xml
(which gives an XML representation of each topic, including each post and the person who posted it), and watch the HTTP 429 errors come in.
gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

BigNate469 wrote:

I would just do
curl https://scratch-mit-edu.ezproxyberklee.flo.org/discuss/feeds/topic/[1-800000] -o file#1.xml
(which gives an XML representation of each topic, including each post and the person who posted it), and watch the HTTP 429 errors come in.
But what if you need to provide a nonce to do something? You can't easily do that using curl, right?
(I was trying to be general, you see.)

Last edited by gilbert_given_189 (Aug. 30, 2024 23:46:45)

TheCreatorOfUnTV
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

NOOOOOOOOOOOOOOOOO, not another forum scraper!
Edit: I didn't read the original post completely.

Last edited by TheCreatorOfUnTV (Aug. 27, 2024 01:10:46)

gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

bump

(1 left)
gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

bump
(last one)
ajskateboarder
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

I feel like the “Each object stores their creator” approach is the most straight forward to me
gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

Initially I want to close this topic, but considering that I only got 2 responses so far, I think I'll break my promise of the last post and make this the final bump.

Edit: and no one responded. Oh well.

Last edited by gilbert_given_189 (Oct. 13, 2024 03:26:21)

gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

I've decided to reopen this topic. I got a new solution whilst developing my library, and I thought I'd include it here as well.

I won't close the topic this time. I'd rather have somebody constructively necropost here rather than losing feedback from a closed topic.
imfh
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

gilbert_given_189 wrote:

BigNate469 wrote:

I would just do
curl https://scratch-mit-edu.ezproxyberklee.flo.org/discuss/feeds/topic/[1-800000] -o file#1.xml
(which gives an XML representation of each topic, including each post and the person who posted it), and watch the HTTP 429 errors come in.
But what if you need to provide a nonce to do something? You can't easily do that using curl, right?
(I was trying to be general, you see.)
A nonce? You mean you want to prove that you scraped the page and that it is legit?
gilbert_given_189
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

imfh wrote:

gilbert_given_189 wrote:

BigNate469 wrote:

I would just do
curl https://scratch-mit-edu.ezproxyberklee.flo.org/discuss/feeds/topic/[1-800000] -o file#1.xml
(which gives an XML representation of each topic, including each post and the person who posted it), and watch the HTTP 429 errors come in.
But what if you need to provide a nonce to do something? You can't easily do that using curl, right?
(I was trying to be general, you see.)
A nonce? You mean you want to prove that you scraped the page and that it is legit?
Yep. Those that lack an actual API did that sometimes. (I know, because that's what I have to do for my library.)
imfh
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

Based on some quick research I did, it's not possible to get a nonce like that unless the page you are scraping provides some extra information. It would be neat if you could prove the origin of a page to others through TLS or something, but it doesn't seem like that is possible.

If whatever you are scraping cryptographically signs pages, then it would be possible to get a nonce. The vast majority of websites don't do that though.

The main other alternative is to build trust with your users so they don't feel the need to have proof or to use some other service which does have trust like archive.org.
TheSecondGilbert
Scratcher
100+ posts

How would you implement getting posts (&/ using an API)?

imfh wrote:

Based on some quick research I did, it's not possible to get a nonce like that unless the page you are scraping provides some extra information. It would be neat if you could prove the origin of a page to others through TLS or something, but it doesn't seem like that is possible.

If whatever you are scraping cryptographically signs pages, then it would be possible to get a nonce. The vast majority of websites don't do that though.

The main other alternative is to build trust with your users so they don't feel the need to have proof or to use some other service which does have trust like archive.org.
That's definitely not what I'm trying to describe here. Let me describe it more clearly.

The form that you use to send a message, or edit your profile, or anything really has hidden inputs on them, with random names and values. You have to include them on your request; the server rejects it if you don't. You can't also send the same inputs multiple times; the server will think that you're sending the same message, and rejects the request.

I call these hidden inputs a “nonce”, for two reasons:
  • The process of getting these “nonce” first before doing anything bears resemblance of any nonce-based cryptographic requests. I think you're confusing what I meant by nonce with this.
  • It's a simple word that adequately describes its purpose: something that's only used once, for that single occasion. At the time of writing, I realized “idempotency keys” could be a better terminology, but that's quite a mouthful over the monosyllabic “nonce”.
To be clear, I'm not describing:
  • A nonce for the purpose of cryptography. Payloads are still sent in plaintext over HTTPS, not encrypted or signed or hashed with the nonce.
  • An “everything” nonce. Nonce values are only used for their requests. You can't use a nonce for sending messages to edit your profile, or vice versa.
Either way, this is just proving that there are cases where a simple curl invocation isn't enough, as convenient as that could be. Especially for my library, you sometimes have to scrape the page in order to get the information you need. (Sometimes, because the service I'm making my library on does have URLs that responds with sane XMLs, but not for the most basic tasks like retrieving message info)
alwayspaytaxes
Scratcher
500+ posts

How would you implement getting posts (&/ using an API)?

in languages like go or c which i am a diehard fan for it would definitely have to be Each object requires a session as an input
imfh
Scratcher
1000+ posts

How would you implement getting posts (&/ using an API)?

TheSecondGilbert wrote:

imfh wrote:

Based on some quick research I did, it's not possible to get a nonce like that unless the page you are scraping provides some extra information. It would be neat if you could prove the origin of a page to others through TLS or something, but it doesn't seem like that is possible.

If whatever you are scraping cryptographically signs pages, then it would be possible to get a nonce. The vast majority of websites don't do that though.

The main other alternative is to build trust with your users so they don't feel the need to have proof or to use some other service which does have trust like archive.org.
That's definitely not what I'm trying to describe here. Let me describe it more clearly.

The form that you use to send a message, or edit your profile, or anything really has hidden inputs on them, with random names and values. You have to include them on your request; the server rejects it if you don't. You can't also send the same inputs multiple times; the server will think that you're sending the same message, and rejects the request.

I call these hidden inputs a “nonce”, for two reasons:
  • The process of getting these “nonce” first before doing anything bears resemblance of any nonce-based cryptographic requests. I think you're confusing what I meant by nonce with this.
  • It's a simple word that adequately describes its purpose: something that's only used once, for that single occasion. At the time of writing, I realized “idempotency keys” could be a better terminology, but that's quite a mouthful over the monosyllabic “nonce”.
To be clear, I'm not describing:
  • A nonce for the purpose of cryptography. Payloads are still sent in plaintext over HTTPS, not encrypted or signed or hashed with the nonce.
  • An “everything” nonce. Nonce values are only used for their requests. You can't use a nonce for sending messages to edit your profile, or vice versa.
Either way, this is just proving that there are cases where a simple curl invocation isn't enough, as convenient as that could be. Especially for my library, you sometimes have to scrape the page in order to get the information you need. (Sometimes, because the service I'm making my library on does have URLs that responds with sane XMLs, but not for the most basic tasks like retrieving message info)
Oh, by nonce, do you mean CSRF tokens? For example, in this current page, I have an element that looks like this:
<input type="hidden" name="csrfmiddlewaretoken" value="secretvalueiprobablyshouldavoidsharing">
At least for this page, it seems like the tokens are still present when I request with curl. If you plan to submit any forms to a website, you will need to extract the tokens from the page though. Some websites or pages might be different and not give the csrf token if you are not logged in or something.

Powered by DjangoBB