:tyrellmanic::elliotmanic: Big scraper! :ed::terrylol2:

Looks like they are trying to get all of fedi, including dumping following/followers lists. Looks like (more on this later) they already have following/followers lists for at least a few hundred servers, as well as a few tens of gigs of timeline.

Hostnames: node$x.testsmall10.mastodonmeasure-pg0.utah.cloudlab.us


CNAMEs for those hostnames are hp126, hp142, hp133, hp143, hp141, hp158, hp125, hp130, and hp146.utah.cloudlab.us, respectively.

Sample requests: - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - - - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
Apparently, these are the statuses of the accounts they are scraping:

406425 "completed"
1533427 "pending"
53541 "read"

Here's a 7MB gzip'd JSON file that indicates the current status of the *instances*. The account metadata is too big to attach. Anyway, this is their view of fedi:
Show thread
This is their view of fsebugoutzone.org. There's a lot of interesting information in there. They've got, for example, `"processing_status": "completed"`, but they never launched any successful requests as far as I can tell. (I will have to look through the logs again.) They have `"connections": "33435"`, and this appears to be the peer count. They are using instances.social for the `thumbnail_proxy`, maybe that's where they got the list of instances. They have 16,795 on the list, and all of them have `"processing_status": "completed"`, so the queue is probably stored somewhere else.

"_id": {
"$oid": "673f10ac2aa5fd2484163e28"
"id": "6628f572b42b8088be06b2fb",
"name": "fsebugoutzone.org",
"added_at": "2024-04-24T12:05:06.281Z",
"updated_at": "2024-11-21T10:02:14.842Z",
"checked_at": "2024-11-21T10:02:14.842Z",
"uptime": {
"$numberInt": "1"
"up": true,
"dead": false,
"version": null,
"ipv6": false,
"https_score": {
"$numberInt": "10"
"https_rank": "E ",
"obs_score": {
"$numberInt": "70"
"obs_rank": "B ",
"users": "1733",
"statuses": "2072037",
"connections": "33435",
"open_registrations": false,
"info": {
"short_description": null,
"full_description": "FSE temporary bugout zone until the new code is ready, but mostly a development instance, so it will not run like regular FSE: closed registrations (but if you had an account, your old username/password will work), I will shoot anything that annoys me while I am hacking (and if you are hoping for a warning shot, please do not annoy me until ammo prices drop), etc. Lots of things will be broken periodically because this is a development instance. Direct complaints to your friendly local anarchist @p, or if you are a total dick, just #fediblock us based on rumors/assumptions without trying to reach out, just like you did with FSE! Full explanation here: https://blog.freespeechextremist.com/blog/update-and-roadmap.html";,
"topic": "Free Speech Extremist",
"languages": [],
"other_languages_accepted": true,
"federates_with": "all",
"prohibited_content": [],
"categories": []
"thumbnail": "/instance/thumbnail.jpeg",
"thumbnail_proxy": "https://camo.instances.social/6079b328d69223bfa89de4e89dbfa2a691fbefdc/2f696e7374616e63652f7468756d626e61696c2e6a706567",
"active_users": null,
"email": "fedi@freespeechextremist.com",
"admin": null,
"loadtime": {
"$date": {
"$numberLong": "1732161080203"
"processing_status": "completed"
Show thread
On the "Cloudlab Team" ( https://cloudlab.us/#team ) from the University of Utah:

Jacobus (Kobus) Van der Merwe (no fedi account, interesting background, https://users.cs.utah.edu/~kobus/ )
Jason Wiese (link goes to a department page)

From that same page:

> CloudLab is part of the National Science Foundation's NSFCloud program.

You may remember that @drand got caught using address space allocated to him specifically, and later that he was part of the teams presenting to the NSF on how to remove misinformation from the internet and get away with censorship.
@p @ricci @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28 @Drand I wonder if he's got us blocked or if I can call him a nigger for acting like a nigger (gibs me dat data in dis case)
@p @ricci @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28 @Drand Our timelines return 401 for all of their requests from that address space so if they got anything to/about poast it wasn't directly from us. There was only two instances in that JSON you posted and both of them are tied to us being in the bio of @silverpill as his backup account
@graf @Drand @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28 @ricci @silverpill Oh, shit, okay, yeah, see this? local_instance_name vs. remote_instance_name? They're scraping through other instances, so blocking their IP range isn't going to stop them from getting Poast accounts unless everyone Poast federates with has blocked that range.
@p @ricci @silverpill @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28 @Drand Yeah I assumed. The same idea with blocking google et al. Google and other search engines index poast users and accounts through other instances. C'est la vie. Thankfully, we are blocked by a lot.
yah, clew.lol doesn't index shit, but a bunch of other instances have left some things searchable on google.

So much for community commitment to anonymity smh
@Tony @graf @ricci @silverpill @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28 @Drand Well, there's search engines, and then there are people hammering the shit out of my site several times a second, impersonating a legitimate user by shoving a bearer token in, *badly* impersonating a legitimate user by naming themselves "python-requests", and doing this to scrape 40GB of user relationships and posts.
@SilverDeth @Doll @Drand @Tony @graf @silverpill

> Fucking feds.

Pic related.

But give that report about the NSF a read if you get a chance.

@p @Tony @Doll @SilverDeth @silverpill @graf
> cia-chan
Who dared humanise the CIA? That's repugnant and absolutely unacceptable! 😤

Sure did! It was more of a rhetorical question though 😅
Anthropomorphisation of FBI as a cute girl makes what they do more… acceptable.

@Tony @Doll @SilverDeth @silverpill @graf

@m0xee @Tony @Doll @SilverDeth @silverpill @graf That is the CIA and I don't think making the CIA a cute anime girl made me pleased with the CIA. View these images and then tell me if they make you more or less likely to run Windows XP.

Hey, Nanami Madobe made me switch to Windows 8 from Mac OS X — best mascot ever! 😍

inb4: no I do not eat at McDonald's 🤭

@Tony @Doll @SilverDeth @silverpill @graf

@m0xee @Tony @Doll @SilverDeth @silverpill @graf I don't recommend installing Windows at any point, but Cirno is 9front's mascot, so I think you're compelled to switch:
Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml