[ai-control] prevent robots.txt entries from becoming law | Brewster Kahle of the Internet Archive, weighs in re: legally enforceable statements in robots.txt
https://alecmuffett.com/article/113737
#InternetArchive #ai #eula #llm #privacy #scraping

[ai-control] prevent robots.txt entries from becoming law | Brewster Kahle of the Internet Archive, weighs in re: legally enforceable statements in robots.txt
https://alecmuffett.com/article/113737
#InternetArchive #ai #eula #llm #privacy #scraping
Schadensersatz beim Scraping: Weiterhin kein Selbstläufer
Nach dem BGH-Urteil (VI ZR 10/24), wonach ein Datenverlust einen immateriellen Schaden nach der DSGVO begründet, wurde versucht, die Frage zugunsten der Rechtssicherheit schnell ad acta zu legen. Die Urteile höherer Instanzen nach diesem BGH-Urteil lassen jedoch eher vermuten, dass sich der Streit v(...)
https://www.dr-datenschutz.de/schadensersatz-beim-scraping-weiterhin-kein-selbstlaeufer/
New Mango Proxy types just dropped:
Rotating DC — 1M+ IPs, instant rotation, API
Rotating ISP — real IPs, high reputation
Perfect for Ads, scraping, logins
From $0.6/GB @mangoproxy_bot
#proxy #ads #scraping #webtools #datacollection #automation #growthhacking
Ok, time to deploy Anubis in front of Gitea, I'm done with those FAANG oligarchs scraping my repos 24/7 to check if anything changed...
F*ck off.
But that also means Gitea might get unstable for some time, woops
If you are curious : https://git.halis.io
If you see the cute furry, it worked
Turn on the lights for internet... if you can afford cloudflare:
https://ugpl.net/blog/post/turn-on-the-lights-for-internet-if-you-can-afford-cloudflare.html
Le #scraping #payant : vers un changement radical du modèle économique de l’ #IA #AI #générative ?
Civil Society: Cloudflare’s latest change {blocks, unblocks} network use by {people, software} that we {hate, love} – {yay, boo} this is {great, terrible}!
https://alecmuffett.com/article/113629
#ai #censorship #cloudflare #scraping
@akamran @davidtoddmccarty If you search Google for #Mastodon hashtag scraping, you find software and programs that help AI for doing that. It exists.
Fact is that from today, the main instances mastodon.social and mastodon.online prohibit #scraping officially: https://techcrunch.com/2025/06/17/mastodon-updates-its-terms-to-prohibit-ai-model-training/
Problem of decentralisation: admins/users of other instances must get aware of the problem and change their terms, too.
It may be funny but it's no joke.
#Hinweis auf #Nutzbarkeit von #Data #Analytics / #Data #Science #Methoden #Scraping, #Pattern #Recognition, #Machine #Learning oder #Text #Mining für #soziologische #Forschung.
#Sutter / #Maasen - #Neuerfindung #Soziologie S.76 f. 2020 DOI: 10.5771/9783845295008-73
Turn out the lights, the internet is over:
https://ugpl.net/blog/post/turn-out-the-lights-the-internet-is-over.html
@anirvan @404mediaco the only way to deal with this is the same as with any other #malware and #DDoS:
I do maintain a #blocklist of those and will happily accept suggestions and pull requests...
https://github.com/greyhat-academy/lists.d/blob/main/scrapers.ipv4.block.list.tsv
My anti society
collision course
I charted to address
my many anxieties
is rapidly
approaching the end of the line
The impending
gravity induced crash
should be quite the event
given the acceleration
of my descent
The "I told you so"
chorus of the status quo
will be very pleased
#scraping up my bloody carcass
to mount on their Warning Wall
I sure would hate
to give these smug bastards
the confirmation validation
they so desperately need
Deceased me
playing the lead
in their future forewarning
history stories
to unborn rebellious
non conformist generations
about the folly
of living life brazenly
outside the rigid boundaries
constructed with bricks of bullshit
I guess it's time to confess
my internal trepidation
Those pointless
could and should ofs
as if somehow
we are the captains
of our destinies
For you see my soon to be
grieving comrades
the die was cast
for ill fated us
the day we were born
Playing the roles
fate precisely defines
scribed in the unwavering stars
#vss365
3/
For more on scraping (as in web-scraping) see here:
https://mastodon.social/@reiver/114353728684249608
CC: @404mediaco
2/
Scraping (as in Web Scraping) is the act of extracting data from HTML web-pages where the data is NOT machine-legible.
If the data, even in an HTML web-page, is in a machine-legible format, then it is NOT scraping.
...
And, getting data in JSON (key-value pairs) is definitely NOT scraping — as JSON's purpose is to communicate data in a machine-legible manner.
CC: @404mediaco
1/
If these researchers used a typical HTTP-based API that returns JSON, then —
What these researchers did is NOT scraping.
CC: @404mediaco
RE: https://www.404media.co/researchers-scrape-2-billion-discord-messages-and-publish-them-online/
Achtung F.c.book-Nutzer/innen: Sammelklage
Im Jahre 2019 wurde eine große Datenbank mit Angaben über 400.000 Nutzer/innen von F.c.book bekannt. Zwei Jahre später war die Datenbank auf 533 Millionen Einträge angewachsen und kostenlos öffentlich zugänglich. Der Zuwachs
https://www.pc-fluesterer.info/wordpress/2025/05/06/achtung-f-c-book-nutzer-innen-sammelklage/