Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
From the Data Rescue Project: the Data Rescue Tracker. “The Data Rescue Tracker is a collaborative tool built to catalog existing public data rescue efforts so that we can coordinate better across initiatives. At this stage, you can use the tool to help reduce duplication of rescue efforts. The Data Rescue Tracker aims to provide a consolidated overview of who is backing up which dataset from […]
Scalable, Efficient Processing and Analysis of Large Audio Datasets – Pawel Cyrta – ADCx Gather 2024
https://www.youtube.com/watch?v=lHME1l9cEPk
#coding #Datasets #programming #softwareengineering
#Reddit #AI #ContentModeration #datasets
'Researchers at Cornell Tech have released a dataset extracted from more than 300,000 public Reddit communities, and a report detailing how Reddit communities are changing their policies to address a surge in AI-generated content. '
https://news.cornell.edu/stories/2025/04/dataset-reveals-how-reddit-communities-are-adapting-ai
"Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for “review” under the Trump administration’s direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.
“The problem with archiving this data is that we can’t,” Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.
A message appeared at the top of multiple NIH websites last week that says: “This repository is under review for potential modification in compliance with Administration directives.”
Repositories with the message include archives of cancer imagery, Alzheimer’s disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data."
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
Massive, Unarchivable #Datasets of #Cancer, #Covid, #HIV and #Alzheimer's Research Could Be Lost Forever
Days before RFK announced 10,000 #HHS staffers would lose their jobs, a message appeared on #NIH research repository sites saying they were "under review." Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency.
https://www.404media.co/nih-archives-repositories-marked-for-review-for-potential-modification/
https://archive.ph/Y8asq
#ListenBrainz / #MetaBrainz I'm confused. Aren't sponsors the true customer? Why use this?
On one hand #Music: "Listen together", "Ethical forever"
On the other: #DATASETS
"Some of the world’s biggest platforms such as Google and Amazon, use our data"
"We ask commercial supporters to support us in order to help fund the creation and maintenance of these datasets."
"The following organizations make use of the data-sets published by MetaBrainz"
New Map Of Landscape Beneath Antarctica Unveiled
--
https://phys.org/news/2025-03-landscape-beneath-antarctica-unveiled.html <-- shared technical article
--
https://doi.org/10.1038/s41597-025-04672-y <-- shared paper
--
#GIS #spatial #mapping #Bedmap3 #icebed #surface #thickness #gridded #datasets #Antarctica #raster #model #modeling #landscape #elevation #icesheet #survey #remotesensing #earthobservation #climatechange #warming #climate #melt #melting #seafloor #subglacial #geophysical #survey #topography #geology #bathymetry #topobathy #BritishAntarcticSurvey
@BritishAntarcticSurvey
Academic Torrents is one way to find academic #datasets with BitTorrent: https://academictorrents.com/ (I guess their indexing website is US-hosted, but it's not governmental so less likely to vanish this month.) #torrenting #science
This data may vanish under Trump, so we charted it
Some of most valuable #datasets in human history vanished from #US #government websites, felt like watching Library of Alexandria go up in smoke
Many have gone on record describing #Census Bureau’s #American Community Survey as wonder of modern world
Another loss? #HouseholdPulse survey, online survey that provided week-by-week data on income losses, economic struggles and precarious mental health
https://www.washingtonpost.com/business/2025/02/14/this-data-may-vanish-under-trump-so-we-charted-it/
https://archive.ph/mB512