Nume MacAroon Ⓥ @nm

**Rami Krispin** @ramikrispin@mstdn.social · 19h

19h

Rami Krispin @ramikrispin@mstdn.social

My weekly newsletter is out!

https://ramikrispin.substack.com/p/the-snowflake-ai-toolkit-project

Rami's Data Newsletter · 19hThe Snowflake-AI-Toolkit Project, R for Economic Research, New TutorialsBy Rami Krispin

#RStats #ai #MachineLearning

**pipTrends** @piptrends@mastodon.social · 2d

pipTrends @piptrends@mastodon.social

Roni "Lupin" Carta shared their hacking journey targeting Google’s latest AI, Gemini, as part of the bugSWAT program. In the write-up, he detailed their approach, the techniques they used and how they ultimately discovered a vulnerability.

https://www.landh.tech/blog/20250327-we-hacked-gemini-source-code/

www.landh.techWe hacked Google’s A.I Gemini and leaked its source code (at least some part) - Lupin & Holmes

#python #Programming #PythonProgramming

**Kirill Bobrov** @luminousmen@mastodon.social · 3d

Kirill Bobrov @luminousmen@mastodon.social

How to not cry when AWS breaks?

(Hint: it's not "hope and pray".)

AWS promises infinite scalability, resilience, low latency... but only if you know how to play the game.

In my latest blog post, I break down the AWS Lego blocks and explain how regions and availability zones really work: https://luminousmen.com/post/understanding-aws-regions-and-availability-zones-a-guide-for-beginners

[FREE] Join the community of data engineers to receive practical lessons from the trenches straight to your inbox! Subscribe here: https://luminousmen.substack.com/welcome

luminousmenUnderstanding AWS Regions and Availability Zones: A Guide for BeginnersHigh Availability in the cloud: why us-east-1 alone is not a strategy (it's a gamble)

#dataengineering

**pipTrends** @piptrends@mastodon.social · 3d

pipTrends @piptrends@mastodon.social

If you frequently create new Python projects, you’ve probably used cookiecutter templates. But if you're looking for an alternative with more flexibility, copier package is a great option. In this article, Tucker Beck showed how to build and use project templates with copier.

https://blog.dusktreader.dev/2025/04/06/bootstrapping-python-projects-with-copier/

blog.dusktreader.devBootstrapping Python projects with copier - the.dusktreader blog

#python #Programming #PythonProgramming

**Doug Ortiz** @dougortiz@mastodon.social · 4d

Doug Ortiz @dougortiz@mastodon.social

Tired of writing complex SQL queries?

This video course section shows how to connect Vanna.AI to PostgreSQL, enabling natural language queries through Retrieval-Augmented Generation.

Ask your database questions in plain English and get instant insights - complete with visualizations. The future of database interaction is here!

https://link.illustris.org/connectingpgtovannaai
#PostgreSQL #VannaAI #RAG #DataEngineering http://vanna.ai/

link.illustris.orgConnecting Vanna.ai to PostgreSQL | LinkedIn Learning, formerly Lynda.comLearn the process of integrating Vanna.ai with a PostgreSQL database, and understand configuration requirements for optimal performance.

**Kirill Bobrov** @luminousmen@mastodon.social · 4d

Kirill Bobrov @luminousmen@mastodon.social

The Lies We Tell Ourselves About HA

"AWS is reliable by default"
Nope. AWS gives you tools to build reliability. It doesn't promise you won't screw it up.

"Multi-AZ is expensive, we'll just scale vertically"
Nothing is more expensive than downtime at scale.

"Our SLA says 99.9% uptime, that's good enough"
Yeah, until you realize that 99.9% = almost 9 hours of downtime a year. Now imagine explaining that to a finance team watching live transactions fail.

#dataengineering

**Ralph Straumann (@rastrau)** @rastrau@swiss.social · 4d

Ralph Straumann (@rastrau) @rastrau@swiss.social

Immer wieder wird im Geschäftskontext über #Datenqualität gesprochen, oft zusammen mit «authoritativeness», Entstehungskontext, #Governance-Modelle etc. Aber es lohnt sich meines Erachtens, zuerst die Begrifflichkeiten und die Bedeutung von #Daten-qualität zu klären. Beginn eines Versuchs: https://digital.ebp.ch/2025/04/29/datenqualitaet #DataManagement #DataEngineering #DataScience

digital.ebp.ch · 4dDatenqualität!?Immer wieder wird im Geschäftskontext über Datenqualität gesprochen. Aktuell wird Datenqualität in Diskussionen häufig mit anderen Themen verwoben: Autorität im Sinn von «authoritativeness», Offizi…

**Rami Krispin** @ramikrispin@mstdn.social · Apr 26

Apr 26

Rami Krispin @ramikrispin@mstdn.social

My weekly newsletter is out - Airflow 3.0 review, Think Stats book, and new tutorials

https://ramikrispin.substack.com/p/review-of-airflow-30-think-stats

Rami's Data Newsletter · Apr 26Review of Airflow 3.0, Think Stats 3rd Edition, New TutorialsBy Rami's Data Newsletter

#ai #datascience #machinelearning

**Kirill Bobrov** @luminousmen@mastodon.social · Apr 24

Apr 24

Kirill Bobrov @luminousmen@mastodon.social

In my recent blog post, I said CDC solves all the performance problems because "You're not hammering the source database...": https://luminousmen.com/post/change-data-capture

That's... a bit misleading.

- Log-based CDC still creates overhead, especially in high-throughput environments.
- Reading the transaction log (WAL/redo logs) can add I/O contention if not tuned.
- If poorly configured, CDC tools like Debezium, Maxwell, whatever, can still impact DB performance.

luminousmenChange Data Capture (CDC)Change Data Capture (CDC): what is it and why it's important?

#dataengineering

**pipTrends** @piptrends@mastodon.social · Apr 24

Apr 24

pipTrends @piptrends@mastodon.social

This week newsletter will be out in 2 days. Interesting stuff by @carlk Vuk Rosić, Federico Trotta, @treyhunner & Stephen Diehl covered

https://newsletter.piptrends.com/p/optimize-your-python-program-for

newsletter.piptrends.comOptimize your Python Program for Slowness, Mutable default arguments and morewith some more interesting news, articles, packages and projects

#python #Programming #PythonProgramming

**pipTrends** @piptrends@mastodon.social · Apr 24

Apr 24

pipTrends @piptrends@mastodon.social

This course by Vuk Rosić gives a complete deep dive into DeepSeek V3, a state-of-the-art deep learning model. He covered theoretical explanations with step-by-step coding instructions to make things easier to understand and implement the model from scratch.

https://www.youtube.com/watch?v=5avSMc79V-w

YouTubeCode DeepSeek V3 From Scratch in Python - Full CourseBy freeCodeCamp.org

#python #Programming #PythonProgramming

**Hacker News** @h4ckernews@mastodon.social · Apr 24

Apr 24

Hacker News @h4ckernews@mastodon.social

Vim Language, Motions, and Modes Explained

https://www.ssp.sh/blog/why-using-neovim-data-engineer-and-writer-2023/

Data Engineering Blog · Jan 3, 2023Why Vim Is More than Just an Editor – Vim Language, Motions, and Modes ExplainedVim is a powerful text editor that improves coding speed & efficiency through its shortcut-based Vim language. While learning Vim can be challenging, it is a valuable skill that can enhance your career as a computer professional.

#HackerNews #Vim #Language

**Recce - Trust, Verify, Ship** @DataRecce@mastodon.social · Apr 24

Apr 24

Recce - Trust, Verify, Ship @DataRecce@mastodon.social

Recce 1.0 is now live on Product Hunt!

https://www.producthunt.com/posts/recce-4

Upvote and leave a comment to help us grow the Recce community and bring better data review processed to more data teams

Thanks for your support!

Product HuntRecce - Explore, validate, and share data impact before merging | Product HuntRecce helps data teams discover actual data impact and turn insight into actionable checklists for dbt pull request reviews. It’s a practical way to implement data best practices - know what’s changing, understand the impact, and merge with clarity.

#OpenSource #Data #DataEngineering

**Kirill Bobrov** @luminousmen@mastodon.social · Apr 23

Apr 23

Kirill Bobrov @luminousmen@mastodon.social

In a microservices setup, data drift is the silent killer - services diverge, caches desync, and suddenly your customer data is a mess. I once saw a team lose days chasing a bug caused by a 15-minute replication lag. CDC steps in with a log-based lifeline, propagating changes across services with minimal latency and no tight coupling. It's not a silver bullet, but it’s damn close for distributed consistency.

#dataengineering

**Hacker News** @h4ckernews@mastodon.social · Apr 23

Apr 23

Hacker News @h4ckernews@mastodon.social

Lessons learned operating petabyte-scale ClickHouse clusters: Part II

https://www.tinybird.co/blog-posts/what-i-learned-operating-clickhouse-part-ii

www.tinybird.coLessons learned from 5 years operating huge ClickHouse® clusters: Part IIThis is the second part of the series. Here's more of what I've learned from operating petabyte-scale ClickHouse clusters for the last 5+ years.

#HackerNews #Lessons #ClickHouse

**Jan Antoš** @janantos@mstdn.social · Apr 23

Apr 23

Jan Antoš @janantos@mstdn.social

Great example when software/data engineering lack of skills will cost you dozens thousands dollars on a few clicks in #Cloud. Senior #Data Engineer in undisclosed company changed Cloud Storage Lifecycle Policy and caused 600 milion files from Standard Storage to Tier 2 Archive, which caused cost about $60k. Those fancy hidden costs in cloud. BTW this is hidden in Google cost calculator under advanced section. #dataengineering #Software #softwaredevelopment

**Kirill Bobrov** @luminousmen@mastodon.social · Apr 23

Apr 23

Kirill Bobrov @luminousmen@mastodon.social

#dataengineering

**Kirill Bobrov** @luminousmen@mastodon.social · Apr 22

Apr 22

Kirill Bobrov @luminousmen@mastodon.social

Batch processing has its fans - simple, predictable, cheap. But when your ETL jobs take hours, and business need it now, the math doesn't add up.

CDC changes the story with incremental updates, slashing latency. See why CDC outpaces batch in my new article: https://luminousmen.com/post/change-data-capture

[FREE] Join the community of data engineers to receive practical lessons from the trenches straight to your inbox! Subscribe here: https://luminousmen.substack.com/welcome

#dataengineering

**pipTrends** @piptrends@mastodon.social · Apr 22 *

Apr 22 *

pipTrends @piptrends@mastodon.social

We all aim to make our programs faster, but have you ever tried doing the opposite? In this article, @carlk demonstrated how a simple nested loop can create a program that runs longer than the universe's lifetime. He dove into concepts like tetration (yes, it goes beyond exponentiation), 5-State Turing Machines

https://towardsdatascience.com/how-to-optimize-your-python-program-for-slowness/

Towards Data Science · Apr 8How to Optimize your Python Program for Slowness | Towards Data ScienceWrite a short program that finishes after the universe dies

#python #Programming #PythonProgramming