Roni "Lupin" Carta shared their hacking journey targeting Google’s latest AI, Gemini, as part of the bugSWAT program. In the write-up, he detailed their approach, the techniques they used and how they ultimately discovered a vulnerability.
https://www.landh.tech/blog/20250327-we-hacked-gemini-source-code/
How to not cry when AWS breaks?
(Hint: it's not "hope and pray".)
AWS promises infinite scalability, resilience, low latency... but only if you know how to play the game.
In my latest blog post, I break down the AWS Lego blocks and explain how regions and availability zones really work: https://luminousmen.com/post/understanding-aws-regions-and-availability-zones-a-guide-for-beginners
[FREE] Join the community of data engineers to receive practical lessons from the trenches straight to your inbox! Subscribe here: https://luminousmen.substack.com/welcome
If you frequently create new Python projects, you’ve probably used cookiecutter templates. But if you're looking for an alternative with more flexibility, copier package is a great option. In this article, Tucker Beck showed how to build and use project templates with copier.
https://blog.dusktreader.dev/2025/04/06/bootstrapping-python-projects-with-copier/
Tired of writing complex SQL queries?
This video course section shows how to connect Vanna.AI to PostgreSQL, enabling natural language queries through Retrieval-Augmented Generation.
Ask your database questions in plain English and get instant insights - complete with visualizations. The future of database interaction is here!
https://link.illustris.org/connectingpgtovannaai
#PostgreSQL #VannaAI #RAG #DataEngineering http://vanna.ai/
The Lies We Tell Ourselves About HA
"AWS is reliable by default"
Nope. AWS gives you tools to build reliability. It doesn't promise you won't screw it up.
"Multi-AZ is expensive, we'll just scale vertically"
Nothing is more expensive than downtime at scale.
"Our SLA says 99.9% uptime, that's good enough"
Yeah, until you realize that 99.9% = almost 9 hours of downtime a year. Now imagine explaining that to a finance team watching live transactions fail.
Immer wieder wird im Geschäftskontext über #Datenqualität gesprochen, oft zusammen mit «authoritativeness», Entstehungskontext, #Governance-Modelle etc. Aber es lohnt sich meines Erachtens, zuerst die Begrifflichkeiten und die Bedeutung von #Daten-qualität zu klären. Beginn eines Versuchs: https://digital.ebp.ch/2025/04/29/datenqualitaet #DataManagement #DataEngineering #DataScience
My weekly newsletter is out - Airflow 3.0 review, Think Stats book, and new tutorials
https://ramikrispin.substack.com/p/review-of-airflow-30-think-stats
In my recent blog post, I said CDC solves all the performance problems because "You're not hammering the source database...": https://luminousmen.com/post/change-data-capture
That's... a bit misleading.
- Log-based CDC still creates overhead, especially in high-throughput environments.
- Reading the transaction log (WAL/redo logs) can add I/O contention if not tuned.
- If poorly configured, CDC tools like Debezium, Maxwell, whatever, can still impact DB performance.
This week newsletter will be out in 2 days. Interesting stuff by @carlk Vuk Rosić, Federico Trotta, @treyhunner & Stephen Diehl covered
https://newsletter.piptrends.com/p/optimize-your-python-program-for
This course by Vuk Rosić gives a complete deep dive into DeepSeek V3, a state-of-the-art deep learning model. He covered theoretical explanations with step-by-step coding instructions to make things easier to understand and implement the model from scratch.
Vim Language, Motions, and Modes Explained
https://www.ssp.sh/blog/why-using-neovim-data-engineer-and-writer-2023/
Recce 1.0 is now live on Product Hunt!
https://www.producthunt.com/posts/recce-4
Upvote and leave a comment to help us grow the Recce community and bring better data review processed to more data teams
Thanks for your support!
In a microservices setup, data drift is the silent killer - services diverge, caches desync, and suddenly your customer data is a mess. I once saw a team lose days chasing a bug caused by a 15-minute replication lag. CDC steps in with a log-based lifeline, propagating changes across services with minimal latency and no tight coupling. It's not a silver bullet, but it’s damn close for distributed consistency.
Lessons learned operating petabyte-scale ClickHouse clusters: Part II
https://www.tinybird.co/blog-posts/what-i-learned-operating-clickhouse-part-ii
Great example when software/data engineering lack of skills will cost you dozens thousands dollars on a few clicks in #Cloud. Senior #Data Engineer in undisclosed company changed Cloud Storage Lifecycle Policy and caused 600 milion files from Standard Storage to Tier 2 Archive, which caused cost about $60k. Those fancy hidden costs in cloud. BTW this is hidden in Google cost calculator under advanced section. #dataengineering #Software #softwaredevelopment
Batch processing has its fans - simple, predictable, cheap. But when your ETL jobs take hours, and business need it now, the math doesn't add up.
CDC changes the story with incremental updates, slashing latency. See why CDC outpaces batch in my new article: https://luminousmen.com/post/change-data-capture
[FREE] Join the community of data engineers to receive practical lessons from the trenches straight to your inbox! Subscribe here: https://luminousmen.substack.com/welcome
We all aim to make our programs faster, but have you ever tried doing the opposite? In this article, @carlk demonstrated how a simple nested loop can create a program that runs longer than the universe's lifetime. He dove into concepts like tetration (yes, it goes beyond exponentiation), 5-State Turing Machines
https://towardsdatascience.com/how-to-optimize-your-python-program-for-slowness/
I created a curated list of AI, data science, and data engineering newsletters:
Enjoy!