veganism.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Veganism Social is a welcoming space on the internet for vegans to connect and engage with the broader decentralized social media community.

Administered by:

Server stats:

240
active users

#DataEngineering

7 posts7 participants0 posts today

Roni "Lupin" Carta shared their hacking journey targeting Google’s latest AI, Gemini, as part of the bugSWAT program. In the write-up, he detailed their approach, the techniques they used and how they ultimately discovered a vulnerability.

landh.tech/blog/20250327-we-ha

www.landh.techWe hacked Google’s A.I Gemini and leaked its source code (at least some part) - Lupin & Holmes

How to not cry when AWS breaks?

(Hint: it's not "hope and pray".)

AWS promises infinite scalability, resilience, low latency... but only if you know how to play the game.

In my latest blog post, I break down the AWS Lego blocks and explain how regions and availability zones really work: luminousmen.com/post/understan

👉 [FREE] Join the community of data engineers to receive practical lessons from the trenches straight to your inbox! Subscribe here: luminousmen.substack.com/welco

luminousmenUnderstanding AWS Regions and Availability Zones: A Guide for BeginnersHigh Availability in the cloud: why us-east-1 alone is not a strategy (it's a gamble)

If you frequently create new Python projects, you’ve probably used cookiecutter templates. But if you're looking for an alternative with more flexibility, copier package is a great option. In this article, Tucker Beck showed how to build and use project templates with copier.

blog.dusktreader.dev/2025/04/0

blog.dusktreader.devBootstrapping Python projects with copier - the.dusktreader blog

💡 Tired of writing complex SQL queries?

This video course section shows how to connect Vanna.AI to PostgreSQL, enabling natural language queries through Retrieval-Augmented Generation.

Ask your database questions in plain English and get instant insights - complete with visualizations. The future of database interaction is here!

link.illustris.org/connectingp
#PostgreSQL #VannaAI #RAG #DataEngineering vanna.ai/

link.illustris.orgConnecting Vanna.ai to PostgreSQL | LinkedIn Learning, formerly Lynda.comLearn the process of integrating Vanna.ai with a PostgreSQL database, and understand configuration requirements for optimal performance.

The Lies We Tell Ourselves About HA

❌ "AWS is reliable by default"
Nope. AWS gives you tools to build reliability. It doesn't promise you won't screw it up.

❌ "Multi-AZ is expensive, we'll just scale vertically"
Nothing is more expensive than downtime at scale.

❌ "Our SLA says 99.9% uptime, that's good enough"
Yeah, until you realize that 99.9% = almost 9 hours of downtime a year. Now imagine explaining that to a finance team watching live transactions fail.

In my recent blog post, I said CDC solves all the performance problems because "You're not hammering the source database...": luminousmen.com/post/change-da

That's... a bit misleading.

- Log-based CDC still creates overhead, especially in high-throughput environments.
- Reading the transaction log (WAL/redo logs) can add I/O contention if not tuned.
- If poorly configured, CDC tools like Debezium, Maxwell, whatever, can still impact DB performance.

luminousmenChange Data Capture (CDC)Change Data Capture (CDC): what is it and why it's important?

In a microservices setup, data drift is the silent killer - services diverge, caches desync, and suddenly your customer data is a mess. I once saw a team lose days chasing a bug caused by a 15-minute replication lag. CDC steps in with a log-based lifeline, propagating changes across services with minimal latency and no tight coupling. It's not a silver bullet, but it’s damn close for distributed consistency.

Great example when software/data engineering lack of skills will cost you dozens thousands dollars on a few clicks in #Cloud. Senior #Data Engineer in undisclosed company changed Cloud Storage Lifecycle Policy and caused 600 milion files from Standard Storage to Tier 2 Archive, which caused cost about $60k. Those fancy hidden costs in cloud. BTW this is hidden in Google cost calculator under advanced section. #dataengineering #Software #softwaredevelopment

Batch processing has its fans - simple, predictable, cheap. But when your ETL jobs take hours, and business need it now, the math doesn't add up.

CDC changes the story with incremental updates, slashing latency. See why CDC outpaces batch in my new article: luminousmen.com/post/change-da

👉 [FREE] Join the community of data engineers to receive practical lessons from the trenches straight to your inbox! Subscribe here: luminousmen.substack.com/welco

We all aim to make our programs faster, but have you ever tried doing the opposite? In this article, @carlk demonstrated how a simple nested loop can create a program that runs longer than the universe's lifetime. He dove into concepts like tetration (yes, it goes beyond exponentiation), 5-State Turing Machines

towardsdatascience.com/how-to-

Towards Data Science · How to Optimize your Python Program for Slowness | Towards Data ScienceWrite a short program that finishes after the universe dies