The Hardest Part of Engineering Isn’t Programming

It’s alignment, collaboration, and making systems work inside organizations

Sep 06, 2025

A few months back, I found myself staring at a problem that felt deceptively simple:
“We need a unified view of how our APIs are doing across the company.”

Sounds easy, right? Just throw some dashboards, grab a few metrics, and boom job done.
Except, no. This was one of those problems where the devil wasn’t in the details, the devil was the details.

Background

Our company had APIs everywhere - some powering core products, others quietly serving backend needs, and many more that just… existed. But no matter how important or obscure, they all had one thing in common:

Nobody had a single view of how they were doing.

Were they following governance standards?
Did they meet latency SLAs (P95, P99)?
Were they failing silently somewhere?

Each team had their own dashboards, their own logs, their own way of tracking. Imagine trying to run a city where every neighborhood has its own maps, traffic lights, and bylaws but nobody talks to each other. That’s where we were.

Our mission: build a “control tower” for all APIs in one place.

The Naive First Answer: “Just Read the Logs”

The obvious first thought was: well, the data already exists. APIs log everything - successes, failures, latencies, requests. Just scoop it up.

Then we asked our edge gateways what they were logging, the answer came back like a punch:

20–25 terabytes of logs per day. Billions of events.

Let that sink in. Every. Single. Day.

That meant:

Gigantic ingestion pipelines.
Massive storage costs (cold + hot tiers).
Complex transformation logic before we could even query anything useful.

What sounded like “just read the logs” turned into “build a data warehouse project the size of a small startup.”

Talking to the Gatekeepers

The next step was obvious but painful: talk to the teams running these edge gateways.

This is where the non-technical pain starts:

They spoke a different “language” than us - security and infra-first, while we cared about API-level insights.
Their logs had dozens of fields we didn’t care about, and were missing fields we desperately needed.
They already had their workflows; our requests felt like interruptions.

We had to set up meeting after meeting:

“What exactly are you logging?”
“Can we get field X?”
“What does this cryptic field mean?”
“Oh, you’re logging to Splunk - can we tap into it?”

And yes, Splunk had all the data. But it wasn’t designed for the kind of structured, API-centric queries we wanted.

The Pain of Building From Scratch

At this point, the obvious engineering option was:

Ingest logs from Splunk.
Store them in S3 or Databricks.
Build structured tables.
Layer insights pipelines on top.

It sounds like progress… but think about the effort:

Time: months just to get ingestion stable.
Cost: storing petabytes within weeks.
Ops Overhead: keeping pipelines alive against daily terabytes.
Redundancy: building yet another giant data system when the company already had dozens.

It felt wrong. Painfully wrong.

The Twist: Someone Already Did the Hard Part

Here’s where luck and persistence collide.

While poking around, we discovered another team in the company had already built a massive ingestion pipeline - for security events.

They were reading from 60+ different log sources.
Normalizing data into the OCSF format (Standard Logging).
Storing everything in structured tables.
Running Kafka, databases, pipelines - the whole machinery.

They weren’t solving our problem, but the foundation was eerily similar.

So we had a choice:

Option A: Keep building our own “parallel empire,” doubling costs and effort.
Option B: Swallow our ego, walk over, and ask to collaborate.

We chose Option B.

Collaboration ≠ Easy

Now, collaboration sounds noble in principle. But in practice, it’s one of the hardest things in engineering.

Their priorities were security, ours were API observability.
Their timelines were different.
Their formats weren’t built with APIs in mind.
And, let’s be honest, not everyone likes outsiders coming in with “we want to use your system for our stuff.”

It took weeks - literally three to four weeks - of discussions, clarifications, requirement docs, and alignment calls.

Some conversations went in circles:

“Can we add these fields?”
“Will your pipeline support our query scale?”
“How much additional load will this put on you?”

At times, it felt like more diplomacy than engineering.

The 10% Coding Myth

Here’s the part that might shock a lot of junior engineers: the actual coding part of this project was maybe 10–15% of the total effort.

The rest?

Negotiating requirements.
Understanding existing systems.
Aligning with teams who didn’t always share our goals.
Making trade-offs with cost, scale, and compliance.
Documenting and explaining why we weren’t reinventing the wheel.

And this is the pain nobody prepares you for: the pain of alignment.

Where AI Still Fails

People often ask: “Why won’t AI replace engineers soon?”

Here’s why:

AI can generate SQL queries, code snippets, maybe even pipeline scripts. But it can’t:

Sit in a room with five stakeholders and resolve conflicting goals.
Decide whether to build vs reuse based on organizational context.
Balance storage costs against real-time needs.
Handle the messy human side of building systems inside large organizations.

That’s the stuff that eats up months and it’s the stuff AI can’t touch (yet).

The Endgame

Eventually, we piggybacked on the existing ingestion pipeline.

We stored billions of events in cold storage, moved slices into hot storage for fast queries, and built insights pipelines on top.

The result: a working API observability layer without rebuilding the entire ingestion empire from scratch.

But the lesson wasn’t just about logs, APIs, or storage tiers.

Engineering is not just:

Knowing the best data structure.
Writing the cleanest code.
Or deploying the fastest service.

It’s about solving problems inside the real-world mess:

Systems already exist, often imperfectly.
People have competing goals.
Costs and trade-offs matter.
Reinventing the wheel is easy but dangerous.

Engineering = 20% tech, 80% alignment.

That’s the truth no coding tutorial tells you.

When I look back, the story wasn’t about APIs at all. It was about realizing that as you grow in your career, the bottleneck isn’t your coding speed. It’s your ability to navigate systems - technical, organizational, and human.

And if you ever find yourself drowning in 25 terabytes of logs a day, remember: the solution might not be another line of code. It might be a conversation.

Skilled Coder

Discussion about this post

Ready for more?