Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I fixed a bug the other day (javiergonzalez.io)
169 points by javier123454321 on Aug 11, 2023 | hide | past | favorite | 114 comments


> Deny that there is a problem. > Deny that it is your problem. > Ask for more information. > Complain.

Oh my does this ever bring traumatic memories!

This entire article reminds me of a previous company I used to work for. There was a sufficiently large number of engineers that had a similar, if not more elaborate, deliberate, and sinister way of responding to bugs:

* Deny {bug} is a problem - "I don't see how this affects you/anyone?"

* Deny {bug} can even be fixed - "Ok, fair, but I don't see how this can be fixed?"

* Deny {bug} is their problem - "Ok, fair, but have you tried asking {other employee?}, they are responsible for this." (they are obviously not)

* Be rude - "If you think this is a problem, then go fix it"

All this was just obvious defensive walls of lies to protect their reputation and hide their lack of skill and their laziness.

Around 5% of the engineers were like this and I've always analogized it to the proportion of U238 in a sample of Uranium - just enough enriched dogshit to cause a critical mass of misery.

I left (for greener pastures) shortly after I had a couple of tasks requiring interaction with these "engineers", which made working around them impossible, and then realizing that leadership had totally lost control of the zoo and didn't care one iota.


Unfortunately this same attitude bleeds into consumer-facing customer support so often.

I'll find a bug, and message support about it.

Yes - I've cleared cookies, Yes - I've tried incognito, Yes - I've rebooted, Yes - I've tried a separate browser, Yes - I've tried a VPN, Yes - I've tried different DNT, Yes - I've tried incognito, Yes - I've tried rebooting, Yes - I've tried an entirely separate device.

Infuriating. I'm sure it catches plenty of tech illiterate people's garbage bug reports that are just their issue - but so often I'm left banging my head on a desk when I know it's an actual problem.


It catches tech literate people's bugs so often too.

I maintain my teams VPN, and I have a script that I ask people to run to generate logs when there's an issue. It checks connectivity, flushes DNS, restarts the VPN and toggles the network device off and on again, then checks connectivity again.

The only person so far who this didn't fix the issue for (on an engineering team) was someone who had another VPN client installed and running, despite their insistence that they had followed the setup instructions clearly. That person was an engineer working on infrastructure and online services.


Apple wouldn’t even tell me they refuse to ship a replacement AirPod (I have to make a repair appointment) until I went through their trouble shooting process.

The worst part is if they can fix my airpod, they will charge me $69 for the fix. If they can't, they will charge me $69 to replace the airpod. If I say its “lost”, they’ll make me go through “find my” and then charge $69 for the replacement. I’m considering putting it in a blender to speed things up.


I think you misunderstood this whole process. Airpods are impractical to repair so when you get it "repaired" at the apple store or mail it in, they are always giving you brand new airpods. If you don't have applecare, "repairing" costs the same as buying a replacement because that's exactly what you're doing.


I dont think this is the same problem.


My goodness, are you me? This mirrors my experience to a T, almost exactly when I’ve tried to change process for the better. By far the one that hits the most is be rude: “well, just start fixing things then.” It was a daily struggle to have to work around the nay-say and lazy and equivocation and it was so demoralizing knowing that there was a simple fix but people didn’t want to do it because it would out their incompetence and start to reveal the tremendous pile of tech debt they’ve built everything else on.

The most funny thing was that this was not a huge organization at all, it was small for the scale they were working at but certain individuals have ossified the culture to a point where you would have thought it was hundreds of others. So frustrating, day in and day out.

> leadership had totally lost control of the zoo

This kills me

So glad to see I’m not alone!


>people didn’t want to do it because it would out their incompetence and start to reveal the tremendous pile of tech debt they’ve built everything else on

All part of the great circle of (software) life. Once they pass on from the company, you'll rip out their old, ossified way of doing things, and then when your new purer way hits the hard reality of business requirements and other considerations that don't fit in the nice world of software abstraction, you end up adding patches and edge cases and caveats until your software ends up looking a lot like the old one. And then a new dev comes in and complains that your process is terrible and they could do it so much better...


Regarding tech leadership:

Perhaps I've just been unlucky, but a rather depressing proportion of tech leadership that I have been under in the past appear to have seeked the role in order to:

1. Hide lack of engineering aptitude.

2. Put less work in.

If i'm on the right track with that, then it well explains why they don't care about cultivating a good engineering culture and let everything devolve into a zoo - i.e. everybody feels like everyone is out to get each other, no team work, rep farming, and all that nonsense.

Regarding getting work done:

I agree, I actually believe that it's the cause of the slow death of many companies: A gradual acceptance of nay-sayers, laziness, and "losers", that is to say an acceptance of people who just think that everything is lost, there's no point to fixing anything, and instead choses to spend their time placating why nothing can be done about anything.

It's depressing, and was traumatic for when I was working in those environments in the past. (I like to think) I'm a get shit done and deliver awesome products kind of guy, so these "loser" engineers and leaders are a true PITA.


It can be tough because a lot of the work I do that gets me or keeps me in a technical leadership position is building lieutenants. That necessitates me transferring ownership of things to them, so they can grow or at least demonstrate to the organization that they have adulting skills and need a raise.

That can look a lot like me trying to do less work. Most of the time what I’m doing instead is looking at problems we “didn’t have time for” but are about to bury us. I can only juggle so many balls at once. What’s your favorite color? <hands you one>


I had level 5’s telling me this shit when I was a (mis-leveled) 3 and they were doing the management bullshit thing of “bring me solutions not problems” bitch what do you think a staff/principal engineers job is if it isn’t finding problems before they find us? In a code as packed to the gills with tribal knowledge people don’t even know where the problems live. They just see the problem.

They stopped saying this when I reminded them that if I knew the answer I’d have their job. They still never actually lead anything.


“well, just start fixing things then.”

Also a real problem when management and leadership discourage working outside the "standard process" and incentives punish working around it. The person responsible won't work on it, and anyone else who could resolve the issue won't because they know they'll get dinged for not following the process.


I very recently had to push back on a bug that wasn’t a bug in my part of the stack. If they wanted it “fixed” in my part, it would be a feature. (Enum values from an API being localized — these were hardcoded on the backend). IMHO, these should be localized in the client, which has the current locale (I do not have the current locale and no strings are translated from this API by this API).

That was pretty fun, but I don’t think generalizing “denying a bug” as saying we are being “lazy” is a correct assumption.


I had a similar experience adding new functionality to a web application. It was a simple enough feature that I was able to supply the English myself, but I knew enough not to provide the international copy, even though the only other language we supported was a language I studied in school. I asked the product manager a few times, then created a ticket assigned to them and sent them the link. Meanwhile, my code got reviewed and deployed, behind a feature flag of course.

As soon as they found out the code was deployed, the product manager asked if we could turn on the feature. I reminded them that the reason we supported the other language was that it was a legal requirement for our customers in a certain country. They said turn in on, nobody was going to notice except a big US customer who desperately needed it. So I did.

Within a few hours we had an enraged customer on the phone. Credit to the product manager, they took full responsibility, but the initial response from the founder/head of customer success who took the call was to walk straight to my desk and ask me what the hell I did.


I can only think of a few countries where there are legal requirements for two languages. Was this French Canadian by chance?


Yep, that was it.


I agree, "denying a bug" does not equal "lazy", but rather "denying a bug one has created" does at best equal "lazy", at worst malicious (e.g. rep protection).


Sounds like the four stage strategy of the Standard Foreign Office response in a time of crisis in the BBC show Yes Minister:

In stage one we say nothing is going to happen.

Stage two, we say something may be about to happen, but we should do nothing about it.

In stage three, we say that maybe we should do something about it, but there's nothing we can do.

Stage four, we say maybe there was something we could have done, but it's too late now.


Heh I am glad I have not had many colleagues like this, but 1-2 this is 100% their modus operandi. Even worse though is when management understands this and passively rewards it by not intervening. In one case it got so bad the employee was promoted because they finally shipped something new. A very short time after feature delivery (and being promoted), serious customer bugs started trickling in. Well you know what happened next, more gas lighting.

I'd like to say there were finally some consequences, but no. The office was thrown a curve ball with large layoffs. Now that I think about it, maybe my ex colleague masterfully planned the whole thing.


The problem of that is they get defensive because people at the top doesn't stop the starting of the Blame Games when a bug or issue is found. Bad managers (or founders) are responsible for not stopping or not caring to stop this game. As business owners they are being quite silly as such culture makes costs and turnover of talent go up (and margins go down).


am in the same boat and the ppl responsible for the issues are always busy denying . Some of these ppl have also got into management, so they have got management cover. The only way things get solved is if some other team takes over the codebase maintained by them, but the actual culprits are failing upwards and moving into management.


At 5.4% of U235 enriched U238 , it requires infinite mass to reach criticality. You will need at least 20% U235 to reach criticality and you will need hundreds of kilograms of the stuff.


Really? Most nuclear reactors require enriched uranium, which is uranium with higher concentrations of 235U ranging between 3.5% and 4.5% Did you meant prompt criticality?


Prompt criticality is the best criticality!


In my company we have a bug sitting in the to do list for about a year.

The root cause was a previous change in requirements that was not well defined and explored a couple of years ago. None of the original developers are working for the company anymore.

We offered ourselves to fix that bug, but it would take at least a couple of weeks.

This bug prevents us to roll out other deemed important features. But, it is just a bug. No product owner wants it to say to his boss he spent 2w of an engineer solving an issue instead of building another feature.

The boss of the boss called. He says that this bug is not tolerable. He ask us to fix it. We say we agree and we already have screened and have a solution for it, but it will take 2 weeks.

He says he wants that earlier and we say no because no one wants to rush it as we don't want to cause a possible side effect if we fail in something.

The solution according to the is to not fail. Everybody goes back to the planning and no one picks that issue because the condition is to do it in half of time without fail while working on a new feature.

Go back to line 1.


Ah yes, the classic low-skill management maneuver of setting an unrealistic deadline in the hopes that people will then work faster. Naturally, Joel has written about this [0], see point 12.

[0] https://www.joelonsoftware.com/2000/03/29/painless-software-...


Not related to this case. Once, when I was super young, someone put me in a project that clearly would take 3x more people and 2x more time.

They said it was challenging because they couldn't loose the project bla bla and if everything went ok we would all be promoted.

People worked almost 24x7 for months. But the project got done with no major delays.

When time to credits came, a friend of the boss who was actually in another team got the promotion and we all heard that there were not enough vacancies for promoting the team but they wanted us to know that they appreciated the effort and it wouldn't be forget.

Months later, a layoff came and half of the team got laid off.

Since then, I only work extra time with OT and I demand that to be written as a plus in my formal evaluations.

Does it help? No. But


Yep, companies don't care about your overtime, and neither should you care about their overneeds.


> 8) Put in line items for Vacations, Holidays, etc. If your schedule is going to take about a year, each programmer will probably take 10 to 15 days of vacation.

That's so little vacation. America is wild.


Wait until you hear about the fact that your sick leave time is also covered in those 15 days, so every day that you rest at home when sick is another day less that you get of that vacation you had planned. Definitely a system that encourages people to spread illnesses through the office.

There was also this weird flex I've had older coworkers do where they brag about never taking their vacation time and just cashing it out instead at the end of the year. It happens a lot less nowadays but it was definitely part of the "grind culture" to show you were a hard worker before, maybe it still happens in certain fields.


Wild is a nice way of saying crushing my soul. I've worked at companies that give you 10 days, that they choose, for your first five years. I have no clue why we do this to ourselves other than the person in charge is not bound by it, and one day we all hope to be in charge.


> “I've worked at companies that give you 10 days, _that they choose_, for your first five years.”

They choose? I’ve worked in a lot of companies in a lot of parts of the US, and have never heard of employers choosing a worker’s vacation days. That’s awful.


I think they're talking about holidays. Like Thanksgiving, Christmas, etc. Pretty much every US company does this.


Wait, basically every single company in the world gives you their local bank holidays off, here in the US we have 10-15 of those depending on what your particular company decides to give, and then additionally I have yet to hear of a white collar company that doesn’t also give you at a min 12, but lately I’ve seen 15-20 be the norm.


Most countries in the world don't do the 'depending on what your particular company decides to give'. California recognizes 18 holidays. Average holidays given is 6 (https://money.com/columbus-day-who-has-to-work-why/)

Lately I've seen a lot of 'unlimited' - read as less than your boss takes

In Europe 20 is the legal minimum


Unlimited isn't the worst possible vacation situation, but it can definitely be bad depending largely on your boss.

I took a job with "unlimited vacation" at one company and then AFTER I started my boss "explained" to me that "Unlimited means 4 weeks here." About two months after I started I was moved to a different team new boss. My new boss (same company) had clear deliverables and said, "Unlimited means Unlimited so long as you hit your deliverables." Glad I was moved to a new team.


>He says he wants that earlier and we say no because no one wants to rush it as we don't want to cause a possible side effect if we fail in something.

My style is to say sure then take 2 weeks anyway.


And generally this works very well! The boss may bluster about "We can't spend that much time!" but ultimately every project takes longer than is estimated. And that is business as usual.

The boss often wants to feel like they've negotiated the "price" of an issue and won, regardless of whether they pay full price in the end.


Just happened to me with MongoDB. I don't mean the open source project, I mean the public company that sells premium SaaS hosted MongoDB (Atlas) instances along with their charts product.

I found a very serious bug that was trivially reproducible in my charts account (valid Mongo data being corrupted and displaying non-sensical data in charts). No way to report a bug, so I reached out through chat. Customer Support started playing the telephone game, asking me for various things every 2-3 weeks or so. If I didn't get back to them in 24 hours, I then got a warning that the bug would be closed out in another day.

This went on for several months. I sent them screenshot after screenshot along with the original mongoDB documents which showed very clearly they had a bug. Perhaps the fourth or fifth time that they asked me for some heavy lifting type deliverables, I didn't have time. They simply just closed the ticket.

Oh - near the end of that time, someone on their sales team cold-called me to try and upsell me on annual support contract. Yeah, right.

Anyway, the story has a happy ending. After getting the runaround for a few weeks, I created a Jupyter notebook that displayed the necessary chart I needed. It ended up being a lot more useful than the generic graphs I could pull from Atlas anyway.


It’s not just bug reports, companies won’t even tell you basic information about their services. Yesterday I hopped into online chat because I needed to find out the servers to add to our firewall exception list in order to prevent Spotify from playing. (We saw at least one web socket being blocked, but we needed to know what servers to unblock, we aren’t going to play whack-a-mole constantly)

They wouldn’t tell me. Eventually I asked Spotify to escalate to a supervisor, at least I did when they told me the only solution was to turn off our firewall!

The supervisor started asking what firewall we were using, whether we checked the firewall vendor’s community forums, etc.

I eventually said “look, to me this is not great. I work at a school with over 1,100 students, many of whom are paying for a Spotify subscription. We are happy to allow them to use your service and we are happy to add Spotify servers to our exception list, we just need to know what they are. If you decide not to tell us, we don’t mind - from my POV I’ve done my best to reach out to you and you have decided not to give me the information we need to allow your service.

I will therefore just tell all 1,100+ students that we reached out to Spotify but you guys refuse to tell us how to unblock your service. I’ll even show them this chat log. We aren’t going to disable our firewall and so I will just advise students to use a different paid service and that - until Spotify give us the basic info we need - Spotify won’t work.”

If a company like Spotify won’t even tell you what servers they use so we can allow access to them, then it’s no wonder stupid bugs like the one OP describes come about for years. I mean, Spotify is about to likely lose hundreds of subscriptions soon, even though we would prefer they wouldn’t, all because of some stupid policy that say that the servers they use to serve content to end users can’t be provided to network admins who want to allow their end users unfettered access to a service they pay for!

Truly, big companies are stupid.


That's an unreasonable request for L1 support and an extremely uncommon request in 2023. I'm sure if you asked an SRE they could get you a list of IPs, but that list would change every day as new load balancers are deployed with new IPs and the old ones are shut down. IP+port firewalls are ancient tech that doesn't work with modern apps. Also what country is this? I've never seen a school (or any place with free wifi for that matter) whitelist apps like this in the US. They either allow everything or they only use a blacklist.


Also, I'm clearly not the only one asking about this:

https://community.sophos.com/sophos-xg-firewall/f/discussion...


Australia has some extremely strict filtering requirements. How does one get to an SRE at Spotify, out of interest?


>Australia has some extremely strict filtering requirements

That's in line with all the horrible complaints I've heard about Australian internet.

>How does one get to an SRE at Spotify, out of interest?

noc@ email addresses are often unpublished, but silently open a ticket assigned to a network engineer. If they have any sense, they'll ignore your request for a list of IP addresses.


At the end of the day, Spotify isn’t going to work well at the school I work for. We filter for things like suicide attempts and to prevent access to adult content. We use an industry standard firewall and aren’t going to stop just because some random non-Australian thinks we don’t have a duty of care to our students.

We have had a lot of abuse in Australian schools. Almost every school has been affected by suicide and bullying. None of us want to go back to the way things were.


Did you expect the frontline chat support staff to reply with a list of IP addresses?


Yes. Ingress/egress IP ranges should be something support can easily provide.

Most places I've worked have had public-facing documentation that was a tiny bit dynamic (or at least had the one page updated by a cron) for the page containing the gateway IP lists or equivalent.


How does that work with third party CDNs, ipv6, dynamically allocated IPs from services like API gateway/ALBs, edge locations, geographic differences in content serving and the ability to evolve all of those without breaking an explicitly defined “these IPs are ours” contract for your consumer facing product that 99.9% of your users interact with through a standard connection and no weird firewall?


It worked quite well when I worked for a place that did this. I imagine the experience is quite similar elsewhere.

For CDNs, one of two things will be true: the traffic/access will be primarily API-related and not involve a CDN, or a CDN will be selected by the business which itself publishes its origin IP addresses, and the business publishing its IPs will either embed or link to that information. Cloudflare, for example, does this: https://www.cloudflare.com/ips/. Most good CDNs are popular enough that their IPs will already be allowed by most restrictive network environments.

IPv6 can be handled the same way. Just publish a list.

Dynamically allocated IPs for cloud provider infrastructure should be avoided by mature companies. Edge load balancers which do not support this should be avoided. All cloud providers make this easy: AWS, for example, provides a range of products that can enable this, from GA to NLB-with-ENI+EIP-in-front-of-ALB to NAT gateways to good old EIP'd EC2 instances routing traffic.

Handling multiple geos is as simple as publishing multiple lists of IPs. In the worst-case scenario, a customer uses the wrong geo's IP to allowlist traffic, things don't work, so they select a different one. If it's a frequent problem you can spend some time/money to provide alternative DNS records which always resolve to the same IPs and customers who need custom network configurations will endure a little extra latency.

> 99.9% of your users interact with through a standard connection and no weird firewall

That's doubtful, incomplete, and highly depends on the business you're in. Many companies which host user content, especially when they're new enough that they may not have grown content moderation/compliance procedures, end up on blocklists (which, like firewalls, are easiest to poke holes through on a per-IP basis). Several other things I've observed that complicate the situation further:

1. Some residential internet service providers firewall and/or block traffic, which can affect significant numbers of customers. This is an awful and borderline-unethical practice, but it happens.

2. Some very large corporations put all of their employees behind a firewall (via LAN or VPN). If you sell software to businesses, it is a big problem when a all of a business's users can't access your product while at work. Additionally, business infrastructure components reaching out to internet services to do API-API interactions will often need their traffic allowlisted by destination IP when leaving corporate infrastructure, and your gear will definitely need stable origin IPs if it ever initiates communication with business infrastructure.

3. Even if you're right and a statistically insignificant number of your users are thus affected, if you use dynamic IP addresses, that can change at any time. IPs associated with malicious traffic can be re-used by cloud providers. Even with non-reputationally-compromised IPs, for certain (usually non-HTTP) types of traffic, residential or corporate firewalls can make highly irrational/inconsistent decisions when deciding whether to allow traffic to an IP they haven't previously seen, even if the traffic in question is already common on the network. While such failures due to IP switching are rare, they're also very expensive: it really sucks to have to apologize to customers because you rotated out a load balancer or whatever.


I’m not sure why you have been so badly downvoted, the points you have made are all valid.


Because it completely misses the point that it’s a lot of work that constricts future paths by forming a public contract that offers effectively no benefit to Spotify at all.

Those kids are going to pay for Spotify anyway regardless of if their weird school firewall comes up with a “this site is blocked” message or not. Phones and 5g exist, plus the school gets the ire for blocking it and not Spotify. And no residential ISP is ever going to block Spotify.


A list of websites it connects to, sure. If you can tell me who I can speak to at Spotify otherwise, please enlighten me.


> near the end of that time, someone on their sales team cold-called me to try and upsell me on annual support contract. Yeah, right.

Did you tell the sales person why you were not interested in the support contract?


I don't understand the problem. Or rather the design of their system.

> The issue was that our system for weekly ad was having an issue where the ads needed to 'clip' a coupon (read hit an api) for the discounted price to reach our cart calculations. However, the api that we used for managing the weekly ads was managed by an external vendor, and often when we queried their endpoint, the coupon field was returning empty when it shouldn't.

What's the authoritative source if I want to know that a product has a discount? The external vendor? If the coupon field is empty, doesn't it mean there's no discount? Unless it means there are two different fields: one for is_there_a_discount and another for coupon_id. Would there ever be an expected situation where there is a discount but not a coupon?

Or maybe the source is the internal system. In which case, can't the coupon be generated there?

Surely the source is not the frontend, right?

> Products already had the associated coupons.

From where? The source aside, does that mean the coupon_id never change or expire? If the same product has a different discount a few months from now, will it still use the same coupon?

Also, the apparent solution was hardcoding:

> so I completely hacked it, and replaced the api call service with a dumb function that returned hardcoded data copied from a production.

but a couple paragraphs before, the author mentioned it changes often:

> Because this data was missing a field, it required manual data entry to update it. However, this data changed in real time, so it was OFTEN out of date.

I don't understand how this was a showstopper before but isn't anymore. Or maybe these paragraphs talk about different things.

I don't question the bug fix or the author. I just can't grok it.


Hey, Author here. The source was our internal system, but the vendor was taking our id, converting it to their own id, and then serving us an ad with the coupon labeled formatted to their system. We were then using the vendor id to translate it back to ours.

The coupons associated with the products were already from our system, so I could skip the entire translation back to our id.

The manual overrides bit can be omitted, it's just how people in product were fighting a loosing battle of manually overriding the bad data.

The hardcoding was in order to replicate the behavior, as it wasn't showing up in lower envs. It wasn't the fix, the fix was to just render data we already had. I didn't go too much into details because 1. I can't due to company policy, and 2. it's actually not the interesting bit. I was more making observation of the process.


Thanks, I understand it a bit better now.

I realized later I focused on a side part of your post. My curiosity got the better of me.

Hope the observed process is changed to something better.


I believe the part about hard coding production data was referring to recreation of the bug, not implementing the solution which the author describes subsequently as "a couple of array functions".

There's a fair bit of hand waving going on with technical details throughout but I don't think it's badly intended, more aimed at keeping the focus on the organisational issues described.


> What I did is come in as an outsider [and] let my naivety guide my investigation...

I call this being "a useful idiot", which I sometimes use to describe my role. A large part of my job is floating between teams & applications, identifying gaps and problems and knowing just enough to ask meaningful questions whilst not being deeply invested in the existing code & processes. I can see people asking themselves "why are they wasting my time even asking these questions?!" but in explaining the answer they get to review the process and sometimes work out a new solution. We all get comfortable with how things are and someone prodding us occasionally helps creativity.


Nice, but nit: this is not what a "useful idiot" usually means.


I live in a country where English is not the native language and the idiom is not in common usage, so the literal translation works.

Sometimes I often wonder if it would be useful for more internet forums to allow users to add a flair to denote they are non US, as the assumption on so many sites is that everyone is US and living on the east or west coast.


I think I agree in general, but in this case he "idiom" (in fact, more of a metaphor, I guess) is actually Russian and just translated literally into English.


It’s an ironic use of the term.


So, people have blamed the vendor's API... but have they opened a ticket with them, raised the issue, escalated it?

I can emphasize with not wanting to work around the bugs of an upstream provider, but just sitting on it when it causes real pain to customers is also shitty.

> That's right, for any developer, the answer here is simple: use the coupons that we are ALREADY receiving on the frontend to populate the fields that users expect so they can get their discounts.

Hard to tell without knowing more details, but that might run into the risk that people can inject made-up discounts in the browser and send them to the backend, which then charges less.


I'm not privy to the extent of the conversations with the vendor, I came in with fresh eyes and found a solution. The writeup was just musing on the process that let it get to that point from my perspective.

Your final issue is not a risk in this case, because we still are validating the coupons. This was a frontend fix, and we obviously don't trust any data coming from the frontend.


Here’s the deal, at two years and for a problem that is by all accounts serious, any solution is better than no solution.

Now, if this is code that runs ventilators for sick people, runs the movement of money for a hedge fund, or balances the US budget, then no, you have to be absolutely, positively sure that you are not causing another problem. But when you have a real problem that is affecting people everyday for years, then a little creativity is certainly called for. Even at the expense of hypothetically causing another problem.


> Hard to tell without knowing more details, but that might run into the risk that people can inject made-up discounts in the browser and send them to the backend, which then charges less.

That was my first thought, but thinking about it more, it probably makes no difference. If it's susceptible to injected made-up discounts now, it would have been susceptible before as well. It doesn't matter if the coupon code comes from the vendor API or their own API if you're going to inject fake data anyway.


Sounds like that to me too. I'm sure there is more going on there though


Congrats for the bug fixed! I know how this kind of bug can be really hard. For a company. Less so for "an outsider", just like the author suggests.

I can easily see this happening in literally any company. We humans are very adaptable creatures, we get accustomed to things and just accept them as they are. In my company I preach for razing the flags and looking for things that just don't look right, and we generally do so indeed, but I am pretty sure that for an outsider some acceptable "features" will look very questionable. Rule of thumb I use sometimes: does it feel awkward to answer some of the newcomers’ questions? Independently of their experience, newcomers always ask very good questions.

--

Serenity Prayer comes to mind:

> God, give me the serenity to accept the things I cannot change,

> Courage to change the things I can,

> and Wisdom to know the difference.


I have the full version of the serenity prayer as a cross stitch framed in my office.

>God, grant me the serenity to accept the things I cannot change,

>The courage to change the things I am able,

>The wisdom to know the difference,

>And the ability to destroy all of the evidence of the inevitable murder, should the first three fail.


> I know how this kind of bug can be really hard.

It sounds more like it was embarrassingly easy, but the author is surrounded by 0.1x engineers.


> Harvard and others have written about it Why Big Companies Can't Innovate. I am interested in this, because I am interested in doing work, at scale, within a company that has a culture of solving problems, not of being a behemoth with enough inertia to not need to correct when there is a need.

You might like "Loonshots" by Safi Bahcall. The book proposes to solve this by nurturing an innovation unit inside the behemoth, and carefully maintaining the balance between the two.


Oh my god, I absolutely hate the torture system this person works for. Pretty sure they are talking about the thing at Ralph's where you have to "clip" a coupon by scanning a QR code in the store, except:

- the QR code is often deep in a refrigerated display case, so you're holding the door open with one hand while you stick your phone in there and the camera fogs up

- there is rarely good enough cell signal in the store, even more rarely in the fridge

- the web app for clipping is buggy as hell (yesterday I used it and clipping via the QR code failed on 4 out 4 coupons, and it was the worst kind of failure - "clipping" animation just plays forever while I was standing there blocking the supermarket aisle)

This damn thing has been around for a few years and has changed shopping from fun and easy to frustrating. I might just go back to Amazon fresh or whatever.

Better, I think I'll just write a script to hit all the coupon endpoints every day before I go shopping.

Coupons were always a stupid, user-hostile game, but this system is taking it several steps too far. Also, it's discriminatory against seniors and people who don't have smartphones (or don't want to have to have them out the whole time while shopping).


Did the author go behind his manager’s back to pick up the task, or was it assigned?

Because if it’s the former, it’s possible that the issue was not addressed because it wasn’t deemed high-priority compared to other tasks, regardless of the number of user complaints.

If it’s the latter, well, I don’t know. Congratulations on completing an assigned task at your job?


Seemed like it was a case of "cannot reproduce" and any attempt at reproducing it failed. Since it couldn't be reproduced, it was never prioritized because it never made it past triage.

At least that's the workflow I imagine when reading this and from working at similar places.

Also, I advise leaving a place that requires your manager to dictate which tickets you work on. Sounds like a crappy place to work.


> I advise leaving a place that requires your manager to dictate which tickets you work on

As someone who recently left a place like that: I agree. I'm a highly-paid professional; it was really frustrating to feel like my manager didn't trust me enough to figure out what I needed to work on.


Manager/Tech Lead here who had someone who would choose their tickets themselves.

It's a giant pain and makes my job 10x more difficult when I tell another team that something that's low priority to us but high priority to them that something will be done, and instead find it sitting in in-progress with a tech debt bugbear of theirs in review.

The priority on the ticket was set for a reason, it was assigned and scheduled for a reason. There's only so many hours in the day to explain why some things are prioritised. If you think that's wrong, be a professional and talk to me about it.

I promise you, I want to have the conversation that you need to actually work with your team far less than you want to fix that JSON output.


You’re a manager, not a dictator. Manage that shit.

If it’s a low priority for you but not for someone else, you are prioritizing wrong. Don’t complain because someone doesn’t want to play politics with you. Learn your team, play to their strengths.


There's a wide berth between being a manager who prioritises work and a dictator that assigns out stack ranked tickets with no end in sight.

My job is to make sure the teams priorities are straight, and I can't do that if you or someone else are subterfuging my efforts to do so.

Unless otherwise demonstrated, I trust that you (my team member in this case) are a smart, well intentioned person, and I ask that you assume the same of me. Going behind my back because you don't like what I'm asking you to do is "otherwise demonstrated".


A team member and manager don’t operate in a vacuum. It’s your job to communicate priorities and ensure the team buys in to the priorities. If they don’t and you didn’t get that buy in, then you can’t blame them for having their own priorities. You can say “these are the priorities” all day long, but so can anyone. The team, as a team, should agree on the priorities. Your job is to communicate the business priorities, the teams job is to turn them into technical priorities which may or may not match the business’s exactly. If you’re not getting the last part and communicating it back upstream, you’re not doing your job.


It really depends on what you are shipping:

1. does what you are shipping have lots of international regulations and compliance requirements per country it will be available in with new countries being rolled out on a planned basis -> your manager will determine what you should work on

2. Are there things being released on a set date due to some sort of legal requirement, business deal? -> your manager will determine what you should work on

3. Is it basically the same product in all markets / countries other than internationalized text with not pending contracts or business deals requiring specific functionalities impacting the company? You should be able to determine what you should be working on.


I disagree. Though (1) and (2) are good points, but they shouldn't be telling you what to work on at the individual ticket level. Even then, they should prioritize and set a healthy deadline (if appropriate); and that is it. Implementing a high priority ticket may still require addressing technical debt to implement, or dealing with other lower priority tickets first -- because they block the new, higher priority ticket.

If your manager needs to know all these details and can tell you "no, you can't fix X to build Y" then I really think that is a crappy place to work; not just due to the politics involved, but also the code must be absolute crap.


> Also, I advise leaving a place that requires your manager to dictate which tickets you work on. Sounds like a crappy place to work.

Would that not depend greatly on the type of ticket and the type of developer? In my experience some developers are happily working on tickets with minor impact while there are high priority tickets, where a customer is really losing money, which get no attention from them.


It's easy to understand why, I think. People will tend to pick up the tasks they think they can easily complete and avoid the ones they will be stuck on for multiple sprints while management breathes down their neck because it's a really important task. If you're an inexperienced dev picking up that kind of task can be intimidating, especially if it's a work environment where you're expected to operate fairly autonomously.


Why do you have tasks that take more than a single sprint? Those should be epics made up of smaller tasks. No dev should be working on a single task like that. That’s probably why they aren’t picking them up…


I guess that largely depends on hiring practices. If you are hiring people who prioritize “incorrectly” then you’ll have people doing the wrong thing. I usually see this filtered out by having a list of issues in the “code test” and then seeing which ones they do first and asking why.


I feel this has nothing to do with hiring - people will change focus over time, sometimes multiple times every day.

The manager is there to prioritise, to make sure everybody has the same understanding of the priorities, and to unblock progress, among other things. As a manager, I am not telling people how to do their work, I am telling them what I think it is important to deliver. I don't even make the feature list, that is the job of a product designer.

Absolutely nothing in the "code test" provides any insight into how a developer would prioritise work within a product - it only tells if and how the developer can develop code up to the standards.

So I think in this regard the manager assigning the task is the correct choice - the product has a deficiency, and the manager is directing a team member to fix the deficiency. If some developer wants to make their own product prioritisation decisions, they are more then welcome to develop their own product, possibly inside the same company.


Completely agree. As someone in the same role (well a small company so I'm also the "designer" in many cases), the appropriate thing to do is to tell me you disagree with the priority, and we'll talk about it.

But if you're questioning every priority, and ignoring the priority, then you're not doing your job. You're doing my job without the information that I have.


Your job is to tell the devs the business’s priority. My job is to make it so. If that means I need to build a foundation before putting a house on it, you’re getting a foundation. Any idiot can build a shack, but I build houses. You hired me to build houses.

Granted, I’m not going to go behind your back unless you tell me to build a shack that looks like a house because you don’t want to put down a foundation. (It’s like asking an accountant to cook the books because you don’t want to pay taxes)

POC, MVP, and prototype code excepted.


No, your job is to build the house I told you to build, not the one you think you should build.

If I'm telling you to prioritise a particular room in the house, and you do another room instead, you're not doing your job. I hired you to because you're a professional who I can expect to follow instructions and deliver.

I was trying to avoid goinh into all the edge cases where this might not wholly apply, that we build these priorities out together, you have autonomy in a range of things, you're part of a team not an individual, etc.

It's my job to share the priorities with you, it's your job to accept those priorities sometimes. I don't like being told by _my_ boss that we have a surprise deadline any more than you do, and by the time I come to you saying "hey you need to do Y instead of X, it's because I've spoken to the other leads/PO's and we've figured out that this is what's best for the team and project. You might not agree because the crash you want to fix is high priority, or the thing hosting the widget needs a refactor rather than another special edge case, but at that point im not asking, I'm telling. I promise, (and I've shown this with my team) that if you do this when I do ask, we can do your pet stuff next, and I'll pull you off the critical path for a little bit.

Its the prisoners dilemma, and works when everyone works together but the minute someone stops trusting me and the team, it wrecks it for everyone.


I go back-and-forth in my career between being a manager and being an engineer. I find both sides rather fun for different reasons. There's a lot to unpack in your reply:

1. You're not acting in the role of an engineer. You don't know (intrinsically) what needs to be done. You just know what you want done. Engineers are probably doing shit behind your back and you'll never know except that your velocity seems slower than normal occasionally. You probably won't even have a slight clue unless you are an engineer yourself. Some hints might be when you prioritize a bug and your engineer tells you 'oh, that got fixed when we did X which seems totally unrelated to X, or just barely related'. POs understand this when the customer wants X but really needs Y. The same thing applies to you, engineers know you want X, but really, you need Y. When you pressure them for X, and you're not listening when they tell you you really need Y ... think about that for a bit. Have a conversation with the POs.

2. You should never, ever, ever, agree to anything without discussing with the team first. There should never be any surprises, because when you get surprised your response should be "let me discuss this with the team and get back to you before I give any commitment on that, but I'll get back to you before the end of the day." A key question on any kind of deadline is "is this a soft- or hard-deadline and why? How much wiggle room do we have?"

3. If you are getting surprises on any kind of annual basis or more, it's because you've become a "yes-man" and your team is paying the price. You're allowed to say "no" or "we're dealing with too much, can you help reduce the load" or "we can do it, but it will be two weeks later than you need it, here's what we came up with to get most of it done by X, and complete it by Y.".

4. Don't be a dick. Software Engineering is a 24 hour job. You dream about the problems you are trying to solve sometimes. It really sucks to come in to work, with a solution you've been thinking about all night and all morning, only to be told you can't do it because you've got a dick boss who agreed to some bullshit without even talking to you about it.

> Its the prisoners dilemma, and works when everyone works together but the minute someone stops trusting me and the team, it wrecks it for everyone.

It goes both ways, you have to work with the team as well. And regardless, nobody is a prisoner... hopefully.


The code tests we gave was a completed project with security issues, half implemented features, completed features, and a few very badly written classes, with no tests. It was something you could wrap your head around after half an hour. Then we would provide a list of features, bugs, and “known technical debt” and ask them to tackle at least one issue. If they discovered any security issues, they were asked to fix them or at least document how they would fix them.

It would take the average engineer a Saturday afternoon. During the technical interview, one of the topics was what they thought about the order of the tasks. Thus we got a window into how they thought through what was important or not important and why. There was no wrong answer, we just wanted a balance between “feature first”, “bug first”, “security first”, and “debt first” on teams.


I mean, up until which point do you have full autonomy of the work you do in a bigger org? I might be misunderstanding your point, but I've only ever seen a developer that gets to choose whatever they want to work on in solo, early stage, or bootstrapped startups.


I worked in 5k+ size companies. You didn’t choose your product per se (though you did agree to work for the company on that team), but you had influence over the roadmap and could choose your tickets (even create new ones and create entire — gated — features if you wanted). The team had autonomy and the leads were fantastic at fostering it.


The author says "I had never looked at this code, but was told to take a look," so I assume they weren't doing this behind their manager's back.


The manager told me something along the lines of "hey this has been causing issues, can you take a look? Others haven't been able to figure this out." It's the latter part of the statement that I was musing over in the article. I'll take the sarcastic congrats, though, better than internet name calling.


Must have missed that part. But my intention wasn’t to offend. It felt like the part about solving a problem other developers couldn’t was the main crux of the article, and the insight about inertia in big companies was tacked on. But I’m sure others will disagree.


I read it as saying everybody else gave excuses to get out of dealing with it while the author went ahead and did it when assigned, even though all the previous people had said it couldn't be fixed.


I don't work in a software company so I found it interesting to see how things can become dysfunctional there.


At a meta level in most companies it is important to not become the quick 'bug fixing' specialist. Don't be the developer product managers use/abuse to get around processes and fix what they want fixing today.

Work with your manager to plan your career path. 'I fixed a bug the other day' is good once a year as a bullet point on 'why you should get that promotion' nothing more. 'I built and led a team to fix long standing bugs' is more what you want to be doing.


I don't think I agree. In my experience, being the guy or gal that managers know they can turn to in order to quickly solve problems is a great way to get noticed and get involved in projects that normally you wouldn't be able to.


This isn't about fixing a bug, but about working for the customer when the organization can't.

For theoretical background on why companies grow in this manner, read the transaction cost economics literature on bureaucratic cost. That's what drove the auto industry to abandon the hierarchy for the M-form, with some internal contracting replacing hierarchy. It's why US companies are relatively flat.

In this case, everyone actually wanted the bug fixed. Problems really arise when some do and some don't, and when it's in the customer's interest but not the company's.

The solution here is not to fix the bug but fix the problem with test infrastructure. It's causing a whole class of hard-to-reproduce bugs to be ignored.

In a bureaucracy, the bug will get fixed but the problem will persist. In a healthy company, middle management will work together to fix the problem.


The article discusses how, when companies reach a certain size, tension arises "between owning your work and standing out, vs just doing the minimum". Plenty of folks on HN abhor doing the minimum and rightly so, but rarely is there discussion of the forces pushing programmers towards doing the minimum when they'd prefer to own their work.

At one place I consulted, there was an engineering manager who firmly pushed team members to own their work, at least verbally. Conflictingly, her idea of owning the work was to do it exactly as the organization and leadership directed, without complaint, and never raising any issues about how and what was being done.

How does a programmer own the work and stand out when the incentives reward conformity but punish going outside the lines?


The satisfaction of finding the root cause of something like this and then fixing it is great, and one of the reasons I’m still in this industry.

The great write up reminds me of a few similar cases I’ve dealt with over my career, I still think back to them and smile to myself.


This sounds like a monitoring problem. If it's happening in production there isn't a requirement to figure out how to reproduce it locally.


You need to reproduce the bug to know if your fix works. Not always but that’s how most teams operate. Particularly because you typically won’t pass review without some tangible improvement.


Yeah, we actually needed to 'test' in production. I wrote tests for how it was supposed to work, and made the best effort to reduce the risk, but the only way to get a reproducible environment for this particular issue was to deploy the change. That definitely contributed, though was not exclusively the cause of, the risk aversion.


> Ask for more information.

very often this is part of the solution, asking for info isnt deflection in most cases. Usually the worst bugs are a result of some unknown unknown or excessive complexity in code. Discussion is often the best way out imo


Escalating this issue to a senior director and have them call up the CEO of the marketing partner and threaten to switch providers if they don't resolve the coupon bug within 48 hours would have solved this problem a long time ago.


I’ve worked quite a bit in enterprise software development, and later in teams that are transitioning from startups/mid-size to enterprise and in my very anecdotal experience it’s always down to the internal processes.

Software developers like solving issues. Most people, myself included prefer building new things, but a lot of us actually like solving bugs as well. At least that is my experience. What I’ve seen get in the way is when the processes around reporting, examining, describing and prioritising the “tasks” get so convoluted that that developers get either overwhelmed or detached from the issue by too much bureaucracy. Which happens very often in non-tech organisations in my experience. Maybe it also happens on tech organisations but I’ve never worked in one, but from buying software from around two hundred it didn’t seem like it was a problem in quite the same way it can in regular enterprise.

I realise this may be a little more clear in my head than for you as the reader. So I’ll try to explain by an example I encountered recently with a parking app. My wife and I were visiting a midwife for a comfort scan, and they have you register your license plate in an app so that their patrolmen/parking-guards/whatever they are called don’t give you a ticket when they scan your car. Anyway, the app wasn’t working and the midwife was like: “oh, that happens from time to time, don’t worry that means they can’t check if parking are legal or not”. As though it was the most normal thing in the world. Because it is… IT failing is the most normal thing in the world to a lot of people, but think about it. That app being down, at least on a city scale maybe even global meant that all those parking-guards were paid to do nothing for however long it took to fix it. It also meant anyone could essentially park for free… and that was a semi-regular thing?

When you work in enterprise organisations. They often become accustomed to IT not working. Part of this is because the process people they put in between IT professionals and their organisation don’t actually know IT. They are very good at defining processes, making things lean and what not, but that’s not how you actually make things work in my experience. I’m guessing this is true outside of IT as well, but once you have more process managers, scrum masters, product owners and what not than actual technicians, then you’ve probably ducked it up. This is because programmers are often the best people at determining what’s important, not always, but often, and despite an entire industry’s attempts at “streamlining” the development process I suspect it’s also why sometimes a team of 4 developers can do more quality work in a shorter amount of time than 10 teams of 10 people combined.

It sounds like the author has encountered some of this. They have an important issue. It loses some of its importance on the processes and then once it finally gets to the technicians it can’t be reproduced… Of course it doesn’t get solved. Especially if the organisation has set up systems to reward not solving seemingly unimportant reproducible issues, which many enterprise organisations end up doing.

Corporate culture is hard.


A lot of fancy links referenced, yet as usual it is really simple

Desire to get shit done and motivation is more important than anything else


Sure, but the issue was entrenched in the system. Encouraging autonomy and ownership seems a difficult task in a large scale org.


on the cybernetic systems thing: take a look at a book named "brain of the firm" by stafford beer


This is great, though seemingly difficult to come by (most books online seem to be anywhere from ($100 to $400 usd). Added Stafford Beer to my people to learn from list, thank you.


Crazy that other engineers are so ignorant/bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: