Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've learned over the years to never put profanity or anything even mildly offensive in test data. Too many times have I had to give an impromptu presentation off my test database, only to see user names like "Asshat Joe" and "Jack Off" show up big and bold on the projector screen.

Always, always use plain, non-offensive vanilla boilerplate text in everything you do.



I've been beaten much worse at that. 12 years ago we were writing a very ambitious media asset management platform as a service (yes, in the cloud basically) and I needed tons of pictures and videos to test the system and fill the database and filesystems and test the image pattern recognition engine and else. Guess what I did? Yeah, I loaded an humongous heap of porn into the software, pictures, videos, everything.

Then the very important finance guy and the very important communication lady from whatever big important investor came unexpectedly and demanded the mother of all demos RIGHT NOW.

Boys, I know what it's to be embarrassed to death and to wish being an earthworm. Preferably deep underground. Like in New-Zealand.


You actually demo'd the product like that? You couldn't have bought an hour or two to empty out the porn and replace it with kitties? I am curious to know how this played out. Thanks for sharing. :)


Remember that it was in 2000 or 2001, several years before youtube and most other sources... Actually there were hardly any downloadable videos except porn :)

In fact there was quite a lot of harmless material too, however I really hoped that they wouldn't ask for a keyword that would bring forbidden pictures, or worse, ask to use the mouse themselves! I think we managed to avoid any incident this time; however a little later the boss discovered what was in there and he wasn't happy.


Recently I was this close to uploading an "edgy" picture to an imaging application I help develop to test a new deployment under the test user. I was so sick of seeing that same innocuous test image, and after all, it was under the test user and no one would ever see it, right?

I resisted and uploaded the test image again. Later that day, I was called and informed that a recent change to one of the server components was mixing up sessions and the client was seeing the test image instead of the image they expected. Needless to say, I was grateful for the restraint earlier that day. I probably would have been fired if the client had seen what I had almost put up instead of the test image.

I have also accidentally overwritten a whole column in a development database with a comment about skateboards. This was obviously not a big deal since it was a dev db, but it's just more reinforcement that as boring as it gets to write completely innocuous, repetitive stuff in fields while testing, it is much better than letting something slip through in frustration and getting fired and/or losing customers over it.

Like hard drive failure and other inevitabilities, it's not a question of if your dummy data will eventually be exposed to people not meant to see it, but when.


I learned that a "cute" message can be the best thing. I am a dev on a top iPhone app and while testing our push notifications, I sent a notification out to millions of people. Luckily I wrote "Have a nice day!". It cost thousands of dollars but our vendor was nice enough to refund. In my case, people thought we were just trying to cheer them up. Had I wrote "this is a test", people would have been annoyed and lost confidence in us.


Yeah, I've been trying to bang into the heads of the students that I TA that "there is no test data - only fake data." Test data always seems, to me, to mean meaningless junk. Names like addresses to Dicksville, IL do nothing but accidentally get leaked to someone they shouldn't, and they can't be validated, which creates a whole nest of testing problems.

Fake data, like John Doe, San Francisco, CA, doesn't share these problems.


I read a story a couple years ago about a couple who had there home raided over 50 times over a 4-5 year period by NYPD. Turns out there address was used as test data in the system, and it was leaking into the police's forms/database. So, if using fake data, make sure it can't be mistaken with an actual address/phone number/dna sequence.


There are already defined sets of telephone numbers that can be used for fake data http://en.wikipedia.org/wiki/Fictitious_telephone_number

Also of course example.com. Not sure if there are non routeable physical addresses.


The TV show Breaking Bad uses phone numbers like 1 (575) 147-8092. Since the local part of a phone number can never start with a 0 or a 1, it avoids the "555 problem" of obviously fake phone numbers catching the viewer's attention, while still being non-dialable.


I'm shocked this isn't done more often. Even better though is when the show owns the phone number and can use it to communicate with fans in-the-know.


> Not sure if there are non routeable physical addresses.

Santa Claus, North Pole, Snow Street 1

Sherlock Holmes, London, 221B Baker Street (is a small museum afaik)


The Sherlock Holmes Museum is not located at 221b Baker Street.

That address does not exist, but any mail addressed to it is redirected to the Museum, which is technically 239 Baker Street (since it sits between 237 and 241).

They do have a special dispensation from the City of Westminster to display their address as 221b Baker Street though.


It's actually the address of the Sherlock Holmes Museum, so not wholly inaccurate.

I'm a fan of "Bilbo Baggins, 1 Bagshot Row, Bag End", personally.


Oh, not small then? :)


very small, but worth a visit.


We had an auction website project in college as freshmen, and the person teaching the class who was playing the role of client made this very clear. One of the groups was demoing their app and some bad language and comments made it onto the screen. Their review/feedback/demo ended right there... Painful for them, but taught me and the rest of the class a lesson


During a demo for a group bittorrent client project in college, the person running the demo chose to use the link on the piratebay with the most seeders (because we had some bugs that made it not work reliably with non-seeders). The file? A windows keygen program. Fortunately, the prof didn't notice what we were downloading.


The Fake Name Generator is useful for these kinds of things:

http://www.fakenamegenerator.com/index.php

They even have an API to automatically fetch a bunch of data.


This is actually very good advice, and you can extend it to comments in source code.

When I was young, I sometimes couldn't resist the urge to put strong wording in comments, like:

    // This sucks, use a better data structure for it! 
Invariably, when I revisited the code months or years later, I thought by myself: Why swearing? You can also express disagreement in a more neutral way.

So, nowadays I use a better commenting style for bad code and tech debts, always with a motivation along it:

    // TODO: This datastructure is probably inadquate 
    // here because of O(N^2) lookup


Once I left a comment in a source file of the following form.

    # This comment included for the benefit of those grepping for swear words: shit.
I don't think I'd do it today, but I was young and it was for a place that had demotivational posters hanging in their lobby.


I once worked somewhere that had a poster on the wall: "Meetings - none of us is as dumb as all of us."

I think it was a good reminder.


I think it's not just "strong wording", but the inarticulateness of the first example comment. You can express strong feelings while still being articulate.

Example 1: //FIXME: This code is shit, someone make it better.

Example 2: //FIXME: This code is a truly atrocious hack to work around $issue. I don't know of a better way, but suggestions welcome.


Whats the different between example two and this:

    //FIXME: This code is a shitty hack to work around $issue. I don't know of a better way, but suggestions welcome.
This contains the same information but uses a swearword. I think that every would agree that clearer comments are better, but does the presence of obscenities inhibit clarity or are they merely offensive and superfluous?


> but does the presence of obscenities inhibit clarity

That entirely depends on your work environment. Will adding in obscenities will rouse up some big HR kerfuffle? Then don't do it, you're just wasting everyone's time. Otherwise, do whatever you want.


Annoyed that I couldn't tell which part of the test data was failing when a test failure spewed out "foo", I wrote http://canonical.org/~kragen/2words.cgi several years ago. This produces randomly selected, short, memorable strings from several hundred million possibilities, so when you see one you can easily grep the tests to see where it came from. It picks two random words from the dictionary.

After I left, people stopped using it. They decided that strings like "jounce-visit", "crotch-surges", and "rubble-rump" (all real examples from a single run of it just now) were just a little too memorable, to the point of distracting one's mind from the code during development.

for more or less the same reason, some mail system (maybe Andrew?) used to generate Message-IDs without using vowels, and Google Chrome generates extension IDs using random strings from only the first half of the alphabet.


When I was younger, I worked at a navy base, and learned how to swear professionally. But it's a turn-off, and more importantly, it's just not that funny.

I've switched to animals. Yeah, it's a bit ridiculous coming from some giant leaps in profanity and the implied sexual/violent imagery. But, animals are funny, cute, and have a very large base of context to work with.

I can shove a panda image as a test anywhere, and if I screw up, hey, it's not so bad, it's a cute panda. Some people will ping you if you screw up, no matter how small; that can't be helped. But there's a large range of screw-ups that are recoverable, and a cute panda/kitten/monkey image really helps.


I think there is a big psychological difference between hackers and "normals": the former see the data as just as a placeholder for the code to grab on to, whereas the latter see the data as important and the code as the ancillary thing.

I think it comes down to coders being much more able to traverse levels of abstraction quickly. For us, "Asscock Asscocksson" is just tag for <loremipsum>, whereas non-coders may get tripped up on that and not be able to just skip past it.


I love that it was "too many times" - how many is too many? After the 3rd time seeing Asshat on a large screen you decided to go clean? ;)


It's easy to get frustrated at 1:00am and type a field in as "This software is a huge pile of shit" and forget that it's there. It will always come back to haunt you one day.


I've been trying to beat this into the skulls of my employees. I run a contract based QA team so there's a good chance our paying client(s) are going to see everything we have entered into their databases.


That is a good idea, but most people can find one or more offensive words in enough data, even if it is just the dump of names in the local community.


I usually check my git commit messages very carefully for this reason. But again, I usually restrict myself to nothing much worse than "urmom".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: