Ep 9 – This. Is. Dataaaa

The pain I went through for this ep. Garageband has a grudge. It lost half the ep after I’d edited it all, then I had to record the new outro (finally) about three times. But I fought, and I prevailed to bring you this episode. This. Is. Dataaaa (or Dataaagh). After listening to this/reading the notes, data will no longer feel like a real word. Be warned.

Firstly, I need to talk about episode 7 and the title, which comes from Starship Troopers. IF you’ve not watched it (and you’re not squeamish), you should definitely go watch it.

Secondly: I’m going to have another guest on the show! BBC Dev and co-founder of Manchester Tech Nights, Chris Northwood will be joining me on ep 11 😀

Gathering and managing test data can be part of the planning process. Test data can come from multiple parts of a project – when making a website there can be transactional data – both from orders going in, order data being exported to a third party stock system, there could be user data, subscription data, blog data, all sorts of different kinds of data from various sources, all interacting with each other in multiple ways.

Test data can also refer to data produced whilst testing – outputs of the system in response to various inputs for example.

Understanding the data the system uses, how it uses it, and what data is produced is crucial to understanding the system, and its only when you understand the system, you can test (and build!) it.

Sometimes you can use junk, or not real data. I have a folder on my computer of placeholder images that are just used to test how systems handle images (does including them in a content item mess up the layout for example), I also have some placeholder documents or various types to test their placement. And everyone’s seen the tweet about QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv[1].

And that’s fine for the basics, right? Easy stuff. Its when you’re testing the more complex stuff, how the system interacts with each other, how data travels through the system, that you need to consider sensible, real data.

A fair few of the projects I’ve worked on involve third party integration, which introduces another level of complexity to the whole project, but especially any sort of testing. You’ve essentially got another set of test data there to produce and check.

For something like this, live data, either for processing or for comparing results is the quickest and easiest way to get the best data possible.

Things to consider when getting live data (assuming you can get live data):

Age and Relevance: You need data to be useful for the new functionality you’re building, or for the functionality you’re replicating if you’re just copying the current functionality to a new site. Old, irrelevant data is bad. Relatedly: will you need to edit the live data? If so, can you just change the parts that need to be changed without affecting any other data? Will changing this data affect the tests in any way apart from the way you want it to?
Going off on a tangent, sometimes running live data that you know should fail is useful, because you know what the fail should be, so you can make sure that it does fail in that way.

Security: Will you need to mask any sensitive data? If you do get and store live data, what data protection and security will you need to take into account? Will you have to ensure it’s masked, used, and deleted within a certain time period? That may affect when, where, and how you get live data, so you can be efficient when doing your due diligence.

Size: How much live data do you need? How easy is it to extract parts of the data? How long does processing take? How about storage and access to people who need it (see also: Security)?

Ease of access: Is it easy to get a hold of or generate? Does it require a third party to give that access or data? How about refreshing or updating it if needed?

Is it worth it?: Making sure that all of the above is worth it, weighed against how useful the live data will actually be.

I am all for using systems as they will be used in the wild – I think it’s an efficient way of bug hunting and testing, so I prefer live data for complex testing whenever possible. Sometimes you don’t really have to consider any of the above – we recently got a csv of product details and prices from one of our clients, which was simple, easy, and not really subject to any security measures. But customer details, anything that’s confidential, or sensitive in any way really needs a test management plan.

There are plenty of programs, and methods to manage and even generate your test data, and I’m definitely interested in hearing your experiences with them! I’ve never used them, I mostly work on small projects that don’t require huge amounts of data and data management, so this an area I’ve enjoyed looking into, and will definitely be checking out more!

Footnotes

[1]https://twitter.com/sempf/status/514473420277694465?lang=en
Other reading:
http://www.cmcrossroads.com/sites/default/files/article/file/2012/XDD3202filelistfilename1_0.pdf

Leave Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.