Deep Data Exploration
Deep Space Exploration.
Sometimes being an analyst can feel like travelling through space (not that I would know! I can only imagine it must be like that). You have the technology (sort of) and you have the astronauts but where the f**k do you go?
As a massive Star Wars fan, it will come as no surprise that I was eventually going to write a post linking to space but it was only yesterday, whilst in the middle of a piece of proactive analysis, that it occurred to me that this is kind of like deep space exploration. I stress the importance of the word proactive. If someone had just given me something to analyse, it would have been no problem, it would have been the space equivalent of “going to the moon”. What I’m talking about is, “I know an area I want to analyse, but where do I start and most importantly what am I looking for?”
What to analyse?
The first thing you need to do is know what to analyse? There were some opinion based changes made recently where I work and being the d**k that I am, I decided to see if these changes were effective. To some extent I agreed with the changes, but there wasn’t a test approach taken, so I couldn’t help but do some analysis on it.
The difficulty was analysing the changes in real time to a control group, all I had was the data prior to the change and the data post change.
When you’re comparing data between two different time periods, you have to be very careful. This is so important so I’m going to say it another way.
The present and the future may not be the same as the past!
The reason is that two different time periods are not the same, there would have been:
- external influences
- changes to the site
- promotional changes
- start of the month vs. end of the month (pay day)
The sporting calendar has a heavy influence in the industry that I work in. This could be the olympics, grand national, Wimbledon, US Open etc. Therefore, to keep the integrity of my analysis intact, I know I need to scrutinise whatever I find
Where to start?
This is where the space analogy comes into play. Admittedly, my example above does have some sense of direction but that’s not to say that I knew exactly how to start or where I wanted to go and as a result, exploration was still needed.
I find the first part of any exploratory analysis is to have a question or hypothesis in mind. You might even want to focus on a particular aspect of your site/data.
For two companies that I have not worked in, below are a couple of questions I might ask:
- If I was working for Spotify
- What type of music genre attracts freemium customers and what type attracts subscription members?
- Which genre has the best freemium to subscription rate?
- If I was working for lynda.com
- Is there a relationship between users who finish X chapters within a Y time frame and the likelihood of them completing a course?
- What do students who don’t finish a course have in common?
The second thing is to download an initial set of data. Start by selecting a range of metrics and dimensions along with an appropriate date range and shove it all into excel.
At this point, using a pivot table and graph, I chop and change the metrics and dimensions until something strikes me as odd or different or I start seeing relationships.
What am I looking for?
Using the previous example, what I was looking for was a change in behaviour.
If you are unsure what to look for then think about the metrics and consider the dimensions that could influence them. You should be looking for a relationship between metrics like:
- revenue
- orders
- clicks
- average order value
- time on site
- bounce rate
- LTV of a product
and the dimensions that cause this change such as:
- date
- time
- product
- promotional
- competitor
- legal etc…
Below are some graphs from my exploratory analysis and the patterns I was able to spot.
The red line in the first graph shows the addition of some customer behaviour which didn’t exist before, or the increase in some customer behaviour. Indicated by the red arrows.
Using the statistical tool in Excel, I was able to look for statistical significance in my observed data. We can see that the P value is less than 0.05 indicating over 95% confidence in the change in behaviour.
Final Thoughts
I used Excel here to do my data mining and exploratory analysis. I could have just as easily used Tableau.
Analysis (be it exploratory or non-exploratory) is only beneficial if someone has the ability to, or is expected to, act on it and make some decisions. If your analysis doesn’t drive change then it’s not worth doing.
This was a fantastic read. Love from the citizens of Punjab university.