Accelerating data exploration through automation

Team & Role

I'm leading this project under the mentorship of the Data Science group director.

What I did

Semi-structured interviews, brand design, contextual enquiry, market research, competitive analysis, literature reviews, wireframing, usability testing and development.

What I delivered

Wireframe, Research Findings, Long term and Short term goals

Project Duration

June 2022 - May 2023

Augmenting the data scientist through automation

By integrating automation in the workflow of a data scientist in a controlled manner, Plato aims to assist and accelerate their knowledge discovery process, while making sure the human remains in the drivers seat.

Keeping in mind the curiousity and creativity that is necessary for this exploration, the focus of Plato is to steer away from complete automation and look to augment human capabilities through data visualization and pattern discovery.

Problem space

About 50-80% of a data scientists time is spent on making sense out of large datasets

Despite the increasing trend in automating different parts of the DS lifecyle, and increasing usage patterns of autoML tools such as VertexAI and AzureML, data exploration continues to be a painstakingly time consuming and manual process. Due to the amount of creativity required in this phase, they must be carefully designed to place the control in the hands of the human. At what points can automation be introduced to augment human creativity and curiousity? 

User Research I

My research goals of Round 1 were to understand how users would explore an unfamiliar tabular dataset for a classification task and their experience with existing autoML tools. To explore this, I conducted 45 minute semi-structured with 6 data scientists within a research team.

Key Finding #1

Users had different perspectives on data exploration

Some users relied heavily on modeling to understand the data, and were looking for tools that would give them a stronger grasp.

Key Finding #2

Users faced a high cognitive load while studying datasets

Users talked about feeling lost while studying large datasets, not knowing what to do with the data, and the danger of overlooking patterns.

Key Finding #3

Users felt a lack of control in existing autoML tools

Users talked about existing autoML tools being a “no-brainer” and “blackboxes”.


Usability Testing

My research goals for the usability tests were to understand how quickly users were able to navigate through the dataset at different levels of detail. I also looked to understand how well the system supported their data exploration workflow.

I conducted 45 minute task-based usability tests with 5 data scientists.

Key Finding #1

Users founded the guided exploration helpful

2 users thought the system architecture provided a simple entrypoint and helped them where to look. All users found the generated insights and reports useful to their analysis.

Key Finding #2

Users would like to see less language used

4/5 users had doubts whenever any qualitative terms were used to describe the data. They found it easier trusting the graphs and numbers.

Key Finding #3

Users would prefer a different global data visualization

4/5 users wanted to start with visualizations of characteristics more descriptive of columns before understanding quality

Key Finding #4

Users would prefer to understand relationships better

2/5 users wanted to visualize relationships at the column level while another user wanted to visualize the schema in a more efficient manner.

Coming Soon!

I'm currently working on acquiring licenses to share some of my designs and test results. Please check back sometime at the end of March for a more comprehensive case study.