MASSMINE:~$
Developing a graphical experience for a command-line data scraping tool
UX Design | 6 months
Massmine is a command-line tool designed for researchers to simplify the collection of data from various online sources.
I became a part of an initiative to design a graphical user interface for Massmine during my master's program at Arizona State University. Our team comprised three designers and was led by our professor, Dr. Claire Lauer, with Massmine as our client.
I had the privilege of contributing to various aspects, including UX research and visual design, to shape the product's look.
Our efforts led Massmine to successfully secure funding from the National Endowment for the Humanities (NEH).
The Problem to Solve
Massmine was initially developed as a command-line software without a graphical user interface (GUI). Command-line software provides a terminal interface where users must input complex commands to perform actions. For non-technical users, this requires completing a tutorial and familiarizing themselves with command usage before effectively using the software.
The primary issue with Massmine was its lack of a user-friendly interface that could cater to a diverse range of users.
Originally, the tool was created by a team of developers based on requirements provided by scientists who collected web data. However, the team did not have a clear understanding of the tool's end users.
Goal
The goal was to design a GUI that would simplify the process of online data scraping for tech-novice users as well as provide a good user experience (UX) for tech-expert users.
Design Process
My team and I followed the traditional Design Thinking process.
EMPATHIZE
# A Twitter search query for "love"...
massmine --task=twitter-search --query=love --count=300
THE NATURE OF SCRAPING FROM MASSMINE'S COMMAND-LINE INTERFACE
One of my challenges was to conceptualize how to translate the execution of terminal commands into a graphical format that could aid non-technical users in using the tool. Therefore, each team member had to understand the execution of every terminal command provided by Massmine.
Massmine allows users to scrape data from four data sources:
• Tumblr
• Twitter
• Wikipedia
• Web (URL)
Each data sources have a list of tasks that perform a specific function, and each task is associated with a set of task parameters.
For example, the functionality to pull tweets from Twitter would have a task called twitter-search. This task is then coupled with its associated parameters like query (search query string) and count (maximum number of tweets to return) to complete the command for execution.
Task
Parameters
UNDERSTANDING TARGET AUDIENCE
To better understand Massmine's target audience, my team and I conducted an initial round of interviews with the developers and stakeholders of Massmine. We learned that data scraping through the web is majorly done by data analysts and researchers who study Humanities, Psychology, and Communication. Further, these user types may or may not have appropriate technical skills to use a command-line interface for data scraping.
Since our goal was to design a system that could accommodate all types of user skill levels, we narrowed down our end users into two main types:
• Tech-novice researchers
Who struggle to learn appropriate programming languages for data collection.
• Tech-expert researchers
Who wants to have a better experience with the data collection process with the current tools they use.
INTERVIEWING TARGET USERS
My team and I interviewed two humanities researchers who were of tech-novice level, and two data analysts who were experts in using technologies like Python, R, and MATLAB for pulling data from the web.
I moderated the sessions for the tech-novice types and questioned them regarding their frustrations with online data scraping, their vision for an ideal data scraping tool, the most needful and useful features, and ways they want to retrieve data locally.
The tech-experts were interviewed similarly, focusing more on the drawbacks with existing technologies they use for data collection.
DEFINE
CONSOLIDATING THE FINDINGS
Expert users expressed needs to scrape vast amounts of data as quickly as possible with portability and preservability.
Capability of downloading data in multiple formats.
Implementation of a “plain language” command.
Tech-novices stated that they hold limited tolerance for learning a new tool.
Ability to keyword searches or simple text-based search.
The ability for strong data visualizations.
PERSONAS
After conducting insightful interview sessions and consolidating the needs and pain points of the users, I created personas representing each of our targeted user types. This helped my team and me to understand the mental models of users when scraping data online and to develop empathy for them. This empathy was crucial for us as we moved forward with ideating and designing solutions that would effectively meet their needs.
Tech-Experts
Tech-Novices
IDEATE
CONCEPTUALIZING
Based on our findings, each designer was required to explore a range of ideas that we would later converge.
My team and I realized that Twitter had the most complex set of commands for data scraping, so we initiated our ideation with Twitter in mind.
I began by putting everything I had been envisioning on paper.
I could visualize that a user's portfolio would contain multiple studies from various data sources. It was essential to encapsulate all studies within their respective data source groups, and each data source group within the user's portfolio.
I started envisioning the application's information architecture at this point.
A study within a group would have a list of tasks specific to that data source, with input fields for users to enter parameters. Users could collect and view data in a tabular format on a window located on the same page.
My focus was to use plain and simple terminologies that were self-explanatory, so users could easily understand the tasks and the parameters to be entered.
A view of the ongoing data extraction process with a preview of the raw data extracted in a tabular form.
A separate page for visualizing the extracted data in a graphical form and exporting them.
The idea was to limit access to this page view until the data was extracted completely as it was a technical challenge from the development side to visualize data dynamically in real time.
My team and I also started working on the overall flow of the application in parallel. Here's the initial version of the user flow we created:
With the initial design draft created, I began prototyping our sketches into wireframes. This process helped the team experience the overall interaction of the application.
CHALLENGES
The entire ideation phase resulted in an initial design that we could showcase and refine. However, this also brought several challenges and complexities. Addressing these issues turned out to be the most enjoyable part of the process.
Here are a few of them:
One of the challenges our team encountered while wireframing was related to the active data collection from sources like Twitter. Users needed to obtain API access from Twitter to scrape data through Massmine.
This limitation inspired us to make Massmine an account-based software. We introduced an interface within the application to guide users in obtaining their API access and an interface to connect Massmine with Twitter once their API access was granted.
Another challenge was user retention. Since obtaining API access typically takes 2-3 days (especially for Twitter), we were concerned that users might lose interest during this waiting period, making it difficult to retain them.
To address this, we implemented a sandbox environment within Massmine that allows users to experience the application without needing API access, using dummy data instead. This feature also removed the immediate obligation for users to sign up, allowing them to explore the tool freely and become familiar with its functionalities.
Another challenge we faced was the complexity in the information architecture and navigation of the software. Initially, we focused on designing with Twitter in mind. However, when we considered the inclusion of other data sources, we realized the navigation could become complex.
Questions arose, such as:
-
What if users accessed multiple studies from multiple data sources simultaneously?
-
What if they closed the browser in between?
-
How would we organize multiple studies from different data sources?
-
How would we display a summary of all studies conducted?
To address these challenges, we started referring to individual sessions of data scraping as "studies" and grouped all studies from the same data source under "collections."
After extensive brainstorming, we decided to implement tabs for each study, allowing them to run as parallel threads. However, due to technical limitations expressed by the developers, we limited active data scraping to one study at a time. I also proposed adding an extra page to summarize all the studies, which we called the "dashboard."
PROTOTYPE
FIRST OF THE FINAL LOOK
TEST
USABILITY TESTING
My team and I adopted the methodology of task-based usability testing.
We interviewed two researchers (one tech-expert and one tech-novice) and observed them while they performed the below-given tasks:
● Navigate to the home screen and explain their first impressions
● Start a new task in an existing study
● Attempt to analyze the data through the command line/ coding page
● Navigate through the data analysis page while giving their thoughts
● Attempt to export the data
Takeaways
-
Collaborate with Professional Developers Early: Engaging with professional developers from the outset is crucial. Their technical expertise can help identify limitations and possibilities early in the design process, ensuring that the proposed features are feasible and aligned with technical realities.
-
Direct Interaction with Stakeholders: Direct interaction with Massmine stakeholders provided valuable insights into user needs and priorities. This exposure to managerial decisions, such as optimizing software releases based on funding, was instrumental in shaping the design process and setting realistic project goals.
-
Designing with Constraints in Mind: Understanding the constraints, such as funding and technical limitations, allowed us to design more effectively. This approach ensured that the design was practical and aligned with the project's scope and resources.
-
Synchronous Teamwork: Working synchronously with cross-functional teams, including developers and stakeholders, facilitated better communication and collaboration. This alignment was essential for maintaining a cohesive vision and ensuring that the design met both user needs and technical requirements.
Next Steps
After completing the exciting work on Massmine, my journey with the project came to an end.
However, the project had clear next steps for further development:
-
Usability Testing and Design Refinement: The next phase involved adhering to usability test feedback and refining the designs. This process aimed to enhance user experience by addressing any issues identified during testing.
-
Expanding the GUI for Other Data Sources: Further development of the GUI was necessary, especially in designing pages for data scraping from sources beyond Twitter. This expansion would provide a consistent and user-friendly experience for interacting with various data sources.
-
Enhancing the Analytics Page: Another priority was to add more details to the Analytics page. This included offering an in-depth look at various types of graphs and ensuring that data was displayed in a user-friendly manner. The goal was to make complex data analysis accessible and intuitive for users.
-
Integrating a Command Line Interface: An additional feature to be integrated was a traditional command line interface within the application. This would cater to expert users who prefer coding in their traditional way, providing flexibility and catering to a wider range of user preferences.