Redesigning the alert system (IMS)

Redesigning the alert system (IMS)

Revamped the entire product by redesigning the workflow of the incident management system, an integral core feature, and introducing a new onboarding process.

Revamped the entire product by redesigning the workflow of the incident management system, an integral core feature, and introducing a new onboarding process.

Role

Founding Product Designer & Researcher

Timeline

15 weeks

(Sep '21 - Dec '21)

Project type

A live product of Cliff.ai (now Quantive)

Team

1 Product Designer & 1 Product Design Intern

Product Managers

Engineers & Developers

Overview

A brief intro about Cliff.ai and the project

A brief intro about Cliff.ai and the project

Cliff.ai (B2B SaaS) is a business reliability tool that helps companies actively monitor their important business metrics, automatically catch unexpected spikes or dips in the metrics and alert the users. However, users encountered challenges with this core feature.

Cliff.ai (B2B SaaS) is a business reliability tool that helps companies actively monitor their important business metrics, automatically catch unexpected spikes or dips in the metrics and alert the users. However, users encountered challenges with this core feature.

Recognizing the complexity, I took on the responsibility of simplifying this intricate process, ensuring that our users find the alert system effortlessly easy to use.

Recognizing the complexity, I took on the responsibility of simplifying this intricate process, ensuring that our users find the alert system effortlessly easy to use.

Led and mentored the team through the end-to-end design process, from conception to launch.

Led and mentored the team through the end-to-end design process, from conception to launch.

Goal & Impact

What impact we aimed for?

What impact we aimed for?

Our Goal:

Optimizing for Users’ Time-Saving

Boosting Business Growth for both Users and Cliff.ai

The Impact:

64%

We succeeded in reducing the task completion time by 64% which had an enormous positive impact on Cliff.ai's and users' business growth.

Problem & solution in a nutshell

A synopsis of the problem and solution

A synopsis of the problem and solution

Users faced the challenge of not getting alert notifications despite surpassing the threshold
During a festive season, a client's 65K visitor alert on their e-commerce platform failed to notify them, causing frustration. Reporting the issue led to two more complaints and further scrutiny.

Users faced the challenge of not getting alert notifications despite surpassing the threshold
During a festive season, a client's 65K visitor alert on their e-commerce platform failed to notify them, causing frustration. Reporting the issue led to two more complaints and further scrutiny.

Introducing Monitor, Incidents, and Escalation Policies as the Solution
The old alert system involved setting threshold values and KPIs. However, based on our subsequent research, we introduced Monitor, Incidents, and Escalation policy:

Introducing Monitor, Incidents, and Escalation Policies as the Solution
The old alert system involved setting threshold values and KPIs. However, based on our subsequent research, we introduced Monitor, Incidents, and Escalation policy:

  • Streamlining Monitors: Automated and customizable ‘Alert Rule Setup’

Old

New

  • Efficient Incident Management: Centralizing Insights with Date-Wise Incident Page

Old

New

  • Efficient Notification Workflow: Introducing Customizable Escalation Policies at Cliff.ai

New

Research & Analysis

Understanding the problem

Understanding the problem

We began our research by evaluating the existing feature's feasibility, diving into its fundamental aspects to understand the 'whats,' 'whys,' and 'hows.' We employed the following research techniques, including:

We began our research by evaluating the existing feature's feasibility, diving into its fundamental aspects to understand the 'whats,' 'whys,' and 'hows.' We employed the following research techniques, including:

Think-out-loud sessions with two participant groups, one familiar with the product and the other unfamiliar
To gather diverse perspectives and valuable insights, we collaborated with the technical team, and assigned the task of creating a new account, setting an alert threshold value, and analyzing the root cause of anomalies. Here are the outcomes:

Think-out-loud sessions with two participant groups, one familiar with the product and the other unfamiliar
To gather diverse perspectives and valuable insights, we collaborated with the technical team, and assigned the task of creating a new account, setting an alert threshold value, and analyzing the root cause of anomalies. Here are the outcomes:

85%

Well-acquainted participants were able to complete the task.

Well-acquainted participants were able to complete the task.

70%

Unfamiliar participants failed to complete the task.

Unfamiliar participants failed to complete the task.

User interviews with 7 participants including business owners and newer platform users
The key insights from the interviews:

User interviews with 7 participants including business owners and newer platform users
The key insights from the interviews:

  • The most common problem was that, even after setting up monitoring criteria, clients could not get timely alerts.

  • The most common problem was that, even after setting up monitoring criteria, clients could not get timely alerts.

  • Identifying whom to notify and when during incidents was a user pain point.

  • Identifying whom to notify and when during incidents was a user pain point.

  • Lack of clarity in understanding the intricate process of setting "Alert Rules."

  • Lack of clarity in understanding the intricate process of setting "Alert Rules."

  • Users highlighted challenges in extracting crucial insights and conducting root cause analysis from the metrics.

  • Users highlighted challenges in extracting crucial insights and conducting root cause analysis from the metrics.

Competitive analysis to learn from our competitors' successful features and identify opportunities to enhance our platform.

Competitive analysis to learn from our competitors' successful features and identify opportunities to enhance our platform.


Changes we made after competitive analysis:

  1. Nomenclature: The naming convention is crucial for accessibility. While our team understood it, we realized that improvements were needed for better user-friendliness.


  2. Hierarchy: We enhanced the system's overall hierarchy, structuring information after brainstorming feature feasibility and created a new user flow with the key steps (I included it later in this case study)


  3. Incident List: Managing the incidents by compiling all of them in a list.

  1. Nomenclature: The naming convention is crucial for accessibility. While our team understood it, we realized that improvements were needed for better user-friendliness.


  2. Hierarchy: We enhanced the system's overall hierarchy, structuring information after brainstorming feature feasibility and created a new user flow with the key steps (I included it later in this case study)


  3. Incident List: Managing the incidents by compiling all of them in a list.

HMW statement

How might we optimize our platform for efficient anomaly analysis, streamlined alert setup, and timely notifications, fostering business growth for both users and Cliff.ai?

How might we optimize our platform for efficient anomaly analysis, streamlined alert setup, and timely notifications, fostering business growth for both users and Cliff.ai?

Target users

The users we are aiming for

The users we are aiming for

Our platform is designed for business and operations teams within organizations of all sizes. It caters to a wide range of users, including data analysts, site reliability engineers, product managers, executive leaders, and customer success managers.

Our platform is designed for business and operations teams within organizations of all sizes. It caters to a wide range of users, including data analysts, site reliability engineers, product managers, executive leaders, and customer success managers.

Ideation

Creating the User Flow

Creating the User Flow

The old user flow lacked a clear process for setting up thresholds and the alert system:

The old user flow lacked a clear process for setting up thresholds and the alert system:

The new user flow involves setting the threshold value and configuring the alert system in a sequential queue:

The new user flow involves setting the threshold value and configuring the alert system in a sequential queue:

Mid-fi & User-testing

Mid-fi prototypes for user testing and evaluation

Mid-fi prototypes for user testing and evaluation

Based on research and competitive analysis outcomes, we divided our alert system into three parts: Monitors, Incidents & Escalation Policies. Here’s the first iteration we prototyped to share with the participants for user testing:

Based on research and competitive analysis outcomes, we divided our alert system into three parts: Monitors, Incidents & Escalation Policies. Here’s the first iteration we prototyped to share with the participants for user testing:

Monitor list & Monitor detail page

Incident list & Incident detail page

Escalation policy list & adding new policy page

User test results:
Primary feedback on our initial iterations included:

  • The incident list we designed would result in endless scrolling due to the continuous nature of incident occurrences.

  • The details in the list are not enough and user would not get enough idea about the incident.

  • The monitor list should include details about the type of monitor.

  • Users wish to access details of incidents attributed to a specific monitor.

  • Users want to view the details of escalation policy, including the assigned users or teams, chosen platforms, and specifics about streams and measures.

  • The incident list we designed would result in endless scrolling due to the continuous nature of incident occurrences.

  • The details in the list are not enough and user would not get enough idea about the incident.

  • The monitor list should include details about the type of monitor.

  • Users wish to access details of incidents attributed to a specific monitor.

  • Users want to view the details of escalation policy, including the assigned users or teams, chosen platforms, and specifics about streams and measures.

Solution in detail

The solution - Introducing ‘Incident Management System’

The solution - Introducing ‘Incident Management System’

Based on the user-testing feedback, we started working on the screens. Let’s go deeper and view incidents, monitors and escalation policies in detail.

Based on the user-testing feedback, we started working on the screens. Let’s go deeper and view incidents, monitors and escalation policies in detail.

Monitors

We automated the Alert Rule setup process (Monitors) to ensure users receive notifications for anomalies, even without setting up a monitor themselves. Users can select rules, add streams, measures, and dimensions with thresholds for metrics.

Monitors

We automated the Alert Rule setup process (Monitors) to ensure users receive notifications for anomalies, even without setting up a monitor themselves. Users can select rules, add streams, measures, and dimensions with thresholds for metrics.

Monitor list page - list of all the monitors created

Now, opening a monitor displays comprehensive details, aiding analysis of top measures and dimensions with assigned responders. The heatmap visualizes past incidents on this monitor.

Now, opening a monitor displays comprehensive details, aiding analysis of top measures and dimensions with assigned responders. The heatmap visualizes past incidents on this monitor.

Monitor detail page

Incidents

At Cliff.ai, we lacked a central incident repository for users to quickly gather insights from metric incidents. Thus, we created a date-wise incident page to streamline this process.

Incidents

At Cliff.ai, we lacked a central incident repository for users to quickly gather insights from metric incidents. Thus, we created a date-wise incident page to streamline this process.

Incident list page

The 'Incidents' screen displays a list of incidents based on monitor thresholds, organized by date. Serving as the central ‘hub’, users can manage alerts and access insights, enhancing accessibility to incident information. The heatmap illustrates resolved incidents over time:

The 'Incidents' screen displays a list of incidents based on monitor thresholds, organized by date. Serving as the central ‘hub’, users can manage alerts and access insights, enhancing accessibility to incident information. The heatmap illustrates resolved incidents over time:

Incident detail page

Escalation policy

Cliff.ai lacked prompt anomaly notifications, so we introduced an escalation policy for users to select recipients and specify details.


An escalation policy describes the following three things:

1. Who to deliver the notifications to.

2. In what order or interval, notifications should be delivered

3. On which platform to deliver the notifications.

Escalation policy

Cliff.ai lacked prompt anomaly notifications, so we introduced an escalation policy for users to select recipients and specify details.


An escalation policy describes the following three things:

1. Who to deliver the notifications to.

2. In what order or interval, notifications should be delivered

3. On which platform to deliver the notifications.

Escalation policies list page

Now comes the fun part, on incident generation, notifications are automatically sent to assigned responders through the Default Escalation Policy. Users benefit from a hassle-free experience without the need for manual setup.


Users can customize the Escalation Policy for efficient incident management, defining rules for intervals and delivery platforms. And, if no response is received in the escalation queue, the actions can be repeated n number of times, which is decided by the user.

Now comes the fun part, on incident generation, notifications are automatically sent to assigned responders through the Default Escalation Policy. Users benefit from a hassle-free experience without the need for manual setup.


Users can customize the Escalation Policy for efficient incident management, defining rules for intervals and delivery platforms. And, if no response is received in the escalation queue, the actions can be repeated n number of times, which is decided by the user.

Adding/Editing an escalation policy

Key learnings

My learnings after leading the project

My learnings after leading the project

  • Planning the roadmap: Pulling off the design journey efficiently was a big challenge especially because this was a huge project I was given ownership for. I ensured that every individual entity including project managers, stakeholders, interns, and engineers were looped into each design decision that I intended to make.

  • Mentoring: Getting the juniors onboard in every design decision to maximize their participation and learning was one of my priority goals while working on this project. It polished my leadership skills.

  • Giving feedback: Taking feedback is easier than giving feedback on someone else’s designs. It helped me improve in a way that I could give feedback without overwhelming their thought process.

  • Audits: The way we iterate while designing, the same way the developers should iterate after audits as nothing can be built perfectly in one go.

  • Planning the roadmap: Pulling off the design journey efficiently was a big challenge especially because this was a huge project I was given ownership for. I ensured that every individual entity including project managers, stakeholders, interns, and engineers were looped into each design decision that I intended to make.

  • Mentoring: Getting the juniors onboard in every design decision to maximize their participation and learning was one of my priority goals while working on this project. It polished my leadership skills.

  • Giving feedback: Taking feedback is easier than giving feedback on someone else’s designs. It helped me improve in a way that I could give feedback without overwhelming their thought process.

  • Audits: The way we iterate while designing, the same way the developers should iterate after audits as nothing can be built perfectly in one go.

That's a wrap! I hope you enjoyed it ✨

That's a wrap! I hope you enjoyed it ✨

That's a wrap! I hope you enjoyed it ✨

That's a wrap! I hope you enjoyed it ✨

Like what you see? Let’s chat!

Find me on:

Like what you see? Let’s chat!

Find me on: