chevron_left Back
Automation 25 May 2026

Process Mining Before RPA: Why Skipping This Step Is the Most Expensive Automation Mistake

Key takeaways

  • Process documentation describes how a process should work. Process mining shows how it actually runs, including every exception, variant, and undocumented path.
  • A basic process mining setup requires four data fields: case ID, activity name, timestamp, and user ID. Real data matters more than perfect data.
  • Initial findings are available within two to three weeks. The main variable is analyst time, not tool complexity.
  • Not every automatable process delivers business value. Prioritization requires scoring volume, variant complexity, decision logic clarity, and business context together.

Most RPA implementations start with a workshop. Business analysts map processes on whiteboards, document the happy path, note a few exceptions, and hand off a process description to the automation team. The team builds to spec. The bot goes live. And then, somewhere between user acceptance testing and production, the edge cases start appearing.

The ones nobody documented. The approval flows that only happen for invoices above a certain threshold. The cases that get rerouted between three departments before anyone touches them. The exceptions that account for 40% of processing time but 5% of volume.

This is not a failure of the automation team. It is a failure of the discovery method. Manual process mapping captures what people remember and what they think is important. Process mining captures what actually happens.

What process mining reveals that manual mapping does not

Process documentation describes how a process should work in theory. In practice, theory and reality frequently diverge. Process mining reads event logs directly from the systems where work happens, reconstructing every actual execution path with all its variants and exceptions.

The difference in output is significant. A manual mapping exercise produces a process diagram that reflects the consensus view of the people in the room. Process mining produces a statistical model of every path the process has actually taken, with frequencies, durations, and deviations included.

The most common surprises are not dramatic. They are structural. A process that appears straightforward in documentation turns out to have a secondary approval chain that activates under specific conditions. A team that appears to work independently turns out to share a step with another department, creating a hidden handoff point that nobody had formally documented.

Invoice processing is a useful example. The main approval path is typically well understood: invoice arrives, matches purchase order, gets approved, gets paid. Process mining confirms that this path works efficiently. What it also reveals is that a significant share of invoices do not follow this path. They require additional approvals, cross-departmental escalation, or manual intervention at steps that were not visible in the original documentation. In a real engagement, these secondary paths turned out to be the primary source of processing delays. They were generating the bottlenecks the team had been trying to address for months, but had been looking for them in the wrong place.

This is the core value of process mining before automation: it prevents teams from automating a version of the process that does not reflect how the work actually gets done.

What data process mining requires

A common concern about process mining is data readiness. In practice, the minimum requirements are lower than most teams expect.

A basic process mining configuration requires four fields: a case identifier, an activity name, a timestamp, and a user identifier. This is enough to run an initial analysis and identify the main process variants, frequency distributions, and deviation patterns.

Data quality matters more than data completeness. The critical requirement is consistency: standardized activity names, unified nomenclature, and descriptions that mean the same thing across systems and teams. Inconsistent naming conventions are the most common data quality issue encountered in process mining projects, and they are usually fixable without significant infrastructure work.

The scope of historical data required depends on process characteristics. A process that runs in daily cycles and does not have significant seasonal variation needs a minimum of several months of archived data to produce reliable results. Processes with seasonal patterns or lower frequency require a proportionally longer observation window. Where system retention policies do not support historical analysis, the project timeline extends to accommodate a live data collection period.

The practical implication is that most organizations already have sufficient data to run an initial process mining analysis. The question is rarely whether the data exists, but whether it can be accessed and standardized.

How to prioritize automation candidates from process mining output

The most common prioritization mistake is automating what is technically automatable rather than what actually generates return. A high-volume process that runs without friction does not become more valuable when automated. It just runs faster. Process mining data makes this distinction visible before any build work starts.

The scoring criteria that matter are volume, variant complexity, decision logic clarity, and business context. Volume drives the economic case: high-frequency processes generate larger returns from automation. Variant complexity affects implementation effort: a process with twelve distinct variants is significantly harder and more expensive to automate than one with three. Decision logic clarity determines whether a standard RPA approach is sufficient or whether additional components are needed: processes that require judgment calls or individual interpretation may be candidates for AI-assisted decision support, which increases solution complexity and delivery time. Business context covers criticality, stability over time, and seasonal patterns.

The scoring works in both directions. Some processes that score lower on business factors are simple enough in their automated form that they are worth implementing early as quick wins. Getting early wins into production builds organizational confidence in the automation program, which matters for the harder implementations that follow. The sequencing decision is as important as the selection decision.

The practical output of this prioritization is a sequenced backlog: a list of automation candidates ranked by expected value, with implementation complexity estimated for each. Process mining data makes this backlog more defensible than one produced from manual assessment, because the underlying statistics are observable rather than estimated.

How long process mining takes before producing actionable recommendations

Initial findings from a process mining engagement are typically available within two to three weeks of analysis.

The main variable affecting timeline is not the tool. It is the analyst work required to verify the output, document process variants and exceptions, and translate statistical findings into actionable process descriptions. This documentation work takes time because it requires human judgment: identifying which variants are genuine business exceptions versus data artifacts, which bottlenecks are structural versus situational, and which findings are significant enough to affect automation sequencing.

The secondary variable is data availability at project start. When systems provide sufficient historical data immediately, analysis begins promptly. When data retention policies require a live collection period, the timeline extends accordingly. Clarifying data availability before project kickoff is one of the most effective ways to manage schedule risk in process mining engagements.

Most common process mining findings by environment

Across manufacturing, financial services, and shared services environments, process mining tends to surface the same categories of inefficiency regardless of industry.

Approval chains that require coordination across multiple teams and rely on email-based handoffs rather than structured queues generate a consistent pattern of delay. The root cause is usually not the number of approvals required but the absence of a structured handoff mechanism: work sits in inboxes rather than queues, visibility is low, and follow-up is manual.

Duplicate processing, where two separate teams perform the same step on the same data, appears regularly in organizations where process ownership is distributed across departments. Neither team is aware the other is performing the same work, because there is no cross-functional visibility into the end-to-end process.

Poor prioritization of incoming work is another frequent finding. Teams tend to process the easiest cases first, deferring complex or non-standard cases. This creates accumulation of the cases that are actually hardest to manage, with downstream SLA risk concentrated in the backlog of deferred exceptions.

Process mining also consistently reveals that process complexity is substantially underestimated before analysis. Processes that appear to have three or four variants in documentation turn out to have fifteen or twenty when event log data is analyzed. This gap between documented and actual complexity has direct implications for automation planning: projects scoped against the documented version of a process frequently encounter scope expansion once the full variant set is visible.

Lack of standardization is the fifth common finding. The same process, performed by different individuals, produces measurably different execution patterns. Without process mining, this variation is invisible. With it, it becomes a quantified input to both process redesign and automation scope decisions.

The consistent pattern across all five findings is the same: organizations do not have accurate visibility into how their own processes work. Process mining does not reveal that processes are broken. It reveals that organizations are managing processes they do not fully understand.

How process mining output transfers to the RPA build team

The output of a process mining engagement is the starting point for RPA implementation, not a separate artifact that has to be translated.

The documentation generated by process mining analysis reduces the pre-project work that RPA teams normally perform from scratch. Process flow documents, which are typically produced manually through workshop facilitation and analytical work before any automation project, can be generated with significantly less effort when process mining data is available. The statistical basis is already there. The analyst work shifts from reconstruction to interpretation.

The specific deliverables that transfer to the RPA build team include the process variant list with flow diagrams for each variant, statistical data covering volumes, SLA parameters, and seasonal patterns, documented business exception lists with validation rules, and an identified list of areas with the highest potential for process simplification before automation begins.

That last item is worth noting separately. Process mining frequently identifies process redesign opportunities that are distinct from automation opportunities. Simplifying a process before automating it produces better automation outcomes and is sometimes more valuable than the automation itself. The handoff to the RPA team should include a clear separation between what will be automated as-is and what should be redesigned first.

Process mining is not a step before RPA. It is the condition under which RPA has a reasonable chance of delivering what was promised.

FAQ

What is process mining and how does it differ from manual process mapping?

Process mining extracts event log data from the systems where work is performed and reconstructs the actual execution paths of a process, including all variants, exceptions, and deviations from the documented procedure. Manual process mapping captures how process participants describe and remember the process, which typically reflects the intended design rather than actual execution. The practical difference is that process mining produces a statistically accurate model of process behavior, while manual mapping produces a consensus description that may omit significant variation.

What minimum data is required to run process mining?

A basic process mining analysis requires four data fields: a case identifier, an activity name, a timestamp, and a user identifier. Data quality requirements center on consistency rather than completeness: standardized activity names and unified nomenclature are more important than having every possible data field available. Most organizations can run an initial analysis with data that already exists in their core systems.

How long does a process mining project take before producing recommendations?

Initial findings are typically available within two to three weeks of analysis. The primary variable affecting timeline is the analyst work required to verify output and document findings, not the technical analysis itself. Projects where historical data is immediately available move fastest. Projects where system retention policies require live data collection require additional time before analysis can begin.

How do you prioritize automation candidates from process mining results?

Prioritization uses a scoring framework that evaluates four dimensions: case volume (the primary driver of economic return), process variant count (the primary driver of implementation complexity), decision logic clarity (which determines whether standard RPA or AI-assisted components are required), and business context including process criticality and stability over time. High-volume, low-variant, deterministic processes are typically the strongest automation candidates. Some lower-scoring processes are worth implementing early as quick wins to build organizational confidence in the automation program.

What are the most common process mining findings?

The most frequent findings across environments are: approval chains relying on email handoffs rather than structured queues, duplicate processing steps performed by separate teams without cross-functional visibility, poor prioritization of incoming work that defers complex cases and concentrates SLA risk, significant underestimation of process variant complexity compared to documented procedures, and lack of execution standardization across team members performing the same process.

How does process mining output connect to RPA implementation?

Process mining output directly reduces the pre-project documentation work that RPA teams normally perform from scratch. The deliverables that transfer to the build team include process variant lists with flow diagrams, volume and SLA data, business exception documentation with validation rules, and a prioritized list of simplification opportunities. This handoff allows the RPA team to begin design work with a statistically grounded process model rather than a workshop-based approximation.

Joanna Maciejewska Marketing Specialist

Related posts

    Blog post lead
    Automation Delivery Technology

    Shift-Left Testing in Practice: What It Actually Means for Your Release Cycle and Team Structure

    Key Takeaways The Phrase That Sounds Obvious Until Someone Asks You to Implement It Shift left has become one of those phrases that sounds self-evident until someone asks you to implement it. Engineering leaders hear it in every conference talk and see it in every DevOps maturity framework. What they encounter less frequently is an […]

    Blog post lead
    Automation Delivery Industry

    Software Testing in Automotive Manufacturing: Where Failures Begin

    Key Takeaways The Failure That Happens Before Testing Begins Most defects found in production automotive software were not inevitable. They were predictable. The configurations that failed were the ones nobody prioritized during refinement. The integration boundaries that broke were the ones no contract test covered. The data errors that drove wrong operational decisions had been […]

    Blog post lead
    Automation Industry Trends

    Digital Innovation Without Clinical Throughput: The Hidden Capacity Crisis in Healthcare

    Technology is expanding while clinical flow remains constrained Healthcare systems in 2026 operate in an environment that is visibly more digital than it was only a few years ago. Electronic health records are widespread and increasingly standardised, interoperability programmes are expanding under regulatory pressure, AI-assisted documentation tools are entering clinical workflows, and revenue cycle platforms […]

© Copyright 2026 by Onwelo