Rons Data Cleaner Concepts - Rons Place Blog

    Aaron Stewart

Rons Data Cleaner Concepts

  • Introduction
  • Data Cleaner
  • Process Job
  • Quick Job

Introduction

Rons Data Cleaner is built around two core concepts, Cleaners and Jobs. 

Cleaners contain a list of instructions that tell the application what actions to perform on a table of data (for example CSV), and can be shared between Jobs.

Jobs contain information about the actual data tables themselves, like where they are and how to read them. Jobs files can be located relative to the data files they process, so that simply double-clicking the Job and hitting Process makes processing 100’s of files intuitive. 

If data needs to be processed ad-hoc there is a Quick Job option to process a file or directory quickly. 

Jobs and Cleaners can be saved, renamed and used again later which becomes a great time saving for future data processing jobs. 

Data Cleaners

Rons Data Cleaner - Cleaner

Cleaners are the core of Rons Data Cleaner (and its namesake). They contain a list of actions, or rules, that describe what the Cleaner is going to do to any particular data source.

In order to start cleaning or amending files, a Cleaner must be created, and cleaning rules added to it. The categories of rules are as follows: 

  1. Columns Selectors

Each Column Selector rule defines criterion to select one or more columns. Column Selectors are used throughout a Cleaner to determine which columns a rule operates on.

  1. Rows Selectors

Each Row Selector rule defines criterion to select one or more rows. Row Selectors are also used throughout a Cleaner to determine which rows a rule operates on.

  1. Column Processors

Column operations define what actions are to be performed on the columns, and most require Column Selectors. For example, delete or merge columns.

  1. Row Processors

Row operations define what actions are to be performed on the rows, and all require Row Selectors, and some Column Selectors. For example, delete or duplicate rows.

  1. Cell Processors

Cell operations define what actions are to be performed on each cell, and all require Row Selectors, and Column Selectors. For example, adding row number to a cell.

Cleaner Rules Processing

Cleaner rules are processed in the following order:

  • Cell rules that apply to column names

Any cell rules that have the Row Selector of type 'Header Row'. Subsequent rules that use column names will use the newly cleaned column names.

  • Column Operations

Columns are processed once, before the body data, to establish the shape of the output data.

  • Row Operations

Rows are added or removed from the processing pipeline and passed to the cell operations.

  • Cell Operations

All cell rules that do not have the Row Selector of type 'Header Row' are applied.

Data jobs

Rons Data Cleaner - Job

The aim of Rons Data Cleaner is to shave hours off our customers day by allowing one-click data processing, whilst drinking coffee. The Cleaners describe what happens to the data, so a way of describing whichdata to apply the Cleaner to was necessary. The Job does that.

Jobs contain a list of data sources, a Cleaner, and a list of outputs, which is all the information needed to process data with a button press. Multiple sources or outputs can be selecting allowing the processing of multiple files at the same time.

There are five sections need to be set up with the necessary information to run the job:

  1. Source Containers

Source Containers do exactly as the name implies: contain data sources. Currently there is only one type of container, a directory containing files, but in the future source containers will include Azure storage and various types of database.

  1. Source Profiles 

Source Profiles contain information about how files in the source containers are to be processed. Typically they contain a file (like '*.csv') to select files in the container, and information about how to read them. For example CSV files need to have the type of delimiters to use in order to be read. 

  1. Output Containers 

Similar to Source Containers, Output Containers do exactly as the name implies: contain the result of file processing.

  1. Output Loggers

When processing large data source it can be difficult to spot errors, so loggers can be configured and associated with output formatters. Output Loggers require an Output Container, which can be separate from the data Output Container(s).

  1. Output Formatters 

Output Formatters determine the format of the data that is written after it has been processed. Output Formatters require an Output Container.

Processing Jobs and Preview

Rons Data Cleaner - Preview

After a job has been configured, it can be easily run at the click of a button or previewed to see the result of the rules before execution. Processing runs in parallel.

Quick Job

Rons Data Cleaner - Quick Process

For less complicated processing, where the creation of a Job seems a little excessive, Quick Job can be used to simply specify the elements needed to clean some files:

  • Source directory or file
  • Source format
  • Data cleaner
  • Destination directory
  • Logger (optional)
  • Output format

Tags: Rons Data Cleaner

Related Posts

Rons CSV Editor - Ultimate CSV Editor

Learn more

Rons Data Cleaner - Powerful Data Processing

Learn more

Rons WebLynx - Organize Your Internet

Learn more

Rons HTML Cleaner - Easy HTML Processing

Learn more

Rons Renamer - Powerful Batch File Renamer

Learn more