fg-data-profiling: a one-line exploratory data analysis tool for comprehensive dataset profiling and quality alerts

What it solves

fg-data-profiling provides a fast, one-line solution for Exploratory Data Analysis (EDA). It extends the basic functionality of pandas df.describe() to deliver a comprehensive analysis of a dataset, which can be exported as HTML or JSON reports.

How it works

The tool takes a pandas DataFrame (or Spark DataFrame via PySpark support) and automatically generates a detailed profiling report. It performs type inference to detect data types and calculates descriptive statistics, correlations, and data quality warnings.

Who it’s for

Data scientists and analysts who need to quickly understand the structure, quality, and characteristics of new datasets, including time-series and text data, without writing extensive manual analysis code.

Highlights

Comprehensive Analysis: Includes univariate analysis (descriptive statistics, histograms), multivariate analysis (correlations, missing data), and global dataset overviews.
Data Quality Alerts: Automatically flags issues like high correlation, skewness, missing values, and constant values.
Specialized Profiling: Dedicated support for time-series (auto-correlation, seasonality) and text analysis (scripts, common categories).
Versatile Output: Reports can be exported as HTML files, JSON strings, or rendered as interactive widgets directly within Jupyter Notebooks.
Scalability: Includes support for PySpark to handle larger datasets.
Integration: Connects with tools like Great Expectations, Streamlit, Dash, and workflow orchestrators like Airflow and Kedro.

fg-data-profiling: a one-line exploratory data analysis tool for comprehensive dataset profiling and quality alerts

fg-data-profiling: a one-line exploratory data analysis tool for comprehensive dataset profiling and quality alerts

What it solves

How it works

Who it’s for

Highlights

Sources