fg-data-profiling: a one-line exploratory data analysis tool for comprehensive dataset profiling and quality alerts
fg-data-profiling: a one-line exploratory data analysis tool for comprehensive dataset profiling and quality alerts
What it solves
fg-data-profiling provides a fast, one-line solution for Exploratory Data Analysis (EDA). It extends the basic functionality of pandas df.describe() to deliver a comprehensive analysis of a dataset, which can be exported as HTML or JSON reports.
How it works
The tool takes a pandas DataFrame (or Spark DataFrame via PySpark support) and automatically generates a detailed profiling report. It performs type inference to detect data types and calculates descriptive statistics, correlations, and data quality warnings.
Who it’s for
Data scientists and analysts who need to quickly understand the structure, quality, and characteristics of new datasets, including time-series and text data, without writing extensive manual analysis code.
Highlights
- Comprehensive Analysis: Includes univariate analysis (descriptive statistics, histograms), multivariate analysis (correlations, missing data), and global dataset overviews.
- Data Quality Alerts: Automatically flags issues like high correlation, skewness, missing values, and constant values.
- Specialized Profiling: Dedicated support for time-series (auto-correlation, seasonality) and text analysis (scripts, common categories).
- Versatile Output: Reports can be exported as HTML files, JSON strings, or rendered as interactive widgets directly within Jupyter Notebooks.
- Scalability: Includes support for PySpark to handle larger datasets.
- Integration: Connects with tools like Great Expectations, Streamlit, Dash, and workflow orchestrators like Airflow and Kedro.