๐Ÿ“– FileScope User Guide

โ† Back to FileScope

A step-by-step guide to profiling your Parquet, Arrow, Feather, and Lance files โ€” entirely in the browser.

1 Open FileScope

Go to filescope.app in any modern browser โ€” Chrome, Firefox, Safari, or Edge. No installation or sign-up needed.

2 Load Your File

Drag & drop a .parquet or .lance file onto the upload area. Or click to browse your file system. You can also try the built-in sample dataset to explore the features first.

Your file stays on your machine โ€” it's processed locally by DuckDB-WASM running in WebAssembly. Nothing is uploaded.

3 Explore the Overview

The Overview tab gives you a bird's-eye view:

4 Inspect the Structure

The Structure tab reveals the internal Parquet file layout:

5 Analyze Individual Columns

The Columns tab provides per-column deep dives:

Use the filter buttons to focus on specific column types.

6 Check Correlations

The Correlations tab displays a Pearson correlation heatmap for all numeric columns. This helps identify relationships between variables โ€” useful for feature selection in machine learning or spotting data quality issues.

7 Preview Raw Data

The Data Preview tab shows the first 100 rows in a sortable table. Click any column header to sort.

8 Export Your Results

Two export options are available from the header:

Supported File Formats

Apache Parquet (.parquet)

The most popular columnar storage format for big data. Used by Spark, Hive, Presto, DuckDB, pandas, Polars, and most data engineering tools. FileScope supports Parquet files with SNAPPY, GZIP, ZSTD, LZ4, and uncompressed encodings.

Lance (.lance)

A modern columnar format designed for ML workflows and vector search. Used by LanceDB. FileScope provides basic profiling for Lance files including schema and statistics.

Apache Arrow IPC (.arrow) & Feather (.feather)

Arrow IPC is the native serialization format for Apache Arrow, used by Polars, DuckDB, pandas, and many modern data tools for zero-copy data exchange. Feather (v2) is an alias for Arrow IPC. FileScope provides full profiling including schema, column statistics, distributions, and correlations โ€” powered by DuckDB-WASM's native Arrow support.

Performance Tips

Frequently Asked Questions

Is my data safe?
Yes. FileScope runs 100% in your browser. Your files are never uploaded to any server. All processing happens locally using DuckDB compiled to WebAssembly.
Do I need to install anything?
No. FileScope works in any modern browser. No downloads, plugins, or sign-ups required.
Can I use FileScope offline?
After the first load, the core libraries are cached by your browser. However, a network connection is needed for the initial load of DuckDB-WASM (~10MB) and ECharts.
What's the maximum file size?
There's no hard limit โ€” it depends on your browser's available memory. Files up to 500MB typically work fine. For very large files (1GB+), performance depends on your device.
Can I profile CSV or JSON files?
Currently FileScope supports Parquet and Lance formats only. For CSV, consider converting to Parquet first using tools like DuckDB CLI (COPY 'file.csv' TO 'file.parquet') or pandas (df.to_parquet()).
How is this different from pandas-profiling?
FileScope runs in the browser with zero setup โ€” no Python, pip, or Jupyter needed. It's faster for quick file inspection. For deeper analysis with custom code, pandas-profiling (now ydata-profiling) is still excellent.

โ† Try FileScope now ยท About ยท Privacy