๐ FileScope User Guide
โ Back to FileScope
A step-by-step guide to profiling your Parquet, Arrow, Feather, and Lance files โ entirely in the browser.
1 Open FileScope
Go to filescope.app in any modern browser โ Chrome, Firefox, Safari, or Edge. No installation or sign-up needed.
2 Load Your File
Drag & drop a .parquet or .lance file onto the upload area. Or click to browse your file system. You can also try the built-in sample dataset to explore the features first.
Your file stays on your machine โ it's processed locally by DuckDB-WASM running in WebAssembly. Nothing is uploaded.
3 Explore the Overview
The Overview tab gives you a bird's-eye view:
- Row count โ total number of records
- Column count โ number of fields
- File size โ compressed size on disk
- Column type distribution โ pie chart showing numeric, categorical, date, and boolean columns
- Missing values โ bar chart highlighting columns with nulls
4 Inspect the Structure
The Structure tab reveals the internal Parquet file layout:
- File tree โ visualize row groups and column chunks
- Row groups โ size and row count per group
- Compression & encoding โ what codecs are used (SNAPPY, ZSTD, PLAIN, etc.)
- Optimization recommendations โ suggestions to reduce file size or improve read performance
5 Analyze Individual Columns
The Columns tab provides per-column deep dives:
- Numeric columns โ min, max, mean, median, standard deviation, distribution histogram
- Categorical columns โ top values, cardinality, frequency chart
- Date columns โ date range, distribution over time
- Boolean columns โ true/false/null counts
Use the filter buttons to focus on specific column types.
6 Check Correlations
The Correlations tab displays a Pearson correlation heatmap for all numeric columns. This helps identify relationships between variables โ useful for feature selection in machine learning or spotting data quality issues.
7 Preview Raw Data
The Data Preview tab shows the first 100 rows in a sortable table. Click any column header to sort.
8 Export Your Results
Two export options are available from the header:
- ๐ Export HTML โ downloads a standalone HTML report you can share with colleagues
- ๐ Export CSV โ exports column statistics as a CSV file for further analysis in Excel or pandas
Supported File Formats
Apache Parquet (.parquet)
The most popular columnar storage format for big data. Used by Spark, Hive, Presto, DuckDB, pandas, Polars, and most data engineering tools. FileScope supports Parquet files with SNAPPY, GZIP, ZSTD, LZ4, and uncompressed encodings.
Lance (.lance)
A modern columnar format designed for ML workflows and vector search. Used by LanceDB. FileScope provides basic profiling for Lance files including schema and statistics.
Apache Arrow IPC (.arrow) & Feather (.feather)
Arrow IPC is the native serialization format for Apache Arrow, used by Polars, DuckDB, pandas, and many modern data tools for zero-copy data exchange. Feather (v2) is an alias for Arrow IPC. FileScope provides full profiling including schema, column statistics, distributions, and correlations โ powered by DuckDB-WASM's native Arrow support.
Performance Tips
- Files up to ~500MB work well in most browsers
- Chrome and Edge tend to handle larger files better due to higher WASM memory limits
- If a file is very large, consider sampling it first with DuckDB or pandas before profiling
- Close other memory-heavy tabs for best performance
Frequently Asked Questions
- Is my data safe?
- Yes. FileScope runs 100% in your browser. Your files are never uploaded to any server. All processing happens locally using DuckDB compiled to WebAssembly.
- Do I need to install anything?
- No. FileScope works in any modern browser. No downloads, plugins, or sign-ups required.
- Can I use FileScope offline?
- After the first load, the core libraries are cached by your browser. However, a network connection is needed for the initial load of DuckDB-WASM (~10MB) and ECharts.
- What's the maximum file size?
- There's no hard limit โ it depends on your browser's available memory. Files up to 500MB typically work fine. For very large files (1GB+), performance depends on your device.
- Can I profile CSV or JSON files?
- Currently FileScope supports Parquet and Lance formats only. For CSV, consider converting to Parquet first using tools like DuckDB CLI (
COPY 'file.csv' TO 'file.parquet') or pandas (df.to_parquet()).
- How is this different from pandas-profiling?
- FileScope runs in the browser with zero setup โ no Python, pip, or Jupyter needed. It's faster for quick file inspection. For deeper analysis with custom code, pandas-profiling (now ydata-profiling) is still excellent.
โ Try FileScope now ยท About ยท Privacy