spreadsheets are the default, but they fall over quickly excel and google sheets slow down, cap rows, and push you into sampling on large files there is no clean pipeline to replay, and sharing a reviewable workflow is awkward
sql fixes the scale but makes exploration clunky you have to manage scripts and query history, and non-technical analysts get blocked eda is tedious: every drill-down is another query, pivot summaries are verbose, and you hop between text and results
bi tools like tableau help, but they are heavy they take time to set up, expect curated data sources, and are overkill for a quick “what’s going on here” pass
there is a gap between “open in excel” and “set up a proper data pipeline” most of the time you just want to filter columns, join tables, or run a quick aggregation without the overhead
the approach
what if the browser itself was the database
modern browsers can run webassembly at near-native speed duckdb is an embedded analytical database that compiles to wasm combine these and you get sql analytics that run entirely client-side with no server dependencies
what repere does
repere is an in-browser data explorer for datasets too large for spreadsheets it’s fully open source under the mit license
drop a csv, parquet, or excel file into the browser and start querying no uploads, no accounts, no waiting for a server to process your data
the core idea is a visual pipeline where each transformation creates a view filter some rows, that’s a view join two tables, another view group and aggregate, another view these chain together as a directed acyclic graph that you can explore and modify
the pipeline
the pipeline is the central abstraction every dataset you load becomes a node, every operation you apply creates a new node connected to its parent
this forms a directed acyclic graph1 where data flows from sources through transformations you can branch the pipeline, apply different operations to the same source, join branches back together
importantly, nothing is materialized until you need it each view is just a sql query definition when you scroll the grid or run an export, duckdb executes the full chain on demand this means you can build complex multi-step transformations without copying data at each step
the visual canvas shows the full graph you can click any node to see its data, delete nodes to remove transformations, or branch off in new directions deleting a node cascades to its descendants since they depend on it
operations
repere supports 17 transformations organized into categories
querying: filter rows with conditions, sort by columns, limit results
column manipulation: select columns, add computed columns with sql expressions, rename, reorder, cast types
aggregation: group by with sum/avg/count/min/max/median, pivot tables with subtotals, unpivot columns to rows
combining data: inner/left/right/full/cross joins, union tables, deduplicate with distinct
data cleaning: fill null values (forward fill, backward fill, mean/median/mode), find and replace, edit individual cells
every operation is just sql under the hood duckdb-wasm creates views that compose together without materializing intermediate results this means you can build deep pipelines without running out of memory
pivot tables
pivot tables deserve special mention because they’re more than just an aggregation
you can configure row fields, column fields, and multiple value aggregations with subtotals and grand totals the result is an interactive view where you can expand and collapse groups
clicking any cell drills down to the underlying data repere creates a new filtered view showing exactly the rows that contributed to that cell this makes it easy to investigate outliers or verify aggregations
if you want to continue transforming the pivot output, there’s a “flatten to table” button this materializes the pivot into a regular table that becomes a new node in the pipeline from there you can apply any other operation: filter, join, export
sql editor
for users who prefer writing queries directly, there’s a sql panel
the editor has autocomplete that knows your schema type a table name and it suggests columns type part of a column name and it shows matches across all tables with their types
there’s a format button that cleans up messy sql, and queries run instantly with results shown alongside the editor if you like the result, one click creates a new node from that query
you can also edit existing sql nodes open a node’s sql, modify it, and update in place descendants automatically reflect the change
the grid
the data grid uses virtualization to render only visible rows this means you can scroll through millions of rows without the browser choking
columns show sparkline histograms for quick distribution checks you can pin columns, resize them, drag to reorder, hide the ones you don’t need
selection works like a spreadsheet click and drag across cells to see quick stats: sum, average, count, median, unique values
privacy first
nothing ever leaves your browser
the files you load stay on your machine queries run locally in webassembly there’s no server logging what you’re doing, no data being sent anywhere
this matters when you’re exploring sensitive data financial records, customer data, anything you wouldn’t want to upload to a random web service
sharing sessions
sessions can be exported and shared in two ways
as a file: export a .repere file that contains the full pipeline structure, all view definitions, and optionally the source data itself
you choose which datasets to embed
small files get included directly, large files are referenced by name and schema
as a url: for smaller sessions, generate a shareable link the pipeline is compressed and encoded into the url hash no server involved, the entire session lives in the link
when someone opens a shared session, they see the full pipeline on a visual canvas if the data was embedded, everything loads immediately if not, they’re prompted to provide the missing files
replaying with different data
the pipeline isn’t tied to the exact bytes of your original files
when you share a session without embedding data, recipients upload their own copies repere validates that the schema matches (column names and compatible types) but doesn’t require identical data
this means you can build a pipeline once and replay it on updated data create your transformations on january’s sales data, share the session, and your colleague runs the same pipeline on february’s data
schema validation is lenient on purpose numeric types are interchangeable (int, bigint, float all work together) string types are interchangeable the system blocks only genuinely incompatible changes like expecting a number but getting text
if a file is missing and you don’t have it, you can skip it repere creates a placeholder table with the right schema but no rows the pipeline structure stays intact, views depending on that table just show empty results you can provide the file later to fill things in
where this is going
repere starts as a fast local explorer, but the goal is a real bi tool that keeps the same local-first core2
the roadmap includes charts and dashboards, live connectors (postgres, bigquery, s3, google sheets), and time intelligence for rolling windows and cohort analysis there’s also a desktop app planned for larger files and native performance
for teams, the long-term plan is collaboration and enterprise features cloud sync, shareable dashboards, access control, audit logs, sso, and on-prem or vpc deployment
there will be a paid tier similar to excalidraw, but the local-first experience will always be free and serverless
summary
repere fills the gap between spreadsheets and proper database tooling
it handles files that are too big for excel but too small to justify infrastructure everything runs in your browser so there’s no setup and no privacy concerns
the pipeline model means your work is reproducible share a session with embedded data for a self-contained analysis, or share just the pipeline and let others bring their own data
useful for quick data exploration when you want sql power without the overhead
check it out at repere.ai