mm. | repere - exploring data without leaving the browser

spreadsheets are the default, but they fall over quickly excel and google sheets slow down, cap rows, and push you into sampling on large files there is no clean pipeline to replay, and sharing a reviewable workflow is awkward

sql fixes the scale but makes exploration clunky you have to manage scripts and query history, and non-technical analysts get blocked eda is tedious: every drill-down is another query, pivot summaries are verbose, and you hop between text and results

bi tools like tableau help, but they are heavy they take time to set up, expect curated data sources, and are overkill for a quick “what’s going on here” pass

there is a gap between “open in excel” and “set up a proper data pipeline” most of the time you just want to filter columns, join tables, or run a quick aggregation without the overhead

the approach

what if the browser itself was the database

modern browsers can run webassembly at near-native speed duckdb is an embedded analytical database that compiles to wasm combine these and you get sql analytics that run entirely client-side with no server dependencies

what repere does

repere is an in-browser data explorer for datasets too large for spreadsheets it’s fully open source under the mit license

drop a csv, parquet, or excel file into the browser and start querying no uploads, no accounts, no waiting for a server to process your data

the core idea is a visual pipeline where each transformation creates a view filter some rows, that’s a view join two tables, another view group and aggregate, another view these chain together as a directed acyclic graph that you can explore and modify

the pipeline

the pipeline is the central abstraction every dataset you load becomes a node, every operation you apply creates a new node connected to its parent

this forms a directed acyclic graph¹ where data flows from sources through transformations you can branch the pipeline, apply different operations to the same source, join branches back together

importantly, nothing is materialized until you need it each view is just a sql query definition when you scroll the grid or run an export, duckdb executes the full chain on demand this means you can build complex multi-step transformations without copying data at each step

the visual canvas shows the full graph you can click any node to see its data, delete nodes to remove transformations, or branch off in new directions deleting a node cascades to its descendants since they depend on it

operations

repere supports 17 transformations organized into categories

querying: filter rows with conditions, sort by columns, limit results

column manipulation: select columns, add computed columns with sql expressions, rename, reorder, cast types

aggregation: group by with sum/avg/count/min/max/median, pivot tables with subtotals, unpivot columns to rows

combining data: inner/left/right/full/cross joins, union tables, deduplicate with distinct

data cleaning: fill null values (forward fill, backward fill, mean/median/mode), find and replace, edit individual cells

every operation is just sql under the hood duckdb-wasm creates views that compose together without materializing intermediate results this means you can build deep pipelines without running out of memory

pivot tables

pivot tables deserve special mention because they’re more than just an aggregation

you can configure row fields, column fields, and multiple value aggregations with subtotals and grand totals the result is an interactive view where you can expand and collapse groups

clicking any cell drills down to the underlying data repere creates a new filtered view showing exactly the rows that contributed to that cell this makes it easy to investigate outliers or verify aggregations

if you want to continue transforming the pivot output, there’s a “flatten to table” button this materializes the pivot into a regular table that becomes a new node in the pipeline from there you can apply any other operation: filter, join, export

sql editor

for users who prefer writing queries directly, there’s a sql panel

the editor has autocomplete that knows your schema type a table name and it suggests columns type part of a column name and it shows matches across all tables with their types

there’s a format button that cleans up messy sql, and queries run instantly with results shown alongside the editor if you like the result, one click creates a new node from that query

you can also edit existing sql nodes open a node’s sql, modify it, and update in place descendants automatically reflect the change

the grid

the data grid uses virtualization to render only visible rows this means you can scroll through millions of rows without the browser choking

columns show sparkline histograms for quick distribution checks you can pin columns, resize them, drag to reorder, hide the ones you don’t need

selection works like a spreadsheet click and drag across cells to see quick stats: sum, average, count, median, unique values

privacy first

nothing ever leaves your browser

the files you load stay on your machine queries run locally in webassembly there’s no server logging what you’re doing, no data being sent anywhere

this matters when you’re exploring sensitive data financial records, customer data, anything you wouldn’t want to upload to a random web service

sessions can be exported and shared in two ways

as a file: export a .repere file that contains the full pipeline structure, all view definitions, and optionally the source data itself you choose which datasets to embed small files get included directly, large files are referenced by name and schema

as a url: for smaller sessions, generate a shareable link the pipeline is compressed and encoded into the url hash no server involved, the entire session lives in the link

when someone opens a shared session, they see the full pipeline on a visual canvas if the data was embedded, everything loads immediately if not, they’re prompted to provide the missing files

replaying with different data

the pipeline isn’t tied to the exact bytes of your original files

when you share a session without embedding data, recipients upload their own copies repere validates that the schema matches (column names and compatible types) but doesn’t require identical data

this means you can build a pipeline once and replay it on updated data create your transformations on january’s sales data, share the session, and your colleague runs the same pipeline on february’s data

schema validation is lenient on purpose numeric types are interchangeable (int, bigint, float all work together) string types are interchangeable the system blocks only genuinely incompatible changes like expecting a number but getting text

if a file is missing and you don’t have it, you can skip it repere creates a placeholder table with the right schema but no rows the pipeline structure stays intact, views depending on that table just show empty results you can provide the file later to fill things in

where this is going

repere starts as a fast local explorer, but the goal is a real bi tool that keeps the same local-first core²

the roadmap includes charts and dashboards, live connectors (postgres, bigquery, s3, google sheets), and time intelligence for rolling windows and cohort analysis there’s also a desktop app planned for larger files and native performance

for teams, the long-term plan is collaboration and enterprise features cloud sync, shareable dashboards, access control, audit logs, sso, and on-prem or vpc deployment

there will be a paid tier similar to excalidraw, but the local-first experience will always be free and serverless

summary

repere fills the gap between spreadsheets and proper database tooling

it handles files that are too big for excel but too small to justify infrastructure everything runs in your browser so there’s no setup and no privacy concerns

the pipeline model means your work is reproducible share a session with embedded data for a self-contained analysis, or share just the pipeline and let others bring their own data

useful for quick data exploration when you want sql power without the overhead

check it out at repere.ai

a graph with no cycles; transformations only flow forward↩︎
data stays on your machine by default; no account or server required↩︎

repere - exploring data without leaving the browser