How to convert HTML to JSON
Sheetize HtmlConverter for .NET makes it easy to parse an HTML file and emit a structured JSON representation of the spreadsheet data it contains. The converter handles inline styles, embedded images, and complex tables, producing clean JSON that can be consumed by web APIs, data‑pipelines, or front‑end applications.
Why Convert HTML → JSON?
- JSON is language‑agnostic and perfect for transmitting tabular data over HTTP.
- Allows you to reuse HTML‑based reports as data sources for dashboards, machine‑learning models, or mobile apps.
- Keeps the original visual layout in the HTML while exposing the underlying cell values, formulas, and metadata in a programmatic form.
Core Feature Set
- Full table extraction – rows, columns, merged cells, and styles are captured.
- Asset handling – images and media are either base64‑encoded or stored as separate files referenced in the JSON.
- Customizable output – choose between a compact flat structure or a hierarchical workbook model.
Conversion Workflow (HTML → JSON)
- Create the Converter – instantiate
HtmlConverter. - Set Load Options – point to the source HTML file and optionally define the base URI for linked resources.
- Configure Save Options – use
HtmlSaveOptionsto select the JSON schema, embed resources, and set the output path. - Run the Process – call
HtmlConverter.Process(loadOptions, saveOptions).
Code Example – HTML to JSON with Embedded Images
using Sheetize;
var loadOptions = new LoadOptions
{
InputFile = @"C:\Docs\Report.html"
};
var saveOptions = new HtmlSaveOptions
{
OutputFile = @"C:\Output\Report.json"
};
HtmlConverter.Process(loadOptions, saveOptions);How the JSON Looks (simplified)
{
"sheets": [{
"name": "Sheet1",
"rows": [{
"cells": [{
"address": "A1",
"value": "Title",
"style": { "fontWeight": "bold" }
}, {
"address": "B1",
"value": "Image",
"image": "data:image/png;base64,iVBORw0KG..."
}]
}]
}]
}Advanced Tips
- Selective Extraction – set
HtmlLoadOptions.IncludeElements = new[] { "table", "img" }to ignore unrelated markup. - Performance – for large HTML files, enable
ParallelProcessing = trueinJsonSaveOptions. - Custom Serialization – implement
IJsonConverterto transform cell values (e.g., dates to ISO‑8601).
Expanded Format Support
Aside from JSON, the same HtmlConverter can target MHTML, CSV, EPUB, AZW3, and even back to XLSX. This makes it a universal bridge for data exchange between web, e‑book, and spreadsheet ecosystems.
With these steps you can reliably turn any HTML report into clean, consumable JSON using Sheetize’s HtmlConverter.