How to convert HTML to JSON

Sheetize HtmlConverter for .NET makes it easy to parse an HTML file and emit a structured JSON representation of the spreadsheet data it contains. The converter handles inline styles, embedded images, and complex tables, producing clean JSON that can be consumed by web APIs, data‑pipelines, or front‑end applications.

Why Convert HTML → JSON?

  • JSON is language‑agnostic and perfect for transmitting tabular data over HTTP.
  • Allows you to reuse HTML‑based reports as data sources for dashboards, machine‑learning models, or mobile apps.
  • Keeps the original visual layout in the HTML while exposing the underlying cell values, formulas, and metadata in a programmatic form.

Core Feature Set

  • Full table extraction – rows, columns, merged cells, and styles are captured.
  • Asset handling – images and media are either base64‑encoded or stored as separate files referenced in the JSON.
  • Customizable output – choose between a compact flat structure or a hierarchical workbook model.

Conversion Workflow (HTML → JSON)

  1. Create the Converter – instantiate HtmlConverter.
  2. Set Load Options – point to the source HTML file and optionally define the base URI for linked resources.
  3. Configure Save Options – use HtmlSaveOptions to select the JSON schema, embed resources, and set the output path.
  4. Run the Process – call HtmlConverter.Process(loadOptions, saveOptions).

Code Example – HTML to JSON with Embedded Images

using Sheetize;

var loadOptions = new LoadOptions
{
InputFile = @"C:\Docs\Report.html"
};

var saveOptions = new HtmlSaveOptions
{
OutputFile = @"C:\Output\Report.json"
};

HtmlConverter.Process(loadOptions, saveOptions);

How the JSON Looks (simplified)

{
"sheets": [{
"name": "Sheet1",
"rows": [{
"cells": [{
"address": "A1",
"value": "Title",
"style": { "fontWeight": "bold" }
}, {
"address": "B1",
"value": "Image",
"image": "data:image/png;base64,iVBORw0KG..."
}]
}]
}]
}

Advanced Tips

  • Selective Extraction – set HtmlLoadOptions.IncludeElements = new[] { "table", "img" } to ignore unrelated markup.
  • Performance – for large HTML files, enable ParallelProcessing = true in JsonSaveOptions.
  • Custom Serialization – implement IJsonConverter to transform cell values (e.g., dates to ISO‑8601).

Expanded Format Support

Aside from JSON, the same HtmlConverter can target MHTML, CSV, EPUB, AZW3, and even back to XLSX. This makes it a universal bridge for data exchange between web, e‑book, and spreadsheet ecosystems.

With these steps you can reliably turn any HTML report into clean, consumable JSON using Sheetize’s HtmlConverter.

 English