- Download the firmware package: voicebox-g2-qual-firmware.zip
- Unzip it. Connect the device via USB-C.
- Run
./flash.sh(Mac/Linux) orflash.bat(Windows). The script installs esptool if needed, finds the device, and flashes automatically. - Repeat for each device.
- If the device already has qual firmware and is on the network, go to the Flash & Serial tab.
- Use the OTA Update section to push new firmware over WiFi — no USB needed.
- Connect the device via USB-C.
- Go to the Flash & Serial tab and click Connect USB.
- In the Serial Console, type:
WIFI_SET <ssid> <password> - Wait ~5 seconds, then type
STATUS. Note the IP address shown. - Repeat for each device. You can disconnect USB after WiFi is configured.
STATUS to find the IP, or use Scan.No USB available? Connect your computer to the device's AP network (VoiceBoxG2-Qual) at
http://192.168.4.1, then use the WiFi Network section on the Flash & Serial tab to join your lab network.
- Click 🔌 Manage Connections in the sidebar (left panel).
- Click 🔍 Scan to auto-discover devices on your network, or click + WiFi Device to add by IP address.
- Each device appears in the list with a green dot when connected. Use the checkbox to include/exclude devices from captures.
- Select the mic (Left/Right), sample rate, and MCLK for each device.
- Close the modal. Your devices appear in the sidebar — all checked devices will be captured simultaneously.
- Ensure the test environment is quiet (no signal playing).
- Set the Duration in the sidebar (10s is standard).
- Click 🎤 Capture in the sidebar. All active devices capture simultaneously.
- Results appear in the Analysis tab — FFT spectrum, noise floor, tonal spurs.
- Select captures in the sidebar to overlay and compare across devices.
- Plug your USB reference mic (e.g. UMIK-2) into this computer.
- Go to the SNR tab and click 🔄 Refresh Devices. Allow microphone access when prompted.
- Select your Ref Mic and Speaker from the dropdowns.
- Position the speaker and all mics at the test distance.
- Click ▶▶ Run Full SNR Test. The tool captures noise floor (signal off), then signal (signal on) from all devices + ref mic.
- Results show per-device and per-band SNR with a direct comparison chart.
🔌 USB Serial Connection
🔌 Serial Console
📶 WiFi Network
Join a WiFi network so the device is reachable from your LAN. Requires USB serial connection. Saved to device NVS.
⚡ Flash Firmware
Flashes pre-built firmware to the connected device via USB. Uses encrypted flash (esptool).
Flash Log
📡 OTA Update (WiFi)
Push firmware to a connected device over WiFi — no USB cable needed. Device must be reachable on the network.
System Log
SNR results from captures run via the sidebar. Configure signal, ref mic, and speaker in the Capture panel.
📊 SNR Results
Spectrum Comparison ⓘ
Run History
▶ ⚙ Processing
| Filter | Settings | ||
|---|---|---|---|
| ⓘ | Gain | ||
Digital gain applied after all filters. Amplifies the signal — useful for quiet captures. Clips at 0 dBFS. | |||
| ⓘ | Subtract mean | ||
Subtracts the mean value from the entire capture, centering the waveform at zero. Removes DC offset introduced by the ADC or analog front-end. | |||
| ⓘ | |||
Single-pole IIR high-pass filter. Removes low-frequency content below the cutoff — mechanical rumble, HVAC hum, wind noise. Set to 100 Hz for typical voice-band analysis. | |||
| ⓘ | |||
Single-pole IIR low-pass filter. Removes high-frequency content above the cutoff. Useful for isolating low-frequency noise sources or focusing on the voice band (set to 4k). | |||
| ⓘ | Hz | ||
Biquad band-reject filter that removes a single frequency. Use for mains hum (50/60 Hz), clock spurs, or any known tonal interference. Q controls width — Tight removes only the exact frequency, Narrow takes out nearby content too. | |||
| ⓘ | IEC 61672 perceptual curve | ||
IEC 61672 A-weighting curve applied in the frequency domain. Shapes the spectrum to match human hearing perception — attenuates sub-bass and ultra-high frequencies, boosts 2–5 kHz. Use when comparing to spec sheets that quote dBA values. | |||
FFT Spectrum
▶ Spectrogram
Summary
| RMS | — |
| Peak | — |
| DC Offset | — |
| Noise Floor | — |
| Spectral Slope | — |
| Tonal Spurs | — |
Tonal Peaks
| Freq | Level | Prominence | Harmonic? |
|---|
Band Energy
| Band | Range | Energy | Δ vs Ref |
|---|
Export, import, and manage your capture database. All captures are stored in your browser's IndexedDB.
📦 Database
Export the full database (all captures, analysis, and metadata) as a JSON file. Import to restore on another machine or browser.
📂 Import WAV Files
Import WAV audio files as captures. Files with embedded metadata (exported from this tool) will restore all settings and session info.
🗑 Clear Data
Permanently delete all captures, groups, and analysis data from this browser. This cannot be undone.
📊 Statistics
Device API Specification
# Microphone Analysis Tool — Device API Specification
Implement these HTTP endpoints on any WiFi-capable microcontroller (ESP32, etc.)
to make it compatible with the Microphone Analysis Tool web UI.
All endpoints must include CORS headers:
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type
All POST endpoints need a matching OPTIONS handler returning 204.
---
## REQUIRED: GET /api/discover
Device identification and capabilities. Called once on connect.
Response (application/json):
```json
{
"type": "my-device-type",
"mac": "AA:BB:CC:DD:EE:FF",
"hostname": "my-device-eeff",
"fw": "1.0.0",
"board": "esp32s3-custom",
"sta_ip": "192.168.0.100",
"running": false,
"mics": [
{"id": "main", "label": "Built-in MEMS Mic", "channel": 0}
],
"capabilities": {
"sample_rates": [16000, 48000],
"gains": [0, 6, 12, 18, 24, 30, 36, 40],
"hp_cutoffs": [80, 100, 120, 150, 200],
"dc_remove": true,
"hp_filter": true,
"subsystems": [
{"id": "wifi", "label": "WiFi", "toggle": true, "default": true},
{"id": "amp", "label": "Amplifier", "toggle": true, "default": false}
]
}
}
```
Fields:
- type: string identifier for your device
- mac: device MAC address
- hostname: mDNS hostname (without .local)
- fw: firmware version string
- board: hardware revision
- sta_ip: WiFi station IP (empty string if AP-only)
- running: true if a capture is in progress
- mics: array of available microphones
- id: string passed as ?mic= parameter to /api/mic_test
- label: human-readable name shown in the UI
- channel: numeric channel index
- capabilities: (optional) what settings the device supports
- sample_rates: array of supported sample rates in Hz
- gains: array of supported gain values in dB (UI shows these as processing options)
- hp_cutoffs: array of HP filter cutoff frequencies in Hz
- dc_remove: boolean — show DC removal option in UI
- hp_filter: boolean — show HP filter option in UI (also implied by hp_cutoffs)
- dsr: array of {value, label} for PDM downsample ratio (PDM mics only)
- mclk: array of MCLK multiples (PDM mics only)
- subsystems: array of toggleable hardware subsystems
- id: used in /api/toggle/{id} endpoint
- label: shown in UI
- toggle: true if the UI can toggle it, false for observe-only
- default: initial checkbox state (true = on at boot)
If capabilities is omitted, the UI shows default controls.
Only include fields your device actually supports.
---
## REQUIRED: GET /api/mic_test
Capture audio from the microphone and return a WAV file.
Query parameters:
- mic: (string) mic id from the mics array in /api/discover
- duration: (int) capture duration in seconds, 1-30, default 10
Response: audio/wav
- Standard 44-byte RIFF WAV header
- Mono (1 channel)
- 16-bit signed PCM, little-endian
- Sample rate matching the device's current setting
IMPORTANT: Return RAW audio. Do NOT apply gain, DC removal, or filtering.
All signal processing is done client-side in the web UI. This enables
non-destructive analysis where users can change processing after capture.
Implementation pattern:
1. Parse ?mic= and ?duration= from query string
2. Initialize audio peripheral (I2S, PDM, etc.)
3. Discard 500ms-1000ms of samples (filter settling)
4. Capture (sample_rate * duration) samples into a buffer
5. Deinitialize audio peripheral
6. Send WAV header (44 bytes) with correct sample rate and data size
7. Send raw PCM data in chunks
For devices without PSRAM, use streaming: capture a chunk, send it,
capture the next chunk. Only needs ~4KB buffer regardless of duration.
---
## OPTIONAL: GET /api/state
Device status, polled periodically.
Response (application/json):
```json
{
"mac": "AA:BB:CC:DD:EE:FF",
"running": false,
"current_test": -1,
"current_run": -1,
"overall": "PENDING",
"results": [],
"metrics": [],
"log": "",
"syslog": ""
}
```
The key field is "running" — the UI disables capture while true.
All other fields can be empty/default if not implementing test suites.
---
## OPTIONAL: GET/POST /api/pdm_tuning
Read and write device settings.
GET response (application/json):
```json
{"sample_rate": 16000, "mic_sel": 0}
```
POST: settings via query parameters. Only include what your device supports.
- rate: sample rate in Hz
- mic: mic channel index (0, 1, ...)
- dsr: downsample ratio (0 or 1)
- mclk: MCLK multiple (128, 256, 384, 512)
Example: POST /api/pdm_tuning?rate=48000&mic=0
---
## OPTIONAL: POST /api/toggle/{id}?on=0|1
Toggle a hardware subsystem. The {id} matches subsystem id from capabilities.
Response (application/json):
```json
{"wifi": "off"}
```
---
## MINIMAL IMPLEMENTATION CHECKLIST
For a basic compatible device, implement:
1. GET /api/discover — return type, mics array, optional capabilities
2. GET /api/mic_test?mic={id}&duration={sec} — capture raw audio, return WAV
3. CORS headers on all responses
4. OPTIONS handlers for POST endpoints
The web UI reads the sample rate from the WAV header automatically.
No capabilities = default UI controls shown.
No /api/state = UI works but won't show device status.
No /api/pdm_tuning = settings can't be pushed to device.
No /api/toggle = subsystem toggles won't work.