Tuesday, July 25, 2023

Live microphone visualization (waveform) with sounddevice

 *** This post was made by ChatGPT***


Are you interested in visualizing live audio data from your microphone? Do you want to see the waveform of your voice or any other sound in real time? In this blog post, we’ll explore a Python script that utilizes Matplotlib to plot live microphone signals. This script is a useful tool for understanding and analyzing audio input in real time.

Before we begin, make sure you have sounddevice, Matplotlib, and NumPy installed. If not, you can install them using the following command:

pip install matplotlib numpy sounddevice

Now, let’s dive into the code and see how it works.

The Code

#!/usr/bin/env python3
"""Plot the live microphone signal(s) with matplotlib.

Matplotlib and NumPy have to be installed.

"""
import argparse
import queue
import sys

from matplotlib.animation import FuncAnimation
import matplotlib.pyplot as plt
import numpy as np
import sounddevice as sd

The script starts with the usual shebang (#!/usr/bin/env python3) and a brief docstring explaining the purpose of the code. It also imports the necessary modules: argparse, queue, sys, FuncAnimation from matplotlib.animation, plt (alias for matplotlib.pyplot), numpy, and sounddevice.

Next, the code defines two helper functions and two main functions.

def int_or_str(text):
    """Helper function for argument parsing."""
    try:
        return int(text)
    except ValueError:
        return text


def audio_callback(indata, frames, time, status):
    """This is called (from a separate thread) for each audio block."""
    if status:
        print(status, file=sys.stderr)
    # Fancy indexing with mapping creates a (necessary!) copy:
    q.put(indata[::args.downsample, mapping])

The int_or_str function is a helper used for parsing command-line arguments. It tries to convert the input text to an integer and returns it if successful; otherwise, it returns the input text as it is.

The audio_callback function is called for each audio block received from the microphone. It receives indata (the audio data), frames (the number of frames), time (the timestamp of the audio data), and status (the status of the audio stream). It prints any status messages to the standard error and puts a copy of the audio data (filtered using downsampling and channel mapping) into a queue (q) for processing later.

def update_plot(frame):
    """This is called by matplotlib for each plot update.

    Typically, audio callbacks happen more frequently than plot updates,
    therefore the queue tends to contain multiple blocks of audio data.

    """
    global plotdata
    while True:
        try:
            data = q.get_nowait()
        except queue.Empty:
            break
        shift = len(data)
        plotdata = np.roll(plotdata, -shift, axis=0)
        plotdata[-shift:, :] = data
    for column, line in enumerate(lines):
        line.set_ydata(plotdata[:, column])
    return lines

The update_plot function is called by Matplotlib for each plot update. It retrieves audio data from the queue (q) and shifts the existing data to accommodate the new audio block. The function then updates the y-data of the lines on the plot with the new audio data.

if __name__ == "__main__":
    # ... (continued in the next code block)

The script uses the standard Python if __name__ == "__main__": guard to ensure that the following code is only executed when the script is run directly, not when it’s imported as a module.

    parser = argparse.ArgumentParser(add_help=False)
    parser.add_argument(
        '-l', '--list-devices', action='store_true',
        help='show list of audio devices and exit')
    args, remaining = parser.parse_known_args()
    if args.list_devices:
        print(sd.query_devices())
        parser.exit(0)

The code sets up an argument parser with argparse to handle command-line arguments. It allows the user to list available audio devices and exit the program without running the main functionality. If the user specifies the --list-devices flag, the script will print a list of audio devices using sd.query_devices() and then exit.

    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        parents=[parser])
    parser.add_argument(
        'channels', type=int, default=[1], nargs='*', metavar='CHANNEL',
        help='input channels to plot (default: the first)')
    parser.add_argument(
        '-d', '--device', type=int_or_str,
        help='input device (numeric ID or substring)')
    parser.add_argument(
        '-w', '--window', type=float, default=200, metavar='DURATION',
        help='visible time slot (default: %(default)s ms)')
    parser.add_argument(
        '-i', '--interval', type=float, default=30,
        help='minimum time between plot updates (default: %(default)s ms)')
    parser.add_argument(
        '-b', '--blocksize', type=int, help='block size (in samples)')
    parser.add_argument(
        '-sr', '--samplerate', type=float, default=16000, help='sampling rate of audio device')
    parser.add_argument(
        '-n', '--downsample', type=int, default=1, metavar='N',
        help='No downsample (default: %(default)s)')
    args = parser.parse_args(remaining)

The script creates another argument parser, this time with a description based on the script’s docstring. It defines several command-line arguments:

  • channels: The channels to plot. If not specified, it will default to the first channel.
  • device: The input audio device to use. It can be specified either by a numeric ID or a substring of the device name.
  • window: The visible time slot in milliseconds. This controls how much of the audio history is displayed on the plot.
  • interval: The minimum time between plot updates in milliseconds.
  • blocksize: The block size (number of samples) for audio processing. If not specified, the default block size of the audio stream will be used.
  • samplerate: The sampling rate of the audio device. If not specified, it will default to 16000 Hz.
  • downsample: The factor by which the audio data is downsampled. By default, no downsampling is applied.

The parse_args method is called to parse the remaining command-line arguments (remaining) after handling the --list-devices option.

    if any(c < 1 for c in args.channels):
        parser.error('argument CHANNEL: must be >= 1')
    mapping = [c - 1 for c in args.channels]  # Channel numbers start with 1
    q = queue.Queue()

The code checks if any of the specified channels are less than 1. If so, it raises an error with an appropriate message. It then creates a mapping list for the channel indices, as the channel numbers in args.channels start from 1.

Full code is listed below. Actually, it is based on an example from sounddevice documentation [1].
#!/usr/bin/env python3
"""Plot the live microphone signal(s) with matplotlib.

Matplotlib and NumPy have to be installed.

"""
import argparse
import queue
import sys

from matplotlib.animation import FuncAnimation
import matplotlib.pyplot as plt
import numpy as np
import sounddevice as sd


def int_or_str(text):
    """Helper function for argument parsing."""
    try:
        return int(text)
    except ValueError:
        return text


def audio_callback(indata, frames, time, status):
    """This is called (from a separate thread) for each audio block."""
    if status:
        print(status, file=sys.stderr)
    # Fancy indexing with mapping creates a (necessary!) copy:
    q.put(indata[::args.downsample, mapping])


def update_plot(frame):
    """This is called by matplotlib for each plot update.

    Typically, audio callbacks happen more frequently than plot updates,
    therefore the queue tends to contain multiple blocks of audio data.

    """
    global plotdata
    while True:
        try:
            data = q.get_nowait()
        except queue.Empty:
            break
        shift = len(data)
        plotdata = np.roll(plotdata, -shift, axis=0)
        plotdata[-shift:, :] = data
    for column, line in enumerate(lines):
        line.set_ydata(plotdata[:, column])
    return lines


if __name__ == "__main__":

    parser = argparse.ArgumentParser(add_help=False)
    parser.add_argument(
        '-l', '--list-devices', action='store_true',
        help='show list of audio devices and exit')
    args, remaining = parser.parse_known_args()
    if args.list_devices:
        print(sd.query_devices())
        parser.exit(0)
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        parents=[parser])
    parser.add_argument(
        'channels', type=int, default=[1], nargs='*', metavar='CHANNEL',
        help='input channels to plot (default: the first)')
    parser.add_argument(
        '-d', '--device', type=int_or_str,
        help='input device (numeric ID or substring)')
    parser.add_argument(
        '-w', '--window', type=float, default=200, metavar='DURATION',
        help='visible time slot (default: %(default)s ms)')
    parser.add_argument(
        '-i', '--interval', type=float, default=30,
        help='minimum time between plot updates (default: %(default)s ms)')
    parser.add_argument(
        '-b', '--blocksize', type=int, help='block size (in samples)')
    parser.add_argument(
        '-sr', '--samplerate', type=float, default=16000, help='sampling rate of audio device')
    parser.add_argument(
        '-n', '--downsample', type=int, default=1, metavar='N',
        help='No downsample (default: %(default)s)')
    args = parser.parse_args(remaining)
    if any(c < 1 for c in args.channels):
        parser.error('argument CHANNEL: must be >= 1')
    mapping = [c - 1 for c in args.channels]  # Channel numbers start with 1
    q = queue.Queue()

    try:
        if args.samplerate is None:
            device_info = sd.query_devices(args.device, 'input')
            args.samplerate = device_info['default_samplerate']

        length = int(args.window * args.samplerate / (1000 * args.downsample))
        plotdata = np.zeros((length, len(args.channels)))

        fig, ax = plt.subplots()
        lines = ax.plot(plotdata)
        if len(args.channels) > 1:
            ax.legend([f'channel {c}' for c in args.channels],
                      loc='lower left', ncol=len(args.channels))
        ax.axis((0, len(plotdata), -1, 1))
        ax.set_yticks([0])
        ax.yaxis.grid(True)
        ax.tick_params(bottom=False, top=False, labelbottom=False,
                       right=False, left=False, labelleft=False)
        ax.text(0.01, 0.99, f'Sample rate: {args.samplerate/args.downsample} Hz', transform=ax.transAxes, va='top', ha='left')

        fig.tight_layout(pad=0)

        stream = sd.InputStream(
            device=args.device, channels=max(args.channels),
            samplerate=args.samplerate, callback=audio_callback)
        ani = FuncAnimation(fig, update_plot, interval=args.interval, blit=True)
        with stream:
            plt.show()
            
    except Exception as e:
        parser.exit(type(e).__name__ + ': ' + str(e))
Save it as sd_plot_input.py (or whatever name.py) and run it with the following commands. See the video above for the sample output.
$ python3 sd_plot_input.py


Reference: 

  1. https://python-sounddevice.readthedocs.io/en/0.4.6/examples.html#plot-microphone-signal-s-in-real-time
Related Posts Plugin for WordPress, Blogger...