Audio::FindChunks v0.03 available
From: Ilya Zakharevich (nospam-abuse_at_ilyaz.org)
Date: 04/29/04
- Previous message: Ilya Zakharevich: "MP3::Splitter 0.02 released"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 29 Apr 2004 02:56:05 +0000 (UTC)
NAME
Audio::FindChunks - breaks audio files into sound/silence parts.
SYNOPSIS
use Audio::FindChunks;
# Duplicate input to output, caching RMS values to a file (as a side effect)
Audio::FindChunks->new(rms_filename => 'x.rms', filter => 1)->get('rms_data');
# Output human-readable info, using RMS cache file 'xxx.rms' if present:
Audio::FindChunks->new(cache_rms => 1, filename => 'xxx.mp3',
stem_strip_extension => 1)->output_blocks();
# Remove start/end silence (if longer than 0.2sec):
Audio::FindChunks->new(cache_rms => 1, filename => 'xxx.mp3',
min_actual_silence_sec => 1e100)->split_file();
# Split a multiple-sides tape recording
Audio::FindChunks->new(filename => 'xxx.mp3', min_actual_silence_sec => 11
)->split_file({verbose => 1});
DESCRIPTION
Audio sequence is broken into parts which contain only noise ("gaps"),
and parts with usable signal ("tracks").
The following configuration settings (and defaults) are supported:
# For getting PCM flow (and if averaging data is read from cache)
frequency => 44100, # If 'raw_pcm' or 'override_header_info' only
bytes_per_sample => 4, # likewise
channels => 2, # likewise
sizedata => MY_INF, # likewise (how many bytes of PCM to read)
out_fh => \*STDOUT, # mirror WAV/PCM to this FH if 'filter'
# Process non-WAV data:
preprocess => {mp3 => [[qw(lame --silent --decode)], [], ['-']]}, # Second contains extra args to read stdin
# RMS cache (used if 'valid_rms')
rms_extension => '.rms', # Appended to the 'filestem'
# Averaging to RMS info
sec_per_chunk => 0.1, # The window for taking mean square
# thresholds picking from the list of sorted 3-medians of RMS data
threshold_in_sorted_min_rel => 0, # relative position of 'threashold_min'
threshold_in_sorted_min_sec => 1, # shifted by this amount in the list
threshold_factor_min => 1, # the list elt is multiplied by this
threshold_in_sorted_max_rel => 0.5, # likewise
threshold_in_sorted_max_sec => 0, # likewise
threshold_factor_max => 1, # likewise
threshold_ratio => 0.15, # relative position between min/max
# Chunkification: smoothification
above_thres_window => 11, # in units of chunks
above_thres_window_rel => 0.25, # fractions of chunks above threshold
# in a window to make chunk signal
# Splitting into runs of signal/noise
max_tracks => 9999, # fail if more signal/noise runs
min_signal_sec => 5, # such runs of signal are forced
min_silence_sec => 2, # likewise
ignore_signal_sec => 1, # short runs of signal are ignored
min_silence_chunks_merge (see below) # and long resulting runs of silence
# are forced
# Calculate average signal in an interval "deeply inside" silence runs
local_level_ignore_pre_sec => 0.3, # offset the start of this interval
local_level_ignore_pre_rel => 0.02, # additional relative offset
local_level_ignore_post_sec => 0.3, # likewise for end of the interval
local_level_ignore_post_rel => 0.02, # likewise
# Enlargement of signal runs: attach consequent chunks with signal this much
# above this average over the neighbour silence run
local_threshold_factor => 1.05,
# Final enlargement of runs of signal
extend_track_end_sec => 0.5, # Unconditional enlargement
extend_track_begin_sec => 0.3, # likewise
min_boundary_silence_sec => 0.2, # Ignore short silence at start/end
Note that "above_thres_window" is the only value specified directly in
units of chunks; the other *_sec may be optionally specified in units of
chunks by setting the corresponding *_chunks value. Note also that this
window should better be decreased if minimal allowed silence length
parameters are decreased.
These values are mirrored from other values if not explicitly specified:
min_actual_silence_sec << min_silence_sec # Ignore short gaps
min_start_silence_sec << min_boundary_silence_sec # Same at start
min_end_silence_sec << min_boundary_silence_sec # Same at end
min_silence_chunks_merge << min_silence_chunks # See above
cache_rms_write <<< cache_rms # Boolean: write RMS cache
cache_rms_read <<< cache_rms # Boolean: read RMS cache (unless 'filter')
The following values default to "undef":
filename # if undef, read data from STDIN
stem_strip_extension # Boolean: 'filestem' has no extension
filter # If true, PCM data is mirrored to out_fh
rms_filename # Specify cache file explicitly
raw_pcm # The input has no WAV header
override_header_info # The user specified values override WAV header
cache_rms # Use cache file (see *_write, *_read above)
skip_medians # Boolean: do not calculate 3-medians
subchunk_size # Optimization of calculation of RMS; the
# best value depends on the processor cache
METHODS
"new(key1 => value1, key2 => value2, ....)"
The arguments form a hash of configuration parameters.
"set(key => value)"
set a configuration parameter.
"get(key)"
get a configuration parameter or a value which may be calculated
basing on them.
"output_levels([key])"
prints a human-readable display of RMS (or similar) values. Defaults
to "rms_data"; additional possible values are "medians" and
"sorted".
"output_blocks([option_hashref], [key])"
prints a human-readable display of obtained audio chunks. "key"
defaults to "b"; additional possible values are "b0" to "b4".
Recognized options key is "format"; defaults to "long", which
results in windy output; the value "short" results in shorter output
and no preamble. Preamble lines are all "#"-commented; any output
line is in the form
START_SEC =END_SEC # COMMENT
With "short" format there is no preamble, and (currently) "COMMENT"
is of the form "PIECE_NUMBER len=PIECE_DURATION_SEC". These formats
are recognized, e.g., by MP3::Split::mp3split_read().
"split_file([options], [key])"
Splits the file (only MP3 via MP3::Splitter is supported now). The
meaning of options is the same as for MP3::Splitter. Defaults to
blocks of type "b"; additional possible values are "b0" to "b4".
@vals = get_rmsinfo(); set_rmsinfo(@vals)
Duplicate RMS info between two different "Audio::FindChunks"
objects. The exchanged info is the following:
chunks rms_data medians sorted channels min max
frequency bytes_per_sample sec_per_chunk bytes_per_chunk
set_rmsinfo() returns the object itself.
set() and get()
In and Out
The functionality of the module is modelled on the architecture of
Data::Flow: the two principal methods are "set(key => value)" and
"get(key)"; the module knows how to calculate keys basing on values of
other keys.
The results of calculation are cached; in particular, if one needs to
calculate some value for different values of a configuration parameter,
one should create many copies of "Audio::FindChunks" object, as in
my @info = Audio::FindChunks->new(filename => $f)->get_rmsinfo;
for my $ratio (0..100) {
Audio::FindChunks->new(threshold_ratio => $r/100)
->set_rmsinfo(@info)->print_blocks();
}
The internally used format of intermediate data is designed for quick
shallow copying even for enourmous audio files.
Dependencies
The current dependecies for values which are not explicitly set():
filestem <<< filename stem_strip_extension
input_type <<< filename
preprocess_a <<< input_type preprocess
preprocess_input <<< preprocess_a filename
fh AND close_fh <<< preprocess_input filename
fh_bin <<< fh
out_fh_bin <<< filter out_fh
rms_filename_default <<< filestem rms_extension
read_from_rms_file <<< filter cache_rms_read rms_filename
write_to_rms_file <<< cache_rms_write rms_filename
rms_filename_actual <<< rms_filename rms_filename_default
samples_per_chunk <<< sec_per_chunk frequency
bytes_per_chunk <<< samples_per_chunk bytes_per_sample
rms_data_arr_f <<< read_from_rms_file rms_filename_actual
samples_per_chunk
rms_data AND chunks <<< rms_data_arr_f OR A LOT OF OTHER PARAMETERS
medians <<< rms_data skip_medians chunks
sorted <<< medians chunks,
threshold_in_sorted_* <<< chunks threshold_in_sorted_*_*
threshold_min/max <<< threshold_factor_* sorted threshold_in_sorted_min/max
threshold <<< threshold_min threshold_ratio threshold_max
above_thres <<< chunks rms_data threshold
above_thres_in_window <<< above_thres chunks above_thres_window
above_thres_window_abs<<< above_thres_window_rel above_thres_window
maybe_signal <<< above_thres_in_window chunks above_thres_window_abs
maybe_trk_pk <<< max_tracks maybe_signal chunks
b0 <<< maybe_trk_pk
b1 <<< b0 min_signal_chunks min_silence_chunks
b2 <<< b1 ignore_signal_chunks
b3 <<< b2 min_silence_chunks_merge
b4 <<< b3
b <<< b4 local_level_ignore_*
medians local_threshold_factor
extend_track_begin_chunks
extend_track_end_chunks
min_actual_silence_chunks
min_start_silence_chunks min_end_silence_chunks
If "rms_data" is not read from cached source, a lot of other fields may
be also set from the WAV header (unless "raw_pcm").
Formats
Potentially large internally-cached values are stored as array
references to decrease the overhead of shallow copying.
The data which relates to the initial chunks (of size "sec_per_chunk")
is stored as length 1 arrays with packed (either by "l*" or "d*",
depending on the semantic) data; this allows small memory footprint work
with huge audio files, and allows an easy implemenation of most
computationally intensive work in C.
The blocks of audio/signal/noise/silence are stored as Perl arrays; each
element is a reference to an array of length 3: type (-1 for silence, 0
for noise, 1 for signal, and 2 for audio), start chunks, duration in
chunks.
ALGORITHM
The algorithm for finding boundaries of parts follows closely the
algorithm used by GramoFile v1.7 (however, *this* version is *fully*
customizable, fully documented, and has some significant bugs fixed).
The keywords in the discussion below refer to customization parameters;
keywords of the form ">>>key" refer to "get()"able values.
Smooth the input
This is done in 2 distinct steps:
Break the input into chunks of equal duration (governed by
"sec_per_chunk"); find the acoustic energy of each channel per chunk
(no customization); energy is the quadratic average of signal level;
calculate maximal energy among channels per chunk (no customization;
">>>rms_data").
Trim "extremal" chunks by replacing the energy level of each chunk
by the median of it and its two neighbors (switched off if
"skip_medians"; ">>>medians").
Calculate the signal/noise threshold
basing on the distribution (">>>sorted") of smoothed values.
Governed by "threshold_*" parameters. ">>>threshold_min",
">>>threshold_max", ">>>threshold".
Smooth it again
Separate into *signal* and *noise* chunks basing on the number of
above-threshold chunks in a small window about the given chunk.
Governed by "above_thres_window", "above_thres_window_rel".
">>>maybe_signal", ">>>b0".
Find certain intervals of sound and silence
Long enough runs of signal chunks are proclaimed carrying sound;
likewise for noise chunks and silence. Governed by "max_tracks",
"min_signal_chunks", "min_silence_chunks". ">>>b1".
Long enough "unproclaimed" runs of chunks with only short bursts of
signal are proclaimed silence. Governed by "ignore_signal_chunks",
">>>b2"; and "min_silence_chunks_merge", ">>>b3".
Merge undecided into sound/silence
A run of chunks (signal or noise) "yet unproclaimed" to be sound or
silence is proclaimed sound if it is adjacent to a run of sound on
at least one side. The rest of unproclaimed runs are proclaimed
silence. No customization.
Runs of sound/silence are audio/gap candidates (no customization;
">>>b4").
Calculate average signal level in each gap candidate
ignoring short intervals near ends of gaps. Governed by
"local_level_*".
Allow for slow attack/decay or fade in/out
Extend runs of audio: join the consequent runs of chunks of adjacent
gaps where the energy level remains significantly larger than the
average level in this gap. Additionally, unconditionally extend the
tracks by a small amount. Governed by "local_threshold_factor",
"extend_track_end_chunks", "extend_track_begin_chunks".
Long enough gap candidates are gaps
Gaps which became too short are considered audio and are merged into
neighbors. Governed by "min_actual_silence_chunks",
"min_start_silence_chunks", "min_end_silence_chunks"; ">>>b".
Functions implemented in C
long bool_find_runs(int *input, array_run_t *output, long cnt, long out_cnt)
void double_find_above(double *input, int *output, long cnt, double threshold)
void double_median3(double *rmsarray, double *medarray, long total_blocks)
void double_sort(double *input, double *output, long cnt)
void int_find_above(int *input, int *output, long cnt, int threshold)
void int_sum_window(int *input, int *output, long cnt, int window_size)
void le_short_sample_stats(char *buf, int stride, long samples, array_stats_t *stat)
SEE ALSO
"Data::Flow", "MP3::Split"
AUTHOR
Ilya Zakharevich, <cpan@ilyaz.org<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2004 by Ilya Zakharevich
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself, either Perl version 5.8.2 or, at
your option, any later version of Perl 5 you may have available.
- Previous message: Ilya Zakharevich: "MP3::Splitter 0.02 released"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|