Audio::FindChunks v0.03 available

From: Ilya Zakharevich (nospam-abuse_at_ilyaz.org)
Date: 04/29/04

  • Next message: trampolien: "Re: Installation of Math::Pari"
    Date: Thu, 29 Apr 2004 02:56:05 +0000 (UTC)
    
    

    NAME
        Audio::FindChunks - breaks audio files into sound/silence parts.

    SYNOPSIS
          use Audio::FindChunks;

          # Duplicate input to output, caching RMS values to a file (as a side effect)
          Audio::FindChunks->new(rms_filename => 'x.rms', filter => 1)->get('rms_data');

          # Output human-readable info, using RMS cache file 'xxx.rms' if present:
          Audio::FindChunks->new(cache_rms => 1, filename => 'xxx.mp3',
                                 stem_strip_extension => 1)->output_blocks();

          # Remove start/end silence (if longer than 0.2sec):
          Audio::FindChunks->new(cache_rms => 1, filename => 'xxx.mp3',
                                 min_actual_silence_sec => 1e100)->split_file();

          # Split a multiple-sides tape recording
          Audio::FindChunks->new(filename => 'xxx.mp3', min_actual_silence_sec => 11
                                )->split_file({verbose => 1});

    DESCRIPTION
        Audio sequence is broken into parts which contain only noise ("gaps"),
        and parts with usable signal ("tracks").

        The following configuration settings (and defaults) are supported:

          # For getting PCM flow (and if averaging data is read from cache)
            frequency => 44100, # If 'raw_pcm' or 'override_header_info' only
            bytes_per_sample => 4, # likewise
            channels => 2, # likewise
            sizedata => MY_INF, # likewise (how many bytes of PCM to read)
            out_fh => \*STDOUT, # mirror WAV/PCM to this FH if 'filter'
          # Process non-WAV data:
            preprocess => {mp3 => [[qw(lame --silent --decode)], [], ['-']]}, # Second contains extra args to read stdin
          # RMS cache (used if 'valid_rms')
            rms_extension => '.rms', # Appended to the 'filestem'
          # Averaging to RMS info
            sec_per_chunk => 0.1, # The window for taking mean square
          # thresholds picking from the list of sorted 3-medians of RMS data
            threshold_in_sorted_min_rel => 0, # relative position of 'threashold_min'
            threshold_in_sorted_min_sec => 1, # shifted by this amount in the list
            threshold_factor_min => 1, # the list elt is multiplied by this
            threshold_in_sorted_max_rel => 0.5, # likewise
            threshold_in_sorted_max_sec => 0, # likewise
            threshold_factor_max => 1, # likewise
            threshold_ratio => 0.15, # relative position between min/max
          # Chunkification: smoothification
            above_thres_window => 11, # in units of chunks
            above_thres_window_rel => 0.25, # fractions of chunks above threshold
                                                 # in a window to make chunk signal
          # Splitting into runs of signal/noise
            max_tracks => 9999, # fail if more signal/noise runs
            min_signal_sec => 5, # such runs of signal are forced
            min_silence_sec => 2, # likewise
            ignore_signal_sec => 1, # short runs of signal are ignored
            min_silence_chunks_merge (see below) # and long resulting runs of silence
                                                 # are forced
          # Calculate average signal in an interval "deeply inside" silence runs
            local_level_ignore_pre_sec => 0.3, # offset the start of this interval
            local_level_ignore_pre_rel => 0.02, # additional relative offset
            local_level_ignore_post_sec => 0.3, # likewise for end of the interval
            local_level_ignore_post_rel => 0.02, # likewise
          # Enlargement of signal runs: attach consequent chunks with signal this much
          # above this average over the neighbour silence run
            local_threshold_factor => 1.05,
          # Final enlargement of runs of signal
            extend_track_end_sec => 0.5, # Unconditional enlargement
            extend_track_begin_sec => 0.3, # likewise
            min_boundary_silence_sec => 0.2, # Ignore short silence at start/end

        Note that "above_thres_window" is the only value specified directly in
        units of chunks; the other *_sec may be optionally specified in units of
        chunks by setting the corresponding *_chunks value. Note also that this
        window should better be decreased if minimal allowed silence length
        parameters are decreased.

        These values are mirrored from other values if not explicitly specified:

         min_actual_silence_sec << min_silence_sec # Ignore short gaps
         min_start_silence_sec << min_boundary_silence_sec # Same at start
         min_end_silence_sec << min_boundary_silence_sec # Same at end
         min_silence_chunks_merge << min_silence_chunks # See above

         cache_rms_write <<< cache_rms # Boolean: write RMS cache
         cache_rms_read <<< cache_rms # Boolean: read RMS cache (unless 'filter')

        The following values default to "undef":

            filename # if undef, read data from STDIN
            stem_strip_extension # Boolean: 'filestem' has no extension
            filter # If true, PCM data is mirrored to out_fh
            rms_filename # Specify cache file explicitly
            raw_pcm # The input has no WAV header
            override_header_info # The user specified values override WAV header
            cache_rms # Use cache file (see *_write, *_read above)
            skip_medians # Boolean: do not calculate 3-medians
            subchunk_size # Optimization of calculation of RMS; the
                                        # best value depends on the processor cache

    METHODS
        "new(key1 => value1, key2 => value2, ....)"
            The arguments form a hash of configuration parameters.

        "set(key => value)"
            set a configuration parameter.

        "get(key)"
            get a configuration parameter or a value which may be calculated
            basing on them.

        "output_levels([key])"
            prints a human-readable display of RMS (or similar) values. Defaults
            to "rms_data"; additional possible values are "medians" and
            "sorted".

        "output_blocks([option_hashref], [key])"
            prints a human-readable display of obtained audio chunks. "key"
            defaults to "b"; additional possible values are "b0" to "b4".
            Recognized options key is "format"; defaults to "long", which
            results in windy output; the value "short" results in shorter output
            and no preamble. Preamble lines are all "#"-commented; any output
            line is in the form

              START_SEC =END_SEC # COMMENT

            With "short" format there is no preamble, and (currently) "COMMENT"
            is of the form "PIECE_NUMBER len=PIECE_DURATION_SEC". These formats
            are recognized, e.g., by MP3::Split::mp3split_read().

        "split_file([options], [key])"
            Splits the file (only MP3 via MP3::Splitter is supported now). The
            meaning of options is the same as for MP3::Splitter. Defaults to
            blocks of type "b"; additional possible values are "b0" to "b4".

        @vals = get_rmsinfo(); set_rmsinfo(@vals)
            Duplicate RMS info between two different "Audio::FindChunks"
            objects. The exchanged info is the following:

                chunks rms_data medians sorted channels min max
                frequency bytes_per_sample sec_per_chunk bytes_per_chunk

            set_rmsinfo() returns the object itself.

    set() and get()
      In and Out
        The functionality of the module is modelled on the architecture of
        Data::Flow: the two principal methods are "set(key => value)" and
        "get(key)"; the module knows how to calculate keys basing on values of
        other keys.

        The results of calculation are cached; in particular, if one needs to
        calculate some value for different values of a configuration parameter,
        one should create many copies of "Audio::FindChunks" object, as in

          my @info = Audio::FindChunks->new(filename => $f)->get_rmsinfo;
          for my $ratio (0..100) {
            Audio::FindChunks->new(threshold_ratio => $r/100)
                ->set_rmsinfo(@info)->print_blocks();
          }

        The internally used format of intermediate data is designed for quick
        shallow copying even for enourmous audio files.

      Dependencies
        The current dependecies for values which are not explicitly set():

          filestem <<< filename stem_strip_extension
          input_type <<< filename
          preprocess_a <<< input_type preprocess
          preprocess_input <<< preprocess_a filename
          fh AND close_fh <<< preprocess_input filename
          fh_bin <<< fh
          out_fh_bin <<< filter out_fh
          rms_filename_default <<< filestem rms_extension
          read_from_rms_file <<< filter cache_rms_read rms_filename
          write_to_rms_file <<< cache_rms_write rms_filename
          rms_filename_actual <<< rms_filename rms_filename_default
          samples_per_chunk <<< sec_per_chunk frequency
          bytes_per_chunk <<< samples_per_chunk bytes_per_sample
          rms_data_arr_f <<< read_from_rms_file rms_filename_actual
                                        samples_per_chunk
          rms_data AND chunks <<< rms_data_arr_f OR A LOT OF OTHER PARAMETERS
          medians <<< rms_data skip_medians chunks
          sorted <<< medians chunks,
          threshold_in_sorted_* <<< chunks threshold_in_sorted_*_*
          threshold_min/max <<< threshold_factor_* sorted threshold_in_sorted_min/max
          threshold <<< threshold_min threshold_ratio threshold_max
          above_thres <<< chunks rms_data threshold
          above_thres_in_window <<< above_thres chunks above_thres_window
          above_thres_window_abs<<< above_thres_window_rel above_thres_window
          maybe_signal <<< above_thres_in_window chunks above_thres_window_abs
          maybe_trk_pk <<< max_tracks maybe_signal chunks
          b0 <<< maybe_trk_pk
          b1 <<< b0 min_signal_chunks min_silence_chunks
          b2 <<< b1 ignore_signal_chunks
          b3 <<< b2 min_silence_chunks_merge
          b4 <<< b3
          b <<< b4 local_level_ignore_*
                                        medians local_threshold_factor
                                        extend_track_begin_chunks
                                        extend_track_end_chunks
                                        min_actual_silence_chunks
                                        min_start_silence_chunks min_end_silence_chunks

        If "rms_data" is not read from cached source, a lot of other fields may
        be also set from the WAV header (unless "raw_pcm").

       Formats
        Potentially large internally-cached values are stored as array
        references to decrease the overhead of shallow copying.

        The data which relates to the initial chunks (of size "sec_per_chunk")
        is stored as length 1 arrays with packed (either by "l*" or "d*",
        depending on the semantic) data; this allows small memory footprint work
        with huge audio files, and allows an easy implemenation of most
        computationally intensive work in C.

        The blocks of audio/signal/noise/silence are stored as Perl arrays; each
        element is a reference to an array of length 3: type (-1 for silence, 0
        for noise, 1 for signal, and 2 for audio), start chunks, duration in
        chunks.

    ALGORITHM
        The algorithm for finding boundaries of parts follows closely the
        algorithm used by GramoFile v1.7 (however, *this* version is *fully*
        customizable, fully documented, and has some significant bugs fixed).
        The keywords in the discussion below refer to customization parameters;
        keywords of the form ">>>key" refer to "get()"able values.

        Smooth the input
            This is done in 2 distinct steps:

            Break the input into chunks of equal duration (governed by
            "sec_per_chunk"); find the acoustic energy of each channel per chunk
            (no customization); energy is the quadratic average of signal level;
            calculate maximal energy among channels per chunk (no customization;
            ">>>rms_data").

            Trim "extremal" chunks by replacing the energy level of each chunk
            by the median of it and its two neighbors (switched off if
            "skip_medians"; ">>>medians").

        Calculate the signal/noise threshold
            basing on the distribution (">>>sorted") of smoothed values.
            Governed by "threshold_*" parameters. ">>>threshold_min",
            ">>>threshold_max", ">>>threshold".

        Smooth it again
            Separate into *signal* and *noise* chunks basing on the number of
            above-threshold chunks in a small window about the given chunk.
            Governed by "above_thres_window", "above_thres_window_rel".
            ">>>maybe_signal", ">>>b0".

        Find certain intervals of sound and silence
            Long enough runs of signal chunks are proclaimed carrying sound;
            likewise for noise chunks and silence. Governed by "max_tracks",
            "min_signal_chunks", "min_silence_chunks". ">>>b1".

            Long enough "unproclaimed" runs of chunks with only short bursts of
            signal are proclaimed silence. Governed by "ignore_signal_chunks",
            ">>>b2"; and "min_silence_chunks_merge", ">>>b3".

        Merge undecided into sound/silence
            A run of chunks (signal or noise) "yet unproclaimed" to be sound or
            silence is proclaimed sound if it is adjacent to a run of sound on
            at least one side. The rest of unproclaimed runs are proclaimed
            silence. No customization.

            Runs of sound/silence are audio/gap candidates (no customization;
            ">>>b4").

        Calculate average signal level in each gap candidate
            ignoring short intervals near ends of gaps. Governed by
            "local_level_*".

        Allow for slow attack/decay or fade in/out
            Extend runs of audio: join the consequent runs of chunks of adjacent
            gaps where the energy level remains significantly larger than the
            average level in this gap. Additionally, unconditionally extend the
            tracks by a small amount. Governed by "local_threshold_factor",
            "extend_track_end_chunks", "extend_track_begin_chunks".

        Long enough gap candidates are gaps
            Gaps which became too short are considered audio and are merged into
            neighbors. Governed by "min_actual_silence_chunks",
            "min_start_silence_chunks", "min_end_silence_chunks"; ">>>b".

      Functions implemented in C
          long bool_find_runs(int *input, array_run_t *output, long cnt, long out_cnt)
          void double_find_above(double *input, int *output, long cnt, double threshold)
          void double_median3(double *rmsarray, double *medarray, long total_blocks)
          void double_sort(double *input, double *output, long cnt)
          void int_find_above(int *input, int *output, long cnt, int threshold)
          void int_sum_window(int *input, int *output, long cnt, int window_size)
          void le_short_sample_stats(char *buf, int stride, long samples, array_stats_t *stat)

    SEE ALSO
        "Data::Flow", "MP3::Split"

    AUTHOR
        Ilya Zakharevich, <cpan@ilyaz.org<gt>

    COPYRIGHT AND LICENSE
        Copyright (C) 2004 by Ilya Zakharevich

        This library is free software; you can redistribute it and/or modify it
        under the same terms as Perl itself, either Perl version 5.8.2 or, at
        your option, any later version of Perl 5 you may have available.


  • Next message: trampolien: "Re: Installation of Math::Pari"

    Relevant Pages

    • Re: opensuse 11.1, kde 4.1, desktop
      ... functionality. ... Animated windows and pretty decorations don't ... improve the function of the tool (they even decrease it at times ...
      (alt.os.linux.suse)
    • Re: [opensuse] Re: Beagle sucks
      ... The determining factor between the two is a sufficient amount of time in the release cycle for adequate coding and quality assurance of new functionality to insure against a decrease in functionality. ... dot 2 or dot 3) there has been a failure at the decision making level to adhere to and above. ...
      (SuSE)
    • Re: David Dryden - Searching All of Sequence Space
      ... the rate of decrease does not get smaller as the number of ... mutations increase. ... functionality continues to increase at the same exponential rate with ... I have certainly been unable to convince you otherwise. ...
      (talk.origins)
    • Re: Bring SGI STL to VC++
      ... > Just put the directory for the SGI headers earlier on the #include path ... > You will notice a decrease in STL compliance and functionality by doing ... Prev by Date: ...
      (microsoft.public.vc.language)