Image Processing Libraries: OpenCV, OpenMAX

OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision. The library is cross-platform. It focuses mainly on real-time image processing. If the library finds Intel's Integrated Performance Primitives on the system, it will use these proprietary optimized routines to accelerate itself.

OpenVX (Open Vision Acceleration) is an open, royalty-free standard for cross platform acceleration of computer vision applications. It is designed by the Khronos Group to facilitate portable, optimized and power-efficient processing of methods for vision algorithms. This is aimed for embedded and real-time programs within computer vision and related scenarios. It uses a connected graph representation of operations.


OpenMAX Integration Layer (IL) is a standard API to access Multimedia Components on mobile platforms. It focuses mainly on audio, video, and still images processing. It has been defined by the Khronos group. By means of the OpenMAX IL API, multimedia frameworks can access hardware accelerators on platforms that provide it. Bellagio is an opensource implementation of the OpenMAX IL API that runs on Linux.It is intended to show the usage of the IL API and to allow people to start developing components.This package provides the OpenMAX IL core shared library with a "reference" component.


OpenMAX DL is a OpenMAX DL (Development Layer) APIs contains a comprehensive set of audio, video and imaging functions that can be implemented and optimized on new CPUs, hardware engines, and DSPs and then used for a wide range of accelerated codec functionality such as MPEG-4, H.264, MP3, AAC and JPEG.


Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. They currently cover the two following feature sets, and more will come in the future: Dense matrix and array manipulations, Sparse linear algebra. Eigenfaces is a popular Facial Recognition library.

Festival library(&Edinburgh Speech Tools library) is a C++ Text to Voice generation library.

Digital Audio Programming

Digital Audio is the most commonly used method to represent sound inside a computer. In this method sound is stored as a sequence of samples taken from the audio signal using constant time intervals. The quality of digital audio signal depends on the time (recording rate) and voltage resolution (usually in an linear integer representation with basic unit one bit). sample represents volume of the signal at the moment when it was measured. In uncompressed digital audio each sample require one or more bytes of storage. Number of bytes required depends on number of channels (mono, stereo) and sample format (8 or 16 bits, mu-Law, etc.). The length of this interval determines the sampling rate. Normally used sampling rates are between 8 kHz (telephone quality) and 48 kHz (DAT tapes).

Digital audio can be stored in a wide range of formats. Generally speaking, audio comes in two flavors: compressed and uncompressed. Compressed audio can further be subdivided into different kinds of compression: lossless, which preserves the original content exactly, and lossy which achieves more compression at the expense of degrading the audio. Uncompressed PCM audio, on the other hand, is defined by two parameters: the sample rate and the bit-depth. Loosely speaking, the sample rate limits the maximum frequency that can be represented by the format, and the bit-depth determines the maximum dynamic range that can be represented by the format. You can think of bit-depth as determining how much noise there is compared to signal.


In the Linux kernel, there have historically been two uniform sound APIs used. One is OSS(Open Sound System); the other is ALSA (Advanced Linux Sound Architecture). ALSA is available for Linux only, and as there is only one implementation of the ALSA interface, ALSA refers equally to that implementation and to the interface itself.


Android Native code makes uses of OpenSL ES (Open Sound Library for Embedded Systems) for handling Audio processing.

ref:

The ABCs of PCM (Uncompressed) digital audio - http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html


WAVE PCM soundfile format - https://ccrma.stanford.edu/courses/422/projects/WaveFormat/


An introduction to Linux sound systems and APIs - http://archive09.linux.com/articles/113775


A Tutorial on Using the ALSA Audio API - http://equalarea.com/paul/alsa-audio.html


How to write ALSA driver - http://c-qs.blogspot.com/2014/05/writing-of-alsa-driver.html


alsa vs tinyalsa - http://blog.csdn.net/myzhzygh/article/details/8468210


Open Sound System(OSS) - 

  1. http://en.wikipedia.org/wiki/Open_Sound_System
  2. https://wiki.archlinux.org/index.php/Open_Sound_System
  3. http://www.4front-tech.com/pguide/audio.html
Advanced Linux Sound Architecture(ALSA) -
  1. ALSA pcm -  http://www.alsa-project.org/alsa-doc/alsa-lib/pcm.html
  2. ALSA -  alsa-lib, libsound2.so - http://www.alsa-project.org/alsa-doc/alsa-lib/pcm.html
  3. ALSA Pulse Audio - http://freedesktop.org/software/pulseaudio/doxygen/simple.html
  4. Pulse Audio(libpulse.so)  - http://freedesktop.org/software/pulseaudio/doxygen/simple.html
  5. Linux sound drivers -  http://www.tldp.org/HOWTO/Sound-HOWTO/
Introduction to Audio programming -
  1. How Audio Data is Represented - http://blogs.msdn.com/b/dawate/archive/2009/06/22/intro-to-audio-programming-part-1-how-audio-data-is-represented.aspx
  2. Demystifying the WAV Format - http://blogs.msdn.com/b/dawate/archive/2009/06/23/intro-to-audio-programming-part-2-demystifying-the-wav-format.aspx
  3. An Introduction to Audio Processing Objects - http://msdn.microsoft.com/en-us/magazine/dn201755.aspx
  4. Synthesizing Simple Wave Audio using C# - http://blogs.msdn.com/b/dawate/archive/2009/06/24/intro-to-audio-programming-part-3-synthesizing-simple-wave-audio-using-c.aspx
  5. Algorithms for Different Sound Waves in C# - http://blogs.msdn.com/b/dawate/archive/2009/06/25/intro-to-audio-programming-part-4-algorithms-for-different-sound-waves-in-c.aspx