Big Data

Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data.

MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.


MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.


Linux Kernel Development(miscellaneous)

A kernel module(.ko) is a bit of compiled code that can be inserted into the kernel at run-time, such as with insmod or modprobeA driver is a bit of code that runs in the kernel to talk to some hardware device. It "drives" the hardware. Most every bit of hardware in your computer has an associated driver. A large part of a running kernel is driver code; the rest of the code provides generic services like memory management, IPC, scheduling, etc.

A driver may be built statically into the kernel file on disk. (The one in /boot, loaded into RAM at boot time by the boot loader early in the boot process.) A driver may also be built as a kernel module so that it can be dynamically loaded later. (And then maybe unloaded.)


Standard practice is to build drivers as kernel modules where possible, rather than link them statically to the kernel, since that gives more flexibility. There are good reasons not to, however:

  • Sometimes a given driver is absolutely necessary to help the system boot up. That doesn't happen as often as you might imagine, due to the initrd feature.
  • Statically built drivers may be exactly what you want in a system that is statically scoped, such as an embedded system. That is to say, if you know in advance exactly which drivers will always be needed and that this will never change, you have a good reason not to bother with dynamic kernel modules.
Not all kernel modules are drivers. For example, a relatively recent feature in the Linux kernel is that you can load a different process scheduler. One exception to this broad statement is the CPU chip, which has no "driver" per se. Your computer may also contain hardware for which you have no driver.

Linux Header packages only contain the header part of the above (and not all of that - only the "exported" headers), and some of the build infrastructure. So what you are seeing is expected. Header packages do not contain C source code (except for some stubs and build infrastructure code). The whole point of having this type of package is to save space (and bandwidth) - the whole Linux kernel source tree is rather large, and completely unnecessary if you don't intend to compile the kernel yourself. The header packages are built and shipped by distributions to provide just the right things necessary to build modules, but no more. (They certainly do not contain the compiled kernel.)


Installed linux kernel binaries are usually installed in the /boot directory, along with bootloader binaries and configuration files. (This is sometimes an independent filesystem, not mounted by default.) The exact name of the files depends on the kernel and distribution. (So does the bootloader.)


Installed linux kernel modules reside in sub-directories /lib/modules/`uname -r`/

Full kernel source code: /usr/src/linux is a traditional place to put kernel sources, but nothing prevents you from putting kernel sources elsewhere. This path is also often just a symbolic link to a directory. The symlink is there to simplify building applications that depend on the kernel source. KConfig files are a description of the kernel configuration options (and their dependencies) that are available for a given directory/module. Apart from that, it's all (mostly) C source code, header files and Makefiles. There are a few helper scripts here and there, and assembly source too.

Linux device drivers commands:
  1. Know the linux kernel version from kernel source code =>
    1. make kernelversion
    2. Check the top-level Makefile contents
  2. Miscellaneous commands
Linux command syntax
Linux command description
ls -R /lib/modules/$(uname -r)
Command to list all modules available for a given linux system
modinfo /path/to/module.ko
Display module information
insmod kernel-module-name
Install a module to a running kernel. NOTE: this command does not resolve module dependencies
modprobe kernel-module-name
Install a module to a running kernel inlcuding dependencies
depmod -a
Rebuild module dependancy database using /lib/modules/$(uname -r)/modules.dep
insmod --force kernel-module-name
Force insmod to load module even if its build for a defferent module version
modprobe -n -v kernel-module-name
Display insmod commands to load module and its dependencies. Useful when modprobe gives up due to dependency problem
lsmod
Display all modules currently loaded into a kernel
rmmod kernel-module-name
Command to remove a module from a running kernel

General linux commands:
  1. list all the dependent libraries of a binary => ldd $$(NAME_OF_BINARY)
  2. list all API exposed by shared library =>
    1. nm –D –defined-only name_of_binary (Exported sumbols are indicated by a T. Required symbols that must be loaded from other shared objects have a U)
    2. objdump –T $(NAME_OF_BINARY)
  3. find the bitness of a file =>
    1. readelf –h $(NAME_OF_BINARY)
    2. objdump –a $(NAME_OF_BINARY)
  4. Print CRC checksum and byte counts of each file => cksum
  5. Print MD5 hash sum of a file => md5sum
  6. Print SHA1 hash sum of a file => sha1sum
  7. Estimate file space usage => du –h $(NAME_OF_DIRECTORY)
  8. Display amount of free and used memory in the system(in mega bytes) => free -m
  9. Find the process ID of a running program => pidof
  10. Outputs file status => stat
  11. Print the strings of printable characters in files => strings
  12. Locate the binary, source, and manual page files for a command => whereis
  13. Display a tree of processes => pstree

ref:

Kernel newbies - http://kernelnewbies.org/

Kernel coverage at LWN.net - http://lwn.net/Kernel/

Linux kernel documentation(all in one) - https://code.google.com/p/kernel-all-in-one/source/browse/trunk/Docs/?r=70


Unreliable Guide To Hacking The Linux Kernel - http://kernelbook.sourceforge.net/kernel-hacking.pdf

Linux kernel development 3rd edition by Robert Love - https://archive.org/details/pdfy-PjVB7QjMXCW8xzZj,  http://reiber.org/nxt/pub/Linux/LinuxKernelDevelopment/Linux.Kernel.Development.3rd.Edition.pdf

Linux Device drivers 3rd edition - http://lwn.net/Kernel/LDD3/


Understanding the Linux Kernel, 3rd Edition By Daniel P. Bovet, Marco Cesati - 
http://gauss.ececs.uc.edu/Courses/c4029/code/memory/understanding.pdf

The Linux Kernel Module Programming Guide - http://www.tldp.org/LDP/lkmpg/2.6/lkmpg.pdf


Linux kernel in a Nutshell - https://aligunduz.org/random/LinuxKernelInANutshell.pdf, http://www.kroah.com/lkn/


Linux Device Driver Dos and Don'ts - http://kernel-janitor.sourceforge.net/kernel-janitor/docs/driver-howto.html

Kernel APIs, Part 1: Invoking user-space applications from the kernel - http://www.ibm.com/developerworks/library/l-user-space-apps/

Chrome Development

Chrome OS is an operating system based on the Linux kernel and designed by Google to work with web applications and installed applications. Initially, Chrome OS was almost a pure web thin client operating system with only a handful of "native" applications, but Google gradually began encouraging developers to create "packaged applications", some of which can work offline.

Chrome OS is built upon the open source project called Chromium OS which, unlike Chrome OS, can be compiled from the downloaded source code.


ref:

Chrome Operating System - http://en.wikipedia.org/wiki/Chrome_OS


Chrome development - https://developer.chrome.com

Google Chromium project - http://www.chromium.org/Home


Google developers - https://developers.google.com/


Developing applications on Chromebook - http://www.chromium.org/chromium-os/developing-apps-on-your-chromium-os-device


Google Chrome platform API - https://developer.chrome.com/home/platform-pillar,  https://developer.chrome.com/extensions/api_index


Google Chrome Development guide - https://developer.chrome.com/extensions/devguide


Chromium embedded framework(CEF) - https://code.google.com/p/chromiumembedded/

Reflections on Developing our First Chrome AppReflections on Developing our First Chrome App - https://software.intel.com/en-us/articles/reflections-on-developing-our-first-chrome-app

Chromebooks*: Developing the user experience of the future - https://software.intel.com/en-us/articles/chromebooks-developing-the-user-experience-of-the-future

Essential Resources Guide  - https://docs.google.com/document/d/16pGWXaoxC6CtVV1kZ0I9PgtSntZP80_2gWmaxKSLB18/edit#

Android Power Management

Power/energy consumption comes from hardware component utilization in the device. Energy efficiency comes from limiting hardware resource usage. Also, look carefully at your methods for utilizing hardware components, and make use of the most efficient methods.

The Android framework is designed to be energy efficient. When apps need hardware data or contents, Android provides passive event-driven mechanisms instead of prompting applications to actively poll the data. This is because active polling introduces more CPU or hardware utilization than event driven methods, which suggests we should utilize event driven mechanisms to cache the events in the app. Android provides events like intents, notifications and content observer to notify the apps on the changes of hardware components or the changes of contents. Each kind of event is managed by internal services. Once the services detect changes, the services notify the apps.

Suppose we design an app that needs location data. Android provides a location service enabling location data to multiple applications. The service allows the app developer to specify the location provider: GPS, network or passive. Deciding which provider to use is a trade-off in accuracy, speed and energy efficiency. Once an app registers itself to the location service with specified criteria, the location service sends location change notifications to the app according to the criteria.

Android Alarms (based on the AlarmManager class) give you a way to perform time-based operations outside the lifetime of your application. For example, you could use an alarm to initiate a long-running operation, such as starting a service once a day to download a weather forecast.

Android Suspend:

In android patched kernel, going to request_suspend_state() in kernel/power/earlysuspend.c (since android add the Early suspend & wakelock feather in kernel). For detail understand that, let first introduce several new feather android imported.

Files:
·         linux_source/kernel/power/main.c
·         linux_source/kernel/power/earlysuspend.c
·         linux_source/kernel/power/wakelock.c

Early Suspend:
Early suspend is a mechanism that android introduced into linux kernel. This state is btween really suspend, and trun off screen. After Screen is off, several device such as LCD backlight, gsensor, touchscreen will stop for battery life and functional requirement.

Late Resume
Late resume is a mechinism pairs to early suspend, executed after the kernel and system resume finished. It will resume the devices suspended during early suspend.

Wake Lock
Wake lock acts as a core member in android power management system. wake lock is a lock can be hold by kernel space ,system servers and applications with or without timeout. In an android patched linux kernel (referenced as android kernel below) will timing how many and how long the lock have. If there isn't any of wake lock prevent suspend(WAKE_LOCK_SUSPEND), android kernel will call linux suspend (pm_suspend()) to let entire system going to suspend.

Suspend vs Hibernate:
Suspend does not turn off your computer. It puts the computer and all peripherals on a low power consumption mode. Suspend saves the state of your computer to RAM. If the battery runs out or the computer turns off for some reason, the current session and unsaved changes will be lost.
Hibernate saves the state of your computer to the hard disk and completely powers off. When resuming, the saved state is restored to RAM.
Linux kernel currently there are three methods of suspending available: suspend to RAM (usually called just suspend), suspend to disk (usually known ashibernate), and hybrid suspend (sometimes aptly called suspend to both):
  1.     Suspend to RAM method cuts power to most parts of the machine aside from the RAM, which is required to restore the machine's state. Because of the large power savings, it is advisable for laptops to automatically enter this mode when the computer is running on batteries and the lid is closed (or the user is inactive for some time).
  2.     Suspend to disk method saves the machine's state into swap space and completely powers off the machine. When the machine is powered on, the state is restored. Until then, there is zero power consumption.
  3.     Suspend to both method saves the machine's state into swap space, but does not power off the machine. Instead, it invokes usual suspend to RAM. Therefore, if the battery is not depleted, the system can resume from RAM. If the battery is depleted, the system can be resumed from disk, which is much slower than resuming from RAM, but the machine's state has not been lost.
There are multiple low level interfaces (backends) providing basic functionality, and some high level interfaces providing tweaks to handle problematic hardware drivers/kernel modules (e.g. video card re-initialization).

ref:

Android Power Management -
Linux Kernel Power Management -

Advanced Android Power Management and Implementation of Wakelocks - http://www.ktm2m.net/~suhopark/apm2.pdf






Android Device File System Structure

Android Device directory structure is not fixed as in normal Linux but the following are typical:

 /system/bin – native utilities
 /system/xbin – more native utilities
 /system/etc – configuration files
 /system/apps – location of apps as apk or odex files
 /system/framework – application framework files (jar)
 /storage/sdcard – accessible outside of the phone (rw)
 /data – applications data (rw)
 /proc – proc file system
 /sys – sys file system
 /proc/crypto – encryption/signing schemes
 

ref:

Android Kernel vs Linux Kernel - http://www.all-things-android.com/content/android-kernel-versus-linux-kernel

Understanding the Android File Hierarchy - http://www.all-things-android.com/content/understanding-android-file-hierarchy

http://techblogon.com/android-file-system-structure-architecture-layout-details/