Tech Kaizen: Big Data

Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data.

MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.

MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.

ref:

Big Data - http://en.wikipedia.org/wiki/Big_data, http://en.wikipedia.org/wiki/MapReduce

Hadoop tutorial - http://www.coreservlets.com/hadoop-tutorial/

What is Hadoop - http://www-01.ibm.com/software/data/infosphere/hadoop/

MapReduce: Simplified Data Processing on Large Clusters - http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf

Google’s MapReduce Programming Model(Revisited) - http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf

MapReduce: Simplified Data Processing on Large Clusters - http://www.cs.utexas.edu/~pingali/CS395T/2012sp/lectures/MR-nikhil-panpalia.pdf

Hadoop/MapReduce - http://www.cs.colorado.edu/~kena/classes/5448/s11/presentations/hadoop.pdf

Apache's implementation of Google's MapReduce framework - https://www.defcon.org/images/defcon-17/dc-17-presentations/defcon-17-calca-anguiano-hadoop.pdf

Intel big data - http://www.intel.com/bigdata

Apache Hadoop Framework Spotlights - http://www.intel.com/content/www/us/en/big-data/big-data-apache-hadoop-framework-spotlights-landing.html

Tech Kaizen

Search this Blog:

Big Data

Y Combinator Interviews - YOUTUBE

Masters of Scale - YOUTUBE

The Verge - YOUTUBE

Google - YOUTUBE

Meta Developers - YOUTUBE

Microsoft - YOUTUBE

Microsoft India - YOUTUBE

MIT OpenCourseWare - YOUTUBE

FREE CODE CAMP - YOUTUBE

NEET CODE - YOUTUBE

GAURAV SEN INTERVIEWS - YOUTUBE

SUCCESS IN TECH INTERVIEWS - YOUTUBE

IGotAnOffer: Engineering YOUTUBE

Tanay Pratap YOUTUBE

Ashish Pratap Singh YOUTUBE

Questpond YOUTUBE

Kantan Coding YOUTUBE

CYBER SECURITY - YOUTUBE

CYBER SECURITY FUNDAMENTALS PROF MESSER - YOUTUBE

DEEPLEARNING AI - YOUTUBE

STANFORD UNIVERSITY - YOUTUBE

NPTEL IISC BANGALORE - YOUTUBE

NPTEL IIT MADRAS - YOUTUBE

NPTEL HYDERABAD - YOUTUBE

MIT News

MIT News - Artificial intelligence

The Berkeley Artificial Intelligence Research Blog

Microsoft Research

MachineLearningMastery.com

Harward Business Review(HBR)

Wharton Magazine

Monthly Blog Archives

Blog Archives Categories

Popular Posts

My Other Blogs

Total Pageviews

who am i

Google Developers Blog

Blogs@Google

Berklee Blogs » Technology

Martin Fowler's Bliki

TED Blog

TEDTalks (video)

Psychology Today Blogs

Aryaka Insights

The Pragmatic Engineer

Stanford Online

MIT Corporate Relations

AI at Wharton

OpenAI

AI Workshop

Hugging Face - Blog

BYTE BYTE GO - YOUTBUE

Google Cloud Tech

3Blue1Brown

Bloomberg Originals

Dwarkesh Patel Youtube Channel

Reid Hoffman

Aswath Damodaran