Tech Kaizen

passion + usefulness = success .. change is the only constant in life

Search this Blog:

Apache Hadoop vs Apache Spark

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework.

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well suited to machine learning algorithms.


Hadoop vs Spark


Hadoop is parallel data processing framework that has traditionally been used to run map/reduce jobs. These are long running jobs that take minutes or hours to complete. Spark has designed to run on top of Hadoop and it is an alternative to the traditional batch map/reduce model that can be used for real-time stream data processing and fast interactive queries that finish within seconds. So, Hadoop supports both traditional map/reduce and Spark. We should look at Hadoop as a general purpose Framework that supports multiple models and We should look at Spark as an alternative to Hadoop MapReduce rather than a replacement to Hadoop.


Spark uses more RAM instead of network and disk I/O its relatively fast as compared to hadoop. But as it uses large RAM it needs a dedicated high end physical machine for producing effective results. It all depends and the variables on which this decision depends keep on changing dynamically with time.


MapReduce Hadoop is designed to run batch jobs that address every file in the system. Since that process takes time, MapReduce is well suited for large distributed data processing where fast performance is not an issue, such as running end-of day transactional reports. MapReduce is also ideal for scanning historical data and performing analytics where a short time-to-insight isn’t vital. Spark was purposely designed to support in-memory processing. The net benefit of keeping everything in memory is the ability to perform iterative computations at blazing fast speeds—something MapReduce is not designed to do. 


Unlike MapReduce, Spark is designed for advanced, real-time analytics and has the framework and tools to deliver when shorter time-to-insight is critical. Included in Spark’s integrated framework are the Machine Learning Library (MLlib), the graph engine GraphX, the Spark Streaming analytics engine, and the real-time analytics tool, Shark. With this all-in-one platform, Spark is said to deliver greater consistency in product results across various types of analysis.


Over the years, MapReduce Hadoop has enjoyed widespread adoption in the enterprise, and that will continue to be the case. Going forward, as the need for advanced real-time analytics tools escalates, Spark is positioned to meet that challenge.


ref:


http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/

http://www.qubole.com/blog/big-data/spark-vs-mapreduce/


http://www.computerworld.com/article/2856063/enterprise-software/hadoop-successor-sparks-a-data-analysis-evolution.html


Apache Hadoop videos - https://www.youtube.com/watch?v=A02SRdyoshM&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD


Apache Spark videos - https://www.youtube.com/watch?v=7k_9sdTOdX4&list=PL9ooVrP1hQOGyFc60sExNX1qBWJyV5IMb


Hadoop Training Videos -

  1. Hortonworks Apache Hadoop popular videos - https://www.youtube.com/watch?v=OoEpfb6yga8&list=PLoEDV8GCixRe-JIs4rEUIkG0aTe3FZgXV
  2. Cloudera Apache Hadoop popular videos - https://www.youtube.com/watch?v=eo1PwSfCXTI&list=PLoEDV8GCixRddiUuJzEESimo1qP5tZ1Kn
  3. Stanford Hadoop Training material - https://www.youtube.com/watch?v=d2xeNpfzsYI&list=PLxRwCyObqFr3OZeYsI7X5Mq6GNjzH10e1
  4. Edureka Hadoop Training material - https://www.youtube.com/watch?v=A02SRdyoshM&list=PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD
  5. Durga Solutions Hadoop Training material - https://www.youtube.com/watch?v=Pq3OyQO-l3E&list=PLpc4L8tPSURCdIXH5FspLDUesmTGRQ39I

Labels: BIG DATA ANALYTICS, CLOUD COMPUTING, DATA SCIENCE
Newer Post Older Post Home

The Verge - YOUTUBE

Loading...

Google - YOUTUBE

Loading...

Microsoft - YOUTUBE

Loading...

MIT OpenCourseWare - YOUTUBE

Loading...

FREE CODE CAMP - YOUTUBE

Loading...

NEET CODE - YOUTUBE

Loading...

GAURAV SEN INTERVIEWS - YOUTUBE

Loading...

Y Combinator Discussions

Loading...

SUCCESS IN TECH INTERVIEWS - YOUTUBE

Loading...

IGotAnOffer: Engineering YOUTUBE

Loading...

Tanay Pratap YOUTUBE

Loading...

Ashish Pratap Singh YOUTUBE

Loading...

Questpond YOUTUBE

Loading...

Kantan Coding YOUTUBE

Loading...

CYBER SECURITY - YOUTUBE

Loading...

CYBER SECURITY FUNDAMENTALS PROF MESSER - YOUTUBE

Loading...

DEEPLEARNING AI - YOUTUBE

Loading...

STANFORD UNIVERSITY - YOUTUBE

Loading...

NPTEL IISC BANGALORE - YOUTUBE

Loading...

NPTEL IIT MADRAS - YOUTUBE

Loading...

NPTEL HYDERABAD - YOUTUBE

Loading...

MIT News

Loading...

MIT News - Artificial intelligence

Loading...

The Berkeley Artificial Intelligence Research Blog

Loading...

Microsoft Research

Loading...

MachineLearningMastery.com

Loading...

Harward Business Review(HBR)

Loading...

Wharton Magazine

Loading...
My photo
Krishna Kishore Koney
View my complete profile
" It is not the strongest of the species that survives nor the most intelligent that survives, It is the one that is the most adaptable to change "

View krishna kishore koney's profile on LinkedIn

Monthly Blog Archives

  • ►  2025 (2)
    • ►  May (1)
    • ►  April (1)
  • ►  2024 (18)
    • ►  December (1)
    • ►  October (2)
    • ►  September (5)
    • ►  August (10)
  • ►  2022 (2)
    • ►  December (2)
  • ►  2021 (2)
    • ►  April (2)
  • ►  2020 (17)
    • ►  November (1)
    • ►  September (7)
    • ►  August (1)
    • ►  June (8)
  • ►  2019 (18)
    • ►  December (1)
    • ►  November (2)
    • ►  September (3)
    • ►  May (8)
    • ►  February (1)
    • ►  January (3)
  • ►  2018 (3)
    • ►  November (1)
    • ►  October (1)
    • ►  January (1)
  • ►  2017 (2)
    • ►  November (1)
    • ►  March (1)
  • ►  2016 (5)
    • ►  December (1)
    • ►  April (3)
    • ►  February (1)
  • ▼  2015 (15)
    • ►  December (1)
    • ►  October (1)
    • ►  August (2)
    • ▼  July (4)
      • Apache Hadoop Info Dump ..
      • Windows 10 Universal Windows Platform (UWP) ..
      • Apache Hadoop vs Apache Spark
      • Secure coding guidelines ...
    • ►  June (2)
    • ►  May (3)
    • ►  January (2)
  • ►  2014 (13)
    • ►  December (1)
    • ►  November (2)
    • ►  October (4)
    • ►  August (5)
    • ►  January (1)
  • ►  2013 (5)
    • ►  September (2)
    • ►  May (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2012 (19)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  July (1)
    • ►  June (6)
    • ►  May (1)
    • ►  April (2)
    • ►  February (3)
    • ►  January (2)
  • ►  2011 (20)
    • ►  December (5)
    • ►  August (2)
    • ►  June (6)
    • ►  May (4)
    • ►  April (2)
    • ►  January (1)
  • ►  2010 (41)
    • ►  December (2)
    • ►  November (1)
    • ►  September (5)
    • ►  August (2)
    • ►  July (1)
    • ►  June (1)
    • ►  May (8)
    • ►  April (2)
    • ►  March (3)
    • ►  February (5)
    • ►  January (11)
  • ►  2009 (113)
    • ►  December (2)
    • ►  November (5)
    • ►  October (11)
    • ►  September (1)
    • ►  August (14)
    • ►  July (5)
    • ►  June (10)
    • ►  May (4)
    • ►  April (7)
    • ►  March (11)
    • ►  February (15)
    • ►  January (28)
  • ►  2008 (61)
    • ►  December (7)
    • ►  September (6)
    • ►  August (1)
    • ►  July (17)
    • ►  June (6)
    • ►  May (24)
  • ►  2006 (7)
    • ►  October (7)

Blog Archives Categories

  • .NET DEVELOPMENT (38)
  • 5G (5)
  • AI (Artificial Intelligence) (9)
  • AI/ML (4)
  • ANDROID DEVELOPMENT (7)
  • BIG DATA ANALYTICS (6)
  • C PROGRAMMING (7)
  • C++ PROGRAMMING (24)
  • CAREER MANAGEMENT (6)
  • CHROME DEVELOPMENT (2)
  • CLOUD COMPUTING (45)
  • CODE REVIEWS (3)
  • CYBERSECURITY (12)
  • DATA SCIENCE (4)
  • DATABASE (14)
  • DESIGN PATTERNS (9)
  • DEVICE DRIVERS (5)
  • DOMAIN KNOWLEDGE (14)
  • EDGE COMPUTING (4)
  • EMBEDDED SYSTEMS (9)
  • ENTERPRISE ARCHITECTURE (10)
  • IMAGE PROCESSING (3)
  • INTERNET OF THINGS (2)
  • J2EE PROGRAMMING (10)
  • KERNEL DEVELOPMENT (6)
  • KUBERNETES (19)
  • LATEST TECHNOLOGY (18)
  • LINUX (9)
  • MAC OPERATING SYSTEM (2)
  • MOBILE APPLICATION DEVELOPMENT (14)
  • PORTING (4)
  • PYTHON PROGRAMMING (6)
  • RESEARCH AND DEVELOPMENT (1)
  • SCRIPTING LANGUAGES (8)
  • SERVICE ORIENTED ARCHITECTURE (SOA) (10)
  • SOFTWARE DESIGN (13)
  • SOFTWARE QUALITY (5)
  • SOFTWARE SECURITY (23)
  • SYSTEM and NETWORK ADMINISTRATION (3)
  • SYSTEM PROGRAMMING (4)
  • TECHNICAL MISCELLANEOUS (31)
  • TECHNOLOGY INTEGRATION (5)
  • TEST AUTOMATION (5)
  • UNIX OPERATING SYSTEM (4)
  • VC++ PROGRAMMING (44)
  • VIRTUALIZATION (8)
  • WEB PROGRAMMING (8)
  • WINDOWS OPERATING SYSTEM (13)
  • WIRELESS DEVELOPMENT (5)
  • XML (3)

Popular Posts

  • Observer Pattern - Push vs Pull Model
  • AI Agent vs AI Workflow
  • Microservices Architecture ..
  • SSCLI(Shared Source Common Language Infrastructure)

My Other Blogs

  • Career Management: Invest in Yourself
  • Color your Career
  • Attitude is everything(in Telugu language)
WINNING vs LOSING

Hanging on, persevering, WINNING
Letting go, giving up easily, LOSING

Accepting responsibility for your actions, WINNING
Always having an excuse for your actions, LOSING

Taking the initiative, WINNING
Waiting to be told what to do, LOSING

Knowing what you want and setting goals to achieve it, WINNING
Wishing for things, but taking no action, LOSING

Seeing the big picture, and setting your goals accordingly, WINNING
Seeing only where you are today, LOSING

Being determined, unwilling to give up WINNING
Gives up easily, LOSING

Having focus, staying on track, WINNING
Allowing minor distractions to side track them, LOSING

Having a positive attitude, WINNING
having a "poor me" attitude, LOSING

Adopt a WINNING attitude!

Total Pageviews

who am i

My photo
Krishna Kishore Koney

Blogging is about ideas, self-discovery, and growth. This is a small effort to grow outside my comfort zone.

Most important , A Special Thanks to my parents(Sri Ramachandra Rao & Srimathi Nagamani), my wife(Roja), my lovely daughter (Hansini) and son (Harshil) for their inspiration and continuous support in developing this Blog.

... "Things will never be the same again. An old dream is dead and a new one is being born, as a flower that pushes through the solid earth. A new vision is coming into being and a greater consciousness is being unfolded" ... from Jiddu Krishnamurti's Teachings.

Now on disclaimer :
1. Please note that my blog posts reflect my perception of the subject matter and do not reflect the perception of my Employer.

2. Most of the times the content of the blog post is aggregated from Internet articles and other blogs which inspired me. Due respect is given by mentioning the referenced URLs below each post.

Have a great time

My LinkedIn Profile
View my complete profile

Failure is not falling down, it is not getting up again. Success is the ability to go from failure to failure without losing your enthusiasm.

Where there's a Will, there's a Way. Keep on doing what fear you, that is the quickest and surest way to to conquer it.

Vision is the art of seeing what is invisible to others. For success, attitude is equally as important as ability.

Favourite RSS Syndications ...

Google Developers Blog

Loading...

Blogs@Google

Loading...

Berklee Blogs » Technology

Loading...

Martin Fowler's Bliki

Loading...

TED Blog

Loading...

TEDTalks (video)

Loading...

Psychology Today Blogs

Loading...

Aryaka Insights

Loading...

The Pragmatic Engineer

Loading...

Stanford Online

Loading...

MIT Corporate Relations

Loading...

AI at Wharton

Loading...

OpenAI

Loading...

AI Workshop

Loading...

Hugging Face - Blog

Loading...

BYTE BYTE GO - YOUTBUE

Loading...

Google Cloud Tech

Loading...

3Blue1Brown

Loading...

Bloomberg Originals

Loading...

Dwarkesh Patel Youtube Channel

Loading...

Reid Hoffman

Loading...

Aswath Damodaran

Loading...