Tech Kaizen: Machine Learning(ML) Overview

Machine learning is a Umbrella term. It is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. AI means making computers act intelligently. It is one of the major fields of study in computer science and encompasses sub-fields such as robotics, machine learning, expert systems, general intelligence and natural language processing.” Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.

Machine learning is the sub field of computer science that "gives computers the ability to learn without being explicitly programmed" (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions through building a model from sample inputs.

Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses in prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with Data Mining where the latter subfield focuses more on exploratory data analysis. Statistical Analysis is a component of data analytics. In the context of business intelligence (BI), statistical analysis involves collecting and scrutinizing every data sample in a set of items from which samples can be drawn.

Deep learning is a form of machine learning that uses a model of computing that's very much inspired by the structure of the brain. Hence we call this model a neural network. The basic foundation unit of a neural network is the neuron, which is actually conceptually quite simple.

Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are -

1. Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.

2. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).

3. Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent.

Generalization refers to how well the concepts learned by a machine learning model apply to specific examples not seen by the model when it was learning. The goal of a good machine learning model is to generalize well from the training data to any data from the problem domain. This allows us to make predictions in the future on data the model has never seen. There is a terminology used in machine learning when we talk about how well a machine learning model learns and generalizes to new data, namely overfitting and underfitting.

Overfitting and underfitting are the two biggest causes for poor performance of machine learning algorithms. Overfitting refers to a model that models the training data too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance on the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact the models ability to generalize. Underfitting refers to a model that can neither model the training data not generalize to new data. An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data. Underfitting is often not discussed as it is easy to detect given a good performance metric. The remedy is to move on and try alternate machine learning algorithms.

Supervised Learning:

Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). The majority of practical machine learning uses supervised learning. Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

Supervised learning mainly has 2 categories:

1. Classification - Target variable is categorical(yes/no). A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
2. Regression - Target variable is continuous. A regression problem is when the output variable is a real value, such as “dollars” or “weight”.

Some popular examples of supervised machine learning algorithms are:
1. Linear regression for regression problems.
2. Random forest for classification and regression problems.
3. Support vector machines for classification problems.

Unsupervised Learning:

Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution - this distinguishes unsupervised learning from supervised learning and reinforcement learning. Unsupervised learning is closely related to the problem of density estimation in statistics. However, unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data.

Unsupervised learning problems can be further grouped into clustering and association problems -

1. Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
2. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

Some popular examples of unsupervised learning algorithms are:
1. K-means for clustering problems.
2. Apriori algorithm for association rule learning problems.

ref:

Wiki -
1. Artificial Intelligence - https://en.wikipedia.org/wiki/Artificial_intelligence
2. Machine Learning - https://en.wikipedia.org/wiki/Machine_learning
3. Unsupervised Learning - https://en.wikipedia.org/wiki/Unsupervised_learning
4. Supervised Learning - https://en.wikipedia.org/wiki/Supervised_learning
5. Neural Networks - https://en.wikipedia.org/wiki/Artificial_neural_network

Deep Neural Networks - https://www.technologyreview.com/s/602344/the-extraordinary-link-between-deep-neural-networks-and-the-nature-of-the-universe/

Supervised and Unsupervised learning - http://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/

Machine Learning Algorithms - http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/

Machine Learning using Python - http://scikit-learn.org/

Misc -
1. http://www.kdnuggets.com/2015/01/deep-learning-explanation-what-how-why.html
2. http://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
3. http://math.stackexchange.com/questions/141381/regression-vs-classification

Tech Kaizen

Search this Blog:

Machine Learning(ML) Overview

The Verge - YOUTUBE

Hard Fork Podcast

Dwarkesh Patel Podcast

SemiAnalysis Podcast (Dylan Patel)

Andrej Karpathy Youtube Channel

Microsoft Research

Hugging Face - Blog

AI at Wharton

Stanford Online

MIT OpenCourseWare - YOUTUBE

NPTEL IISC BANGALORE - YOUTUBE

HackerRank - YOUTUBE

FREE CODE CAMP - YOUTUBE

BYTE BYTE GO - YOUTBUE

GAURAV SEN INTERVIEWS - YOUTUBE

Tanay Pratap - YOUTUBE

Ashish Pratap Singh - YOUTUBE

Kantan Coding - YOUTUBE

SUCCESS IN TECH INTERVIEWS - YOUTUBE

IGotAnOffer: Engineering - YOUTUBE

DEEPLEARNING AI - YOUTUBE

MIT News - Artificial intelligence

Monthly Blog Archives

Blog Archives Categories

Popular Posts

My Other Blogs

Total Pageviews

Who am I

Aryaka Insights

Reid Hoffman - YOUTUBE

Martin Fowler's Bliki - BLOG

The Pragmatic Engineer

AI Workshop

CYBER SECURITY - YOUTUBE

CYBER SECURITY FUNDAMENTALS PROF MESSER - YOUTUBE