What do your Facebook news-feed, Amazon product recommendations and Siri voice recognition all have in common? They all use some form of machine learning to automate data correlation, identify patterns, and make changes based on newly learned data. Continuing on our pursuit of Trending Technologies, lets discover what exactly is machine learning?
Machine learning is a buzzword in the technology world right now, and for good reason: It represents a major step forward in how computers can learn. Whether you realize it or not, machine learning is one of the most important technology trends – it underlies so many things we use today without even thinking about them. Speech recognition, Amazon and Netflix recommendations, fraud detection, and financial trading are just a few examples of machine learning commonly in use in today’s data-driven world.
- 2006: Netflix held the first “Netflix Prize” competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%.
- 2010: The Wall Street Journal wrote about the firm Rebellion Research and their use of Machine Learning to predict the financial crisis.
- 2012: Co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs would be lost in the next two decades to automated machine learning medical diagnostic software.
- 2014: Machine learning algorithm applied in Art History to study fine art paintings, and that it may have revealed previously unrecognized influences between artists.
Machine learning is starting to reshape how we live, and it’s time we understood what it is and why it matters.
What is Machine Learning?
Machine learning is when algorithms and methodologies give computers the ability to automatically learn and improve from experience without human intervention and without being explicitly programmed. Machine learning automatically finds patterns and structures in data that humans cannot process easily in order to make predictions and decisions. The key emphasis is on “automatic”. No specific human guidance or expert knowledge is required. Machine learning algorithms can automatically adapt to evidence from data, which allows them to learn new concepts.
The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly. Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field:
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
So if you want your program to,
- Task T: Predict traffic patterns at a busy intersection.
- You can run it through a machine learning algorithm,
- Experience E: with data about past traffic patterns.
- And, if it has successfully “learned”,
- Performance measure P: It will then do better at predicting future traffic patterns.
Machine learning enables computers to get into a mode of self-learning without being explicitly programmed. When exposed to new data, these computer programs are enabled to learn, grow, change, and develop by themselves. This is possible as programs learn from previous computations and use “pattern recognition” to produce reliable results. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It’s a science that’s not new – but one that has gained fresh momentum.
1959: Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term “Machine Learning” in 1959 while at IBM. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs.
As a scientific endeavor, machine learning grew out of the quest for artificial intelligence. Already in the early days of AI as an academic discipline, some researchers were interested in having machines learn from data. However, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and Machine learning.
1990s: Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory. It also benefited from the increasing availability of digitized information, and the possibility to distribute that via the Internet.
Machine Learning and Data
Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to “produce reliable, repeatable decisions and results” and uncover “hidden insights” through learning from historical relationships and trends in the data.
Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data. Machine learning reproduces known patterns and knowledge and further automatically applies that information to data, decision-making, and actions.
Artificial Intelligence and Machine Learning
Artificial Intelligence (AI) and Machine Learning (ML) are two very hot buzzwords right now, and often seem to be used interchangeably. They are not quite the same thing, but the perception that they are can sometimes lead to some confusion. Historically, machine learning was something of a reaction within artificial intelligence research. AI focused heavily on logic rather than probability or statistics. It was also a fairly open ended research program in which it is relatively hard to judge progress. Machine learning is, in comparison, an extremely well defined area focusing on concrete algorithmic and mathematical problems.
Artificial Intelligence is the broader concept of machines being able to carry out tasks in a way that we would consider “smart”.
Machine Learning is a current application of AI based around the idea that we should really just be able to give machines access to data and let them learn for themselves.
As technology, and, importantly, our understanding of how our minds work, has progressed, our concept of what constitutes AI has changed. Rather than increasingly complex calculations, work in the field of AI concentrated on mimicking human decision making processes and carrying out tasks in ever more human ways.
Basic steps to Machine Learning
There are 5 basic steps used to perform a machine learning task:
- Collecting data: Be it the raw data from excel, access, text files etc., this step (gathering past data) forms the foundation of the future learning. The better the variety, density and volume of relevant data, better the learning prospects for the machine becomes.
- Preparing the data: Any analytical process thrives on the quality of the data used. One needs to spend time determining the quality of data and then taking steps for fixing issues such as missing data and treatment of outliers. Exploratory analysis is perhaps one method to study the nuances of the data in details thereby burgeoning the nutritional content of the data.
- Training a model: This step involves choosing the appropriate algorithm and representation of data in the form of the model. The cleaned data is split into two parts – train and test (proportion depending on the prerequisites); the first part (training data) is used for developing the model. The second part (test data), is used as a reference.
- Evaluating the model: To test the accuracy, the second part of the data (holdout / test data) is used. This step determines the precision in the choice of the algorithm based on the outcome. A better test to check accuracy of model is to see its performance on data which was not used at all during model build.
- Improving the performance: This step might involve choosing a different model altogether or introducing more variables to augment the efficiency. That’s why significant amount of time needs to be spent in data collection and preparation.
Be it any model, these 5 steps can be used to structure the technique and when we discuss the algorithms, you shall then find how these five steps appear in every model!
Types of Problems and Tasks
Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning “signal” or “feedback” available to a learning system.
- Supervised learning: The computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs.
- Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).
- Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent). The program is provided feedback in terms of rewards and punishments as it navigates its problem space.
Machine learning can only be as good as the data you use to train it. The phrase “garbage in, garbage out” predates machine learning, but it aptly characterizes a key limitation of machine learning. Machine learning can only discover patterns that are present in your training data. For supervised machine learning tasks like classification, you’ll need a robust collection of correctly labeled, richly featured training data.
Why Machine Learning? | The Applications
To better understand the uses of machine learning, consider some of the instances where machine learning is applied: the self-driving Google car, cyber fraud detection, online recommendation engines—like friend suggestions on Facebook, Netflix showcasing the movies and shows you might like, and “more items to consider” and “get yourself a little something” on Amazon—are all examples of applied machine learning.
Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible; example applications include email filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), learning to rank, and computer vision. All these are by-products of applying machine learning to analyze huge volumes of data. By developing fast and efficient algorithms and data-driven models for real-time processing of data, machine learning is able to produce accurate results and analysis.
- Data Security: ML can help predict which files are malware with great accuracy. In other situations, machine learning algorithms can look for patterns in how data in the cloud is accessed, and report anomalies that could predict security breaches.
- Financial Trading: Many people are eager to be able to predict what the stock markets will do on any given day — for obvious reasons. But machine learning algorithms are getting closer all the time. Many prestigious trading firms use proprietary systems to predict and execute trades at high speeds and high volume. And humans can’t possibly compete with machines when it comes to consuming vast quantities of data or the speed with which they can execute a trade.
- Healthcare: Machine learning algorithms can process more information and spot more patterns than their human counterparts. Machine learning can be used to understand risk factors for disease in large populations.
- Marketing Personalization: The more you can understand about your customers, the better you can serve them, and the more you will sell. That’s the foundation behind marketing personalization. Companies can personalize which emails a customer receives, which direct mailings or coupons, which offers they see, which products show up as “recommended” and so on, all designed to lead the consumer more reliably towards a sale.
- Fraud Detection: Machine learning is getting better and better at spotting potential cases of fraud across many different fields. PayPal, for example, is using machine learning to fight money laundering. The company has tools that compare millions of transactions and can precisely distinguish between legitimate and fraudulent transactions between buyers and sellers.
- Recommendations: You’re probably familiar with this use if you use services like Amazon or Netflix. Intelligent machine learning algorithms analyze your activity and compare it to the millions of other users to determine what you might like to buy or binge watch next.
- Online Search: Perhaps the most famous use of machine learning, Google and its competitors are constantly improving what the search engine understands. Every time you execute a search on Google, the program watches how you respond to the results and can then learn from user behavior to deliver a better result in the future.
- Natural Language Processing (NLP): NLP is being used in all sorts of exciting applications across disciplines. Machine learning algorithms with natural language can stand in for customer service agents and more quickly route customers to the information they need.
- Smart Cars: A smart car would not only integrate into the Internet of Things, but also learn about its owner and its environment. It might adjust the internal settings — temperature, audio, seat position, etc. — automatically based on the driver, report and even fix problems itself, drive itself, and offer real time advice about traffic and road conditions.
All these examples echo the vital role machine learning has begun to take in today’s data-rich world. Machines can aid in filtering useful pieces of information that help in major advancements, and we are already seeing how this technology is being implemented in a wide variety of industries.
Real Examples of Machine Learning
Want to see some real examples of machine learning in action? Here are some companies that are using the power of machine learning in new and exciting ways…
- Yelp: Yelp’s machine learning algorithms help the company’s human staff to compile, categorize, and label images more efficiently – no small feat when you’re dealing with tens of millions of photos.
- Pinterest: Machine learning touches virtually every aspect of Pinterest’s business operations, from spam moderation and content discovery to advertising monetization and reducing churn of email newsletter subscribers.
- Facebook: Some chat-bots are virtually indistinguishable from humans when conversing via text. ML applications are being used at Facebook to filter out spam and poor-quality content, and the company is also researching computer vision algorithms that can “read” images to visually impaired people.
- Twitter: Twitter’s ML evaluates each tweet in real time and “scores” them according to various metrics. Ultimately, Twitter’s algorithms then display tweets that are likely to drive the most engagement.
- Google: According to Google, the company is researching “virtually all aspects of machine learning,” which will lead to exciting developments in what Google calls “classical algorithms” as well as other applications including natural language processing, speech translation, and search ranking and prediction systems.
- Baidu: Google isn’t the only search giant that’s branching out into machine learning. Chinese search engine Baidu is also investing heavily in the applications of ML. One of the most interesting (and disconcerting) developments at Baidu’s R&D lab is what the company calls Deep Voice, a deep neural network that can generate entirely synthetic human voices that are very difficult to distinguish from genuine human speech. The network can “learn” the unique subtleties in the cadence, accent, pronunciation and pitch to create eerily accurate recreations of speakers’ voices.
- Salesforce: Salesforce Einstein allows businesses that use Salesforce’s CRM software to analyze every aspect of a customer’s relationship – from initial contact to ongoing engagement touch points – to build much more detailed profiles of customers and identify crucial moments in the sales process. This means much more comprehensive lead scoring, more effective customer service (and happier customers), and more opportunities.
Some Approaches to Machine Learning
- Decision tree learning: Uses a decision tree as a predictive model, which maps observations about an item to conclusions about the item’s target value.
- Association rule learning: A method for discovering interesting relations between variables in large databases.
- Artificial neural networks: An artificial neural network (ANN) learning algorithm, usually called “neural network” (NN), is a learning algorithm that is inspired by the structure and functional aspects of biological neural networks. Computations are structured in terms of an interconnected group of artificial neurons, processing information using a connectionist approach to computation.
- Deep learning: Falling hardware prices and the development of GPUs for personal use in the last few years have contributed to the development of the concept of deep learning which consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.
- Inductive logic programming: ILP is an approach to rule learning using logic programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Support vector machines: SVMs are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
- Clustering: Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar.
- Reinforcement learning: Reinforcement learning is concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states.
- Representation learning: Several learning algorithms, mostly unsupervised learning algorithms, aim at discovering better representations of the inputs provided during training. Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing reconstruction of the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution.
- Similarity and metric learning: In this problem, the learning machine is given pairs of examples that are considered similar and pairs of less similar objects. It then needs to learn a similarity function (or a distance metric function) that can predict if new objects are similar. It is sometimes used in Recommendation systems.
- Genetic algorithms: A search heuristic that mimics the process of natural selection, and uses methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms found some uses in the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.
- Rule-based machine learning: Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves `rules’ to store, manipulate or apply, knowledge.
Machine learning poses a host of ethical questions. Systems which are trained on data-sets collected with biases may exhibit these biases upon use, thus digitizing cultural prejudices. Responsible collection of data thus is a critical part of machine learning.
Machine learning can inadvertently create a self-fulfilling prophecy. In many applications of machine learning, the decisions you make today affect the training data you collect tomorrow. Once your machine learning system embeds biases into its model, it can continue generating new training data that reinforces those biases. And some biases can ruin people’s lives. Be responsible: don’t create self-fulfilling prophecies.
Resurging interest in machine learning is due to the same factors that have made data mining and Bayesian analysis more popular than ever. Things like growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data storage. All of these things mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. And by building precise models, an organization has a better chance of identifying profitable opportunities – or avoiding unknown risks.
With the constant evolution of the field, there has been a subsequent rise in the uses, demands, and importance of machine learning. The highly complex nature of many real-world problems, though, often means that inventing specialized algorithms that will solve them perfectly every time is impractical, if not impossible. The goal of ML is never to make “perfect” guesses, because ML deals in domains where there is no such thing. The goal is to make guesses that are good enough to be useful. Machine Learning provides potential solutions in different domains, and is set to be a pillar of our future civilization. The demand for ML engineers is only going to continue to grow, offering incredible chances to be a part of something big.
I hope this article helped you to get acquainted with basics of machine learning. We would love to hear about it from you. Did you find it useful? What do you think of machine learning? Where would you like to see it implemented? Let us know in the comments!