## overview:

Machine Learning is the use of computers to highlight the true meaning of the data, in order to convert the unordered data into useful information. It is a multi-disciplinary subject involving many disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. Specializing in how computers simulate or implement human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. It is the core of artificial intelligence, and it is the fundamental way to make computers intelligent. Its application spans all fields of artificial intelligence. It mainly uses induction, synthesis rather than deduction.

## Machine learning research significance

Machine learning is a science of artificial intelligence. The main research object in this field is artificial intelligence, especially how to improve the performance of specific algorithms in empirical learning.” “Machine learning is the study of computer algorithms that can be automatically improved through experience” “Machine learning is the use of data or past experience to optimize the performance standards of computer programs. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E .

Machine learning has been widely used, such as: data mining, computer vision, natural language processing, biometrics, search engines, medical diagnosis, detection of credit card fraud, securities market analysis, DNA sequence sequencing, speech and handwriting recognition, strategy Games and robots.

## Machine learning scenario

- For example: identify animal cats
- Pattern recognition (official standard): People get a conclusion through a lot of experience, so that it is a cat.
- Machine learning (data learning): People learn by reading, observing that it will call, small eyes, two ears, four legs, one tail, and get a conclusion to judge that it is a cat.
- Deep learning (in-depth data): People understand it by knowing it, and it is similar to similar cats and draws conclusions to judge that it is a cat. (Frequent areas of deep learning: speech recognition, image recognition)

- Pattern recognition: Pattern recognition is the oldest (as a term, it can be said to be very obsolete).
- We refer to the environment and the object as “patterns”. Recognition is a recognition of patterns, and how to make a computer program do something that looks “smart”.
- By integrating wisdom and intuition, by building a program, identify something, not a person, for example: Identify numbers.

- Machine learning: Machine learning is the most basic (one of the hot spots of current startups and research laboratories).
- In the early 1990s, people began to realize a way to build pattern recognition algorithms more efficiently, which is to replace experts (people with a lot of image knowledge) with data (which can be obtained through cheap labor collection).
- “Machine learning” emphasizes that after inputting some data to a computer program (or machine), it must do something, that is, to learn the data, and the steps of this learning are clear.
- Machine Learning is a discipline that specializes in how computers simulate or implement human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance.

- Deep learning: Deep learning is a very new and influential frontier, and we don’t even think about it – the post-deep learning era.
- Deep learning is a new field in machine learning research. Its motivation is to build and simulate a neural network for human brain analysis and learning. It mimics the mechanism of the human brain to interpret data such as images, sounds and texts.

- Search Engine: Optimize your next search results based on your search clicks. It is machine learning to help search engines determine which results are more suitable for you (and also determine which ads are right for you).
- Spam: Automatically filters spam emails into the trash.
- Supermarket Coupon: You will find that when you buy a child’s diaper, the salesperson will give you a coupon to redeem 6 cans of beer.
- Post Office Mail: The handwriting software automatically identifies the address where the greeting card is sent.
- Apply for a loan: Conduct a comprehensive assessment of your recent financial activity information to determine your eligibility.

## Machine learning composition

### main mission

- Classification: Divide instance data into appropriate categories.
- Application example: Determine whether the website is hacked (two classifications), automatic recognition of handwritten digits (multi-classification)

- Regression: mainly used to predict numerical data.
- Application examples: forecasting of stock price fluctuations, housing price forecasts, etc.

### Supervised learning

- The value of the target variable must be determined so that the machine learning algorithm can discover the relationship between the feature and the target variable. In supervised learning, given a set of data, we know what the correct output should look like and know that there is a specific relationship between input and output. (including: classification and regression)
- Sample set: training data + test data
- Training sample = feature + target variable (label: classification – discrete value / regression – continuous value)
- The features are usually the columns of the training sample set, which are measured independently.
- Target Variable: The target variable is the test result of the machine learning prediction algorithm.
- In the classification algorithm, the types of target variables are usually nominal (such as true and false), while in regression algorithms they are usually continuous (eg, 1~100).

- Supervised learning needs to pay attention to the problem:
- Offset variance tradeoff
- Functional complexity and quantity of training data
- The dimension of the input space
- Output value in noise

**Knowledge representation：**- Can be in the form of a rule set [for example: math scores greater than 90 is excellent]
- Can be in the form of probability distribution [for example: through statistical distribution, 90% of students’ mathematics scores, below 70 points, then greater than 70 points is considered excellent]
- You can use an instance of the training sample set [for example: through the sample set, we train a model instance to get young, high in mathematics, elegant in conversation, we think it is excellent]

### Unsupervised learning (unsupervised learing)

- In machine learning, the problem with unsupervised learning is to try to find hidden structures in unlabeled data. Because the examples provided to learners are unmarked, there are no errors or reward signals to evaluate potential solutions.
- Unsupervised learning is a closely related issue of statistical data density estimates. However, unsupervised learning also includes techniques for seeking, summarizing, and interpreting the main features of the data. Many of the methods used in unsupervised learning are based on data mining methods for processing data.
- The data has no category information and does not give a target value.
- Types covered by unsupervised learning:
- Clustering: In unsupervised learning, the process of dividing a data set into multiple classes of similar objects is called clustering.
- Density estimation: Estimate the similarity to the grouping by the tightness of the sample distribution.
- In addition, unsupervised learning can also reduce the dimensions of data features so that we can display data information more intuitively using 2D or 3D graphics.Reinforcement learningThis algorithm can train the program to make a decision. The program tries all possible actions in a given situation, recording the results of the different actions and trying to find the best one to make the decision. There is a Markov decision process for this type of algorithm.

### Reinforcement learning

This algorithm can train the program to make a decision. The program tries all possible actions in a given situation, recording the results of the different actions and trying to find the best one to make the decision. There is a Markov decision process for this type of algorithm.

## Machine learning basics

### Division of data sets

- Training set – Learn the sample data set and build a model by matching some parameters, mainly for training the model. An analogy to the problem solving before the postgraduate study.
- Validation set – For the learned model, adjust the parameters of the model, such as selecting the number of hidden units in the neural network. The validation set is also used to determine the network structure or parameters that control the complexity of the model. Analog test before the exam.
- Test set – Tests the resolving power of a trained model. Analogy. This is really a test for life.

### Model fit

- Underfitting: The model does not capture the data features well and does not fit the data well. The general nature of the training samples has not been well learned. Analogy, if you don’t do a book, you will feel that you will have anything. If you go to the examination room, you will know that you will not.
- Overfitting: The model learns the training samples as “very good”. It may take some of the characteristics of the training samples as the general nature of all potential samples, resulting in a decrease in generalization ability. Analogy, all the questions after the class are done right, and the super-class questions are also considered to be the exam questions, and they will not be on the examination room.

In general, under-fitting and over-fitting can be used in one sentence, the under-fitting is: “You are too naive!”, over-fitting is: “You think too much!”.

### Common model indicators

- Correct rate – the number of correct pieces of information extracted / the number of pieces of information extracted
- Recall rate – the number of correct pieces of information extracted / the number of pieces of information in the sample
- F value – correct rate * recall rate * 2 / (correct rate + recall rate) (F value is the harmonic mean of the correct rate and recall rate)

Here is an example:

Here is an example: A pond has 1,400 squid, 300 shrimp, and 300 turtles. Now for the purpose of catching squid. Sprinkled a net and caught 700 squid, 200 shrimps, and 100 turtles. Then these indicators are as follows: Correct rate = 700 / (700 + 200 + 100) = 70% Recall rate = 700 / 1400 = 50% F value = 70% * 50% * 2 / (70% + 50%) = 58.3 %

### model

- Classification problem – To put it bluntly, it is to divide some unknown categories of data into the currently known categories. For example, based on some of your information, judge whether you are rich and handsome, or poor. The three indicators that judge the classification effect are the three indicators described above: correct rate, recall rate, and F value.
- Regression Problem – A supervised learning algorithm for predicting and modeling numerical continuous random variables. Regression often determines the accuracy of the model by calculating the error (Error).
- Clustering Problem – Clustering is an unsupervised learning task that finds the natural population (ie cluster) of the observed samples based on the internal structure of the data. The criteria for clustering problems are generally based on distance: Intra-cluster Distance and Inter-cluster Distance. The smaller the distance within the cluster, the better, that is, the more similar the elements in the cluster are, the better the distance between the clusters is, and the better the distance between the clusters, that is, the more different the elements between clusters (different clusters). In general, measuring the clustering problem gives a formula that combines the distance within the cluster and the distance between the clusters.

### Some small things in feature engineering

- Feature Selection – Also called Feature Subset Selection (FSS). It refers to the selection of N features from the existing M features to optimize the specific indicators of the system. It is the process of selecting some of the most effective features from the original features to reduce the dimensions of the data set, and is a function to improve the performance of the algorithm. The important means is also the key data preprocessing step in pattern recognition.
- Feature Extraction – Feature extraction is a concept in computer vision and image processing. It refers to the use of a computer to extract image information and determine whether the points of each image belong to an image feature. The result of feature extraction is to divide the points on the image into different subsets, which tend to belong to isolated points, continuous curves or continuous regions.

**Machine learning terminology**

- Model: computer-level cognition
- Learning algorithm, a method of generating a model from data
- Data set: a collection of records
- Example: A description of an object
- Sample: also called an example
- Attribute: An aspect of an object’s performance or characteristics
- Feature: same attribute
- Attribute value: the value on the attribute
- Attribute space: the space where the attribute is expanded
- Sample space / sample space (samplespace): same attribute space
- Feature vector: Each point corresponds to a coordinate vector in the attribute space, and an example is called a feature vector.
- Dimension: describes the number of sample parameters (that is, the space is several dimensions)
- Learning/training: learning from data
- Training data: data used during training
- Training sample: each sample used for training
- Training set: A collection of training samples
- Hypothesis: The learning model corresponds to some potential rule about data.
- The ground-truth: the underlying law of existence
- Learner (learner): Another term for a model that instantiates a learning algorithm in a given data and parameter space.
- Prediction: the property of a thing
- Label: Information about the results of the example, such as I am a “good guy.”
- Example (example): Example of owning a tag
- Label space: a collection of all tags
- Classification: A prediction is a discrete value, such as a learning task that divides people into good people and bad people.
- Regression: The predicted value is a continuous value, for example, your good person reaches 0.9, 0.6 or the like.
- Binary classification: a classification task involving only two categories
- Positive class: one of the two categories
- Negative class: another one in the second category
- Multi-class classification: classification involving multiple categories
- Testing: The process of predicting a sample after learning the model
- Test sample: the sample being predicted
- Clustering: divides objects in the training set into groups
- Cluster: Each group is called a cluster
- Supervised learning: paradigm – classification and regression
- Unsupervised learning: paradigm–clustering
- Unseen instance: “new sample”, untrained sample
- Generalization ability: the ability of the learned model to apply to new samples
- Distribution: A law of obedience of the entire sample space of a sample space
- Independent and identically distributed (i, i, d.): Each sample obtained is independently sampled from this distribution.

### Machine learning mathematics foundation

- calculus
- Statistics/probability theory
- Linear algebra

### Python language

- Executable pseudo code
- Python is popular: it is widely used, has many code examples, and is rich in module libraries. The development cycle is short.
- Features of the Python language: clear and concise, easy to understand
- Disadvantages of the Python language: The only downside is the performance issue
- Python related library
- Scientific function library:
`SciPy`

,`NumPy`

(bottom language: C and Fortran) - Drawing Tools Library:
`Matplotlib`

- Data analysis library
`Pandas`

- Scientific function library:

**Mathematical tools**

- Matlab