加载中...

李宏毅ml note

发表于2025-04-02|更新于2025-04-03|学习

|总字数:388|阅读时长:1分钟|浏览量:|评论数:

Intro

Traning

Function

使用 model: $y = b+wx_{1}$
其中， $w$ 表示 weight， $b$ 表示 bias

Loss function

表示： $L(b,w)$

Optimization

一般性方法： 梯度下降法（gradient descent）

选择起始点 $w_0$
计算梯度 $\eta\frac{\partial L}{\partial w}|_{w = w^0}$ ( $\eta$ 表示学习率 learning rate)
更新 w: $w_1 \leftarrow w_{0 }-\eta\frac{\partial L}{\partial w}|_{w = w^0}$
b 作同样操作使 L 最小

More sophisticated models

多个 model 叠加

线性叠加

线性叠加
使用多个线性 model 叠加即可得到更为复杂的函数拟合曲线

Sigmoid func

$\begin{align} y &= c\frac{1}{1+e^{ -(b+wx_{1}) } }\\&=csigmoid(b+wx_{1}) \end{align}$

使用 sigmoid 方程代替线性叠加可以得到所有的函数拟合，最终得到

$y = b+\sum_{i} c_{i} sigmoid\left( b_{i}+\sum_{j}w_{ij}x_{j} \right)$

即：

$y = b+c^{T} \sigma(b+Wx)$

j: no. of features
i: no. of sigmoid

ReLu func

$y = \max\{0,x\}$

Loss

$L = \frac{1}{N} \sum_{n} e_{n}$

Optimization

$\theta^* = \arg \min_{\theta}L$
梯度矩阵

$\mathbf{g} = \nabla L(\theta^0)$

$\mathbf{\theta^1} \leftarrow \mathbf{ \theta^{0}} - \eta\mathbf{ g}$

在实际训练中，通常使用 batch 将原始数据集分解成不同的 batch 单独计算。

batch

文章作者: Leserein

文章链接: http://liuyang2005.github.io/2025/04/02/李宏毅/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 Lesereinの小木屋！

笔记 AI ml dl

buy me a starbucks

微信
支付宝

相关推荐

AI基础笔记I

Intro Search: Formulation and Solution (s) Agent categories Rational agent Reflex agent Planning agent Have the model of how the world evolves. Optimal and replannning. Formulation Basic Elements of Search Problem State Space world state search state Successor Function Start state (to start the search) and goal test (to terminate the search) Solution Search graph A mathematical formulation of Search Problem: G={V, E} Nodes V: States IN State Space Edges/Arcs E: Successor Functions...

AI基础笔记II

CSP 概述组成 A special subset of search problems. State is defined by variables XiX_iXi with values from a domain DDD (sometimes D depends on i). Goal test is a set of constraints specifying allowable combinations of values for subsets of variables 分类 Binary CSP: each constraint relates (at most) two variables. Binary constraint graph: nodes are variables, arcs show constraints. General-purpose CSP algorithms use the graph structure to speed up...

BN Inference Inference by enumeration Select the entries consisitent with the evidence. Sum out H (Hidden variables) to get joint of Query and evidence Normalize Variable Elimination Op 1: Join Factors Op 2: Eliminiate (Marginalizing Early) Op 3: Normalize Sampling Prior sampling 根据贝叶斯网络的拓扑排序，从根节点开始，按照条件概率分布逐步生成样本。适用于贝叶斯网络的联合分布抽样，但可能导致低效样本（例如，许多样本不符合证据）。 Rejection sampling 先按照直接采样生成样本，然后拒绝不符合证据变量的样本。适用于查询条件概率 P(X∣E)P(X | E)P(X∣E)， Likelilhood weighting 固定证据变量 E，只对非证据变量进行采样。 E...

评论

数据加载中