Applications（4）

2023-11-12

Other Applications

In this section we cover a few other types of applications of deep learning that are different from the standard object recognition, speech recognition and natural language processing tasks discussed above. Part III of this book will expand that scope even further to tasks that remain primarily research areas.

Recommender Systems

One of the major families of applications of machine learning in the information technology sector is the ability to make recommendations of items to potential users or customers. Two major types of applications can be distinguished: online advertising and item recommendations (often these recommendations are still for the purpose of selling a product). Both rely on predicting the association between a user and an item, either to predict the probability of some action (the user buying the product, or some proxy for this action) or the expected gain (which may depend on the value of the product) if an ad is shown or a recommendation is made regarding that product to that user.
Often, this association problem is handled like a supervised learning problem: given some information about the item and about the user, predict the proxy of interest (user clicks on ad, user enters a rating, user clicks on a “like” button, user buys product, user spends some amount of money on the product, user spends time visiting a page for the product, etc). This often ends up being either a regression problem (predicting some conditional expected value) or a probabilistic classification problem (predicting the conditional probability of some discrete event).
The early work on recommender systems relied on minimal information as inputs for these predictions: the user ID and the item ID. In this context, the only way to generalize is to rely on the similarity between the patterns of values of the target variable for different users or for different items. Suppose that user 1 and user 2 both like items A , B \mathrm{A}, \mathrm{B} A,B and C \mathrm{C} C. From this, we may infer that user 1 and user 2 have similar tastes. If user 1 likes item D, then this should be a strong cue that user 2 will also like D. Algorithms based on this principle come under the name of collaborative filtering.
Both non-parametric approaches (such as nearest-neighbor methods based on the estimated similarity between patterns of preferences) and parametric methods are possible. Parametric methods often rely on learning a distributed representation (also called an embedding) for each user and for each item.
Bilinear prediction of the target variable (such as a rating) is a simple parametric method that is highly successful and often found as a component of state-of-the-art systems. The prediction is obtained by the dot product between the user embedding and the item embedding (possibly corrected by constants that depend only on either the user ID or the item ID). Let R ^ \hat{\boldsymbol{R}} R^ be the matrix containing our predictions, A \boldsymbol{A} A a matrix with user embeddings in its rows and B \boldsymbol{B} B a matrix with item embeddings in its columns. Let b \boldsymbol{b} b and c \boldsymbol{c} c be vectors that contain respectively a kind of bias for each user (representing how grumpy or positive that user is in general) and for each item (representing its general popularity). The bilinear prediction is thus obtained as follows:
R ^ u , i = b u + c i + ∑ j A u , j B j , i \hat{R}_{u, i}=b_{u}+c_{i}+\sum_{j} A_{u, j} B_{j, i} R^u,i=bu+ci+j∑Au,jBj,i
Typically one wants to minimize the squared error between predicted ratings R ^ u , i \hat{R}_{u, i} R^u,i and actual ratings R u , i R_{u, i} Ru,i. User embeddings and item embeddings can then be conveniently visualized when they are first reduced to a low dimension (two or three), or they can be used to compare users or items against each other, just like word embeddings.
One way to obtain these embeddings is by performing a singular value decomposition of the matrix R \boldsymbol{R} R of actual targets (such as ratings). This corresponds to factorizing R = U D V ′ \boldsymbol{R}=\boldsymbol{U} \boldsymbol{D} \boldsymbol{V}^{\prime} R=UDV′ (or a normalized variant) into the product of two factors, the lower rank matrices A = U D \boldsymbol{A}=\boldsymbol{U} \boldsymbol{D} A=UD and B = V ′ . \boldsymbol{B}=\boldsymbol{V}^{\prime} . B=V′. One problem with the SVD is that it treats the missing entries in an arbitrary way, as if they corresponded to a target value of 0 . Instead we would like to avoid paying any cost for the predictions made on missing entries. Fortunately, the sum of squared errors on the observed ratings can also be easily minimized by gradient-based optimization. The SVD and the bilinear prediction of equation 12.20 12.20 12.20 both performed very well in the competition for the Netflix prize( Bennett and Lanning , 2007), aiming at predicting ratings for films, based only on previous ratings by a large set of anonymous users. Many machine learning experts participated in this competition, which took place between 2006 and 2009 . It raised the level of research in recommender systems using advanced machine learning and yielded improvements in recommender systems. Even though it did not win by itself, the simple bilinear prediction or SVD was a component of the ensemble models presented by most of the competitors, including the winners (Töscher et al., 2009 ; Koren, 2009 )
Beyond these bilinear models with distributed representations, one of the first uses of neural networks for collaborative filtering is based on the RBM(Restricted Boltzmann machines) undirected probabilistic model (Salak nov et al., 2007). RBMs were an important element of the ensemble of methods that won the Netflix competition (Töscher et al., Koren, 2009). More advanced variants on the idea of factorizing the ratings matrix have also been explored in the neural networks community (Salakhutdinov and Mnih, 2008 ) ) ).
However, there is a basic limitation of collaborative filtering systems: when a new item or a new user is introduced, its lack of rating history means that there is no way to evaluate its similarity with other items or users (respectively), or the degree of association between, say, that new user and existing items. This is called the problem of cold-start recommendations. A general way of solving the cold-start recommendation problem is to introduce extra information about the individual users and items. For example, this extra information could be user profile information or features of each item. Systems that use such information are called content-based recommender systems. The mapping from a rich set of user features or item features to an embedding can be learned through a deep learning architecture (Huang et al., 2013; Elkahky et al., 2015 ).
Specialized deep learning architectures such as convolutional networks have also been applied to learn to extract features from rich content such as from musical audio tracks, for music recommendation (van den Oörd et al., 2013). In that work, the convolutional net takes acoustic features as input and computes an embedding for the associated song. The dot product between this song embedding and the embedding for a user is then used to predict whether a user will listen to the song.

Exploration Versus Exploitation

When making recommendations to users, an issue arises that goes beyond ordinary supervised learning and into the realm of reinforcement learning. Many recommendation problems are most accurately described theoretically as contextual bandits (Langford and Zhang, 2008; Lu et al., 2010). The issue is that when we use the recommendation system to collect data, we get a biased and incomplete view of the preferences of users: we only see the responses of users to the items they were recommended and not to the other items. In addition, in some cases we may not get any information on users for whom no recommendation has been made (for example, with ad auctions, it may be that the price proposed for an ad was below a minimum price threshold, or does not win the auction, so the ad is not shown at all). More importantly, we get no information about what outcome would have resulted from recommending any of the other items. This would be like training a classifier by picking one class y ^ \hat{y} y^ for each training example x \boldsymbol{x} x (typically the class with the highest probability according to the model) and then only getting as feedback whether this was the correct class or not. Clearly, each example conveys less information than in the supervised case where the true label y y y is directly accessible, so more examples are necessary.
Worse, if we are not careful, we could end up with a system that continues picking the wrong decisions even as more and more data is collected, because the correct decision initially had a very low probability: until the learner picks that correct decision, it does not learn about the correct decision. This is similar to the situation in reinforcement learning where only the reward for the selected action is observed.
In general, reinforcement learning can involve a sequence of many actions and many rewards. The bandits scenario is a special case of reinforcement learning, in which the learner takes only a single action and receives a single reward. The bandit problem is easier in the sense that the learner knows which reward is associated with which action. In the general reinforcement learning scenario, a high reward or a low reward might have been caused by a recent action or by an action in the distant past. The term contextual bandits refers to the case where the action is taken in the context of some input variable that can inform the decision. For example, we at least know the user identity, and we want to pick an item. The mapping from context to action is also called a policy. The feedback loop between the learner and the data distribution (which now depends on the actions of the learner) is a central research issue in the reinforcement learning and bandits literature.
Reinforcement learning requires choosing a tradeoff between exploration and exploitation. Exploitation refers to taking actions that come from the current, best version of the learned policy-actions that we know will achieve a high reward.
Exploration refers to taking actions specifically in order to obtain more training data. If we know that given context x \boldsymbol{x} x, action a a a gives us a reward of 1, we do not know whether that is the best possible reward. We may want to exploit our current policy and continue taking action a a a in order to be relatively sure of obtaining a reward of 1 .
However, we may also want to explore by trying action a ′ . a^{\prime} . a′. We do not know what will happen if we try action a ′ . a^{\prime} . a′. We hope to get a reward of 2, but we run the risk of getting a reward of 0. 0 . 0. Either way, we at least gain some knowledge.
Exploration can be implemented in many ways, ranging from occasionally taking random actions intended to cover the entire space of possible actions, to model-based approaches that compute a choice of action based on its expected reward and the model’s amount of uncertainty about that reward.
Many factors determine the extent to which we prefer exploration or exploitation.
One of the most prominent factors is the time scale we are interested in.
If the agent has only a short amount of time to accrue reward, then we prefer more exploitation.
If the agent has a long time to accrue reward, then we begin with more exploration so that future actions can be planned more effectively with more knowledge.
As time progresses and our learned policy improves, we move toward more exploitation.
Supervised learning has no tradeoff between exploration and exploitation because the supervision signal always specifies which output is correct for each input. There is no need to try out different outputs to determine if one is better than the model’s current output-we always know that the label is the best output.
Another difficulty arising in the context of reinforcement learning, besides the exploration-exploitation trade-off, is the difficulty of evaluating and comparing different policies. Reinforcement learning involves interaction between the learner and the environment. This feedback loop means that it is not straightforward to evaluate the learner’s performance using a fixed set of test set input values. The policy itself determines which inputs will be seen. Dudik et al. (2011) present techniques for evaluating contextual bandits.

Knowledge Representation, Reasoning and Question Answering

Deep learning approaches have been very successful in language modeling, machine translation and natural language processing due to the use of embeddings for symbols (Rumelhart et al., 1986a) and words (Deerwester et al., 1990; Bengio et al., 2001). These embeddings represent semantic knowledge about individual words and concepts. A research frontier is to develop embeddings for phrases and for relations between words and facts. Search engines already use machine learning for this purpose but much more remains to be done to improve these more advanced representations.

Knowledge, Relations and Question Answering

One interesting research direction is determining how distributed representations can be trained to capture the relations between two entities. These relations allow us to formalize facts about objects and how objects interact with each other.
In mathematics, a binary relation is a set of ordered pairs of objects. Pairs that are in the set are said to have the relation while those who are not in the set do not.
For example, we can define the relation “is less than” on the set of entities { 1 , 2 , 3 } \{1,2,3\} {1,2,3} by defining the set of ordered pairs S = { ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 3 ) } \mathbb{S}=\{(1,2),(1,3),(2,3)\} S={(1,2),(1,3),(2,3)}. Once this relation is defined, we can use it like a verb. Because ( 1 , 2 ) ∈ S (1,2) \in \mathbb{S} (1,2)∈S, we say that 1 is less than 2. Because ( 2 , 1 ) ∉ S (2,1) \notin \mathbb{S} (2,1)∈/S, we can not say that 2 is less than 1. 1 . 1. Of course, the entities that are related to one another need not be numbers. We could define a relation is_a_type_of containing tuples like (dog, mammal).
In the context of AI, we think of a relation as a sentence in a syntactically simple and highly structured language. The relation plays the role of a verb, while two arguments to the relation play the role of its subject and object. These sentences take the form of a triplet of tokens (subject, verb,object)
with values (entity i _{i} i, relation j _{j} j, entity k _{k} k)
We can also define an attribute, a concept analogous to a relation, but taking only one argument: (entity i _{i} i, attribute j _{j} j)
For example, we could define the has_fur attribute, and apply it to entities like dog.
Many applications require representing relations and reasoning about them. How should we best do this within the context of neural networks?
Machine learning models of course require training data. We can infer relations between entities from training datasets consisting of unstructured natural language. There are also structured databases that identify relations explicitly. A common structure for these databases is the relational database, which stores this same kind of information, albeit(尽管) not formatted as three token sentences. When a database is intended to convey commonsense knowledge about everyday life or expert knowledge about an application area to an artificial intelligence system, we call the database a knowledge base. Knowledge bases range from general ones like Freebase, OpenCyc, WordNet, or Wikibase, etc. to more specialized knowledge bases, like GeneOntology. Representations for entities and relations can be learned by considering each triplet in a knowledge base as a training example and maximizing a training objective that captures their joint distribution (Bordes et al., 2013a).
In addition to training data, we also need to define a model family to train. A common approach is to extend neural language models to model entities and relations. Neural language models learn a vector that provides a distributed representation of each word. They also learn about interactions between words, such as which word is likely to come after a sequence of words, by learning functions of these vectors. We can extend this approach to entities and relations by learning an embedding vector for each relation. In fact, the parallel between modeling language and modeling knowledge encoded as relations is so close that researchers have trained representations of such entities by using both knowledge bases and natural language sentences (Bordes et al., 2011,2012; Wang et al., 2014a) or combining data from multiple relational databases (Bordes et al., 2013b). Many possibilities exist for the particular parametrization associated with such a model. Early work on learning about relations between entities (Paccanaro and Hinton 2000 ) posited highly constrained parametric forms (“linear relational embeddings”), often using a different form of representation for the relation than for the entities. For example, Hinton (2000) and Bor al. (2011) used vectors for entities and matrices for relations, with the idea that a relation acts like an operator on entities. Alternatively, relations can be considered as any other entity (Bor et al., 2012), allowing us to make statements about relations, but more flexibility is put in the machinery that combines them in order to model their joint distribution.
A practical short-term application of such models is link prediction: predicting missing arcs in the knowledge graph. This is a form of generalization to new facts, based on old facts. Most of the knowledge bases that currently exist have been constructed through manual labor, which tends to leave many and probably the majority of true relations absent from the knowledge base. See Wang (2014b), Lin et al. (2015) and Garcia-Duran et al. (2015) for examples of such an application.
Evaluating the performance of a model on a link prediction task is difficult because we have only a dataset of positive examples (facts that are known to be true). If the model proposes a fact that is not in the dataset, we are unsure whether the model has made a mistake or discovered a new, previously unknown fact. The metrics are thus somewhat imprecise and are based on testing how the model ranks a held-out of set of known true positive facts compared to other facts that are less likely to be true.
A common way to construct interesting examples that are probably negative (facts that are probably false) is to begin with a true fact and create corrupted versions of that fact, for example by replacing one entity in the relation with a different entity selected at random. The popular precision at 10 % 10 \% 10% metric counts how many times the model ranks a “correct” fact among the top 10 % 10 \% 10% of all corrupted versions of that fact.
Another application of knowledge bases and distributed representations for them is word-sense disambiguation (Navigli and Velardi, 2005 ; 2005 ; 2005; Bordes et al., 2012), which is the task of deciding which of the senses of a word is the appropriate one, in some context.
Eventually, knowledge of relations combined with a reasoning process and understanding of natural language could allow us to build a general question answering system. A general question answering system must be able to process input information and remember important facts, organized in a way that enables it to retrieve and reason about them later. This remains a difficult open problem which can only be solved in restricted “toy” environments. Currently, the best approach to remembering and retrieving specific declarative facts is to use an explicit memory mechanism, as described in section 10.12. Memory networks were first proposed to solve a toy question answering task (Weston et al., 2014). Kumar et al. (2015) have proposed an extension that uses GRU recurrent nets to read the input into the memory and to produce the answer given the contents of the memory.
This concludes part II, which has described modern practices involving deep networks, comprising all of the most successful methods. Generally speaking, these methods involve using the gradient of a cost function to find the parameters of a model that approximates some desired function. With enough training data, this approach is extremely powerful. We now turn to part III, in which we step into the territory of research - methods that are designed to work with less training data or to perform a greater variety of tasks, where the challenges are more difficult and not as close to being solved as the situations we have described so far.

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

深度学习花书笔记

线性代数

深度学习

概率论

机器学习

Applications（4）的相关文章

【Vue】前端上传图片时限制只可以按文件夹上传图片( webkitdirectory )

前言最近再对公司前后端不分离的 C Winform 系统进行 Java Web 重构为了保持客户使用习惯所以要高度还原其中有一个功能就是上传图片时以文件夹进行上传要读取文件夹内所有的图片看了挺多大神的博客也看了 Element

随机推荐

Keepalived与HaProxy的协调合作原理分析

Keepalived与HaProxy的协调合作原理分析 keepalived与haproxy合作场景更好的理解方式协调合作中考虑的问题一 Keepalived 以TCP IP模型角度来分析二 HaProxy 总结协调合作中考虑的问
Go-Python-Java-C-LeetCode高分解法-第四周合集

前言本题解Go语言部分基于 LeetCode Go 其他部分基于本人实践学习个人题解GitHub连接 LeetCode Go Python Java C Go Python Java C LeetCode高分解法第一周合集 Go Py
由于找不到d3dx9_43.dll无法继续执行-修复教程

d3dx9 43 dll是一个非常重要的DLL文件它属于微软DirectX 9 0c的一部分它主要用于游戏应用程序的开发和运行在使用Windows平台的玩家中非常常见这个文件通过DirectX接口提供了一些相关的功能比如图像和声音
该服务器因不稳定无法正常反应,服务器连接异常

服务器连接异常内容精选换一换 ELB的常见异常返回码有400 403 502 504等若遇到这些返回码建议您先直接访问后端云服务器查看是否是后端云服务器的异常若后端云服务器响应正常请参考表1排查处理如果仍无法解决请联系客服人
非关系型数据库、关系型数据库mysql简介

Nosql 非关系数据库数据存储不需要固定的模式 NoSql特点易扩展数据之间无关系大数据量高性能细粒度 cache性能高多样灵活的数据模型增删字段容易关系数据库 vs NoSql 传统关系数据库 ACID 事务所具有的四个
【华为OD机试】组成最大数 (C++ Python Java)2023 B卷

题目描述小组中每位都有一张卡片卡片上是6位内的正整数将卡片连起来可以组成多种数字计算组成的最大数字输入描述号分割的多个正整数字符串不需要考虑非数字异常情况小组最多25个人输出描述最大的数字字符串用例1 输入 22 22
机械电子工程用不用学c语言,机械电子工程到底学什么毕业以后能干什么

机械电子工程专业俗称机电一体化是机械工程与自动化的一种下文有途网小编给大家整理了机械电子工程的就业方向及前景供参考机械电子工程专业主要学什么机械电子工程要学习得课程有电工与电子技术机械制图工程力学机械设计基础机械制造基础
红队打靶，红日系列，红日靶场5

文章目录靶场详情外网渗透端口扫描漏洞发现与利用获取shell 内网渗透提权内网信息收集横向移动第一种使用CS 第二种使用msf 路由转发与代理通道 Psexec 攻击靶场详情此次靶场虚拟机共用两个一个外网一个内网用
java实现二分查找-两种方式

二分查找是一种查询效率非常高的查找算法又称折半查找起初在数据结构中学习递归时实现二分查找实际上不用递归也可以实现毕竟递归是需要开辟额外的空间的来辅助查询本文就介绍两种方法二分查找算法思想有序的序列每次都是以序列的中间位置的数
看融合数学教学的steam教育模式

传统意义上的教育德育教学双肩挑是要求教师能够处理好德育和教学两方面的工作要求这实质上是通过人为划分的两个途径实现对学生的培养 STEAM教育的跨学科整合思维能够很好的迁移在德育与教学的整合上即在教师引导下的学生参
OWASP Top 10 2021 榜单出炉，安全人员必看

作为一个安全人员如果你还不知道OWASP Top 10是什么那么请你接着往下看 OWASP 开放式Web应用程序安全项目是一个开源的非营利性的全球性安全组织致力于改进Web应用程序的安全这个组织最出名是它总结了10种最严重的W
数据结构小白之队列与环形队列的模拟

队列的简单介绍队列是一个有序列表可以使用数组或者链表来实现队列遵循先入先出的原则先存入队列的数据要先取出后存入的后取出使用数组来模拟队列说明实现思路示意图代码 1 说明队列本身是有序列表若使用数组的结构来存储队列的数
安装和更新node的正确姿势

干货时刻本文主要讲解了如何安装node 以及如何更新node的版本 node js 是什么简称node 是基于Chrome V8引擎的JavaScript JS 运行时环境 node 安装进入node 官网点击如下图所示的安装包即可
前端之HTML

一概念 1 页面组成结构 HTML Hyper Text Markup Language 超文本标记语言页面原始和内容表现 CSS 网页原始的外观和位置等页面样式如颜色大小等行为 JavaScript 网页模型的定义与交互简
计算机组成原理慕课测试-第四单元

计算机字长32位主存容量为128MB 按字编址其寻址范围为 0 32M 1 1 B 8b 1 kB 1024 B kB kilobajt 千 1 MB 1024 kB MB megabajt 兆 1 MB 1024 kB MB mega
IPV6 阿里DDNS

IPV6 阿里DDNS 因为需要在家搭建一套环境并且需要公网能访问国内的ipv4的地址各大运营商基本都不会分配ipv4地址电信宽带好像有地方可以但是听说很贵而且是动态的每过段时间就会改变发现移动宽带的公网ipv6地址是可以获
微信为什么更受欢迎？

想必大家都和我一样曾是一个QQ的忠实用户认为QQ是最受欢迎的社交软件其实不然微信比QQ更受欢迎只是我们根本不知道微信所以我们来谈谈微信NB在那里吧 1 QQ的用户是年轻化娱乐性强而微信让不是qq用户的人也加入进来变得更加
CentOS通过nvm安装管理node

今天搭建CentOS node 环境原本打算源码安装环境编译一直出错为节省时间直接用nvm 来下载和管理node nvm 是一个开源软件大家可以在github 上面下载它的源码https github com creationi
【AI with ML】第 14 章：在 iOS 应用程序中使用 TensorFlow Lite

大家好我是Sonhhxg 柒希望你看完之后能对你有所帮助不足请指正共同学习交流个人主页 Sonhhxg 柒的博客 CSDN博客欢迎各位点赞收藏留言系列专栏机器学习 ML 自然语言处理 NLP 深度学习 DL fore
Applications（4）

CONTENTS Other Applications In this section we cover a few other types of applications of deep learning that are differe

Applications（4）

CONTENTS

Other Applications

Recommender Systems

Exploration Versus Exploitation

Knowledge Representation, Reasoning and Question Answering

Knowledge, Relations and Question Answering

Applications（4） 的相关文章

随机推荐

热门标签

Applications（4）的相关文章