Reinforcement Learning 強化學習

What’s Reinforcement Learning

Reinforcement learning (RL) is an area of machine learning, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature, reinforcement learning is called approximate dynamic programming, or neuro-dynamic programming. The problems of interest in reinforcement learning have also been studied in the theory of optimal control, which is concerned mostly with the existence and characterization of optimal solutions, and algorithms for their exact computation, and less with learning or approximation, particularly in the absence of a mathematical model of the environment. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.

Source: Wikipedia

In machine learning, the environment is typically formulated as a Markov Decision Process (MDP).

Applications

 

Source: https://github.com/reiserwang/data-science-ipython-notebooks/blob/master/overview.md#reinforcement-learning

 

大數據症候群

 

大數據(Big Data)是最近十年隨著網路起來的新名詞

 

只是到了台灣政府、企業老闆跟媒體,就變成另一個濫用的字眼,跟所謂的”雲”一樣,只是把舊有的資料庫跟伺服器架構重新改個名、原來的資料庫分析換個新名詞,就變成產業經濟跟營收的嘴上威而鋼。所謂 “如果你手裡拿著一把錘子,那麼在你眼裡所有東西看起來都像是釘子”,比起拿著槌子到處亂敲,更可怕是不思考更不懂,大搖大擺說錯話,甚至決定公司或國家的策略。
大數據的關鍵在於找出”相關性”(correlation),而非”因果關係”(Causality),”只知其然而不知所以然”正是大數據分析的特徵。因此沒有目標的大數據分析其實像是在垃圾裡掏金,找不出相關性或其他價值的大量資料跟垃圾差不了多少。巨量資料的長遠發展方向沒人能說的清,每個人都在說自己的故事,不管是真人真事、還是道聽塗說、或是自己的春夢。
當我們擁有太多資訊(或雜訊)的時候,本能會採取的簡便作法就是選擇性的處理,挑出我們喜歡的部分、忽略剩下的部分,找和我們做相同選擇的人結盟、與其他人為敵 (“The Signal and the Noise – Why So Many Predictions Fail – but Some Don’t”, by Nate Silver)。也因此,大數據並不會如許多人想像的一樣打破世界藩籬、帶來更多的平等,相對的因為複雜,往往沒有人可以看到全貌,資料詮釋上反而變得容易被操作。像是最近發表在Science上的研究(臉書研究團隊所做的研究),指出臉書的運算機制可能會影響民眾看到的內容,形成所謂過濾氣泡現象跟回聲室效應。隨著使用的按讚或點擊的紀錄,臉書的機制會透過計算提供使用者可能比較有興趣的內容,也可能導致使用著處在同質性高的言論環境中,產生過濾氣泡現象`;使用者也產生”多數人跟我的意見相同”的回聲室效應。