從 ML.NET 0.3 Release Notes 發行說明來了解在 .NET 生態中,機器學習的發展。如有翻譯錯誤,請指正,謝謝!
ML.NET 0.3 Release Notes
今天我們釋出 ML.NET 0.3。此次的版本側重於以下幾點:
- 增加 ML.NET 內部組件,例如 Factorization Machines、LightGBM、Ensembles 和 LightLDA
- 支援匯出 ONNX 格式的模型
- 修正臭蟲
Installation
ML.NET 支援 Windows, MacOS, and Linux,詳細請參考 .NET Core2.0 支援的作業系統版本。
您可以使用 .NET Core CLI 來安裝 ML.NET NuGet 套件:
dotnet add package Microsoft.ML
或透過 NuGet 套件管理器進行安裝:
Install-Package Microsoft.ML
Release Notes
以下為本次釋出的部分重點。
-
增加 Field-Aware Factorization Machines (FFM) 作為二元分群學習器 (#383)
- FFM 適用於大型且疏鬆的數據集,特別是用在推薦或點擊預測等領域。已被用於許多點擊預測的比賽並獲得勝利,例如 Criteo Display Advertising Challenge on Kaggle。你可以從這裡了解更多獲勝的解決方案。
- FFM 是串流型學習器,因此他不需要將整個數據集載入記憶體才能進行學習。
- 你可以從這裡學習到更多關於 FFM 的做法,以及從這裡學到一些在 ML.NET 中使用的加速方法。
-
增加 LightGBM 演算法框架作為二元分群、多類分群以及迴歸之學習器 (#392)
-
增加整體學習器 (Ensemble Learner) 作為二元分群、多類分群以及迴歸之學習器 (#379)
- 整體學習器可以在一個模型中啟動多個學習器。舉例來說,整體學習器可以同時訓練
FastTree
、AveragedPerceptron
以及兩者的平均預測,作為最終的預測結果。 - 結合多個具有相似統計性能的模型,有機會比單獨處理每個模型擁有更好的性能。
- 整體學習器可以在一個模型中啟動多個學習器。舉例來說,整體學習器可以同時訓練
-
增加 LightLDA 主題模型的數據轉換器 (#377)
- LightLDA 是 Latent Dirichlet Allocation 的實作,用來從數據中推斷主題結構。
- 在 ML.NET 中的 LightLDA 是基於這篇論文的實作。LightLDA 的實作在這裡發行。
-
增加 One-Versus-All (OVA) 作為多類分群學習器 (#363)
- OVA (One-Versus-Rest) 用來處理多類分群問題所使用的二元分群器。
- 雖然 ML.NET 中有一些二元分群學習器本身支援多類分群(例如 Logistic Regression),但還有一些是不支援的(例如 Averaged Perceptron)。OVA 允許用在在二元分群處理之後,進行多類分群。
-
啟動將 ML.NET 模型匯出成 ONNX 格式 (#248)
- ONNX 是深度學習模型的通用格式,它讓開發人員能夠在不同的機器學習工具中,使用同一個模型格式做預測。
- ONNX 模型可以用在 Windows ML 中,例如可以用在 Windows 10 的設備上進行評估,並利用硬件加速等功能提升執行效率。
- 目前為止只有一部分 ML.NET 元件可以將模型轉換為 ONNX 做使用。
此里程碑中的額外更新請參考這裡。
Field-Aware Factorization Machines 自動特徵分類 One-Versus-All (OVA) 為每一個類建立一個唯一的分類器
以下為原文:
ML.NET 0.3 Release Notes
Today we are releasing ML.NET 0.3. This release focuses on adding components to ML.NET from the internal codebase (such as Factorization Machines, LightGBM, Ensembles, and LightLDA), enabling export to the ONNX model format, and bug fixes.
Installation
ML.NET supports Windows, MacOS, and Linux. See supported OS versions of .NET Core 2.0 for more details.
You can install ML.NET NuGet from the CLI using:
dotnet add package Microsoft.ML
From package manager:
Install-Package Microsoft.ML
Release Notes
Below are some of the highlights from this release.
-
Added Field-Aware Factorization Machines (FFM) as a learner for binary classification (#383)
- FFM is useful for various large sparse datasets, especially in areas such as recommendations and click prediction. It has been used to win various click prediction competitions such as the Criteo Display Advertising Challenge on Kaggle. You can learn more about the winning solution here.
- FFM is a streaming learner so it does not require the entire dataset to fit in memory.
- You can learn more about FFM here and some of the speedup approaches that are used in ML.NET here.
-
Added LightGBM as a learner for binary classification, multiclass classification, and regression (#392)
- LightGBM is a tree based gradient boosting machine. It is under the umbrella of the DMTK project at Microsoft.
- The LightGBM repository shows various comparison experiments that show good accuracy and speed, so it is a great learner to try out. It has also been used in winning solutions in various ML challenges.
- This addition wraps LightGBM and exposes it in ML.NET.
- Note that LightGBM can also be used for ranking, but the ranking evaluator is not yet exposed in ML.NET.
-
Added Ensemble learners for binary classification, multiclass classification, and regression (#379)
- Ensemble learners
enable using multiple learners in one model. As an example, the Ensemble
learner could train both
FastTree
andAveragedPerceptron
and average their predictions to get the final prediction. - Combining multiple models of similar statistical performance may lead to better performance than each model separately.
- Ensemble learners
enable using multiple learners in one model. As an example, the Ensemble
learner could train both
-
Added LightLDA transform for topic modeling (#377)
- LightLDA is an implementation of Latent Dirichlet Allocation which infers topical structure from text data.
- The implementation of LightLDA in ML.NET is based on this paper. There is a distributed implementation of LightLDA here.
-
Added One-Versus-All (OVA) learner for multiclass classification (#363)
- OVA (sometimes known as One-Versus-Rest) is an approach to using binary classifiers in multiclass classification problems.
- While some binary classification learners in ML.NET natively support multiclass classification (e.g. Logistic Regression), there are others that do not (e.g. Averaged Perceptron). OVA enables using the latter group for multiclass classification as well.
-
Enabled export of ML.NET models to the ONNX format (#248)
- ONNX is a common format for representing deep learning models (also supporting certain other types of models) which enables developers to move models between different ML toolkits.
- ONNX models can be used in Windows ML which enables evaluating models on Windows 10 devices and taking advantage of capabilities like hardware acceleration.
- Currently, only a subset of ML.NET components can be used in a model that is converted to ONNX.
Additional issues closed in this milestone can be found here.
Acknowledgements
Shoutout to pkulikov, veikkoeeva, ross-p-smith, jwood803, Nepomuceno, and the ML.NET team for their contributions as part of this release!
參考資料: