從 ML.NET 0.2 Release Notes 發行說明來了解在 .NET 生態中,機器學習的發展。如有翻譯錯誤,請指正,謝謝!
ML.NET 0.2 Release Notes
感謝社群的貢獻,幫助我們打早更完善的 ML.NET。
今天我們釋出 ML.NET 0.2。此次的版本側重於以下幾點:
- 處理已知問題及 issues
- 支援聚類的機器學習任務
- 允許使用記憶體中的數據來訓練模型
- 更容易驗證模型
ML.NET 支援 Windows, MacOS, and Linux,詳細請參考 .NET Core2.0 支援的作業系統版本。
您可以使用 .NET Core CLI 來安裝 ML.NET NuGet 套件:
dotnet add package Microsoft.ML
或透過 NuGet 套件管理器進行安裝:
Install-Package Microsoft.ML
Release Notes
ML.NET 0.2 提供
任務,此任務使用 K-Means++ clustering 和 Yinyang K-means acceleration 理論進行實作。請參考這項測試學習如何使用(from #222)。
ML.NET 0.1 提供
從文字檔中載入數據,ML.NET 0.2 開始,除了可以從文字檔中載入外,還可以直接使用記憶體內的數據作為輸入來源,進行訓練,請參考這個範例(from #106)。 -
交叉驗證(Cross-validation)是一種統計學上將樣本切割成多個小子集的做測試與訓練,評量你的模型的執行表現。它不需要單獨的測試數據,而是使用您的訓練數據來測試模型。它會對數據進行分區,分出用於訓練和測試的數據,多次執行此操作進行驗證。請參考這個交叉驗證範例(from #212)。
改進預測的速度:藉由不為只有一個元素的數據視圖建議併行處理的指標,可以顯著的加快預測速度(請參考 #179 了解測量方法)。
API:此 API 現在是程式碼生成的,並且已更新為對數據中的列進行顯式聲明。請參考 #142。 -
ML.NET 專案的每日建置 NuGet 套件版本請參考這裡。
ML.NET 0.2 Release Notes
We would like to thank the community for the engagement so far and helping us shape ML.NET.
Today we are releasing ML.NET 0.2. This release focuses on addressing questions/issues, adding clustering to the list of supported machine learning tasks, enabling using data from memory to train models, easier model validation, and more.
ML.NET supports Windows, MacOS, and Linux. See supported OS versions of .NET Core 2.0 for more details.
You can install ML.NET NuGet from the CLI using:
dotnet add package Microsoft.ML
From package manager:
Install-Package Microsoft.ML
Release Notes
Below are some of the highlights from this release.
Added clustering to the list of supported machine learning tasks
Clustering is an unsupervised learning task that groups sets of items based on their features. It identifies which items are more similar to each other than other items. This might be useful in scenarios such as organizing news articles into groups based on their topics, segmenting users based on their shopping habits, and grouping viewers based on their taste in movies.
ML.NET 0.2 exposes
which implements K-Means++ clustering with Yinyang K-means acceleration. This test shows how to use it (from #222).
Train using data objects in addition to loading data from a file using
. ML.NET 0.1 enabled loading data from a delimited text file.CollectionDataSource
in ML.NET 0.2 adds the ability to use a collection of objects as the input to aLearningPipeline
. See sample usage here (from #106). -
Easier model validation with cross-validation and train-test
Cross-validation is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times). Here is an example for doing cross-validation (from #212).
Train-test is a shortcut to testing your model on a separate dataset. See example usage here.
Note that the
is prepared the same way in both cases.
Speed improvement for predictions: by not creating a parallel cursor for dataviews that only have one element, we get a significant speed-up for predictions (see #179 for a few measurements).
API: theTextLoader
API is now code generated and was updated to take explicit declarations for the columns in the data, which is required in some scenarios. See #142. -
Added daily NuGet builds of the project: daily NuGet builds of ML.NET are now available here.
Additional issues closed in this milestone can be found here.
Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m, forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions as part of this release!