Big Data Machine Learning with H2O
a years ago SriSatish Ambati gave a talk at the Chicago Big Data meetup about how to use H2O to do machine learning on big data.
SriSatish is founder and CEO of a startup called 0xdata (pronounced "hexadata").
0xdata developed H2O, an open-source machine learning tool. (Github.)
H2O has a number of powerful machine learning algorithms that it can run. What makes H2O different from R and other programs is that H2O has really good performance and can handle huge amounts of data. It splits up the algorithm and data into little chunks and processes them in parallel over many machines in a cluster.
Netflix is using H2O to do some of their machine learning. Netflix is one of the pioneers of big data machine learning, and I look forward to hear more about how they're using H2O.
You can run H2O against data stored in the Hadoop Distributed File System (HDFS). You can also upload data directly to the H2O cluster.
H2O is written in Java, but because it works at the data / analysis layer, it should play nice with most systems.
H2O currently supports the following algorithms:
- Random Forests
- Generalized Linear Modeling (GLM)
- Logistic regression
- K-Means
SriSatish said the team is working to add more algorithms. In particular, SriSatish told us, a key area of focus is developing a set of statistics and math algorithms for unbalanced datasets for use in fraud detection.
There is also an H2O R package, so R users can run distributed H2O algorithms from the R command line.
below is a syntax of R command i run to initialise h2o in R
#Start the H2O server locally.
> localH2o = h2o.init(ip = "127.0.0.1", port = 54321)
> filepath =file.choose()
> test_data <- h2o.uploadFile(file.choose(), destination_frame = "", parse = T, header = T, sep = ",",
na.strings = c("unknown"), progressBar = FALSE,
parse_type = "csv")
na.strings = c("unknown"), progressBar = FALSE,
parse_type = "csv")
The algorithms can also be run via a self-hosted web interface.
SriSatish said that the company's goal is to enable developers to use H2O as a real-time machine learning service. To enable a smartphone app or web app to learn continually as it obtained more data.
"Our real goal is to make algorithms embeddable in your application."—SriSatish Ambati, CEO of 0xdata;/.l,
Comments
Post a Comment