hey everyone we're coming back with data exposed and for the first episode here
we're going to talk about Machine Learning Services in Azure HDInsight.
Katherine is gonna tell us more about what we've introduced in this new release and what
are the various components that you can start leveraging right away with your
HDInsight clusters so join us for this video and we'd have a lot of fun talking
and demoing the capabilities welcome everybody
We're here reviving the 'Data Exposed' show and the first topic that we have today
on the Data Exposed show here is ML Services in Azure HDInsight.
We're doing a new announcement with Machine Learning Services 9.3. I'm Nishant Thacker. I'm a
Technical Product Manager for Analytics and AI at Microsoft and today as a guest
we have Katherine Kampf. Catherine why don't you introduce yourself. hello as
Nishant said my name is Katherine, I'm a Program Manager on the Azure HDInsight
team and I'm excited to be here introducing our latest release of ML
Services. Awesome! Thanks Katherine for joining us today.
So, we're talking about ML Services in HDInsight let's do a quick refresher
first on what HDInsight is for audiences. Yeah, so HDInsight as some of you
may know is a fully managed analytics service hosted on Azure so we have be
like you spin up clusters in just a few minutes and then it's got management
capabilities on our end and a 99.9 percent availability SLA and a bunch of
tooling built around it whether you want enhancing enhancements and monitoring or
different development environments we want to make it as easy as possible we
have a bunch of different cluster types whether you want to use Spark,
Hadoop, Hive, HBase or what we're talking about today, which is a previously named R
server and now our new introduction of ML Services cluster type in HDI. Perfect!
thanks Katherine. So, just just to understand HDInsight better and move on
to the ML Services offering, HDInsight is our, kind of, open
source offering for all the various cluster types if you want to do
streaming with Kafka and Storm, you want to do No-SQL with HBase, you want to
bring in Hadoop or Spark, all of that as a hosted cluster environment within the
Azure infrastructure over here and you can leverage any of these open source
tool kits and leverage them to your advantage.
Now, all this while with HDInsight we've had something called R server, and now
we expanding this R Server capability to be Machine Learning Services which is
really exciting. So now we're bringing Python in addition
to our... exactly!
tell me a little more about this Katherine. Yeah, so with R server we started
getting a lot of popularity in our community. R users really loved it but
we know data scientists don't just use R anymore data scientists love Python
as well and we didn't want to alienate that audience we wanted to expand as
many of our capabilities as possible it's a tailor to the Python community
as well and let you use both are both Python whatever you desire if you want
to use both all within the same cluster type on HDInsight now this is really
important as Katherine mentioned, Machine Learning services brings the best of R
and Python over here trying to bring it closer to what your comfort zone is as a
data scientist and you can play around with any of the tools, any of the
frameworks, any of the libraries in these languages and bring them to distribute
it across the Spark cluster in HDInsight. So tell me a little more about what the
differences with the ML Services and the R service/ R server in HDInsight is.
Yes, of course. Spark is a very exciting technology we see a lot of
usage for it in the open source community and we as Microsoft wanted to
be able to combine the power of that open source and the community around it
with some of our proprietary investments in the ML and AI space so with that
we've got a bunch of strong functions for parallelization and pleasingly
parallel workloads as well as some of our pre trained models that we've built
out that you can now use and take advantage of within the ML Services cluster.
This is great so in the addition to just the Revolution R that
is kind of imbibed so very natively inside of all of our assets now we're
taking Python and bring that closer over here as well with the help of like
creating some of these algorithms and pre-training them so that you don't have
to start from scratch exactly right let's dig deeper into ml services and
understand the individual components there Katherine. Yeah, so these are a few of
our major features of course we talked about the Python support which is very
exciting and as I was mentioning with this being a Microsoft investment we
have really easy simple operationalization, so if you want
to deploy in SQL Server if that's where your data lives we want to make
that easy for you to do or a Restful API we want to enhance all of those and
communicate with a bunch of different Azure data sources or on-premises data
sources to make sure that we can communicate with wherever your data
lives and we have a bunch of different parallel algorithms or if you want to
write your own or you know we'll know that data scientists want to use the
best algorithm for their current situation so if you want to try out
H2O if you want to use pure Spark with SparklyR or PySpark we want to let you
easily interoperates you can try out a bunch of different things as a data
scientist we know it's difficult process you want to try as many things as
possible and we want to make it easy for you to do that
perfect so you want to start with SQL Server or you have Machine Learning
Services in SQL Server, you want to move on to big data, you have Machine
Learning Services with HDInsight, and then if you want to use some third-party
libraries you can go ahead and bring about SparklyR and H20, and all of that
goodness of it inside of this environment over here. Yeah, so we want to
make it as flexible as possible for you to use your preferred libraries,
languages, frameworks etc.
Awesome, enough of talking it's actually dive into some
showing and tell you about that so let's see a demo on machine learning services
an easy answer so if we go so I have an hdinsight ml service is 9.3 cluster type
and this is my Jupyter hub, so I'm a big fan of Jupyter and if you're an R
user of course we ship our studio as well so our studio
in addition is available on the HDI edge node so easy for you to take advantage
of that as an AR user but I like Jupiter so here's a simple example of training a
model so we can start off and get our rx-spark connect so this will start off
our Spark session and we can immediately start taking advantage of some of the
pure Spark capabilities so we can use Spark read functionality and pull some
data from a CSV. So this is a standard flight and weather data set I'm sure
you've seen it before but we're going to be predicting airline delays whether or
not the a flight will be delayed on 15 minutes so you can see here we go
through some standard Spark data transformations and split into our test
and training data set and so this is where we get into some of the ml
services specific functionality with our rxLogit function and with this we can
train a logistic regression on those data frames we just built out so right
now we have a test data frame and a training data frame of built together of
that flights data set and the weather data set to pull together some
predictions on delays. Now, this is important to notice here because ideally
you would think that all of this is just part of Spark, but this is actually not part
of Spark. This is in addition to what the Spark functionality offers as part of
its Python capabilities inside of Spark and we are extending that with the
Machine Learning capabilities to take it even a step further by distributing
algorithms that Spark natively doesn't understand we pre-built them we brought
it to a stage where Spark can now distribute them natively inside and we
using specialized functions like rxlogit to go ahead and distribute them.
Exactly, yeah so we can train this model and pull out some of those key features
we want to use and then here we can see what our model looks like and then we
can pull in the help of another rx function our expert it to see our models
performing so we want to we just trained on our training data set now we want to
see what our testing looks like so we use our predict function to put together
our predictions based on our test data frame and then we can even pull in
Scikit-Learn to do some accuracy analysis so we can
look at the area under the curve and see where our models performing
and it's about 64% so it's a solid starting point and from there we could
play around with different algorithms or different features to try to build that
up or another good function that ml services comes with is rx exact buy and
what this lets you do is say you only care about certain carriers in this
flight scenario maybe you're looking for a new credit card want to see who has
the most delays where you should invest in getting your miles and so here you
can actually use that same logistic regression function you just set up
previously and split this up you can use this keys equals carrier and what this
will do is divide up your data and build a model for each of those individual or
Airlines which is really powerful and can be applicable in a lot of situations
so here obviously we get a bit more variability and what accuracies were
seeing so we've seen some a little or at 62 but you can see we get some as high
up as 68% which is really great for individual carriers yeah so that's an
introduction of some of the new exciting functions we're bringing to Python and
of course all of these are using the power of spark so they paralyze well and
they run incredibly fast. All right, so this also comes with all the goodness of
the open source engine that Spark is itself, so you don't have to learn a
new engine altogether, you can bring all of your knowledge from the Spark
perspective and extend it with newer algorithms, newer models, newer
capabilities inside of spark. Now Katherine, tell us a little more about
what the difference between these two? Is it like Spark has some Python and R
capabilities with SparkR and native Python support inside a spark with PySpark?
Yeah. And what's the difference between what Machine Learning Services
offers and what spark has natively? Yeah, so it's a lot of what I said the
integration with the Microsoft ecosystem and our investments in
ML. So, especially one of, I think, important thing to notice is the
pre-trained models so if you want to do image featureization or you want to do
sentiment analysis but you're a smaller company or startup and you don't have
access to that mass amounts of data to train those models we ship them with ML
Services you can easily start either using those directly or doing some
transfer learning so that's something that that's an exciting functionality I
think it brings pretty that's awesome all right let's take a little deeper
into what are the capabilities that machine learning services actually has
so we know it has a Jupyter Hub, we know we host, like an, R studio inside
an edge node, and that's important to understand like ML Services in HDInsight
actually come with the cluster and then it comes with an edge node which gets
the client tools installed and made available for users to tap into it. Yeah..
So tell us a little more and what the structure is, and what are the components
of.. Yeah so ML Services is... actually standard HDInsight clusters are driven
by a head node but with ML Services we utilize the edge node as well to drive
our Spark workloads and with that it makes it easier for us to continue to
ship different developer tools for you so particular R studio. When we were
first releasing our server we said we know users like this and we want to make
it as easy as possible for them we don't want them to have to do any additional
installation so we ship it right there once your HDI clusters ready to go you
can immediately log into our server and start training your models or
experimenting with your data science workloads and as well of course we have
for people who prefer vs code Visual Studio we're integrated with that
ecosystem as well this is awesome so with ml services now you're able to
bring in all the goodness of the tools of your choice all the goodness of the
frameworks and platform of your choice and still leverage the native
integration and the work that Microsoft has put in to extend those capabilities
even further and integrate it natively within the Azure ecosystem I think that
was a wonderful overview Katherine. I invite all our users to actually go
ahead and try this out. You can go to Azure,
sign up for a free trial, spin up a HDIsight cluster and choose ML Services
and with the 9.3 version you get both R Services as well as Python services
built natively inside the cluster capabilities itself. Please let us know
by tweeting @AzureHDInsight what do you think of this new release and we'd
be happy to take all the feedback that you may have to share with us. Thank you
Katherine.. Thanks..
and thank you everybody!! Thanks!
Không có nhận xét nào:
Đăng nhận xét