How is Blue Label Labs different from its competitors?

First, we live and breathe cutting-edge/bleeding-edge technology, so your app will similarly be cutting-edge/bleeding edge with a long “shelf-life”. Second, with a perfect mix of local, domestic and international talent you are guaranteed to get the optimal mix of a high-quality product and a fair price. Third, you will be assigned a dedicated Program Manager (PM) after we’ve established a contract who will be your single point-person for all of your needs and questions. This PM will stay with you for the duration of your relationship with Blue Label Labs and is your conduit to all of the resources we have to offer, thus streamlining your communication and preventing you from having to repeat your message to multiple resources or play air traffic control. Our PMs are the best in the business and we’d be happy to introduce you to a projected PM (and project team) once we’ve got an estimate on the table to discuss. Fourth, the size of Blue Label Labs team (now 64 people strong) means you don’t have to worry about the “single point of failure” issue that exists at smaller shops of just one or two developers, nor the burden paying for the high overhead cost associated with larger shops. Fifth, we have a stronger focus on design than most other shops. We split our design team into two specific functions—Product/User Experience and User Interface. It’s a rare unicorn designer that is good at both thinking through flow and product decisions, as well as branding, colors, logos and overall User Interface. To that end, each project team has both a UX Designer and UI Designer staffed to ensure top notch design. Sixth, with over 7 years of operation and over 250 apps completed, we have both the experience and extensive re-usable code base to ensure your app is built as efficiently and as inexpensively as possible. Seventh, our QA and Testing methodology is rigorous and as important to us as the actual development—meaning your app will be as bug free as is possible. We have a dedicated QA Engineer staffed on every project. This engineer is not the original author of the code ensuring the highest quality results from the QA process. Please see the below sections for more details.

How does Blue Label Labs work with clients?

Blue Label Labs Labs begins most projects with a Discovery, Planning & Design Phase (Phase One). First, we determine the proper technologies and development schedule. Second, the Design teams inherit the User Stories that were drafted during the sales process. User Stories are simple one-liners that outline the activities your users will need to be able to accomplish within the app. These User Stories then get grouped together into meaningful features and feature sets. The Design teams work with the client to finalize the User Stories. Third, our design team sets out creating greyscale wireframes to illustrate how a user will move through the app (i.e., the user experience, UX). Fourth, we transition to full color illustrations/designs that have the final look and feel of the app (i.e., the user interface, UI). Fifth, the app designs are then uploaded into a prototyping platform called Invision that allows our team to create tappable and swipeable “hotspots” so that you can show the app on your phone or via a mocked-up phone in a web browser. At the same time, we draft a Functional Specification document that details the technical hows and whys of the project for the development team. Once that work is complete, we them move on to the Development Phase (Phase Two). The Agile development/engineering team works hand-in-hand with the Program Manager and follows the wireframes and the Functional Specifications to build the app. Our first development goal is the First Deliverable milestone—we call this D1; this is a highly functional build of the app that contains a subset of features agreed upon by Blue Label Labs and the client. We then begin an iterative process of finalizing the remaining features. At the same time, a QA Team builds a full test plan for the app. The QA Team continually performs functional and regression testing on the app, reporting issues to the Project Manager. Learn more about our process here.

How long does it take to complete the average app?

Typically, for our average 4 to 5-month project, the Phase One: Discovery & Planning and Design lasts for approximately 4 to 6 weeks, with Phase Two: Development and Testing Phase occupying the remaining 3 to 4 months. At the end of the Phase Two, the final app is prepared for Deployment. Deployment involves moving any web services and server side components to a production environment and submitting the app to the relevant app store(s).

How much does the average app cost?

If we were in full-swing on a new project, a client can estimate a cost of between $10,000 and $30,000 per two-week sprint ($20,000 to $60,000 per month) for the effort—typically billed bi-weekly. If Blue Label Labs is handling everything from Discovery, Planning & Design through to Deployment in an app store, total app costs average about $120,000 for one “platform”–iOS, Android, Web, etc. That said, it very much depends on the complexity of the design and development required for each individual app, we’ve completed full apps at costs of between $50,000 (a bare-bones, simple MVP/prototype) and $2,000,000 (a full featured, enterprise apps). We provide free, no commitment estimates to clients. After the app is submitted to an app store(s) or pushed live on the web, the Blue Label Labs team monitors the app store and customer reviews and deals with any issues, questions, delays or approvals that arise. We have a great track record with app submissions and make sure that the app submitted is highly likely to be approved without issue. Post approval, a maintenance agreement can be put into place to cover and small issues, minor app edits (e.g., color, image or text changes) and any bugs that might linger. At the end of the project, we will transfer all source code and project files to you and you will remain the sole owner of all IP in the app. After that time, we provide our services at an hourly rate of $120/hour for ad-hoc updates or at an hourly rate of $100/hour for a mutually agreed upon set number of hours in a 6-month retainer. The monthly maintenance cost for the average app is $2,000/month: in other words, a retainer of 20 hours per month at $100 per hour equals $2,000/month and guarantees expedited service vs. ad-hoc requests.

What is the difference between a Native App, a Hybrid App and an HTML5 App?

Native App: A Native App is an app written for a specific platform like iOS or Android—in other words, the app is written in the coding language used for development on that specific platform (Objective-C or Swift for iOS and Java for Android). When you build a Native App, you get access to all the hardware features exposed by the native code APIs and SDKs. For example, Apple iOS Push Notifications are one example of a feature that cannot be supported by an HTML5 App. The graphics and user interface of a Native App are far superior to any Hybrid or HTML5 app. You would choose this option, if you want to use the user interface components from Apple’s proprietary user interface libraries (i.e., the usability is better); you want full device hardware access—cameras, microphone, and geo-location; you want peak performance with no lag; you want users to be able to access the app off-line (i.e., without Internet access); you want access to push notifications and in-app purchase; and you want your app to de discoverable in the app store. On the downside, your code is less portable across operating systems and it is often the most expensive development option. It is our recommendation that if you plan to start a true, revenue generating, app-based business, this is the only viable option. Hybrid App: A Hybrid App is a form of web app that is deployed to a native platform like an iPhone or Android phone inside a native shell. This shell is usually little more than a web browser view (to display the HTML5 content) and some other hooks to allow your web app to potentially access some more hardware features that are not available inside a typical mobile web browser—e.g., push notification and in-app purchases. It improves code portability because if you want to port your app to another platform, you only need a native shell on that other platform to run it, the HTML5 content of the app is directly portable. These shells can be hand written, but can also be generated automatically by tools in the marketplace like React Native, PhoneGap, Cordova or Xamarin. These tools allow a developer to push the app out to multiple platforms (e.g., iOS, Android, Windows Mobile) quickly by generating the shell and packages required to do so automatically. This sounds powerful, and it is, but clearly creators of these third-party platforms have a lot of configurations and hardware to support, and thus, the solutions do not always work the same across all devices, and in practice, testing and hacking of special cases will be required by the developer to get proper functionality. You would choose this option if you want a balance between cost and usability; you want access to some (but not all) of the phone’s native APIs/SDKs and hardware; you want to be able to quickly deploy across multiple device types; and you’re not worried about high-end graphics. On the downside, because these third-party hybrid platforms (e.g., React Native, PhoneGap and Xamarin) need to support so many device types there are always implementation and usability issues to workout. In addition, you will only have access to the features of a new operating system 6+ months AFTER it is released by Apple, Google or Microsoft. In this way, your app will never be able to cutting-edge technology. While we are fully capable of building Hybrid Apps, it is our recommendation that this option is only viable for simple, lo-fi apps that are heavy on text and images, but don’t need much in the way of functionality or user interface. An HTML5 Web App: An HTML5 app is essentially a website with JavaScript code written to allow the app to perform dynamically, as opposed to a static website that you need to refresh to see the changes. It works interactively to make it feel like an app, but it is actually running in a web browser. It is essentially a website, which will support multiple viewports or screen sizes so that it works well on mobile. Because it is running in a browser, it is confined to what is possible within a mobile web browser. You would choose this option if you want to support multiple platforms with the same app/codebase; you want to deploy it on the web; you want the widest possible support across multiple device operating systems and manufacturers; and you want to be able to update your source files on your server and have them immediately deployed to all devices (i.e., removing the need for the app approval process). On the downside, your app is not discoverable in the app stores; it will never look and feel like a true mobile app; and you will have limited access to some of the software and hardware functionality of the phone (e.g., push notifications and cameras). It is our recommendation, that this option is only viable for view-only content where users are not expected to interact in any meaningful way.

Has Blue Label Labs ever transitioned an app the firm has built to a client’s internal development team? What is the typical process and how long does it usually take? Is there a fee for this?

Quite often Blue Label Labs transitions an app to a client’s in-house development team. As part of our regular transition for any client, we transfer all functional specs, wireframes, UI design resources, screen flow maps, passwords/certificates/credentials and source code to the client. We also have extensive experience working in tandem with a client’s existing engineering team. For example, Blue Label Labs will build the front-end experience of an app on top of a client’s existing backend web service and databases. These transitions are the quickest, averaging only a few days, since the Blue Label Labs team and the client team have already been working hand-in-hand for months. If we are handing over a complete project with no prior engagement with the client’s development team, then the transition can take a little longer depending on the support needed. This can range from a few days to a couple weeks. The transition is typically free of charge unless engineering resources are needed to facilitate the transition or the Program Manager is required to spend more than a week on the transition and training. We’ve even helped companies to interview potential new technical team members—e.g., developers and CTOs.

How does Blue Label Labs provide training to a client’s development team so that they could make app enhancements and how much would that training cost?

Along with all the project documentation and support outlined in the answer to the question above, we are happy to provide hands-on training to in-house developers/team members as part of the transition. There is typically no extra charge for this if only a few hours are required. If ongoing training is necessary, for example an in-house developer is not familiar with app development and requires a crash course, then we can offer our services at our hourly rate of $120/hour for training. We’ve even aided our partners in their hiring of their own technical/development team if/when the time is right.

How many designers and developers does Blue Label Labs have on staff? How many work on each client project? Where is the team located? Can the client talk directly to whomever will work on the app?

The Blue Label Labs team is comprised of 64 individuals across its UX Design/Product, UI Design, Development, Program Management, Quality Assurance, Sales, Marketing and Leadership teams. We have 6 Designers, 10 Program Managers, 35 Developers, 3 App Marketers—the remainder of the team is comprised of the Sales, Marketing, Operations & Leadership Teams. Depending on the size of the project, a team of at least 7 Blue Label Labs staffers will work on your project: 1 Program Manager, 1 UX Designer, 1 UI Designer, 1 Technical Architect, 1 Front-End Developer, 1 Back-End Developer, and 1 Quality Assurance Engineer—in addition to the technical and strategic oversight provided by Blue Label Labs leadership. The majority of our Design and Program Manager teams is based in New York City, Seattle and San Francisco areas. The Developers and Quality Assurance Testers are also based in the aforementioned cities, in addition to India-based team members. The Development, Design and QA Teams are managed directly by the Program Manager. The Program Manager is your primary point of contact and the liaison across all other Blue Label Labs team members associated with your project. That said, clients are welcome to speak directly with any individual team member as is needed.

Are projects priced based on “time-and-material” (i.e., hourly) estimate or a “fixed-price” quote?

Projects are priced on a “time-and-material” basis. With 7 years and 250+ apps of experience to our credit, we are able to produce very accurate estimates. We do our best to stay within budget or provide you ample warning if we think we may exceed our initial estimate so we can coordinate how to move forward together with you. Prior client references can vouch for the accuracy of our estimates and can be provided to you upon request.

Is Blue Label Labs able to guarantee that our app will be published in an app store?

Unfortunately, no application can be guaranteed for app store approval. However, Blue Label Labs has successfully built and deployed over 250 apps for the Google Play and Apple iTunes stores. We have a solid understanding and insight into what these stores will typically approve or disapprove of. We will warn you well in advance if any particular aspect of you project poses an approval risk. We have had a handful of instances where an app store has rejected an app for publication; not due to technical issues, but due to legal/business/terms of use issues; however, we successfully appealed ALL of these rejections and these very same apps are now in the app store for download today. We led these appeals on behalf of our clients at no extra charge. If your app is rejected due to technical reasons, we will make all necessary corrections needed to get approval at no extra charge. We are happy to connect you with some of our clients who did face some app store pushback to tell you about the rejection and our successful appeal.

What is Blue Label Labs’ policy regarding correcting defects and publishing an update in the app store?

We provide our services at a blended hourly rate of $120/hour for ad-hoc updates or at blended hourly rate of $100/hour at a mutually agreed upon set number of hours over a 6-month retainer. For example, a retainer of 20 hours a month for $2,000/month guarantees expedited service vs. ad-hoc requests. Ad-hoc requests are handled on a first come, first serve basis across all of our clients and development is based on our team’s earliest availability. The monthly retainer of hours functions much like an extension of the warranty period; issues are addressed within 24-48 hours and development will start as soon as the scope of the issue is defined. For those clients wanting a guaranteed 24 to 48-hour turnaround time on maintenance issues, a 6-month retainer is suggested.

What level of Quality Assurance is included and provided by you in your proposal?

A Quality Assurance (QA) Engineer is assigned to every project and builds a custom test plan for every project during the Discovery, Planning & Design Phase. During the Development & Testing Phase the QA Engineer continually tests the app using functional and regression testing. We test on a wide array of physical devices as needed for a given project. If we are building a backend web service and databases as part of the project, we will also have a set of automated unit tests that are run regularly to verify functionality.

What is Blue Label Labs’ quality assurance (QA) and testing methodology?

We believe that Blue Label Labs’ QA and testing methodology is part of our competitive differentiation. The fact that we dedicate resources to solely this function is something that sets us apart from other development shops for whom testing is simply an afterthought. At Blue Label Labs, QA is handled at all levels of development. There are generally 3 layers of QA/testing that happens for every project: 1.) Internal Development Team Testing Each Development Team has an internal QA Tester along with the Developer Lead who perform basic validation on bug fixes prior to builds being released to the broader Project Manager and QA team. Their validation is to ensure that items within JIRA, which we use alongside Trello, to coordinate team efforts and communicate with clients—are properly remedied according to their descriptions prior to “checking-in” or committing code. When items are resolved by the Development Team they are pulled into the “Testing Done by Dev Team” stack of Trello cards and/or within JIRA. When developing new projects, the Blue Label Labs Development Team will also be responsible for creating “unit tests” for our code so that we can quickly run a basic set of verifications. The dev team does the majority of their testing using iOS and Android app simulators. 2.) Dedicated QA Engineer Each project has a dedicated QA Engineer who sits independently from the Development Team who is responsible for: a.) Drafting a milestone based test plan which outlines a list of black box scenarios to verify and the expected results; b.) Performing smoke and full test pass runs against weekly and milestone builds; c.) Verifying and closing bugs that the Development Team has marked as complete; d.) Working with the Development Team to report issues and regressions; and d.) The QA Engineer takes items from the “Testing Done by Dev Team” Trello stack and verifies whether it is actually fixed. If it is, the Trello card is moved to the “Done (Verified)” pile, or sent back to the “Weekly Priorities” or “Product Backlog” stacks, as appropriate. The QA Engineer will generally perform their tests on 2-3 devices, depending on project requirements. 3.) Project Manager (PM) Testing The final QA gate is the PM who is responsible for quality of the entire product. They will also work with the QA Engineer to verify items have been fixed properly in addition to triaging new issues coming in from the client and the QA Engineer. The PM will appropriately schedule issues to be resolved based on their severity and importance. The PM’s testing is done on actual devices.

Can Blue Label Labs help me with Marketing and PR?

Absolutely. We aim to be your full-service app partner. Our Marketing & PR Offering is designed to help you launch your app and gain marketing exposure. Learn more about our Marketing & PR Offering here.

Artificial Intelligence Development

How to Deploy an H2O.ai Machine Learning Model to AWS SageMaker

In the rapidly evolving world of machine learning (ML), the ability to quickly move from model training to deployment is crucial to being able to build software on top of it. H2O.ai’s AutoML provides a powerful tool for automated model training, enabling data scientists and developers to efficiently create high-quality predictive models. At BlueLabel, in our work training ML models, I’ve come to love the power and flexibility of the H2O.ai product suite, especially its AutoML capabilities. However, the real challenge often begins when it’s time to deploy these models into a production environment, which is where AWS’s SageMaker service comes into play. AWS SageMaker offers a serverless platform to host and manage these models as endpoints that can expose these models to downstream apps, web sites and other software tools. However, the process to go from H2O.ai to SageMaker is not obvious, nor documented well.

My name is Bobby Gill, and I am the co-founder of the BlueLabel AI consultancy. In this blog post, I’ll hope to bridge the gap between H2O.ai and SageMaker by sharing the hard-earned knowledge I’ve gained operationalizing H2O.ai models on AWS SageMaker. Specifically, I will walk through the steps involved in deploying an H2O.ai AutoML model to AWS SageMaker in the form of an ‘inference’ endpoint that can then be used in production workloads such as an app or website.

A Little Bit About Our Prediction Model

Before I get into, as the famous H.E. Pennypacker might say, ‘the real gritty-gritty’, let me spend a few words describing what my ML model does. In this example, I am using a binary classification ML model that I’ve trained using H2O.ai AutoML function to help predict the probability of a female producing at least 1 genetically normal (Euploid) embryo based on two key factors: the age of a female along with her AMH measurement. This blog post and the exercise contained within was driven by my desire to create a Fertility Calculator tool for my wife’s fertility blog (called ‘The Lucky Egg’) that leveraged machine learning to make it’s predictions. In order for me to build a ML-powered Fertility Calculator tool on her blog, I first needed to operationalize the model behind a HTTP-based API that could be called by my front end Javascript, which is what the rest of this tutorial will cover.

Pre-requisites

An AWS account, with a local install of the AWS CLI setup and configured with the proper IAM access keys.
A pre-trained H2O.ai AutoML model that is ready to be exported in MOJO file format.
A local Java development environment running JDE 17+, with Gradle configured (For this project, I’ve used VS Code running in WSL 2).
Access to the source code used in this project via this GitHub project.
A locally running instance of the Docker command line tools.
A very rudimentary knowledge of Java .
Within your AWS account, you will need to create an IAM role configured for SageMaker with the following settings:

- Service: SageMaker
- Permissions:
    - AmazonSageMakerFullAccess
    - AmazonS3ReadOnlyAccess
    - AmazonEC2ContainerRegistryReadOnly
    - CloudWatchLogsFullAccess

High-Level Overview of the Approach

At a high level, the steps to go from a model trained locally within an H2O.ai deployment to an HTTP endpoint hosted on SageMaker that can make predictions using this model requires the following:

Training and exporting a AutoML model from H2O.ai into a MOJO file format.
Creating an AWS SageMaker-compatible service,that will expose the inference endpoints SageMaker will call to make predictions using our model.
Packaging and Docker-izing the aforementioned app and deploying it to Amazon Elastic Container Registry (ECR).
Deploying a SageMaker inference endpoint that creates a hosted model based on the Docker image pushed to ECR.

Wait, Why Java and not Python?

The astute reader might be wonder as to why I am using Java rather than Python for this walk through and they would be right to ask the question. I trained the model using Python, I do all of my ML related development in Python and I haven’t written a lick of Java code since CS 134 at Waterloo. The reason for this choice is simply because working with MOJO files exported by H2O.ai is much more straightforward and streamlined for Java environments. While Python does offer a pathway to deploy MOJO files, it is not well-documented and relies on open source projects that looked to be dated and not maintained. I am sure you can achieve the same in Python, however, the Java approach is much simpler and requires little specialized Java knowledge beyond that one might have learned in an introductory CS class.

How SageMaker Hosts an Inference Endpoint

In order to deploy our model to SageMaker, we first need to understand how SageMaker expects to interact with a deployed model:

SageMaker expects a model to be deployed via a Docker container and launched via the ‘serve’ command. That is, any Docker container we create must listen for the ‘serve’ command and use that as a trigger for loading up its inference logic.
SageMaker expects the container to be listening to incoming HTTP post requests on port 8080.
SageMaker expects that the container listening on port 8080 implements the following HTTP methods:
- /invocations (which is the method called to perform an inference)
- /ping (a method used to health check the container)

For exact specifications and requirements for a model to be deployed to SageMaker please refer to the AWS documentation here.

1.) Export the Model From H2O.ai into a MOJO Model File

After running training a model in H2O.ai, the first step to hosting it in SageMaker is to export the model to file. H2O.ai natively supports exporting models to a MOJO (Model Object, Optimized) file. MOJOs are a representation of a ML model that is optimized for scoring and prediction in real time, which is what we intend to do by hosting it via n inference endpoint. Exporting a model in H2O.ai to MOJO is straight-forward. If you are using the H2O.ai Flow GUI tool, first load the model and then in the UI click on the button “Download Model Deployment Package (MOJO)”.

2.) Create a Java App to Serve the H20.ai MOJO Model

In order to deploy my ML model to SageMaker, we need to create a basic web service that implements the /invocations and /ping endpoints that SageMaker will communicate to it via. This web service will host the MOJO file representation of my trained model and perform inferences using it through the /invocations method. To host this API, I’ve decided to use the Java Spring Boot framework to create a very thin project (which I will refer to as the Fertility Calculator Service.) that wraps my prediction model and implements the two endpoints expected by SageMaker.

The Fertility Calculator Service is a boilerplate Spring Boot app that exposes an /invocations endpoint that is used to run predictions through the MOJO representation of our ML model.

The code for the Fertility Calculator App is largely boilerplate and is contained within the FertilityCalculatorMojoApplication.java file within the GitHub repo. The name of the MOJO file that contains my model is “StackedEnsemble_BestOfFamily_7_AutoML_1_20240418_75305-MOJO.mojo”.

Recommended Folder Structure for the MOJO File

For non-Java programmers, its important that you place the MOJO file in the correct location of the project folder structure so that it’s packaged up properly by Gradle and then accessible via the Java classpath at runtime. Of everything in this blog post, this part is the one thing that left me stumped for a few hours. To save you similar agony, below you will find the folder structure that I used for this project:

src/
└── main/
    ├── java/
    │   └── com/
    │       └── yourcompany/
    │           └── yourproject/
    │               └── YourApplication.java
    └── resources/
        └── static/
            └── your_model.mojo

Implement SageMaker Inference Endpoints

The endpoint definitions needed for SageMaker are relatively straight forward to implement:

// Health check endpoint used by SageMaker
    @GetMapping("/ping")
    public String ping() {
        return "Healthy";
    }

    // Inference endpoint used by SageMaker
    @PostMapping("/invocations")
    public String invocations(@RequestBody PredictionRequest request) throws Exception {
        RowData row = new RowData();
        row.put("age", request.getAge());
        row.put("amh", request.getAmh());

        BinomialModelPrediction prediction = model.predictBinomial(row);
        return Double.toString(prediction.classProbabilities[1]); // Assuming class "1" is the positive class
    }

In the above, you can see /invocations endpoint extracting the passed in AMH and Age parameters from the request POST body and then using a BinomialModelPrediction object created from my MOJO file to make the prediction and return the result.

3.) Dockerize and Deploy to AWS Elastic Container Registry

In the next series of steps, we need to take the Fertility Calculator Service and package it as a Docker image and push it to a Amazon ECR repository.

The Fertility Calculator Service Spring Boot app is packaged inside a Docker image and pushed to the AWS ECR repository.

Once I’ve written the logic to perform the inference, the next step is to compile the Java code into a self-contained JAR file via Gradle:

/gradlew build

The generated JAR file should then be located within the project’s build/libs folder, in the case of our example it is named ‘fertility-calculator-mojo-0.0.1-SNAPSHOT.jar’

Create a Script to Start the Fertility Calculator Service

Firstly, I create a script file (serve.sh) that will respond to the ‘serve’ command that SageMaker will send to the Docker container when its launched, this script will initialize and launch our Fertility Calculator Service.

#!/bin/bash

# Script to start the Spring Boot application

# Default to "serve" if no command is provided
if [ "$1" == "serve" ]; then
    echo "Starting Spring Boot Application"
    exec java -jar /app/fertility-calculator-mojo-0.0.1-SNAPSHOT.jar
else
    echo "Command not recognized"
    exec "$@"
fi

Create a Dockerfile

Now I setup a basic Dockerfile that will copy into the running Docker container the JAR file for my Fertility Calculator Service and the aforementioned serve.sh file and set the entry point for the Docker container to be the serve.sh script.

# Use an official Java runtime as a parent image
FROM openjdk:17-jdk-slim

# Set the working directory in the container
WORKDIR /app

# Copy the JAR file into the container at /app
COPY ./build/libs/fertility-calculator-mojo-0.0.1-SNAPSHOT.jar /app/fertility-calculator-mojo-0.0.1-SNAPSHOT.jar

# Copy the serve script into the container
COPY serve.sh /app/serve.sh

# Make port 8080 available to the world outside this container
EXPOSE 8080

# Make the script executable and set it as the entry point
RUN chmod +x /app/serve.sh
ENTRYPOINT ["/app/serve.sh"]

# Default command
CMD ["serve"]

Once we’ve setup our Dockerfile, we can test to make sure that the container image builds properly by executing:

docker build -t fertility-calculator-mojo .

Deploy the Fertility Calculator Service to Amazon Elastic Cloud Registry

With a locally configured instance of the AWS CLI, it is relatively straight forward to push the Docker image created in the last step to the Amazon Elastic Cloud Registry:

First, we create the ECR repository:

aws ecr create-repository --repository-name <respository name>

Then, we authenticate the locally running instance of docker to be able to access the ECR repository:

aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws.accountid>.dkr.ecr.<region>.amazonaws.com

Now, tag the Docker image:

docker tag <repository name>:latest <aws.accountid>.dkr.ecr.<region>.amazonaws.com/<repository name>:latest

Finally, push the Docker image up to ECR:

docker push <aws.accountid>.dkr.ecr.<region>.amazonaws.com/<repository name>:latest

4.) Create the AWS SageMaker Inference Endpoint

Once we’ve deployed the Fertility Calculator Service to ECR, we are almost done. The next set of steps is designed to create and configure AWS SageMaker to host the Docker image as a model that can be used for inference. We will first create a Model in AWS SageMaker that is based off the Fertility Calculator Service image that we deployed to ECR. Once done, the remaining steps are to create a SageMaker endpoint configuration and finally the SageMaker endpoint.

AWS SageMaker will launch our Docker container and pass inference requests sent to the SageMaker Endpoint to the /invocations endpoint in our Fertility Calculator Service running inside the Docker container.

To create the SageMaker Model execute the following command:

aws sagemaker create-model --model-name <model name> --primary-container Image="<aws.accountid>.dkr.ecr.us-east-1.amazonaws.com/<repository name>" --execution-role-arn arn:aws:iam::<aws.accountid>:role/<role you created in the pre-requisites>

Once the Model is created, execute the following command to create an Endpoint Configuration object:

aws sagemaker create-endpoint-config --endpoint-config-name <endpoint configuration name> --production-variants VariantName=variant-1,ModelName=<model name>,InitialInstanceCount=1,InstanceType=ml.t2.medium

Now the final step in this long, arduous journey is to create the SageMaker Endpoint based on this configuration:

aws sagemaker create-endpoint --endpoint-name <endpoint name> --endpoint-config-name <endpoint configuration name>

Once this command completes, if you login to the AWS SageMaker console, under Inference you will see the endpoint being created. In the SageMaker console, you will be able to copy the HTTP endpoint that hosts our app and which can be used to generate inferences against our model! If you can navigate the process of obtaining a bearer OAuth token against AWS within Postman, then you can directly run inferences against your endpoint from there!

5.) Test the Endpoint Using the AWS CLI

With the AWS SageMaker created and started successfully, the final step in this walk through is to test that the endpoint works and is accessible. To test the endpoint, we open up the AWS CLI on my development machine and execute the following:

aws sagemaker-runtime invoke-endpoint --endpoint-name <endpoint name> --body '{"age":29, "amh":3}' --content-type 'application/json' --region <region>

This executes the inference HTTP endpoint, passing in an age of ’29’ and an amh level of ‘3’, and voila, I am returned the probability of ‘0.83’ from my ML model now hosted in AWS SageMaker!

How to Expose AWS SageMaker Endpoint as a Public URL

The endpoint we’ve created and deployed as part of this tutorial is an authenticated endpoint that requires an IAM user context in order to call it. While this might work for production workloads that will be communicating with it from within AWS using an IAM role, what’s missing is being able to access the this SageMaker endpoint via a publicly accessible URL like what I will need to do to integrate this API into my front end Fertility Calculator tool. Luckily, this last step can be easily achieved by creating an AWS API Gateway and integrating it directly to the SageMaker endpoint so that it proxies request to and from the inference endpoint. Setting this up is rather straight forward and builds directly off of what we’ve achieved thus far. You can follow this excellent guide from AWS to learn how to do so.

We’ve journeyed through the meticulous process of deploying an H2O.ai AutoML model onto AWS I hope this guide helps to untangle the steps needed to operationalize an H2O.ai machine learning model within AWS SageMaker. This approach is just one way of operationalizing an machine learning model, but it is by no means the only way to skin the proverbial cat. By leveraging the robust capabilities of H2O.ai for model training and SageMaker for deployment, we’ve established a powerful and scalable machine learning solution that stands ready to tackle real-world applications.

To run the code used in this blog, please remember to checkout the GitHub repository here.

Bobby Gill

Co-Founder & Chief Architect at BlueLabel | + posts

Bobby Gill is the co-founder and Chief Architect of BlueLabel, an award winning digital product agency headquartered in New York. With over two decades of experience in software development, he is a seasoned full-stack engineer, software architect, and AI practitioner. Bobby currently leads the BlueLabel AI/ML practice, where he is leading a team of engineers operationalizing the transformational capabilities of generative AI within BlueLabel and for a number of enterprise clients.

How to Deploy an H2O.ai Machine Learning Model to AWS SageMaker

A Little Bit About Our Prediction Model

Pre-requisites

High-Level Overview of the Approach

Wait, Why Java and not Python?

How SageMaker Hosts an Inference Endpoint

1.) Export the Model From H2O.ai into a MOJO Model File

2.) Create a Java App to Serve the H20.ai MOJO Model

Recommended Folder Structure for the MOJO File

Implement SageMaker Inference Endpoints

3.) Dockerize and Deploy to AWS Elastic Container Registry

Create a Script to Start the Fertility Calculator Service

Create a Dockerfile

Deploy the Fertility Calculator Service to Amazon Elastic Cloud Registry

4.) Create the AWS SageMaker Inference Endpoint

5.) Test the Endpoint Using the AWS CLI

How to Expose AWS SageMaker Endpoint as a Public URL

Bobby Gill

Let’s get to Work

You might also like

Are You Using AI In Your Job?

How to Deploy an H2O.ai Machine Learning Model to AWS SageMaker

Bobby Gill | May 2, 2024

A Little Bit About Our Prediction Model

Pre-requisites

High-Level Overview of the Approach

Wait, Why Java and not Python?

How SageMaker Hosts an Inference Endpoint

1.) Export the Model From H2O.ai into a MOJO Model File

2.) Create a Java App to Serve the H20.ai MOJO Model

Recommended Folder Structure for the MOJO File

Implement SageMaker Inference Endpoints

3.) Dockerize and Deploy to AWS Elastic Container Registry

Create a Script to Start the Fertility Calculator Service

Create a Dockerfile

Deploy the Fertility Calculator Service to Amazon Elastic Cloud Registry

4.) Create the AWS SageMaker Inference Endpoint

5.) Test the Endpoint Using the AWS CLI

How to Expose AWS SageMaker Endpoint as a Public URL

Bobby Gill

Let’s get to Work

You might also like