Hospitality
In this section, we'll investigate online hotel reservation platforms and their relationship to the booking landscape and customer behavior using a Hotel Reservations dataset.
Question
A substantial number of online hotel bookings result in cancellations or no-shows. In the current market, customers having the option to cancel their reservation for free or for a small fee has made it easier to cancel their bookings, but also has lead to a loss of revenue for hotels. Which factors lead to cancellations and can we predict whether a customer is likely to cancel their reservation?
Challenges
An increase in online booking platforms has lead to there being a substantial amount of data on hotel reservations. However, despite the availability of this data, several challenges can make analyzing this data complex:
- Technical barriers. Advanced data analytics tools, such as machine learning models and data visualizations may be needed to effectively analyze large amounts of data.
- Interpretations and results. Multiple contextual factors may influence customer behavior, such as price, location, amenities, and customer service.
- Bias and error. Interpretation of results can be influenced by individual experience and perspectives.
Method
With DataChat, we can quickly and confidently address these challenges to find meaningful insights in our reservation data.
Load Data
Let's get an idea of the data we're working with. Load the provided demo dataset, "Hotel_Reservations", into your session. The dataset should look something like this:
Organize and Clean Data
From here, we can click Show Descriptive Statistics in the dataset header to provide summary statistics about our data, such as column types, counts, values, representation types, and more.
We can see that there are 19 total columns, each providing additional context to the reservations. It appears that "lead_time" and "avg_price_per_room" have the most unique values outside of the "Booking_ID". There are also columns "no_of_previous_cancellations" and "no_of_previous_bookings_not_canceled" that may be helpful in our analysis moving forward.
Now, let's clean up our data a bit. The "Booking_ID" column has a unique value for each customer, having no impact on whether a customer is likely to cancel. Let's remove the "Booking_ID" column from our dataset. Click the More menu in the "Booking_ID" column, then select Drop. Our datasets no longer has the "Booking_ID" column:
Create a Machine Learning Model
We can now use the Train Model
skill to determine which factors most impact booking cancellations. Click Machine Learning > Train Model in the skill menu, select "booking_status" for the target column, and click Submit. Our impact chart looks something like this:
This impact chart shows us that "lead_time", followed by "no_of_special_requests" and "market_segmentation", has the highest impact on whether a reservation is cancelled or not with 87% accuracy. "avg_price_per_room" and "arrival_month" also seem to have some impact on cancellations as well.
Let's investigate this chart a bit further by exploring the Pipeline Report tab:
From here, we can see each stage in the model training process. This includes valuable information such as:
- Information on columns that had little to no impact.
- Features that had values imputed or labelled.
- The type of trained model.
- Information on class imbalances.
Expanding the class imbalance section, we see that some class imbalances were detected in the "booking_status" column and that the model automatically corrected this by adding weights to balance the distribution:
Create Visualizations
Let's now generate some visualizations to display our findings. Click Visualize from the impact chart's header. This shows us several different visualization options. Let's view Chart1D by clicking on it:
This chart is a stacked bar chart that shows the relationships between "market_segment_type" and "no_of_special_requests" and whether or not a booking is cancelled. Using the slider, we can see that across all market segments, cancellations increase with less special requests, meaning that the more special requests a customer adds to their stay, the more likely they are to keep that reservation. We can also see that personal bookings, "Online" and "Offline", tend to have higher cancellation rates over corporate and professional bookings.
Let's also create a second chart to visualize the impacts of "avg_price_per_room" and "arrival_month". Click New Chart in the top left of the Chart tab to open the Chart Builder, then:
- Select Heatmap for the chart type.
- Enter "booking_status" for the X-Axis, "arrival_month" for the Y-Axis, and "avg_price_per_room" for the Density. By default, the average is used for the Density Aggregate.
- Click Submit.
The resulting chart looks like this:
This heatmap chart shows us that expensive rooms are more likely to be cancelled when booked for April through October.
Results
Through our analysis, we have identified several factors that most influence whether a hotel booking is cancelled or not, including:
- The number of days the booking is made prior to check-in.
- The number of special requests for each booking.
- The market segmentation.
- The average price per room.
- The month for which the reservation is booked.
We have observed that cancellations are more frequent in late summer and fall, and customers are more likely to cancel if they booked more than six months in advance or reserved a room priced around $100. The analysis revealed that room type and service are not the primary factors that affect cancellations, instead, timing and scheduling have a more significant impact.
Based on these findings, we can recommend several actionable steps to improve the hotel's revenue and customer satisfaction:
- Focus on reducing the cancellation rate of online reservations, possibly by offering incentives or discounts to customers who book through this channel
- Provide better deals to customers who make early reservations to prevent them from cancelling.
- Enhance promotions during the late summer and fall to reduce cancellations during this period.
- Request more specific requirements from customers to increase engagement and reduce cancellations.
- Improve the quality of rooms priced around $100 to attract more bookings and reduce cancellations.