Online fashion marketplaces are experiencing a boost in popularity. People see the appeal of websites where they can sell their products by providing information such as title, price, description, and pictures. With this popular new model for buying and selling fashion products comes a new set of challenges to face. With attention focused on analyzing product titles provided by the user, this paper covers the application of natural language processing techniques and a couple of machine learning algorithms to an online fashion marketplace, with the goal of predicting an item's category or subcategory. The paper begins with an overview of some popular preprocessing techniques in the context of analyzing titles. These preprocessing techniques are vital to the next step, the actual training of the models. This paper covers the development and performance of two models: a model that utilizes a Nave Bayesian learning approach, and a model that utilizes Support Vector Machines as the prediction model. The results from each prediction model are compared and discussed. The results show that the prediction model that utilized the Support Vector Machines was more accurate, and that natural language processing techniques can be effectively applied to an online fashion marketplace to predict an item's category or subcategory.
Demographic prediction is a very important component to build mobile user profile that can help improve personalized services and targeted advertising. However, demographic information is often unavailable due to user privacy issue. This paper presents technologies and algorithms to build demographic prediction classifiers based on mobile user data such as call logs, app usages, Web data and so on. To associate those data with demographic information, we implemented a system that consists of two parts: mobile application for data collection with web infrastructure for user survey administration (i.e. gender, age, marital status and so on), and classifiers to predict demographic information. In the demographic prediction, we focus on user interest which is semantically extracted from Web data rather than other mobile data. To capture user interest more precisely, advanced topic model called ARTM (Additive Regularization of Topic Models) used. Using user interest as features, the experimental results show our system achieves demographic prediction accuracies on gender, marital status, and age as high as 97%, 94%, and 76%, respectively using deep learning.