Abstracts – Paul H. Chook Department of Information Systems and Statistics

Walking a Fine Line: Retaining Customers in Mobile App Targeting

Xinying Hao^a Zhuping Liu^b Vijay Mahajan^a

^aMcCombs School of Business, University of Texas at Austin, Austin, Texas 78712
^bZicklin School of Business, Baruch College, City University of New York, New York, 10010

In this paper, we address a common pitfall largely neglected in mobile app marketing –“over-targeting”. We conceptualize three different types of over-targeting commonly seen in mobile marketing. We investigate how behavioral-based push (push notifications based on past behavior) and location-based push (push notifications based on current location) affect the mobile user churn, retention and re-engagement. We propose a Hidden Markov model to capture the effects of behavior-based push and location-based push. Our results indicate strong nonlinear push effects, implying a “fine line” in targeting consumers. Specifically, we find that three push notifications per week are usually the most effective and that most customers will be “pushed away” if they receive one push per day. We also find that excessive behavioral-based pushes may trigger privacy concern, leading to the opt-out of location-based pushes. We through simulations show that targeting the right consumers at the right time can help maximize customer retention and re-engagement. For instance, sending one more (less) behavior-based push than the optimal frequency per week may cause firm to lose 6.2% (4.1%) customers.

Keywords: Mobile App, Push Notifications, Behavioral and Contextual Targeting, Customer Relationship Management, Hidden Markov Model

Sales Assistance, Search and Purchase Decisions: An Analysis Using Retail Video Data
Aditya Jain^a, Sanjog Misra^b, Nils Rud^c

^aBaruch College, City University of New York
^bBooth School of Business, University of Chicago
^cINSEAD

We investigate the roles of sales assistance and in-store search in driving customer’s purchase decision using unique observational data collected using video cameras installed in retail stores. The data contain visual descriptors of customers, the time they spent in interacting with sales persons (sales assistance) and browsing products (search), and are linked to their purchase decision. Our empirical specification is rooted in the descriptive process in which the customer engages in sales assistance and search to acquire information to make utility maximizing purchase decision. The observed values of sales assistance and search are thus treated as jointly determined endogenous constructs. Our estimation strategy employs a control function approach to correct for this endogeneity using instruments pertaining to salesperson motivation to offer sales assistance. Our analysis reveals that both sales assistance and search play substantial and complementary roles—the former has a more dominant role in purchase incidence, whereas the latter in conditional expenditure.

What You Give Is What They Want: The Systematic Choice of Product Attributes in Word-of-Mouth

Mahima Hada^a Ujwal Kayande^b Arvind Rangaswamy^c

^aBaruch College, City University of New York
^bMelbourne Business School
^cSmeal College of Business, Penn State

Potential customers are known to depend on Word of Mouth (WOM) communications far more than on direct communication from marketers, and increasingly they depend on unprompted-wom, like online reviews, where they do not engage in a conversation with current customers, but passively receive information from them. At the same time, marketers perceive a lack of control of the message within WOM, and whether it is consistent with the firm’s overall communications strategy. One of the reasons for firms to perceive lack control of WOM message might be our lack of knowledge of whether the message is random or systematic in its emphasis on attributes. We know that potential customers typically want information from current customers about attributes that are difficult to observe prior to purchase. But on which attributes do customers choose to give information in wom: less observable attributes (e.g., a car’s handling in snow), more observable attributes (e.g., its exterior design), or any/all attributes irrespective of their observability prior to purchase? By text-mining large-scale data from restaurant reviews, we first show that customers systematically prefer to give information on less observable attributes. Second, with an experiment in a business-to-business setting, we show that, even when unprompted, customers systematically prefer to give information about less observable attributes, which is also what potential customers want. Finally, with large-scale survey data of automobile customers, we show that this systematic influence of less observable attributes extends to customers’ likelihood of recommending the product. With the knowledge that less observable attributes, such as a car’s handling in snow, systematically influence WOM far more than observable attributes, such as its exterior design, firms can better manage their marketing communications.

“Intelligent” Collaborative Filtering: Toward a More Informative Online Reputation Mechanism

Mingfeng Lin¹, Qiang Gao², Yong Liu³

¹ Scheller College of Business, Georgia Institute of Technology, Atlanta, GA 30308
² Zicklin School of Business, Baruch College, City University of New York, New York, 10010
³Eller College of Management, The University of Arizona, Tucson, AZ 85721

Traditional reputation system generates one rating for each product or seller, and shows the same rating to all potential buyers. Yet buyers are known to have heterogeneous preferences and criteria, and a review is most useful when it came from someone with similar needs and preferences as the potential buyer. We propose a new reputation mechanism that takes into account the relevance of each historical ratings to each specific buyer. We examine the new rating mechanisum using data from a large online labor market. Results show that sellers with high ratings on the new scale are indeed more likely to win contracts. This is especially the case when the number of competing sellers is small, and the buyers are likely to read all historical reviews. This new ratings system can be easily automated and implemented on e-commerce websites. In addition to increasing the efficiency of matching sellers to buyers, it can further mitigate several growing concerns over the current online reputation mechanism.

Big Data and the Incidentalome: An Instructive Icelandic Saga

Donna. M. Gitter, Professor of Law, Baruch College, New York, NY

Due to the consanguinity of Iceland’s approximately 330,000 citizens and their detailed genealogical records dating back centuries, Iceland presents a unique opportunity to study genetic mutations and the medical disorders associated with them. Genetic research in Iceland also raises challenging ethical conundrums. The CEO of deCODE Genetics, Inc., a pioneering Icelandic biotech firm, asserts that the company could, by decrypting data it has gathered, determine the identity of every Icelandic citizen with a mutation in the BRCA2 gene, which is associated with breast and ovarian cancer. DeCODE has requested from the Icelandic government permission to inform these individuals, many of whom did not participate directly in genetic research and none of whom expected to receive information at the individual level. The emergence of unanticipated and yet highly significant genetic findings, referred to as the “incidentalome,” raises vexing legal, ethical, and policy issues that necessitate international solutions.

ON DATA PHILANTHROPY

Yafit Lev-Aretz,

Zicklin School of Business, Baruch College, City University of New York

The term “data philanthropy” refers to the sharing of private sector data for socially beneficial purposes such as academic research and humanitarian aid. Data philanthropy has received considerable scholarly attention in various academic disciplines but has, until now, been completely overlooked by legal commentary. Specifically, as privacy has been cited as the greatest conceptual and practical challenge to data philanthropy, I attempt to disentangle the bundle of privacy concerns to refine the scope of data philanthropy’s informational risks and propose a framework for mitigating some of these risks through the Fair Information Practice Principles (FIPs). The purpose specification and use limitation principles, which limit data collection to ex-ante specified purposes, are discordant with the unanticipated, ex-post quality of data philanthropy. The “data philanthropy exception” acknowledges considerations such as the existence and nature of the privacy risk, the time frame for action, the social risks of using the data, and the allowed retention time after the socially beneficial reuse. The proposed exception reinforces the values at the heart of the FIPs, provides guidance in a field that currently operates in a legal vacuum, and introduces the possibility of responsible sharing by and to smaller market participants

How Conceptual Modeling Can Advance Machine Learning
Roman Lukyanenko¹, Jeffrey Parsons², Veda C. Storey³,
Arturo Castellanos⁴, Monica Chiarini Tremblay^5

1HEC Montréal
²Memorial University of Newfoundland
³J. Mack Robinson College of Business, Georgia State University
⁴Zicklin School of Business, Baruch College, CUNY
⁵Raymond A. Mason School of Business at the College of William and Mary

With the transformation of our society into a “digital world,” machine learning has emerged as an essential approach to extracting useful information from large collections of data. However, challenges remain for using machine learning effectively, some of which, we propose, can be overcome using conceptual. We examine a popular cross-industry standard process for data mining, commonly known as CRISP-DM Directions, and show the role of conceptual modeling at each stage of this process. The results are illustrated through an application to a management system for drug monitoring. This exposition demonstrates the broad potential of conceptual modeling to advance machine learning, specifically by: (1) supporting the application of machine within organizations, (2) improving the usability of machine learning as decision tools (e.g., by making them more transparent) and (3) optimizing the performance of machine learning algorithms (e.g., by imbuing them with more nuanced domain knowledge). Future research direction are also proposed.

Keywords: Machine Learning, Complex Models, Conceptual Models, Conceptual Modeling, Applications, Data Mining, CRISP-DM

Algorithm Aversion or Adoption — When Do Decision Makers Choose to Use AI?
Anh Luong, Karl Lang

Zicklin School of Business, Baruch College, CUNY

Firms are increasingly investing in Arti1icial Intelligence and Machine Learning platforms, to support organizational decision-making. However, research has shown managers tend to favor humans’ forecasts over algorithms’—despite algorithms’ established accuracy superiority—due to a number of reasons related to trusting and egoistic tendencies. The present research explores this phenomenon in greater depths, arguing that the reported algorithm aversion is an initial reaction caused by lacking experience in algorithm use, and can thus be signi1icantly reduced as decision makers gain more exposure to algorithms. Further, our research argues that algorithms’ price and predictive power are two additional important factors that affect decision makers’ adoption of algorithms. Lastly, the research posits that the more adoption of algorithms in decision making, the better the outcome of such decision-making process. Using the methodology of behavioral economics to test our hypotheses, we conduct a controlled lab experiment where economically incentivized subjects engage in a multiple-round task of reviewing loan applications and are asked to choose between using algorithms’ predictions and using their own. The research expects to contribute to the extant algorithm aversion literature by adding important dynamic and economic perspectives to the investigation of decision makers’ attitude toward using algorithms.

Applications of Fully Homomorphic Encryption for Private Analytics in Healthcare

Alexander Wood¹, Vladimir Shpilrain^5,6, Kayvan Najarian2,^3,4, and Delaram Kahrobaei^1,7

1. Department of Computer Science, The Graduate Center, CUNY, USA
2. Department of Computational Medicine and Bioinformatics, University of Michigan, USA
3. University of Michigan Center for Integrative Research in Critical Care, USA
4. Emergency Medicine Department, University of Michigan, USA
5. Department of Mathematics, The Graduate Center, CUNY, USA
6. Department of Mathematics, The City College of New York, USA
7. Department of Computer Science, Tandon School of Engineering, New York University, USA

Fully homomorphic encryption enables private computation over sensitive data, such as medical data, via potentially quantum-safe primitives. In this extended abstract we provide an overview of an implementation of a private-key fully homomorphic encryption scheme in a protocol for private Naive Bayes classification. This protocol allows a data owner to privately classify her data point without direct access to the learned model. We implement this protocol by performing privacy-preserving classification of breast cancer data as benign or malignant.

Medical Diagnostics Based on Encrypted Medical Data

Alexei Gribov – The City University of New York
Jonathan Gryak – University of Michigan — Ann Arbor
Kelsey Horan – The City University of New York
Delaram Kahrobaei – The City University of New York
Kayvan Najarian – University of Michigan — Ann Arbor
Vladimir Shpilrain – The City University of New York
R. Soroushmehr – University of Michigan — Ann Arbor

The Health Insurance Portability and Accountability Act places firm constraints on the privacy and security practices surrounding all medical data. Hospitals have a wealth of sensitive medical data that cannot be shared, while researchers have a need for data that is difficult to obtain. The data problem is one symptom of the larger trade-off between data utility and data privacy; a useful patient database is insecure, a secure patient database is useless. Our work provides secure machine learning via Fully Homomorphic Encryption, a relatively new construction in cryptography which allows computation over encrypted data. We perform two experiments, showing both the correctness and efficiency of our encryption scheme. The implementation conducts efficient data mining while maintaining correctness. It is completely feasible to consider this encryption scheme for highly sensitive, private, and federally regulated data. This would allow for hospitals to share an abundance of data with research centers.

Appointment Scheduling Under Time-Dependent Patient No-Show Behavior

Qingxia Kong^a , Shan Li^b, Nan Liu^c, Chung-Piaw Teo^d, Zhenzhen Yan^d

^aErasmus University
^bBaruch College, City University of New York
^cBoston College
^dNational University of Singapore

Patient nonattendance (or commonly known as “no-show”) frequently arises in clinical appointment scheduling. Our studies of independent datasets from countries in two continents identify a significant time-of-day effect on patient show-up probabilities. That is, controlling for patient-level and provider-level factors, patients may be more likely to show up for their scheduled appointments in certain time windows of a day compared to others. Motivated by this interesting phenomenon, we study an appointment scheduling problem under schedule-dependent patient no-show behavior.

This problem is quite difficult due to several unaddressed technical challenges in the literature. For instance, uncertainties in our model related to patient no-shows are actually endogenous on the schedule, i.e., our decision variables. To tackle these challenges, we deploy a distributionally robust model, and develop new modeling and solution techniques. In contrast to previous literature, our computational studies reveal new patterns for the optimal schedule when patient no-shows depend on the schedule itself. We also show a significant reduction in total expected cost by taking into account the time-of-day variation in patient show-up probabilities as opposed to ignoring it.

Does Collateral Value Affect Asset Prices? Evidence from a Natural Experiment in Texas∗

Albert Alex Zevelev

Zicklin School of Business, Baruch College, CUNY

This paper identifies the impact of collateral value on house prices, exploiting law changes in Texas which legalized home equity loans in 1998. The impact of this credit expansion was positive, heterogeneous and direct. The laws increased Texas house prices 3.6%; this is price-based evidence that households are credit constrained. Prices rose more in locations with inelastic supply, higher pre-law house prices, income and employment. These estimates reveal that richer households value the option to pledge their home as collateral more strongly. Further estimates indicate that the effect was direct, as variables related to house prices were unaffected.

Modeling Dynamics in Equity-based Crowdfunding

Chul Kim^a, P.K. Kannan^b, Michael Trusov^c, Andrea Ordanini^d^aChul Kim, Baruch College, City University of New York
^bP.K. Kannan, University of Maryland
^cMichael Trusov, University of Maryland
^dAndrea Ordanini, Bocconi University

We investigate various dynamics characterizing an equity-based crowdfunding process: stagnation after friend-funding, gradual increase through crowd’s participation, and acceleration in the last phase. We propose forward-looking investment behavior and social interactions as the major source of these dynamics. We develop a dynamic structural model to accommodate active social interactions among forward-looking investors to capture the contrasting dynamics within a unified framework. Methodologically, our approach can handle multiple-discrete/continuous investment decisions in a forward-looking manner with a closed-form likelihood function, thereby applicable to high-dimensional data with a large choice set. Using Bayesian estimation methods, we analyze individual-level investment and network data from a crowdfunding platform, Sellaband and find the strong evidence of forward-looking behavior and social interactions. We find that the proposed structural model shows very good predictive performance at the overall project level, even though it makes predictions at the individual level. We simulate counterfactuals to derive optimal crowdfunding policies for both fundraisers and platforms. For fundraisers, our approach allows to infer the largest possible monetary goal and the smallest possible proportion of profit sharing, which maximize both the chance of success and the outcome of fundraising. For quality measures for platforms, we suggest drop-out elasticities and estimate the demand increase in response to decrease in fear of drop-out.

Efficient and scalable implementations of approximate leave-one-out cross validation

Kamiar Rahnama Rad
Zicklin School of Business, Baruch College, CUNY

Learning from large datasets has been the cornerstone of modern innovations and discoveries in science, medicine, and technology. The fast estimation of the prediction performance of unseen events is a canonical goal in statistical learning. A classic approach to this end is leave-one-out cross-validation, a time-consuming routine of leaving a datum out, fitting the model on the rest, and testing it on the left out datum, repeatedly. The recent emergence of massive data has exacerbated the computational infeasibility of such approaches. Moreover, in many recent instances, the number of features per observation can be extremely large, adding another challenging facet to the fast estimation of prediction error. To overcome these problems we innovated a novel methodology, namely, approximate leave-one-out (ALO). The computational bottleneck of ALO is the inversion of a large matrix. This can make the application of ALO to high dimensional problems computationally infeasible. In this work, we explore various ways in which the matrix inversion can be approached in a computationally efficient manner, specially when the
design matrix has a structure that lends itself to efficient matrix vector operations. We will see that the sparsity of the design matrix, the generalized cross validation approach and its noisy variants, and iterative approaches such as the conjugate gradient method will play major roles in making the problem computationally feasible. We illustrate these approaches using numerical experiments with synthetic data and real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.