Deep Kalman Filters

David Steinberg.  Kalman Filters are a popular and influential approach for modeling time-varying phenomena. They admit an intuitive probabilistic interpretation, have a simple functional form, and have been successfully applied in a wide variety of disciplines. The classic Kalman filter is a generative dynamic model in which the state of the system evolves over time … Continue reading Deep Kalman Filters

Advertisements

Articles about reproducible research and p-values

David Steinberg.  I want to devote this issue to some articles both in and outside the statistical literature discussing the use of p-values and related questions of selective inference. The p-value (and in fact much of the standard statistical paradigm for science) has been a focal point for controversy in recent years. The article that … Continue reading Articles about reproducible research and p-values

Using Forum and Search Data for Sales Prediction

By David Steinberg. MIS Quarterly is a management and information science journal that publishes many articles that make interesting and unusual use of data. A good example is the forthcoming article by Geva et al., which exploits internet data for sales prediction.   Their work builds on many previous articles that use data from social media websites … Continue reading Using Forum and Search Data for Sales Prediction

50 Years of Data Science

By David Steinberg.  Data Science has become a rallying cry for universities, research organizations, and many commercial and industrial companies. We are surrounded by ever increasing amounts of data and by myriad methods and algorithms to take advantage of them. Rallying cry aside, no one seems to be very clear about just what IS data … Continue reading 50 Years of Data Science

Accurate estimation of influenza epidemics using Google search data

By David Steinberg.  A 2008 (November 11) New York Times article trumpeted the success of Google Flu Trends (GFT) in tracking the progress of annual influenza outbreaks in the United States. The GFT assessments used data on search words to estimate the number of individuals affected with influenza like illness (ILI). By comparison to the official … Continue reading Accurate estimation of influenza epidemics using Google search data

Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

By David Steinberg. A major challenge in de novo drug design is to identify molecules that are effective in attacking causes of disease. Computational strategies can be an effective tool to generate novel molecules with strong affinity to the biological target. This work explores the use of recurrent neural networks that are trained as generative … Continue reading Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

In-Process Monitoring of Selective Laser Melting

By David Steinberg. Data for process monitoring now comes from diverse collection and sensor modalities. Often this results in rich data that presents interesting challenges for on-line monitoring. This article, by Grasso et al., looks at image data produced by a machine vision system. The particular focal area for this article is “additive manufacturing” (AM) … Continue reading In-Process Monitoring of Selective Laser Melting

Predictive analytics for targeted election campaigns

By David Steinberg. In the course of researching a paper on election surveys, I came across some fascinating material on the use of predictive analytics in election campaigns. There was extensive use by both the Trump and Clinton campaigns in the recent U.S. presidential election. An important catalyst has been the move from mass media … Continue reading Predictive analytics for targeted election campaigns

Multivariate Industrial Time Series with Cyber-Attack Simulation

By David Steinberg.  Industrial processes today increasingly merge physical parts with the internet-of-things. This exposure of processes makes them potentially vulnerable to cyber attack. Hence cyber security is essential to protect the processes. Such security systems typically involve analysis of the data streams that typify normal conditions, with the goal of developing algorithms that detect … Continue reading Multivariate Industrial Time Series with Cyber-Attack Simulation