I will organize this section in three main fields: computer science, science of science, and economics.
Computer Science
Predicting Airbnb review scores
We proposed a method to predict Airbnb spots’ scores from its text reviews. The method exploits some unique characteristics of the Airbnb dataset to transform plain text into mathematical vectors, which are used to train machine learning models –including neural networks– for predicting review scores. According to the results, our methods achieve up to 69% of accuracy, which is fairly good compared with state-of-the-art works on customer reviews. We utilized 1.5M reviews for Airbnb spots in NY. In this project, I worked with Hai Nguyen.
Predicting Commodities Prices: A Data-Mining Approach
Although this looks like Economics, we developed this project under an operational approach. That is to say, focusing on maximizing the predictive power.
We developed a program to scrape web pages with information about temperature, snow, rain, inflation, economic indicators, and financial data. Then, we trained time series models to predict prices of agricultural products, such as corn, rice, wheat and other crops.
My partners for this project were Erin Ochoa and Meiqing Zhang. I implemented web scrapping tools, data transformations, and time series algorithms to select the best models. All the code is in Python. You can look at it in GitHub. An example of the output is here.
Songmasters
This project was made for my class on Big Data (Computer Science with Applications III). We utilized the OneMillionSongs database and our main tool was Google Cloud using 21 nodes. My partners for this project were Erin Ochoa and Wanlin Ji.
The primary objective was to compare each combination of songs. Then, according to their observable features, we could recognize the most eclectic songs, formulaic songs, similar and dissimilar songs, among others. Please look at the code and our cool presentation at GitHub.
In our sample, the two most similar songs:
Livio Minafra, “Campane”
Semprini, “Rhapsody In Blue (2003 Digital Remaster)”
Two piano songs, which are actually similar. Right?
The most eclectic song: Hinge, “Pray The I Miss”
The most formulaic song: Owsley, “Class Clown/Good Old Days (Reprise)”
Perfect destination!
Have you ever wanted to go to a place with specific characteristics? Such as a paradise beach, without a lot of rain, and great vegetarian dishes?
This website, designed in html and php, and with data managed through mysql, allows selecting destinations according to specific features. For instance, where are the destinations without cockroaches? Which are the places with beaches and the most friendly populations? Furthermore, the website is designed with a friendly front-end for database updating, it makes possible to utilize the database for people without programming knowledge. Please visit the website here.
Knowledge of Knowledge
A Framework to Analyze the Evolution of Science and an Application for Broad Academic Fields
In my master thesis, I develop a framework to analyze the evolution of Science. I use about 1 billion citations from 1985 to 2014, which represent almost all the relevant scientific publications.
I create clusters of papers per year and ties among those according to the overlapped publications. This process creates a multi-slice network. The methodology uses four main components: 1) a community detection algorithm (Infomap), 2) t-SNE, 3) KMeans, and 4) in-house algorithms to create and order science groups in a coherent way.
I apply this framework to analyze broad academic fields and their evolution. Medicine is the most persistent cluster in the last thirty years according to the results, which might mean that its future depends on its past more than other disciplines. Meanwhile, academic fields that last only a couple of years are characterized by their specificity or applied nature. For instance, disciplines such as material science might react to industry necessities.
This application only scratches the surface of this framework. The methodology can be further applied to research on the creation, evolution, and decline of research areas, among other issues related to the evolution of science.
My advisor is Johan Chu at Chicago Booth. Also, I appreciate the ideas an comments from James Evans at The Knowledge Lab, The University of Chicago.
You can consult this the thesis here.
Economics
Determinants of International Tourism Flows into Mexico: Externalities of the Perception of Crime and Other Factors
In this project, I studied the factors that determine international tourism flows into Mexico.
The novelty was the introduction of perception of crime as a control factor, built from text analysis of all NYT’ articles fromJuly 2007 to December 2016 which mentions Mexico (and other similar words) at least two times. This project surged by the curiosity to quantify the negative externality of dangerous places in international touristic flows into safes places. In other words, if tourism flows decreases in safe places when crime augments in violent areas. As expected, the results show that the perception of crime, which follows crime in dangerous places, have a negative effect on tourism flows in safe places. All that after controlling for actual crime in each place.
Another interesting finding, the models suggest a substitution effect among destinations. The number of visitors to highly touristic places augments when the American economy is doing well. However, the number of visitors to non-popular destinations decreases. Then, if the American economy is doing poorly, not all touristic destinations will be affected.
To sum up, this project uses text analysis and times series models (ARIMAX) to provide public policy insights. Here are the paper and a poster.