Return to Directory

Juan Banda

Asst Professor, Adjunct    

B.S., Computer Science, Universidad Autonoma de Chihuahua, Mexico, 2005
M.A., Mathematics, Eastern New Mexico University, 2007
Ph.D., Computer Science, Montana State University, 2011


Astroinformatics, big data mining, information retrieval, biomedical informatics


Trained as a computer scientist and mathematician, my work bridges fields from astroinformatics to biomedical informatics. I am in charge of research and development for the Solar Dynamics Observatory Content-Based Image Retrieval System. This system is designed to find similar solar images within the vast image repository of NASA’s SDO mission. Current research work being done for this system involves using Lucene, deep learning, and other image retrieval and machine learning algorithms paired with computer vision techniques to find interesting regions within solar images.

I also have an appointment as a postdoctoral data science fellow at Stanford University’s Center for Biomedical Informatics Research, where I am a data-ninja/researcher at Shah Lab. In this lab, we look to uncover information hidden in the dark corners of the free-text sections of the EMR systems of Stanford’s STRIDE data warehouse. I am in charge of maintaining and upgrading our text-tagging pipeline (based on Unitex) and our terminology resources. I am also responsible for curation of the multiple data sources/databases that Shah Lab’s researchers and collaborators have convenient access to. Another of my job responsibilities is writing production-quality code for publicly available tools developed by the lab and all the internal tools/scripts needed for day-to-day operations involving data processing, loading, and mapping.

In my free time I am an avid mountain climber and music lover.

As a data scientist, I work with large volumes of image data, extracting and transforming computer vision image features into large content-based image retrieval systems for NASA’s Solar Dynamics Observatory mission. My interests are not limited to image data, however. I am also well-versed in extracting terms and clinical concepts from millions of unstructured electronic health records and using them to build predictive models (electronic phenotyping) and mine for potential multi-drug interactions (drug safety). My interest in phenotyping includes leading the development of Aphrodite, a tool that allows researchers to build electronic phenotypes using fuzzy labels.


J. Banda, T. Kuhn, N. Shah, and M. Dumontier, Provenance-centered dataset of drug-drug interactions, Proceedings of the 14th International Semantic Web Conference (ISWC 2015).

M. Schuh, J. Banda, T. Wylie, P. McInerney, K. Ganesan Pillai, and R. Angryk, On visualization techniques for solar data mining, Astronomy and Computing, vol. 10, April 2015, pp. 32–42.

J. Banda, M. Schuh, R. Angryk, K. Pillai, and P. McInerney, Big data new frontiers: mining, search and management of massive repositories of solar image data and solar events, Proceedings of the 17th East European Conference on Advances in Databases and Information Systems, published as New Trends in Databases and Information Systems, Advances in Intelligent Systems and Computing, vol. 241, Springer, 2014, pp. 151–158.

J. Banda, M. Schuh, R. Angryk, and P. Martens, Image retrieval on compressed images: can we tell the difference?, Proceedings of the 4th International Conference on Image Processing Theory, Tools and Applications (IPTA 2014), pp. 1–6.

J. Banda, M. Schuh, T. Wylie, P. McInerney, and R. Angryk, When too similar is bad: a practical example of the solar dynamics observatory content-based image-retrieval system, Proceedings of the 17th East European Conference on Advances in Databases and Information Systems, published as New Trends in Databases and Information Systems, Advances in Intelligent Systems and Computing, vol. 241, Springer, 2014, pp. 87–95.

J. Banda and R. Angryk, Large-scale region-based multimedia retrieval for solar images, Proceedings of the 13th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2014), published as Artificial Intelligence and Soft ComputingLecture Notes in Computer Science, vol. 8467, Springer, pp. 649–661.

J. Banda and  R. Angryk, Scalable solar image retrieval with Lucene, Proceedings of the 2014 IEEE International Conference on Big Data, pp. 11–17.

M. Schuh, J. Banda, P. Bernasconi, R. Angryk, and P. Martens, A comparative evaluation of automated solar filament detection, Solar Physics, vol. 289, no. 7, 2014, pp. 2503–2524.

J. Banda, R. Angryk, and P. Martens, imageFARMER – introducing a framework for the creation of large-scale content-based image retrieval systems, International Journal of Computer Applications, vol. 79, no. 13, 2013, pp. 8–13.

J. Banda, R. Angryk, and P. Martens, On dimensionality reduction for indexing and retrieval of large-scale solar image data, Solar Physics: Image Processing in the Petabyte Era, vol. 283, no. 1, 2013, pp. 113–141.

J. Banda, R. Angryk, and P. Martens, Steps toward a large-scale solar image data analysis to differentiate solar phenomena, Solar Physics: Image Processing in the Petabyte Era, vol. 288, no. 1, 2013, pp. 435–462.