Data Science, MS
Program Description
A&M-Corpus Christi’s Data Science program will prepare students to meet the growing state, national, and international needs for highly qualified personnel in the fields of data science. The program objectives underline the interdisciplinarity of data science and the importance of building a strong foundation of data science for our students.
- Provide strong core training so that graduates can adapt easily to changes and new demands from society and industry.
- Develop an in-depth understanding of the theory and methods in data science and develop students’ skills for problem analysis and decision-making.
- Integrate fields within computer science, optimization, engineering, and statistics to create adept and well-rounded data scientists.
- To teach students how to create new methodologies and application tools to solve interdisciplinary problems with Big Data.
- Enable students to communicate effectively how to resolve problems from big, complex, and unstructured data.
- Provide students with insights into data science practice in interdisciplinary fields.
Admission Requirements
Persons seeking admission to the M.S. in Data Science should first contact the program faculty and identify a faculty member willing to serve as the graduate adviser. Applicants will not be admitted to the program without a graduate adviser.
Applicants for the M.S. in Data Science should have the equivalent of a bachelor’s in data science or other areas of science, with the equivalent of at least a minor in Mathematics or Statistics. Specific leveling course work is MATH 1442 Statistics for Life, MATH 3311 Linear Algebra, MATH 3315 Differential Equations, MATH 3342 Applied Probability and Statistics, MATH 2415 Calculus III, and MATH 4301 Introduction to Analysis. Students with no computer programming experience may find themselves at a disadvantage in certain courses without an introductory programming course.
Program Requirements
- Students may enter with a BS or an MS.
- Adequate preparation with coursework in mathematics, statistics, and computer science is required (or leveling courses).
- Mathematics: discrete mathematics, calculus sequence, linear algebra, numerical methods, differential equations
- Statistics: probability theory, advanced statistics beyond introductory courses, mathematical statistics
- Computer science: high-level programming language (python, SQL, C++ or equivalent), MATLAB, R-statistical package, Data Structures
- All students take fundamental core courses on Data Science.
- Depending on individual interest and preferences, students select their own emphasis/track for future studies, research, and dissertation.
- Diverse program faculty could offer tracks with accents on Environmental and Marine Sciences, Biology, Engineering, Business, Health Care, etc.
- To successfully graduate with MS degree, students must complete course work and defend a thesis: total 30 SCH.
- Core courses are a total of 12 SCH.
- Prescribed elective courses are 12 SCH.
- Thesis research and thesis are 6 SCH.
Code | Title | Hours |
---|---|---|
Core Courses | ||
DASC 5301 | Principles of Data Science | 3 |
DASC 5302 | Data Science and Predictive Analytics | 3 |
DASC 5307 | Machine Learning in Data Science | 3 |
or COSC 6338 | Machine Learning | |
DASC 5323 | Natural System Analysis and Multivariate Statistics | 3 |
or CMSS 6303 | Natural Systems Analysis | |
Electives | ||
Select 12 hours from the following: | 12 | |
Data Science Computing | ||
Bayesian Interference in Data Science | ||
or MATH 6318 | An Introduction to Bayesian Statistics | |
Applied Differential Equations in Data Science | ||
Numerical Methods for Data Science | ||
DASC 5306 | Dynamical System Analysis for Data Science | |
Geospatial Data Structure | ||
or GSCS 6321 | Geospatial Data Structures | |
Digital Image Processing | ||
or COSC 6324 | Digital Image Processing | |
Natural Systems Modeling | ||
or CMSS 6305 | Natural Systems Modeling | |
DASC 5327 | Introduction to Computer Graphic/COSC 6327 | |
DASC 5329 | Scientific Visualization/GSCS 6329/GSEN 6329 | |
Advanced Geospatial Computing | ||
or GSCS 6331 | Advanced Geospatial Computing | |
Database Management Systems | ||
or COSC 6336 | Database Management Systems | |
DASC 5337 | Data Mining/COSC 6337 | |
Genomics, Proteomics and Bioinformatics | ||
or BIOL 5340 | Genomics, Proteomics and Bioinformatics | |
Statistical Methods and Data Analysis | ||
or MATH 5341 | Statistical Methods and Data Analysis | |
Linear Statistical Models | ||
or MATH 5342 | Linear Statistical Models | |
Computational Methods for Statistics | ||
or MATH 5345 | Computational Methods for Statistics | |
Optimization | ||
or MATH 5348 | Optimization | |
Advanced Topics in DBMS | ||
or COSC 6350 | Advanced Topics in DBMS | |
Environmental Forecasting | ||
or CMSS 6352 | Environmental Forecasting | |
Artificial Intelligence | ||
or COSC 6354 | Artificial Intelligence | |
Data Communication and Networking | ||
or COSC 6355 | Data Communications and Networking | |
Computational Biology | ||
Spatial Database Design | ||
or COSC 6365 | Current Trends in Programming | |
Data Analytics | ||
or COSC 6380 | Data Analytics | |
Remote Sensing and Image Analysis | ||
or GSEN 6386 | Remote Sensing and Image Analysis | |
Thesis Option | ||
DASC 5994 | Proposal Research | 3 |
DASC 5995 | Thesis | 3 |
Total Hours | 30 |
Courses
This course integrates skill set spanning mathematics, statistics, machine learning, databases, and computer science along with a good understanding of the craft of problem formulations in STEM fields to find effective solutions. This course will introduce basic principles and tools in data science and will expose students to concepts and techniques to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, predictive modeling, descriptive modeling, data product creation, and evaluation. R and other statistical software will be used to make the learning contextual.
This course aims to build computational abilities, inferential thinking, and practical skills to solve complex problems in data science and make predictive models. It uses the concepts in data management, statistical modeling, statistical computing, and visualization, and integrates the use of programing in R or similar languages to analyze, model, analyze, and interpret large, multi- source heterogeneous data.
Programming and computing techniques for the requirements of data science: acquisition and organization of data; visualization, modeling, and inference for scientific applications; presentation and interactive communication of results. Emphasis on computing for substantial projects. Software development with an emphasis on R, plus other key software tools.
This course introduces the Bayesian approach. It involves the concept of probability and the analysis of data which focuses on the principles of data analysis and computer-intensive, modern statistical modeling. Topics include Bayesian inference, prior and posterior distributions, regression modeling, hierarchical models, model checking and selection, missing data, and stochastic simulation by Markov Chain Monte Carlo including Gibbs sampling and Metropolis algorithms. The course will apply Bayesian methods to practical problems, by building models from the prior probabilities to the posterior distribution with statistical packages.
The purpose of this course is to study real-world applications of differential equations to problems in Data Science. The course will cover how differential equations can be used in the development, solution, and analysis of mathematical models based on data. The course will include learning techniques for using data to estimate the parameters and structure of a model and learning about evaluation techniques to determine whether a particular model is a good one.
Machine learning is a highly interdisciplinary subject that encompasses the techniques from statistics, probability, linear algebra, optimization, and computer science. Machine learning techniques are being used in several areas such as face recognition, self-driving cars, cybersecurity, and also in the areas where decisions are very important without human intervention. This course covers both theory and practical algorithms for machine learning for a variety of applications. We cover topics such as supervised learning (generative/discriminative learning, parametric/nonparametric learning, neural networks, and support vector machines), unsupervised learning (clustering, dimension reduction, kernel methods), learning theory, reinforcement learning, and adaptive control. This course will also discuss a variety of recent applications of machine learning, such as data mining, autonomous navigation, and web data processing (maybe from Facebook or Twitter).
This course introduces a broad range of numerical techniques that are widely used in mathematics, science, and engineering. The topics covered in this course include basic direct and iterative methods for linear systems; classical root-finding methods; Newton’s method and related methods for nonlinear systems; fixed-point iteration; polynomial, piecewise polynomial, and spline interpolation methods; least-squares approximation; orthogonal functions and approximation; basic techniques for numerical differentiation; numerical integration, including adaptive quadrature; and methods for initial-value problems for ordinary differential equations. Python, or similar software, will be used to implement the methods covered in the class.
The representation of spatial data is an important issue in diverse areas including computer graphics, geographic information systems (GIS), robotics, and many others. Choosing an appropriate representation is a key to facilitate operations such as spatial search. This course will focus on representation of point data and object data, which are the important types of spatial data. Various fundamental data structures on spatial data, such as quadtrees, kd-trees, grid structures, kd-trees, and R-trees will be explored. The use of these structures to address some important problems will also be covered.
Statistical analysis for data collected in several variables. Topics include sampling from multivariate normal distribution, multivariate analysis of variance, discriminant analysis, principle components, and factor analysis.
This course introduces concepts and techniques for image processing. The objective of this course is to introduce the fundamental techniques and algorithms used for processing and extracting useful information from digital images. The students will learn how to apply the image processing methods to solve real-world problems.
Modeling and analysis of deterministic and stochastic dynamical systems, including investigation of model behavior and stability. Theory will be applied to research natural environmental and biological systems such as multi-species systems, epidemic models, carbon circulation in the biosphere, Nutrients- Phytoplankton-Zooplankton models, etc.
Seminar in reading and critical evaluation of academic literature in the field of and fields relating to geospatial computing. Student will design, implement, and evaluate an advanced, contemporary geospatial computing technology to solve a geospatial problem.
A study of contemporary database management concepts. Performance (indexing, query optimization, update optimization), concurrency, security and recovery issues are discussed. Also includes the study of front-end environments that access the database.
Integrative biological study using genome-wide approaches and bioinformatics. The “-omics” technologies (Genomics, Proteomics, Metabolomics, etc.) will be reviewed. Applications to understanding biological function in various biological disciplines will be emphasized.
Introduction to the basic concepts of probability, common distributions, statistical methods, data analysis and a wide variety of statistical inference techniques. Demonstrations of the interplay between probability models and statistical inference. Data sets will be analyzed using the R software package.
Review of basic concepts in probability theory. Principles of estimation and model building. Linear models, especially ANOVA and regression. Non-parametric alternatives.
An introduction to computing tools needed by the modern statistician. Topics include floating- point numbers, reformatting large datasets, important statistical algorithms, and parallel processing.
Unconstrained optimization, necessary and sufficient conditions for solutions, basic algorithms. Constrained optimization, KKT conditions, linear programming, convex programming, algorithms.
The study of emerging database technologies. Topics are chosen from data warehousing, distributed databases, spatial databases and web-based applications.
Statistical techniques (classic and Bayesian) and new artificial intelligence-based techniques, such as neural networks, for the analysis of environmental systems with large datasets.
Fundamental concepts and techniques for the design of computer-based, intelligent systems. Topics include: a brief history, methods for knowledge representation, heuristic search techniques, programming in LISP or Prolog.
Areas studied include principles of computer-based communication systems, analysis and design of computer networks, and distributed data processing.
Introduces the powerful open-source computing tools that are used in biological research for the creation, organization, manipulation, processing, analysis, and archiving of “big data.” This course is designed to prepare and enable students to use computational tools for bioinformatic applications in advanced courses and independent research projects. The primary topics covered are data formats and repositories, command line Linux computing and scripting, regular expressions, super-computing, computer programming with PYTHON and R, data visualization with R, version control and dissemination of scripts and programs with GIT, and typesetting with markdown languages.
This course will focus on spatial database principles and the practical skills of design, implementation, and use of spatial databases. This course will first cover fundamentals of relational database design, and then focus on design and management of spatial databases utilizing geodatabase models. In addition, case studies of geodatabase design models in several applications will also be covered. This course is intended for students who want to design, create, maintain and manipulate data from a geospatial database.
This course will introduce state-of-the-art techniques to process and analyze different types of data, generate insights and knowledge from data, and make data-based decisions and predictions. Real-world examples will be used to familiarize students with the theory and applications. Main topics include data preprocessing, probability theory, tests of hypothesis, and various data analysis techniques (e.g., clustering, classification, prediction/forecasting, etc.) for different types of data including static, time-series, spatial, and spatiotemporal.
This course will focus on the theory, techniques, and applications of advanced geospatial analytics. Topics covered include spatial point patterns, network analysis, area objects and spatial autocorrelation, and spatial interpolation. New approaches to geospatial analytics will also be covered. This course emphasizes the methods and the applied side of geospatial analytics that can be useful in students' own theses or projects for their current or potential employers.
Addresses the interpretation, processing and analysis techniques of remotely sensed data acquired by orbital and sub-orbital platforms. Physical principles and imaging mechanisms, remote sensing systems, data characteristics, image processing, and information extraction methods will be covered. Topics include passive optical imaging with multispectral, hyperspectral, and thermal sensing; active imaging with radar sensing; image corrections and rectification; spatial/frequency transforms and image filtering; image classification and feature extraction; and image processing with machine learning techniques. Applications in the course will be focused on geomatics and monitoring of natural and built environments.