Introduction to Statistics and Probability - Importance of statistics in data science - Definitions: Population vs Sample - Types of data (Qualitative vs Quantitative) - Levels of measurement (Nominal, Ordinal, Interval, Ratio) - Introduction to Probability (Probability theory, axioms of probability) - Probability Distributions - Probability Mass Function (PMF) vs Probability Density Function (PDF) - Cumulative Distribution Function (CDF) - Common probability distributions - Binomial distribution - Poisson distribution - Uniform distribution - Exponential distribution - Normal distribution (Gaussian) - Basic Probability Concepts - Conditional probability - Bayes' theorem and applications- Independent and dependent events
Measures of Central Tendency - Measures of Dispersion - Standard Deviation - Calculating mean, median, mode using `numpy`, `scipy`, and `statistics` libraries - Standard deviation and variance calculation using `numpy`- Plotting and visualizing normal distributions using `matplotlib` and `seaborn`- Calculating probabilities for normal distribution using `scipy.stats.norm`
What is a Z-score - Z-scores for standard normal distribution - Bias-Variance Trade-off-Bias - Definition and examples (underfitting) - Variance Definition and examples (overfitting) - Z-score calculations
Distance Metrics - Euclidean - Manhattan Distance - Cosine Similarity - Minkowski Distance - Outlier Analysis - Causes and effects of outliers in data - Calculating distance metrics using scipy
Types of missing data (Missing Completely at Random (MCAR), Missing at Random (MAR), Missing Not at Random (MNAR)) - Techniques for Handling Missing Values - Identifying missing values using `pandas`- Imputation techniques using `SimpleImputer` from `sklearn`- Visualizing missing data patterns
Definition and types of correlation (Positive, Negative, No correlation) - Pearson correlation coefficient - Spearman's rank correlation – Covariance - Covariance
What are Python Libraries - Purpose and benefits of using libraries in data science - Overview of popular data science libraries (NumPy, Pandas, Matplotlib, Scikit-learn, etc.) Installing Python Libraries - Introduction to `pip` - Installing libraries using `pip - Popular IDEs and Tools for Data Science - Overview of Jupyter Notebook, VSCode, PyCharm
Introduction to NumPy - Importance of NumPy for numerical computing - Understanding NumPy Arrays (ndarrays) vs Python Lists - Installation of NumPy (`pip install numpy`) - Basic Operations in NumPy - Creating NumPy arrays (`array()`, `arange()`, `linspace()`) - Understanding shape, dimensions, and data types of arrays - Indexing and slicing NumPy arrays - Reshaping arrays, Transposing arrays, Flattening arrays. - NumPy Mathematical Functions - Element-wise operations on arrays - Aggregate functions (`sum()`, `mean()`, `median()`, `std()`, `var()`) - Matrix operation
Introduction to Pandas - Importance of Pandas for data manipulation - Pandas structures: Series and DataFrames - Installation of Pandas (`pip install pandas`) - Working with Pandas Series - Creating a Series from lists, NumPy arrays, and dictionaries - Working with Pandas DataFrames - Creating DataFrames from dictionaries, lists, NumPy arrays, CSV files, etc - Reading and writing data to/from files (CSV, Excel, JSON) - Accessing data from DataFrames (`loc[]`, `iloc[]`) - Basic DataFrame operations - Renaming columns, Adding and removing columns/rows - Sorting data, Filtering data, Handling duplicates - Pandas Data Cleaning and Manipulation - Handling missing data (`isnull()`, `dropna()`, `fillna()`)- Changing data types of columns- Applying functions to columns (`apply()`, `map()`)- Grouping data and aggregation (`groupby()`, `agg()` - Merging, joining, and concatenating DataFrames (`merge()`, `concat()`, `join()`) - Pivot tables and cross- tabulations - Reshaping DataFrames using `melt()`, `stack()`, `unstack()`- Handling large datasets (chunking, memory optimization)
Introduction to Matplotlib - Importance of data visualization in data science - Basic architecture of Matplotlib (Figure, Axes, Subplots) - Installing Matplotlib - Basic Plots using Matplotlib - Line plots - Bar plots and horizontal bar plots - Scatter plots - Histograms - Pie charts - Customizing Plots - Adding titles, labels, legends, gridlines - Changing plot styles- Setting axis limits, ticks, and scales (log scale, etc.) - Subplots and multiple plots on the same figure - Creating stacked plots and bar charts - Contour plots and heatmaps - Working with dates on x-axis - 3D plotting using `mpl_toolkits.mplot3d`
Introduction to Databases - What is a Database? Importance of databases in applications - Introduction to Database Management Systems (DBMS) - Types of DBMS: Relational, NoSQL, Object-Oriented, etc. - Overview of SQL and its role in relational databases (RDBMS
Introduction to SQL Server - What is SQL Server? Overview of RDBMS concepts - Overview of different SQL Server Editions (Express, Standard, Enterprise) - SQL Server Management Studio (SSMS) Introduction and Setup. - Installing and using SQL Server Management Studio (SSMS) for database management
Creating and Modifying Databases – Creating and Modifying Tables - Table Constraints - - Primary Keys (`PRIMARY KEY` constraint) - Foreign Keys (`FOREIGN KEY` constraint) - Unique Constraints (`UNIQUE` constraint) - Default Values (`DEFAULT` constraint) - Check Constraints (`CHECK` constraint). - Operators, Data Types, and Type Conversion - Arithmetic Operators - Comparison Operators - Logical Operators – BETWEEN, IN, LIKE for pattern matching - IS NULL and IS NOT NULL operators. - SQL Server Data Types - Type Conversion
Inserting Data - Updating Data - Deleting Data
Basic Querying with `SELECT - Filtering Data with `WHERE - Sorting Results Using `ORDER BY` to sort records in ascending (`ASC`) and descending (`DESC`) order. - Conditional Logic with `CASE`- Using Aggregate Functions - Group By, Having, - Difference between `WHERE` and `HAVING`- Using `HAVING` with aggregate functions.
Common string functions: `LEN()`, `SUBSTRING()`, `CHARINDEX()`, `UPPER()`, `LOWER()`, `REPLACE()`, `LEFT()`, `RIGHT()` - Concatenating strings with `+` or `CONCAT()`. Date/Time Functions in SQL Server - Getting the current date/time (`GETDATE()`, `SYSDATETIME()`) - Adding/subtracting dates (`DATEADD()`) - Finding the difference between dates (`DATEDIFF()`) - Extracting parts of a date (`YEAR()`, `MONTH()`, `DAY()`)
Introduction to Joins - What are joins? Why do we use them - Types of Joins - INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, CROSS JOIN, SELF JOIN
Understanding Transactions - `BEGIN TRANSACTION` - `COMMIT TRANSACTION`- Committing the transaction - `ROLLBACK TRANSACTION` - `SAVEPOINT` - SQL Server Privileges and Subqueries - Granting Permissions - Revoking permissions - Role-based permissions in SQL Server.
Using subqueries in `SELECT`, `WHERE`, `FROM`, and `HAVING` clauses - Correlated Subqueries vs Non-Correlated Subqueries - Subqueries in `JOIN` conditions - Indexes - Creating indexes - Removing indexes - Clustered vs Non-Clustered Indexes. - Impact of indexes on performance (index maintenance, over-indexing).
Views – Creating Views - Modifying views – Removing Views - Using views for complex query simplification - Stored Procedures - Creating stored procedures - Executing stored procedures (`EXEC`) - Input/output parameters in stored procedures - Error handling in stored procedures with `TRY...CATCH – Triggers - Creating triggers `INSERT`, `UPDATE`, `DELETE` operations - BEFORE and AFTER triggers.
Introduction to Regular Expression - Using `PATINDEX()` and `LIKE` for Pattern Matching - Simple pattern matching with `LIKE`- Finding patterns using `PATINDEX()`- Complex pattern matching using wildcards (`%`, `_`) - Combining Regular Expressions with String Functions
Overview of Machine Learning - Types of Machine Learning: Supervised, Unsupervised, Reinforcement Learning - Applications of Machine Learning