Machine Learning

Softmax Function 🔢

Overview 📝 This article explores the Softmax function, a crucial component in machine learning. The Softmax function transforms arbitrary real-valued vectors into probability distributions, making it essential for multi-class classification problems. We’ll dive into its fundamental mechanisms, mathematical definition, key properties, and practical applications. Demystifying the Softmax Function 🧮 The softmax function is a crucial tool in machine learning, particularly for multi-class classification problems. Essentially, it takes a vector of arbitrary real numbers (positive, negative, zero, etc.) and transforms it into a probability distribution. This means the output is a vector of values between 0 and 1 that add up to 1, representing the probability of each class. ...

Understanding Odds Ratio and Risk Ratio

📊 Understanding Odds Ratio and Risk Ratio 🎯 What is Risk Ratio? Risk Ratio is a measure that represents the ratio of risks (incidence rates) between two groups. Let’s explain using hypothetical data: COVID-19 Positive COVID-19 Negative Total Incidence Rate Not Wearing Mask 90 10 100 Pa = 90% Wearing Mask 40 160 200 Pb = 20% 💡 Calculating Risk Ratio Risk Ratio is calculated using the following formula: ...

Complete Guide to Machine Learning Model Evaluation Methods

Core Data Concepts in Model Evaluation 📊 Training Set: Dataset used to train machine learning models (parameter optimization) Validation Set: Dataset used for hyperparameter tuning and model selection during development Test Set: Dataset reserved exclusively for assessing generalization performance → Used for final model evaluation after development completion Evaluation Methodologies Holdout Method Randomly splits the dataset into two mutually exclusive subsets: Typical split: 80% training / 20% testing (ratio varies by use case) Strengths: Computationally efficient, simple implementation Limitations: High variance in performance estimates with small datasets k-Fold Cross-Validation Systematic evaluation protocol: Partition dataset into k equal-sized folds Iteratively use each fold as validation set while training on remaining k-1 folds Aggregate results (mean ± standard deviation) across all folds Key Advantages: Reduces variance in performance estimates Maximizes data utilization (critical for small datasets) Common Variants: Stratified k-fold (preserves class distribution) Leave-One-Out Cross-Validation (LOOCV) Extreme case of k-fold where k = n (number of samples) Use Case: Small-scale datasets with <100 samples Tradeoff: Computationally prohibitive for large n (requires n model fits)

Understanding Entropy and Information Theory in Machine Learning

Introduction 📚 This article explores the fundamental concepts of information theory, which form the mathematical foundation for many machine learning algorithms. Understanding these concepts is crucial for grasping how models process and learn from data. Information Quantity When an event A occurs with probability P(A), the information quantity I(A) measures how much information we gain from observing this event: $ I(A) = -\log P(A)$ Key insight: Rare events carry more information than common ones. This makes intuitive sense - learning that a rare event occurred tells us more than learning about a common event. ...

機械学習前処理

前処理機械学習は前処理が8割と言われます。前処理の手法をまとめました。欠損値の処理データの一部数字がblankである場合、該当データを削除、または、代替値で補完します。どのように欠損値を扱うかがポイントです。処理としては、fillna,dropnaなどの関数で簡単に対処可能です。欠損値の確認 df.isnull.sum() 欠損値の対応平均値で補完 df = df.fillna(df.mean()) 中央値で補完 df = df.fillna(df.mean()) 最煩値で補完 df = df.fillna(df.mode()) 欠損データを削除 dropnaで削除する場合 df = df.dropna() 分類データの処理アンダーサンプリング分類を行う際、あるカテゴリのデータのみ件数が多い状況において、そのカテゴリのデータを削除すること One-Hot-Encoding ダミー変数化ダミー変数化とは、例えば、企業分類があった場合にそれをカテゴリ毎にゼロイチで表現することです。分類データ企業 Amazon Facebook Google One-Hot-Encoding Amazon Facebook Google 1 0 0 0 1 0 0 0 1 Target Encoding 各データをクラス分類してその出現頻度で置き換える方法です。 True/Falseの2値分類であれば、存在確率に置き換わります。 A Class False True False A Class 0.66 0.33 0.66 正規化・標準化正規化正規化は最小値を0最大値が1となるようにスケール変換すること。ただし、外れ値を含む場合は、外れ値を最大値として、0側にデータが偏るため注意が必要。 $ X_{NORM} = \frac{X_i}{X_{max}-X_{min}} $ ...