Firstly, let’s take a moment to talk about a very important (and maybe slightly mysterious) word: 👉 Feature.
In data science and machine learning, a feature is just a fancy word for: A column in your table that describes something about your data.
For example, if you’re building a model to predict house prices, features might include:
These features feed into your models, visualizations, and decisions. This process — turning messy, real-world data into usable, structured information — is called data engineering(machine learning engineer also covers this).🛠️Typical Workflow of DS or MLE is like:
Data collection -> Raw data → Preprocessing → Feature extraction → Model / Analysis
```{admonition} Features of Special Format :class: dropdown In other fields, “features” can look very different:
For example, in Transformers, each token is represented as a 512-dimensional vector.
Think of roads, buildings, rivers, or tram stops — all stored with geometry and fields in shapefiles or geodatabases.
And just like in other fields, we still care about their properties — but now they’re stored in attribute tables.
> Take it easy — imagine you're the computer. For every item in a dataset, the things you store or know about it are called features.
Features can be:
- Built-in (directly collected from raw data)
- Extracted, filtered, or converted from original information (e.g., converting time to day of week, [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html))
- Learned through machine learning or deep learning models — though these learned features can sometimes be a bit mysterious 🤖✨
---
### What Are Feature and what are Fields in a Shapefile?
In this exercise, a **feature** specifically refers to a **geometry object** — such as a point, line, or polygon — representing a real-world location. Each feature typically consists of two parts:
- **Geometry**:
Describes where it is, including spatial shape and location.
Also includes auxiliary information like the Coordinate Reference System (CRS).
- **Attribute Fields**:
Describe what it is — these are the additional properties or metadata stored in a table (e.g., name, type, status, ID, etc.).
> If you read or import shapefiles (or other spatial formats) in your program, you’ll notice that **all the information** — including geometry-related data — is stored as attributes. So in a way, everything becomes a **feature describing the data** — whether it’s where something is (geometry) or what it is (attributes). From the program’s perspective, it’s just a structured collection of information.
```{admonition} ⚠️ Reminders
Always check the **CRS (coordinate reference system)** when working with multiple layers or datasets — mismatched CRS is a common source of confusion!
In the lecture, we mentioned three key points that make geospatial data different from other types of data:
Spatial and Temporal Dependence Example: Traffic congestion in one district affects neighboring areas; deforestation in one region impacts climate patterns elsewhere.
Spatial Heterogeneity Example: Rainfall patterns or land use can differ dramatically from one region to another.
Place-Sensitive User Interest Example: Local air quality reports are much more relevant to nearby residents.
```{admonition} 🔬 Dive more
In probability theory(e.g. Bayesian Estimation), the assumption of I.I.D. (Independent and Identically Distributed) is fundamental.
But due to spatial and temporal dependence, this assumption often fails in geospatial contexts.
👉 How can we address this? → Use models that account for spatial autocorrelation rather than assuming independence.
We can only collect samples at specific locations — e.g., we have limited and static observations about weather conditions.
👉 How can we estimate values in unsampled areas? → Use spatial interpolation techniques such as Kriging or IDW.
How can we quantify spatial heterogeneity?
👉 Use variograms, local statistics (e.g., Local Moran’s I), or model non-stationarity with techniques like GWR (Geographically Weighted Regression).
```
In today’s exercise, you’ll get hands-on experience with how to interact with these features and fields — the fundamental building blocks of geospatial data.
Detailed instructions in {download}Lesson 1 <../doc/Lesson 1.docx>
& You can Click here to look
Data Students.gdborthophoto.tif```{admonition} 🔬 Coding Hints
GeoPandas, Shapely, Rasterio (for TIFF), and Pyproj.GDAL or OGR for both vector and raster data handling.
```| [Edit Features Using Editor Tools in ArcGIS Pro | Beginners Guide](https://www.youtube.com/watch?v=i-HDJaZw6dU&ab_channel=TerraSpatial) |