Geo-Information

E1: Selecting and Editinfg Features

Warm Up

Firstly, let’s take a moment to talk about a very important (and maybe slightly mysterious) word: 👉 Feature.

What Does “Feature” Mean in Data Science?

In data science and machine learning, a feature is just a fancy word for: A column in your table that describes something about your data.

For example, if you’re building a model to predict house prices, features might include:

These features feed into your models, visualizations, and decisions. This process — turning messy, real-world data into usable, structured information — is called data engineering(machine learning engineer also covers this).🛠️Typical Workflow of DS or MLE is like:

Data collection -> Raw data → Preprocessing → Feature extraction → Model / Analysis

```{admonition} Features of Special Format :class: dropdown In other fields, “features” can look very different:

And just like in other fields, we still care about their properties — but now they’re stored in attribute tables.


> Take it easy — imagine you're the computer. For every item in a dataset, the things you store or know about it are called features.
Features can be:
- Built-in (directly collected from raw data)
- Extracted, filtered, or converted from original information (e.g., converting time to day of week, [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html))
- Learned through machine learning or deep learning models — though these learned features can sometimes be a bit mysterious 🤖✨

---

### What Are Feature and what are Fields in a Shapefile?
In this exercise, a **feature** specifically refers to a **geometry object** — such as a point, line, or polygon — representing a real-world location. Each feature typically consists of two parts:
- **Geometry**:
Describes where it is, including spatial shape and location.
Also includes auxiliary information like the Coordinate Reference System (CRS).
- **Attribute Fields**:
Describe what it is — these are the additional properties or metadata stored in a table (e.g., name, type, status, ID, etc.).
> If you read or import shapefiles (or other spatial formats) in your program, you’ll notice that **all the information** — including geometry-related data — is stored as attributes. So in a way, everything becomes a **feature describing the data** — whether it’s where something is (geometry) or what it is (attributes). From the program’s perspective, it’s just a structured collection of information.
```{admonition} ⚠️  Reminders
Always check the **CRS (coordinate reference system)** when working with multiple layers or datasets — mismatched CRS is a common source of confusion!

What makes geoinformation special?

In the lecture, we mentioned three key points that make geospatial data different from other types of data:

  1. Spatial and Temporal Dependence Example: Traffic congestion in one district affects neighboring areas; deforestation in one region impacts climate patterns elsewhere.

  2. Spatial Heterogeneity Example: Rainfall patterns or land use can differ dramatically from one region to another.

  3. Place-Sensitive User Interest Example: Local air quality reports are much more relevant to nearby residents.

```{admonition} 🔬 Dive more


Task

Descriptions

Detailed instructions in {download}Lesson 1 <../doc/Lesson 1.docx>

& You can Click here to look

Data

Overview

Advance Task

```{admonition} 🔬 Coding Hints


Materials