Getting Started with SQL
A comprehensive guide on the basics of SQL and learning resources to become a better SQL programmer.
Introduction to SQL
Structured Query Language (SQL) is the standard language used to interact with databases. It allows users to create, manipulate, and query data stored in a RDBMS. SQL has become the backbone of many software applications, enabling efficient data handling, from small-scale personal databases to large-scale enterprise systems.
Some functions of SQL include:
- Creation and modification of database structures like tables, indexes, and schemas
- Used to insert, update, delete, and retrieve data from a database
- SQL’s query capabilities are one of its strongest features, allowing users to extract specific data from large databases efficiently
- Allows you to manage database permissions, ensuring that only authorized users can access or modify data
Overview of SQL’s Different Categories
1. Data Definition Language (DDL)
DDL focuses on defining and managing a database’s structure. This covers building, editing, and deleting tables, indexes, views, and other database structures. Unlike Data Manipulation Language (DML) commands, which deal with real data, DDL instructions are concerned with metadata—the database’s structure and design. DDL is used when you create, modify, or delete a table, as well as to define relationships between tables. These commands are executed as standalone statements and do not require committing because DDL commands make modifications that are automatically committed.
The
CREATE
command is used to create new database objects such as tables, indexes, and views. It defines the structure and the relationships between these objects.The
ALTER
command is used to modify an existing database object. You can use it to add, delete, or modify columns in a table, or to change the properties of a database object.The
DROP
command is used to delete an existing database object. When you drop a table, all the data within it is also deleted, so it should be used with caution.The
TRUNCATE
command removes all rows from a table, effectively resetting it to its empty state, but it does not delete the table structure. This command is often faster than using theDELETE
command because it doesn’t generate individual row delete operations.
Examples of Each Command in a Simple Database Schema
To better understand how these DDL commands work in practice, let’s consider a simple database schema for a company that tracks employees and departments.
Creating a Database Schema
1 2 3 4 5 6 7 8 9 10 11 12 13
CREATE TABLE Departments ( DepartmentID INT PRIMARY KEY, DepartmentName VARCHAR(100) ); CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), LastName VARCHAR(50), HireDate DATE, DepartmentID INT, FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID) );
In this schema, we have two tables:
Departments
andEmployees
. TheDepartments
table contains information about different departments, and theEmployees
table contains employee details. TheDepartmentID
in theEmployees
table is a foreign key that references theDepartmentID
in theDepartments
table, establishing a relationship between the two tables.Altering the Schema
1 2
ALTER TABLE Employees ADD Email VARCHAR(100);
After creating the
Employees
table, we realized that we also need to store the employees’ email addresses. TheALTER
command allows us to add theEmail
column to the existingEmployees
table.Dropping a Table
1
DROP TABLE Departments;
If the company decides to restructure and no longer needs the
Departments
table, theDROP
command can be used to remove it from the database entirely.Truncating a Table
1
TRUNCATE TABLE Employees;
If the company wants to retain the
Employees
table structure but clear out all the current employee records (perhaps to prepare for a new data import), theTRUNCATE
command is ideal.
2. Data Manipulation Language (DML)
DDL focuses on managing and manipulating the data stored in a database’s tables. While Data Definition Language (DDL) is concerned with the database’s structure, Data Management Language (DML) works directly with the data itself, allowing you to insert new entries, update current data, and delete records that are no longer required. Unlike DDL commands, DML commands do not automatically commit changes to the database; instead, they are typically conducted within a transaction, allowing changes to be reversed if necessary.
The
INSERT
command is used to add new records (rows) to a database table. Each new row is inserted as a separate entry, with the data specified for each column in the table.The
UPDATE
command is used to modify existing records in a database table. It allows you to change the values of one or more columns for rows that match a specific condition.The
DELETE
command is used to remove one or more rows from a database table. The rows to be deleted are identified by a condition specified in the WHERE clause.
Examples of Each Command in a Simple Database Schema
To better understand how DML commands work, let’s continue using our simple database schema from the DDL section, focusing on the Employees
table.
Inserting Data into the Table
1 2
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate, DepartmentID, Email) VALUES (2, 'Jane', 'Smith', '2024-03-10', 1, 'jane.smith@example.com');
This
INSERT
command adds a new record for Jane Smith into theEmployees
table, capturing her personal details and associating her with a department.Update Existing Data
1 2 3
UPDATE Employees SET DepartmentID = 3 WHERE EmployeeID = 2;
After Jane Smith transfers to a different department, this
UPDATE
command changes herDepartmentID
to 3 in theEmployees
table, reflecting her new department.Deleting Data from the Table
1 2
DELETE FROM Employees WHERE EmployeeID = 2;
If Jane Smith leaves the company, this
DELETE
command removes her record from theEmployees
table, ensuring that her data is no longer stored in the system.
3. Data Query Language (DQL)
DQL is used to query and retrieve information from a database. DML modifies data, whereas DQL selects and displays data contained in database tables. The SELECT
command is the foundation of DQL and knowing it is required for efficient data querying. Unlike other SQL queries, DQL commands are typically read-only, which means they do not modify the data but rather obtain it for viewing or subsequent processing.
Examples of SELECT
Command in a Simple Database Schema
To demonstrate how the SELECT
command works in practice, let’s continue using our simple database schema with the Employees
and Departments
tables.
Basic Data Retrieval
1
SELECT EmployeeID, FirstName, LastName FROM Employees;
This basic
SELECT
statement fetches theEmployeeID
,FirstName
, andLastName
columns from theEmployees
table, showing a list of all employees.Filtering Data with
WHERE
1 2
SELECT FirstName, LastName FROM Employees WHERE HireDate > '2023-01-01';
This query retrieves the first and last names of employees hired after January 1, 2023, filtering the results based on the hire date.
Using Aggregate Functions
1
SELECT AVG(Salary) AS AverageSalary FROM Employees;
This query calculates the average salary of all employees in the Employees table, providing a single summary value labeled as
AverageSalary
.Joining Tables to Retrieve Related Data
1 2 3
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName FROM Employees INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
This query joins the
Employees
andDepartments
tables to display each employee’s first name, last name, and the name of their department.Sorting Results with
ORDER BY
1 2
SELECT FirstName, LastName FROM Employees ORDER BY HireDate DESC;
This query retrieves the first and last names of employees, sorting the results by
HireDate
in descending order, so the most recently hired employees appear first.
4. Data Control Language (DCL) and Transaction Control Language (TCL)
Data Control Language (DCL) and Transaction Control Language (TCL) are two important subsets of SQL that focus on managing database access and controlling transactions. While DCL deals with permissions and security, TCL ensures the integrity and consistency of database transactions. The main DCL commands are GRANT
and REVOKE
while main commands of TCL are COMMIT
, ROLLBACK
and SAVEPOINT
.
- The
GRANT
command is used to assign permissions to users or roles, allowing them to perform specific actions on database objects such as tables, views, or procedures. - The
REVOKE
command is used to remove previously granted permissions from a user or role, thereby restricting their access to certain database objects. - The
COMMIT
command is used to save all the changes made during the current transaction to the database. Once a transaction is committed, the changes become permanent and visible to other users. - The
ROLLBACK
command undoes all changes made during the current transaction, reverting the database to its state before the transaction began. It is useful for discarding unwanted changes or recovering from errors. - The
SAVEPOINT
command is used to set a point within a transaction to which you can later roll back. This allows for partial rollback within a transaction without undoing the entire transaction.
Examples of DCL & TCL Commands in Practice
Granting Permissions
1
GRANT UPDATE ON Employees TO hr_manager;
This command grants the
hr_manager
role the ability to update records in theEmployees
table, allowing HR managers to make changes to employee data.Revoking Permissions
1
REVOKE SELECT ON Employees FROM temp_user;
If
temp_user
should no longer have read access to theEmployees
table, this command revokes theirSELECT
permission.Using
COMMIT
in a Transaction1 2 3
BEGIN TRANSACTION; UPDATE Employees SET Salary = Salary * 1.05 WHERE DepartmentID = 1; COMMIT;
This sequence starts a transaction, updates the salaries of all employees in department 1 by increasing them by 5%, and then commits the changes, making them permanent.
Using
ROLLBACK
to undo changes1 2 3
BEGIN TRANSACTION; DELETE FROM Employees WHERE EmployeeID = 10; ROLLBACK;
After starting a transaction and deleting an employee record with
EmployeeID
= 10, theROLLBACK
command is issued, undoing the deletion and restoring the record.Using
SAVEPOINT
1 2 3
BEGIN TRANSACTION; UPDATE Employees SET Salary = Salary * 1.10 WHERE EmployeeID = 5; SAVEPOINT after_first_update;
In this transaction, after updating the salary for
EmployeeID
= 5, a savepoint is created.
Why Mastering SQL is Important
SQL serves as the primary tool for interacting with relational databases. By understanding SQL deeply, you gain the ability to write optimized queries that can handle complex data operations, improve database performance, and ensure data accuracy and integrity. As a data engineer, mastering SQL empowers you to build robust data pipelines, automate data processing tasks, and integrate different data sources. Ultimately, SQL proficiency enhances your ability to contribute to data-intensive projects, making you a more valuable asset in any data-driven organization.
Summary
Certainly, this blog does not cover all there is to know about SQL. It was created to provide you with an overview before diving into more advanced SQL commands. SQL is a broad language with several commands and procedures to discover. Mastering SQL requires constant practice, which may be accomplished by working through online lessons and solving SQL problems. In teh below section, I’ll attach some learning resources to help you on your journey. Engaging with these problems will not only solidify your understanding but also make you a more proficient SQL programmer.