Getting Started with SQL
A comprehensive guide on the basics of SQL and learning resources to become a better SQL programmer.
Introduction to SQL
Structured Query Language (SQL) is the standard language used to interact with databases. It allows users to create, manipulate, and query data stored in a RDBMS. SQL has become the backbone of many software applications, enabling efficient data handling, from small-scale personal databases to large-scale enterprise systems.
Some functions of SQL include:
- Creation and modification of database structures like tables, indexes, and schemas
- Used to insert, update, delete, and retrieve data from a database
- SQL’s query capabilities are one of its strongest features, allowing users to extract specific data from large databases efficiently
- Allows you to manage database permissions, ensuring that only authorized users can access or modify data
Overview of SQL’s Different Categories
1. Data Definition Language (DDL)
DDL focuses on defining and managing a database’s structure. This covers building, editing, and deleting tables, indexes, views, and other database structures. Unlike Data Manipulation Language (DML) commands, which deal with real data, DDL instructions are concerned with metadata—the database’s structure and design. DDL is used when you create, modify, or delete a table, as well as to define relationships between tables. These commands are executed as standalone statements and do not require committing because DDL commands make modifications that are automatically committed.
The
CREATEcommand is used to create new database objects such as tables, indexes, and views. It defines the structure and the relationships between these objects.The
ALTERcommand is used to modify an existing database object. You can use it to add, delete, or modify columns in a table, or to change the properties of a database object.The
DROPcommand is used to delete an existing database object. When you drop a table, all the data within it is also deleted, so it should be used with caution.The
TRUNCATEcommand removes all rows from a table, effectively resetting it to its empty state, but it does not delete the table structure. This command is often faster than using theDELETEcommand because it doesn’t generate individual row delete operations.
Examples of Each Command in a Simple Database Schema
To better understand how these DDL commands work in practice, let’s consider a simple database schema for a company that tracks employees and departments.
Creating a Database Schema
1 2 3 4 5 6 7 8 9 10 11 12 13
CREATE TABLE Departments ( DepartmentID INT PRIMARY KEY, DepartmentName VARCHAR(100) ); CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), LastName VARCHAR(50), HireDate DATE, DepartmentID INT, FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID) );
In this schema, we have two tables:
DepartmentsandEmployees. TheDepartmentstable contains information about different departments, and theEmployeestable contains employee details. TheDepartmentIDin theEmployeestable is a foreign key that references theDepartmentIDin theDepartmentstable, establishing a relationship between the two tables.Altering the Schema
1 2
ALTER TABLE Employees ADD Email VARCHAR(100);
After creating the
Employeestable, we realized that we also need to store the employees’ email addresses. TheALTERcommand allows us to add theEmailcolumn to the existingEmployeestable.Dropping a Table
1
DROP TABLE Departments;
If the company decides to restructure and no longer needs the
Departmentstable, theDROPcommand can be used to remove it from the database entirely.Truncating a Table
1
TRUNCATE TABLE Employees;
If the company wants to retain the
Employeestable structure but clear out all the current employee records (perhaps to prepare for a new data import), theTRUNCATEcommand is ideal.
2. Data Manipulation Language (DML)
DDL focuses on managing and manipulating the data stored in a database’s tables. While Data Definition Language (DDL) is concerned with the database’s structure, Data Management Language (DML) works directly with the data itself, allowing you to insert new entries, update current data, and delete records that are no longer required. Unlike DDL commands, DML commands do not automatically commit changes to the database; instead, they are typically conducted within a transaction, allowing changes to be reversed if necessary.
The
INSERTcommand is used to add new records (rows) to a database table. Each new row is inserted as a separate entry, with the data specified for each column in the table.The
UPDATEcommand is used to modify existing records in a database table. It allows you to change the values of one or more columns for rows that match a specific condition.The
DELETEcommand is used to remove one or more rows from a database table. The rows to be deleted are identified by a condition specified in the WHERE clause.
Examples of Each Command in a Simple Database Schema
To better understand how DML commands work, let’s continue using our simple database schema from the DDL section, focusing on the Employees table.
Inserting Data into the Table
1 2
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate, DepartmentID, Email) VALUES (2, 'Jane', 'Smith', '2024-03-10', 1, 'jane.smith@example.com');
This
INSERTcommand adds a new record for Jane Smith into theEmployeestable, capturing her personal details and associating her with a department.Update Existing Data
1 2 3
UPDATE Employees SET DepartmentID = 3 WHERE EmployeeID = 2;
After Jane Smith transfers to a different department, this
UPDATEcommand changes herDepartmentIDto 3 in theEmployeestable, reflecting her new department.Deleting Data from the Table
1 2
DELETE FROM Employees WHERE EmployeeID = 2;
If Jane Smith leaves the company, this
DELETEcommand removes her record from theEmployeestable, ensuring that her data is no longer stored in the system.
3. Data Query Language (DQL)
DQL is used to query and retrieve information from a database. DML modifies data, whereas DQL selects and displays data contained in database tables. The SELECT command is the foundation of DQL and knowing it is required for efficient data querying. Unlike other SQL queries, DQL commands are typically read-only, which means they do not modify the data but rather obtain it for viewing or subsequent processing.
Examples of SELECT Command in a Simple Database Schema
To demonstrate how the SELECT command works in practice, let’s continue using our simple database schema with the Employees and Departments tables.
Basic Data Retrieval
1
SELECT EmployeeID, FirstName, LastName FROM Employees;
This basic
SELECTstatement fetches theEmployeeID,FirstName, andLastNamecolumns from theEmployeestable, showing a list of all employees.Filtering Data with
WHERE1 2
SELECT FirstName, LastName FROM Employees WHERE HireDate > '2023-01-01';
This query retrieves the first and last names of employees hired after January 1, 2023, filtering the results based on the hire date.
Using Aggregate Functions
1
SELECT AVG(Salary) AS AverageSalary FROM Employees;
This query calculates the average salary of all employees in the Employees table, providing a single summary value labeled as
AverageSalary.Joining Tables to Retrieve Related Data
1 2 3
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName FROM Employees INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
This query joins the
EmployeesandDepartmentstables to display each employee’s first name, last name, and the name of their department.Sorting Results with
ORDER BY1 2
SELECT FirstName, LastName FROM Employees ORDER BY HireDate DESC;
This query retrieves the first and last names of employees, sorting the results by
HireDatein descending order, so the most recently hired employees appear first.
4. Data Control Language (DCL) and Transaction Control Language (TCL)
Data Control Language (DCL) and Transaction Control Language (TCL) are two important subsets of SQL that focus on managing database access and controlling transactions. While DCL deals with permissions and security, TCL ensures the integrity and consistency of database transactions. The main DCL commands are GRANT and REVOKE while main commands of TCL are COMMIT, ROLLBACK and SAVEPOINT.
- The
GRANTcommand is used to assign permissions to users or roles, allowing them to perform specific actions on database objects such as tables, views, or procedures. - The
REVOKEcommand is used to remove previously granted permissions from a user or role, thereby restricting their access to certain database objects. - The
COMMITcommand is used to save all the changes made during the current transaction to the database. Once a transaction is committed, the changes become permanent and visible to other users. - The
ROLLBACKcommand undoes all changes made during the current transaction, reverting the database to its state before the transaction began. It is useful for discarding unwanted changes or recovering from errors. - The
SAVEPOINTcommand is used to set a point within a transaction to which you can later roll back. This allows for partial rollback within a transaction without undoing the entire transaction.
Examples of DCL & TCL Commands in Practice
Granting Permissions
1
GRANT UPDATE ON Employees TO hr_manager;
This command grants the
hr_managerrole the ability to update records in theEmployeestable, allowing HR managers to make changes to employee data.Revoking Permissions
1
REVOKE SELECT ON Employees FROM temp_user;
If
temp_usershould no longer have read access to theEmployeestable, this command revokes theirSELECTpermission.Using
COMMITin a Transaction1 2 3
BEGIN TRANSACTION; UPDATE Employees SET Salary = Salary * 1.05 WHERE DepartmentID = 1; COMMIT;
This sequence starts a transaction, updates the salaries of all employees in department 1 by increasing them by 5%, and then commits the changes, making them permanent.
Using
ROLLBACKto undo changes1 2 3
BEGIN TRANSACTION; DELETE FROM Employees WHERE EmployeeID = 10; ROLLBACK;
After starting a transaction and deleting an employee record with
EmployeeID= 10, theROLLBACKcommand is issued, undoing the deletion and restoring the record.Using
SAVEPOINT1 2 3
BEGIN TRANSACTION; UPDATE Employees SET Salary = Salary * 1.10 WHERE EmployeeID = 5; SAVEPOINT after_first_update;
In this transaction, after updating the salary for
EmployeeID= 5, a savepoint is created.
Why Mastering SQL is Important
SQL serves as the primary tool for interacting with relational databases. By understanding SQL deeply, you gain the ability to write optimized queries that can handle complex data operations, improve database performance, and ensure data accuracy and integrity. As a data engineer, mastering SQL empowers you to build robust data pipelines, automate data processing tasks, and integrate different data sources. Ultimately, SQL proficiency enhances your ability to contribute to data-intensive projects, making you a more valuable asset in any data-driven organization.
Summary
Certainly, this blog does not cover all there is to know about SQL. It was created to provide you with an overview before diving into more advanced SQL commands. SQL is a broad language with several commands and procedures to discover. Mastering SQL requires constant practice, which may be accomplished by working through online lessons and solving SQL problems. In teh below section, I’ll attach some learning resources to help you on your journey. Engaging with these problems will not only solidify your understanding but also make you a more proficient SQL programmer.