The Lahman baseball database contains multiple tables on MLB players and their batting / pitching / fielding statistics for as far back as the data is known. It's a fun and easy dataset to learn how to query. Best of all, I found a dataset already created for SQL Server that will have you up and running in minutes. There's tons of information on querying the Lahman data, and there's courses on edx specifically on sabermetrics and querying the lahman data via sql or in R.
1) Goto https://www.sqlskills.com/sql-server-resources/sql-server-demos/
2) Download the baseball stats sample database, baseball_db.zip, and unzip.
3) Start SQL Server management studio, right click 'Databases' and select 'Restore Database'
4) On the General tab, select 'Device' and '...'
5) Add, and navigate to the downloaded BaseballData.bak
6) Select 'ok'
7) The data will be imported into your dataset.
Verification
1) Run the query:
use BaseballData;
select b.*
from dbo.players p
inner join dbo.batting b on p.playerid = b.playerID
where p.namelast = 'Utley' and p.namefirst = 'Chase'
Order by yearID asc
2) Results should have batting information.
3) Search google for: chase utley baseball statistics
4) Search for specific numbers in the fields to validate that the data aligns with standard sources Always be skeptical of your data and independently validate when possible.
Thank you for taking the time to publish this information very useful! bestsportsgearhub.com
ReplyDeleteYour website is really cool and this is a great inspiring article. Thank you so much. best maple bats
ReplyDeleteThank you!
ReplyDelete