Datasets:
text stringlengths 45 702k | id stringlengths 47 47 | url stringlengths 14 2.3k | dataset stringclasses 1
value |
|---|---|---|---|
Welcome to Gaia! ::
Black Death Goddess's avatar
Interesting Consumer
16,350 Points
• Trick or Treat 100
• Battle: Cleric 100
• Battle: Level Up 200
Sooooo, no answer on the Halloween items in the gold shops then?
Daranigan's avatar
Dangerous Hunter
12,850 Points
• Millionaire 200
• Ultimate Player 200
... | <urn:uuid:731a22a3-8429-4a65-a08d-262668edfc6f> | http://www.gaiaonline.com/forum/ask-the-admin-archives/ask-the-admin-10-01-2012/t.82583617_541/ | mlfoundations/dclm-baseline-1.0-parquet |
CSN Houston, Bankruptcy and Why the Rockets Aren't Just Innocent Victims
Don't expect the Rockets or Astros on your TV anytime soon
Odds are that if you've flown an airline in the past decade, you've flown on an airline going through Chapter 11 bankruptcy. This is the popular bankruptcy, the one that lets the airline ... | <urn:uuid:8f8f0167-f1a8-4338-9427-550dd02be0b3> | http://blogs.houstonpress.com/hairballs/2014/02/csn_houston_rockets_bankruptcy.php | mlfoundations/dclm-baseline-1.0-parquet |
MUNCIE, Ind. -- Miami rallied from a double-point loss to take four singles victories, giving the RedHawks the 2013 MAC Tournament Championship, 4-3 over Bowling Green. The title is the third in the last five years for Miami, which had dropped 4-3 decisions in the finals in each of the last two championships.
The win ... | <urn:uuid:5afc9f17-3ac8-41b8-a8b5-f6995e6ed115> | http://www.muredhawks.com/ViewArticle.dbml?SPID=87606&DB_LANG=C&DB_OEM_ID=26100&ATCLID=207462862 | mlfoundations/dclm-baseline-1.0-parquet |
A Stall Team (Peaked at #4 on UU Leaderboard)
Discussion in 'Past Gen Teams' started by Jubilee, Jul 30, 2010.
1. Jubilee
is a Contributor Alumnus
Jun 20, 2009
I began a few months ago to really get into the UU tier. It was just way more fun to me than the boring OU with every team being the same... ... | <urn:uuid:a833fdfc-e71d-44e8-b849-926dfcbec111> | http://www.smogon.com/forums/threads/a-stall-team-peaked-at-4-on-uu-leaderboard.76069/ | mlfoundations/dclm-baseline-1.0-parquet |
18 year old, seeking advice!
• Hey guys!
I just signed up for this, because I feel like I need support! I have recently lost a lot of weight. I took a 'break' week in which I ate whatever I wanted and didn't worry about working out. I just weighed myself and I gained ten pounds, in 8 days. My mom is saying it's... | <urn:uuid:b96824e8-9b91-461b-a2db-efef8b46122b> | http://bodyforlife.com/community/boards/bfl/f/14/p/1098/13410.aspx | mlfoundations/dclm-baseline-1.0-parquet |
No Online Dating Site Can Match Up Your Inner Crazy
Online dating's said to be the future of relationships, now that we're all too busy to meet people in real life. But claims that websites can match you with your ideal partner using scientific algorithms are bull, according to a team of psychologists. Because not eve... | <urn:uuid:7567ca1d-5d91-4f1b-aee8-464ab8d6ed1e> | http://gizmodo.com/5882657/no-online-dating-site-can-match-up-your-inner-crazy | mlfoundations/dclm-baseline-1.0-parquet |
I've answered a Java question today. The answer was long and included a considerable amount of code. When I tried to post the answer, I got a cryptic error:
enter image description here
I've copied my text to an external editor, and tried posting parts using binary bisection. After some trial and error, the problem b... | <urn:uuid:eba79e9d-ca78-4b98-8271-3cc302d26676> | http://meta.stackoverflow.com/questions/168923/proper-error-message-for-lmgtfy-banning | mlfoundations/dclm-baseline-1.0-parquet |
What's involved in a car tune-up?
July 28, 2009 2:30 PM Subscribe
Car care: (1) How often, generally speaking, should I change my spark plugs and plug wires? (2) When a vehicle gets a "tune up," what, exactly, does this mean? What gets tuned? Thanks!
posted by jackypaper to Travel & Transportation (12 answers total)... | <urn:uuid:d9d5d107-488d-459f-801e-ae9c79eb48c5> | http://ask.metafilter.com/128643/Whats-involved-in-a-car-tuneup | mlfoundations/dclm-baseline-1.0-parquet |
Manip, although you are correct, this has nothing to do with why the String object is immutable. If I append a few characters to a StringBuilder, the same move/copy/free thing is done, but the StringBuilder class is not immutable.
Let's take a look at an example with a method that both the StringBuilder and String cla... | <urn:uuid:f9ca165f-c788-4a81-87a8-f4bb7f99a3f9> | http://channel9.msdn.com/Forums/TechOff/58729-Why-are-string-types-immutable-in-C/020d1b2a169d4c7891b49dea011daa7a | mlfoundations/dclm-baseline-1.0-parquet |
Take the 2-minute tour ×
I am currently creating my own framework in C++ (MSVS 2008) which exports a dll with a bunch of functions for a user of my framework to use/call. In the beginning, when my project was still small, all worked fine. I compiled my project. A MyFramework.dll and MyFramework.lib were spit out. I pr... | <urn:uuid:162923dd-622e-421b-a2ec-fe28d6947733> | http://stackoverflow.com/questions/1571409/creating-a-dll-which-links-to-another-dll-msvs2008-c | mlfoundations/dclm-baseline-1.0-parquet |
Documentation Center
• Trials
• Product Updates
Copy graphics objects and their descendants
new_handle = copyobj(h,p)
copyobj creates copies of graphics objects. The copies are identical to the original objects except the copies have different values for their Parent property and a new handle. The new parent... | <urn:uuid:40fe98c0-3907-4ff6-acf2-faf16dda6684> | http://www.mathworks.it/it/help/matlab/ref/copyobj.html?s_tid=gn_loc_drop&nocookie=true | mlfoundations/dclm-baseline-1.0-parquet |
The Nizkor Project: Remembering the Holocaust (Shoah)
Shofar FTP Archive File: orgs/american/freemen/duke-on-freemen
From Thu Jun 6 07:52:52 PDT 1996
Article: 41336 of alt.revisionism
From: (Rich Graves)
Newsgroups: alt.activism,alt.conspiracy,alt.politics.nationalism.white,alt.politics.white-power,,alt.revisionism,... | <urn:uuid:1a09d95f-d9d6-4788-bd20-c1e6ff31d749> | http://www.nizkor.org/ftp.cgi/orgs/ftp.py?orgs/american/freemen/duke-on-freemen | mlfoundations/dclm-baseline-1.0-parquet |
Browsing named entities in Rebellion Record: a Diary of American Events: Documents and Narratives, Volume 10. (ed. Frank Moore). You can also browse the collection for Vallandigham or search for Vallandigham in all documents.
Your search returned 11 results in 1 document section:
ffectual putting down of this rebelli... | <urn:uuid:9c48f613-4b45-4138-a531-e803bc22e457> | http://www.perseus.tufts.edu/hopper/nebrowser?id=vallandigham&query=Perseus:text:2001.05.0101 | mlfoundations/dclm-baseline-1.0-parquet |
Log in
Free trial
Article excerpt
Key Words: Al-rihla; Medieval Muslim Travelers (MMT); Hajj; Place and space; Positionality
The period between 750 and 1258 C.E. in Medieval Islamic history is characterized as the Golden Age of Muslim civilization during which four Islamic dynasties were established: the Umayyad... | <urn:uuid:a774b278-039b-43f1-99d0-ffee7c245ec9> | http://www.questia.com/library/journal/1P3-2895010521/knowledge-culture-and-positionality-analysis-of | mlfoundations/dclm-baseline-1.0-parquet |
How Can Los Angeles Adapt to Coming Climate Change?
Climate change can’t alter the blue skies or access to the beach and mountains, but it will pose four tangible threats: The summers will grow hotter, the air will be smoggier, there will be more fires, and there will be much less water
© / Janne Ahvo
Editor's Note:... | <urn:uuid:344bcced-68ae-45ac-8429-ef1aca847ed6> | http://www.scientificamerican.com/article/los-angeles-adapt-to-climate-change/ | mlfoundations/dclm-baseline-1.0-parquet |
Debian Weekly News - November 15th, 2005
Debian Weekly News
Debian Weekly News - November 15th, 2005
the Debian community. Members of the Debian-Edu sub-project have
[1]proposed codenames for the upcoming Skolelinux release such as
Terra, Tellus and Oslo. Adrian von Bidder was [2]looking for very old
Debian install... | <urn:uuid:8a159cc5-f72e-4663-ba68-416b98b8c983> | https://lists.debian.org/debian-news/2005/msg00053.html | mlfoundations/dclm-baseline-1.0-parquet |
Carmel en Reclaiming a Coastal Garden <!--paging_filter--><p class="MsoNormal" style="MARGIN: 0in 0in 0pt">The ocean pulls us to its edge with a primeval force. Toes in the sand, face caressed by sea breezes, we worship the beach as the symbol of idleness and renewal. But as a habitat, the beach is no picnic. The sand ... | <urn:uuid:c1517d30-aa31-4bed-99e8-10e92be14211> | http://www.gardendesign.com/tag/carmel/feed | mlfoundations/dclm-baseline-1.0-parquet |
One Ritz-Carlton Drive, Dana Point, California 92629 USA
All Girls Surf Getaway
Why should guys have all the fun? Surfing is one of the most amazing sports on the planet. The connection with nature, the lifestyle surrounding it and the positive impact on your physical, emotional and spiritual being is amazing. It is ... | <urn:uuid:95ad4bd1-e5fb-41fd-a6b8-966c416c6f9c> | http://www.ritzcarlton.com/en/Properties/LagunaNiguel/Reservations/Packages/Detail/all_girls_surf_getaway_day.htm?WBCMODE=PresentationUnpublishedDefault%2CPresentationUnpublishedDefault%2CPresentationUnpublishedDefault%2CPresentationUnpublishedDefaultDefault.htm | mlfoundations/dclm-baseline-1.0-parquet |
RSS Feeds
Romney ad advantage doesn't tell the whole story
Thursday - 10/18/2012, 12:16pm ET
Associated Press
NEW YORK (AP) - Independent groups working to elect Republican Mitt Romney have helped him match or even exceed President Barack Obama's TV ad spending in dozens of media markets in battleground states. Bu... | <urn:uuid:e0c2c772-8781-4fe8-8ea7-1cd95164a855> | http://www.wtop.com/278/3083301/Romney-ad-advantage-doesnt-tell-the-whole-story | mlfoundations/dclm-baseline-1.0-parquet |
El Goonish Shive – Delta
By brokenhero
Author's note: All characters except members of the Cross family owned and copyrighted by Dan Shive. I do not have Dan's permission to write this. I write it for my own enjoyment and the enjoyment of others. So enjoy!
Act One: "Introduction"
-Thursday night, 8:30 p.m.-
"Boys,... | <urn:uuid:7f2b7b23-f3a7-4cb3-9dad-9b145cf0ab1e> | https://www.fanfiction.net/s/4124836/1/El-Goonish-Shive-Delta | mlfoundations/dclm-baseline-1.0-parquet |
Research company x business model, game dev company
Job Description
1. A quick market analysis based on Porters five forces analysis.
2. A qiuck business model canvas, Not more than 1 A4.
3. I want to know, is there any money in this, can they earn money in the future? Do they have any impact on the market
I need... | <urn:uuid:fd5321fc-9778-4ea1-9873-93a416a82701> | https://www.odesk.com/o/jobs/job/_~016dfe57037205d189/ | mlfoundations/dclm-baseline-1.0-parquet |
Take the 2-minute tour ×
Warning : totally noob question...
I just started using ubuntu at home, and i love it, but there are some basic stuff that i don't know how to do and is annoying me...
When I install a package using sudo apt-get install ... I don't even know where the installed package is. For some packages ... | <urn:uuid:e6ba24e3-79f4-44ee-8012-c7d77d35c310> | http://askubuntu.com/questions/229705/finding-an-apt-get-installation?answertab=active | mlfoundations/dclm-baseline-1.0-parquet |
"Catfish: The TV Show"
Credit: MTV.com
"Catfish: The TV Show"
by Jordan Armstrong / KVUE.com
Bio | Email | Follow: @majordyrules
Posted on December 7, 2012 at 4:09 PM
Updated Friday, Jan 18 at 11:45 AM
Have you seen "Catfish: The TV Show"?
First let me ask you, have you seen “Catfish” the documentary? It was ... | <urn:uuid:5947d8da-44c2-4461-8edb-3e9471d27aef> | http://www.kvue.com/entertainment/have-you-seen/Catfish-The-TV-Show-182581161.html?ref=next | mlfoundations/dclm-baseline-1.0-parquet |
Friday 14 March 2014
Saudi delays execution of seven men
Executions jump in 2011, driven by ME: Amnesty
The number of executions carried out around the world jumped last year, largely due to a surge in use of the death penalty in Iran, Iraq and Saudi Arabia, Amnesty International said on Tuesday. The rights g... | <urn:uuid:14a119c7-7802-49e7-b354-96eafde9057f> | http://www.tradearabia.com/articles/tag/47887 | mlfoundations/dclm-baseline-1.0-parquet |
FinePDFs-Edu 50BT + DCLM 30BT + FineWeb-Edu 20BT
A ~100 billion token pretraining mixture combining three high-quality English data sources in a 50-30-20 ratio, using the educational subset of FinePDFs.
Part of the Smol-Data collection — tried and tested mixes for strong pretraining. Inspired by optimal dataset mixing.
Dataset Description
| Component | Source | Tokens |
|---|---|---|
| FinePDFs-Edu 100BT | FinePDFs-Edu | ~50B |
| DCLM 100BT | DCLM-Baseline 1.0 | ~30B |
| FineWeb-Edu 100BT | FineWeb-Edu | ~20B |
The schema is reduced to the intersection of columns across all three sources: text, id, url, and dataset.
A pre-shuffled version is available at HuggingFaceFW/finepdfs_edu_50BT-dclm_30BT-fineweb_edu_20BT-shuffled.
How It Was Created
The dataset was generated using datatrove with the smol_data.py script. Each component was subsampled from its respective 100BT subset at the target fraction (0.5, 0.3, 0.2) using a SamplerFilter with seed 42. Components were written sequentially via Slurm job dependencies to avoid concurrent commits.
Usage
from datasets import load_dataset
ds = load_dataset("HuggingFaceFW/finepdfs_edu_50BT-dclm_30BT-fineweb_edu_20BT", split="train", streaming=True)
for sample in ds:
print(sample["text"][:200])
break
Citation
@misc{niklaus2026smoldata,
title={SmolData},
author={Joel Niklaus and Hynek Kydl{\'\i}{\v{c}}ek},
year={2026},
publisher={Hugging Face},
journal={Hugging Face repository},
howpublished={\url{https://huggingface.co/collections/HuggingFaceFW/smol-data}}
}
- Downloads last month
- 4,727